NEET Exam  >  NEET Notes  >  Biotechnology for Class 11  >  NCERT Textbook: Protein Informatics and Cheminformatics

NCERT Textbook: Protein Informatics and Cheminformatics | Biotechnology for Class 11 - NEET PDF Download

Download, print and study this document offline
Please wait while the PDF view is loading
 Page 1


Protein Informatics 
and Cheminformatics
10.1 Protein informatics
10.2  Cheminformatics
10.1 Protein i nformatics 10.1.1 Introduction
Collecting information about any protein using 
techniques of information technology comes under protein 
informatics. Protein informatics has been of tremendous 
help in getting the geometrical location of the functional 
site, the biochemical function and the biological function 
of the hypothetical proteins. In addition, it has led to 
the determination of the tertiary structures of many 
hypothetical proteins, whose molecular functions could not 
be understood using conventional methods. Heterogeneous 
databases and various descriptors of amino acid sequences, 
tertiary structures and pathways on the proteome scale 
have also been of help in developing protein informatics.
Chapter 10
Chapter 10.indd   256 09/01/2025   15:18:32
Reprint 2025-26
Page 2


Protein Informatics 
and Cheminformatics
10.1 Protein informatics
10.2  Cheminformatics
10.1 Protein i nformatics 10.1.1 Introduction
Collecting information about any protein using 
techniques of information technology comes under protein 
informatics. Protein informatics has been of tremendous 
help in getting the geometrical location of the functional 
site, the biochemical function and the biological function 
of the hypothetical proteins. In addition, it has led to 
the determination of the tertiary structures of many 
hypothetical proteins, whose molecular functions could not 
be understood using conventional methods. Heterogeneous 
databases and various descriptors of amino acid sequences, 
tertiary structures and pathways on the proteome scale 
have also been of help in developing protein informatics.
Chapter 10
Chapter 10.indd   256 09/01/2025   15:18:32
Reprint 2025-26
257
Protein i nformatics and c heminformatics 10.1.2 Protein data types
The process of computation of information extraction 
needs raw data of protein. These protein data can be of 
following types —
(i) Microscopic image of heat-denatured protein 
aggregate
(ii) Protein in solution form
(iii) Protein sequence as output of Matrix Assisted Laser 
Desorption Ionisation (MALDI)
(iv) Assembled protein sequence
(v) Protein crystal structure in Protein Data Bank (PDB) 
format
(vi) Protein-protein, protein-ligand or protein-nucleotide 
interaction ??le
(vii) Nuclear Magnetic Resonance (NMR) data and Mass 
Spectrometry (MS) data
(viii) Protein sequences derived directly from the genomic 
sequences, which do not contain the known evidence 
of existence (Hypothetical protein)
The above mentioned types of protein data can be used for 
getting useful information like
(i) Multi-fractal property of microscopic image of heat-
denatured protein aggregate is used for designing 
protein-marker.
(ii) Protein data in solution are useful for analysing 
physico-chemical properties and kinetics 
information.
(iii) Fragmented short sequences of proteins from MALDI 
are used to ??nd out the full length sequence.
(iv) Protein crystal structures are used to study 
mutations and interactions.
(v) PDB, NMR and MS data are also used for the 
prediction of structure of non-crystallised protein 
(directly from the sequence).
(vi) There are proteins which do not have known 
existences (hypothetical proteins) which can be 
identi??ed from the genomic sequences.
(vii) Network mapping of protein provides information 
about the possible target of treatment of different 
diseases.
Chapter 10.indd   257 09/01/2025   15:18:32
Reprint 2025-26
Page 3


Protein Informatics 
and Cheminformatics
10.1 Protein informatics
10.2  Cheminformatics
10.1 Protein i nformatics 10.1.1 Introduction
Collecting information about any protein using 
techniques of information technology comes under protein 
informatics. Protein informatics has been of tremendous 
help in getting the geometrical location of the functional 
site, the biochemical function and the biological function 
of the hypothetical proteins. In addition, it has led to 
the determination of the tertiary structures of many 
hypothetical proteins, whose molecular functions could not 
be understood using conventional methods. Heterogeneous 
databases and various descriptors of amino acid sequences, 
tertiary structures and pathways on the proteome scale 
have also been of help in developing protein informatics.
Chapter 10
Chapter 10.indd   256 09/01/2025   15:18:32
Reprint 2025-26
257
Protein i nformatics and c heminformatics 10.1.2 Protein data types
The process of computation of information extraction 
needs raw data of protein. These protein data can be of 
following types —
(i) Microscopic image of heat-denatured protein 
aggregate
(ii) Protein in solution form
(iii) Protein sequence as output of Matrix Assisted Laser 
Desorption Ionisation (MALDI)
(iv) Assembled protein sequence
(v) Protein crystal structure in Protein Data Bank (PDB) 
format
(vi) Protein-protein, protein-ligand or protein-nucleotide 
interaction ??le
(vii) Nuclear Magnetic Resonance (NMR) data and Mass 
Spectrometry (MS) data
(viii) Protein sequences derived directly from the genomic 
sequences, which do not contain the known evidence 
of existence (Hypothetical protein)
The above mentioned types of protein data can be used for 
getting useful information like
(i) Multi-fractal property of microscopic image of heat-
denatured protein aggregate is used for designing 
protein-marker.
(ii) Protein data in solution are useful for analysing 
physico-chemical properties and kinetics 
information.
(iii) Fragmented short sequences of proteins from MALDI 
are used to ??nd out the full length sequence.
(iv) Protein crystal structures are used to study 
mutations and interactions.
(v) PDB, NMR and MS data are also used for the 
prediction of structure of non-crystallised protein 
(directly from the sequence).
(vi) There are proteins which do not have known 
existences (hypothetical proteins) which can be 
identi??ed from the genomic sequences.
(vii) Network mapping of protein provides information 
about the possible target of treatment of different 
diseases.
Chapter 10.indd   257 09/01/2025   15:18:32
Reprint 2025-26
258
Biotechnology In order to carry out the protein informatics analysis, 
the following two basic facilities are required: 
(i) Availability of the raw data from various databases, 
such as NCBI, PDB, CHEMBL, BIOMODELS, etc.
(ii) Informatics tools and techniques used for the 
analyses. Some of the well known techniques are: 
(a) image analysis by the wavelet techniques, (b) 
sequence similarity and homology calculations, (c) 
structure optimisation techniques, (d) data analysis 
by statistical and machine learning techniques as 
Arti??cial Neural Network (ANN), Support Vector 
Machine (SVM) and Hidden Markov Model (HMM), 
(e) Network Mapping Technique, and (f) Systems 
Biology Mark-up Language (SBML).
10.1.3 Computational prediction of protein 
structures
Protein structure prediction using bioinformatics tools 
is aimed to explore how amino acid sequences specify 
the structure of proteins and how these proteins bind to 
substrates and other molecules to perform their functions. 
This task for predicting structure of a protein (including 
those of hypothetical proteins) using bioinformatics tools 
is possible even when only gene sequence is known, i.e., in 
the absence of protein sequence. Many computational tools 
are available from different sources for making predictions 
of structural and physico-chemical properties of proteins. 
The major advantages of computational methods are the 
time frame involved, low cost and the feasibility of high 
throughput screening.
10.1.3.1 Primary structure prediction
Protein primary structure prediction involves physico-
chemical characterisation such as isoelectric point, 
extinction co-ef??cient, instability index, aliphatic index and 
grand average hydropathy. All these can be calculated with 
the help of ProtParam tool of ExPASy Proteomics Server. 
Some of the physico-chemical properties of proteins are 
described in brief in the following section.
Isoelectric point— Isoelectric point (pI) is the pH at 
which the surface of protein is covered with charge but 
net charge of protein is zero. At pI, proteins are stable and 
compact. If the computed pI value is less than 7 (pI<7), it 
indicates that protein is considered as acidic.
Chapter 10.indd   258 09/01/2025   15:18:32
Reprint 2025-26
Page 4


Protein Informatics 
and Cheminformatics
10.1 Protein informatics
10.2  Cheminformatics
10.1 Protein i nformatics 10.1.1 Introduction
Collecting information about any protein using 
techniques of information technology comes under protein 
informatics. Protein informatics has been of tremendous 
help in getting the geometrical location of the functional 
site, the biochemical function and the biological function 
of the hypothetical proteins. In addition, it has led to 
the determination of the tertiary structures of many 
hypothetical proteins, whose molecular functions could not 
be understood using conventional methods. Heterogeneous 
databases and various descriptors of amino acid sequences, 
tertiary structures and pathways on the proteome scale 
have also been of help in developing protein informatics.
Chapter 10
Chapter 10.indd   256 09/01/2025   15:18:32
Reprint 2025-26
257
Protein i nformatics and c heminformatics 10.1.2 Protein data types
The process of computation of information extraction 
needs raw data of protein. These protein data can be of 
following types —
(i) Microscopic image of heat-denatured protein 
aggregate
(ii) Protein in solution form
(iii) Protein sequence as output of Matrix Assisted Laser 
Desorption Ionisation (MALDI)
(iv) Assembled protein sequence
(v) Protein crystal structure in Protein Data Bank (PDB) 
format
(vi) Protein-protein, protein-ligand or protein-nucleotide 
interaction ??le
(vii) Nuclear Magnetic Resonance (NMR) data and Mass 
Spectrometry (MS) data
(viii) Protein sequences derived directly from the genomic 
sequences, which do not contain the known evidence 
of existence (Hypothetical protein)
The above mentioned types of protein data can be used for 
getting useful information like
(i) Multi-fractal property of microscopic image of heat-
denatured protein aggregate is used for designing 
protein-marker.
(ii) Protein data in solution are useful for analysing 
physico-chemical properties and kinetics 
information.
(iii) Fragmented short sequences of proteins from MALDI 
are used to ??nd out the full length sequence.
(iv) Protein crystal structures are used to study 
mutations and interactions.
(v) PDB, NMR and MS data are also used for the 
prediction of structure of non-crystallised protein 
(directly from the sequence).
(vi) There are proteins which do not have known 
existences (hypothetical proteins) which can be 
identi??ed from the genomic sequences.
(vii) Network mapping of protein provides information 
about the possible target of treatment of different 
diseases.
Chapter 10.indd   257 09/01/2025   15:18:32
Reprint 2025-26
258
Biotechnology In order to carry out the protein informatics analysis, 
the following two basic facilities are required: 
(i) Availability of the raw data from various databases, 
such as NCBI, PDB, CHEMBL, BIOMODELS, etc.
(ii) Informatics tools and techniques used for the 
analyses. Some of the well known techniques are: 
(a) image analysis by the wavelet techniques, (b) 
sequence similarity and homology calculations, (c) 
structure optimisation techniques, (d) data analysis 
by statistical and machine learning techniques as 
Arti??cial Neural Network (ANN), Support Vector 
Machine (SVM) and Hidden Markov Model (HMM), 
(e) Network Mapping Technique, and (f) Systems 
Biology Mark-up Language (SBML).
10.1.3 Computational prediction of protein 
structures
Protein structure prediction using bioinformatics tools 
is aimed to explore how amino acid sequences specify 
the structure of proteins and how these proteins bind to 
substrates and other molecules to perform their functions. 
This task for predicting structure of a protein (including 
those of hypothetical proteins) using bioinformatics tools 
is possible even when only gene sequence is known, i.e., in 
the absence of protein sequence. Many computational tools 
are available from different sources for making predictions 
of structural and physico-chemical properties of proteins. 
The major advantages of computational methods are the 
time frame involved, low cost and the feasibility of high 
throughput screening.
10.1.3.1 Primary structure prediction
Protein primary structure prediction involves physico-
chemical characterisation such as isoelectric point, 
extinction co-ef??cient, instability index, aliphatic index and 
grand average hydropathy. All these can be calculated with 
the help of ProtParam tool of ExPASy Proteomics Server. 
Some of the physico-chemical properties of proteins are 
described in brief in the following section.
Isoelectric point— Isoelectric point (pI) is the pH at 
which the surface of protein is covered with charge but 
net charge of protein is zero. At pI, proteins are stable and 
compact. If the computed pI value is less than 7 (pI<7), it 
indicates that protein is considered as acidic.
Chapter 10.indd   258 09/01/2025   15:18:32
Reprint 2025-26
259
Protein i nformatics and c heminformatics The pI greater than 7 (pI>7) reveals that protein is basic 
in character. The computed isoelectric point (pI) will be 
useful for developing the buffer system for puri??cation by 
isoelectric focusing method.
The aliphatic index— The aliphatic index (AI), which 
is de??ned as the relative volume of a protein occupied 
by aliphatic side chains (A, V, I and L) is regarded as a 
positive factor for the increase of thermal stability of 
globular proteins. Very high aliphatic index of protein 
sequences indicates that protein may be stable for a wide 
temperature range.
The instability index—The instability index provides 
an estimate of the stability of protein in a test tube. There are 
certain dipeptides, the occurrence of which is signi??cantly 
different in the unstable proteins compared with those 
in the stable ones. This method assigns a weight value 
of instability. Using these weight values it is possible to 
compute an instability index. A protein whose instability 
index is smaller than 40, is predicted as stable, a value 
above 40 predicts that the protein may be unstable.
The Grand Average Hydropathy (GRAVY) value — The 
Grand Average Hydropathy (GRAVY) value for a peptide or 
protein is calculated as the sum of hydropathy values of 
all the amino acids, divided by the number of residues in 
the sequence. The low range of GRAVY value indicates the 
possibility of better interaction with water.
10.1.3.2 Secondary Structure Prediction
The protein secondary structure has been studied 
intensely, since it is very helpful to reveal the functions of 
protein with unknown structures. In addition, it has been 
shown that the prediction of protein secondary structure is 
a step towards protein 3-dimensional structure prediction. 
APSSP, CFSSP, SOPMA, and GOR are common protein 
secondary structure prediction tools.
10.1.3.3 Three dimensional (3D) Structure 
Prediction
The following three computational methods are commonly 
used to predict protein 3D structure.
Homology modelling—For homology modelling, 
the amino acid sequence of a protein with unknown 
structure is aligned against sequences of proteins 
Chapter 10.indd   259 09/01/2025   15:18:32
Reprint 2025-26
Page 5


Protein Informatics 
and Cheminformatics
10.1 Protein informatics
10.2  Cheminformatics
10.1 Protein i nformatics 10.1.1 Introduction
Collecting information about any protein using 
techniques of information technology comes under protein 
informatics. Protein informatics has been of tremendous 
help in getting the geometrical location of the functional 
site, the biochemical function and the biological function 
of the hypothetical proteins. In addition, it has led to 
the determination of the tertiary structures of many 
hypothetical proteins, whose molecular functions could not 
be understood using conventional methods. Heterogeneous 
databases and various descriptors of amino acid sequences, 
tertiary structures and pathways on the proteome scale 
have also been of help in developing protein informatics.
Chapter 10
Chapter 10.indd   256 09/01/2025   15:18:32
Reprint 2025-26
257
Protein i nformatics and c heminformatics 10.1.2 Protein data types
The process of computation of information extraction 
needs raw data of protein. These protein data can be of 
following types —
(i) Microscopic image of heat-denatured protein 
aggregate
(ii) Protein in solution form
(iii) Protein sequence as output of Matrix Assisted Laser 
Desorption Ionisation (MALDI)
(iv) Assembled protein sequence
(v) Protein crystal structure in Protein Data Bank (PDB) 
format
(vi) Protein-protein, protein-ligand or protein-nucleotide 
interaction ??le
(vii) Nuclear Magnetic Resonance (NMR) data and Mass 
Spectrometry (MS) data
(viii) Protein sequences derived directly from the genomic 
sequences, which do not contain the known evidence 
of existence (Hypothetical protein)
The above mentioned types of protein data can be used for 
getting useful information like
(i) Multi-fractal property of microscopic image of heat-
denatured protein aggregate is used for designing 
protein-marker.
(ii) Protein data in solution are useful for analysing 
physico-chemical properties and kinetics 
information.
(iii) Fragmented short sequences of proteins from MALDI 
are used to ??nd out the full length sequence.
(iv) Protein crystal structures are used to study 
mutations and interactions.
(v) PDB, NMR and MS data are also used for the 
prediction of structure of non-crystallised protein 
(directly from the sequence).
(vi) There are proteins which do not have known 
existences (hypothetical proteins) which can be 
identi??ed from the genomic sequences.
(vii) Network mapping of protein provides information 
about the possible target of treatment of different 
diseases.
Chapter 10.indd   257 09/01/2025   15:18:32
Reprint 2025-26
258
Biotechnology In order to carry out the protein informatics analysis, 
the following two basic facilities are required: 
(i) Availability of the raw data from various databases, 
such as NCBI, PDB, CHEMBL, BIOMODELS, etc.
(ii) Informatics tools and techniques used for the 
analyses. Some of the well known techniques are: 
(a) image analysis by the wavelet techniques, (b) 
sequence similarity and homology calculations, (c) 
structure optimisation techniques, (d) data analysis 
by statistical and machine learning techniques as 
Arti??cial Neural Network (ANN), Support Vector 
Machine (SVM) and Hidden Markov Model (HMM), 
(e) Network Mapping Technique, and (f) Systems 
Biology Mark-up Language (SBML).
10.1.3 Computational prediction of protein 
structures
Protein structure prediction using bioinformatics tools 
is aimed to explore how amino acid sequences specify 
the structure of proteins and how these proteins bind to 
substrates and other molecules to perform their functions. 
This task for predicting structure of a protein (including 
those of hypothetical proteins) using bioinformatics tools 
is possible even when only gene sequence is known, i.e., in 
the absence of protein sequence. Many computational tools 
are available from different sources for making predictions 
of structural and physico-chemical properties of proteins. 
The major advantages of computational methods are the 
time frame involved, low cost and the feasibility of high 
throughput screening.
10.1.3.1 Primary structure prediction
Protein primary structure prediction involves physico-
chemical characterisation such as isoelectric point, 
extinction co-ef??cient, instability index, aliphatic index and 
grand average hydropathy. All these can be calculated with 
the help of ProtParam tool of ExPASy Proteomics Server. 
Some of the physico-chemical properties of proteins are 
described in brief in the following section.
Isoelectric point— Isoelectric point (pI) is the pH at 
which the surface of protein is covered with charge but 
net charge of protein is zero. At pI, proteins are stable and 
compact. If the computed pI value is less than 7 (pI<7), it 
indicates that protein is considered as acidic.
Chapter 10.indd   258 09/01/2025   15:18:32
Reprint 2025-26
259
Protein i nformatics and c heminformatics The pI greater than 7 (pI>7) reveals that protein is basic 
in character. The computed isoelectric point (pI) will be 
useful for developing the buffer system for puri??cation by 
isoelectric focusing method.
The aliphatic index— The aliphatic index (AI), which 
is de??ned as the relative volume of a protein occupied 
by aliphatic side chains (A, V, I and L) is regarded as a 
positive factor for the increase of thermal stability of 
globular proteins. Very high aliphatic index of protein 
sequences indicates that protein may be stable for a wide 
temperature range.
The instability index—The instability index provides 
an estimate of the stability of protein in a test tube. There are 
certain dipeptides, the occurrence of which is signi??cantly 
different in the unstable proteins compared with those 
in the stable ones. This method assigns a weight value 
of instability. Using these weight values it is possible to 
compute an instability index. A protein whose instability 
index is smaller than 40, is predicted as stable, a value 
above 40 predicts that the protein may be unstable.
The Grand Average Hydropathy (GRAVY) value — The 
Grand Average Hydropathy (GRAVY) value for a peptide or 
protein is calculated as the sum of hydropathy values of 
all the amino acids, divided by the number of residues in 
the sequence. The low range of GRAVY value indicates the 
possibility of better interaction with water.
10.1.3.2 Secondary Structure Prediction
The protein secondary structure has been studied 
intensely, since it is very helpful to reveal the functions of 
protein with unknown structures. In addition, it has been 
shown that the prediction of protein secondary structure is 
a step towards protein 3-dimensional structure prediction. 
APSSP, CFSSP, SOPMA, and GOR are common protein 
secondary structure prediction tools.
10.1.3.3 Three dimensional (3D) Structure 
Prediction
The following three computational methods are commonly 
used to predict protein 3D structure.
Homology modelling—For homology modelling, 
the amino acid sequence of a protein with unknown 
structure is aligned against sequences of proteins 
Chapter 10.indd   259 09/01/2025   15:18:32
Reprint 2025-26
260
Biotechnology with known structures. High degrees of homology (very 
similar sequences across and between the proteins) can 
be used to determine the global structure of the protein 
with unknown structure and place it into a certain fold 
category. Lower degrees of homology may still be used 
to determine the local structures, an example being the 
Chou-Fasman method for predicting secondary structure. 
An advantage of homology modelling methods is lack of 
dependence on the knowledge of physical determinants. 
MODELLER and SWISS-MODEL are commonly used tools 
for homology modelling.
Fold prediction—With the method called ‘threading’, 
the sequence of a protein with unknown structure is 
forced to take the conformation of the backbone (protein 
side chains) of a protein with known structure. These 
methods tend to be more compute-intensive than homology 
modelling methods, but they give more con??dence in the 
physical viability of the results. LIBELLULA and Threader 
are commonly used tools for this method.
De novo protein structure prediction: It is an 
algorithmic process by which protein tertiary structure is 
predicted from its amino acid primary sequence. QUARK 
is a computer algorithm for ab initio protein structure 
prediction and protein peptide folding, which aims to 
construct the correct protein 3D model from amino acid 
sequence only. 
Computationally elucidated structure of a protein 
is recorded as atomic coordinates in protein-data-bank 
??les. The three-dimensional coordinates are stored in a 
type of text-??le namely PDB-??le with ??le extension .pdb 
in Protein Data Bank (PDB) database. It contains data 
from X-ray crystallography, NMR and a few theoretical 
structure models. 
Domain prediction— Domain is distinct functional 
and/or structural units of a protein. Independent folding 
unit of a polypeptide chain also carries speci??c function.  
They are often identi??ed as recurring (sequence or 
structure) units, which may exist in various contexts. 
Domains provide most valuable information for the 
prediction of protein structure, function, evolution, and 
design. The most common tools for domain prediction are 
InterPRO scan of EMBL and CDD search of NCBI.
Chapter 10.indd   260 09/01/2025   15:18:32
Reprint 2025-26
Read More
24 docs

FAQs on NCERT Textbook: Protein Informatics and Cheminformatics - Biotechnology for Class 11 - NEET

1. What is the significance of protein informatics in biological research?
Ans. Protein informatics is crucial in biological research as it helps in understanding the structure, function, and interactions of proteins. It enables researchers to analyze large datasets, predict protein structures, and identify potential drug targets, thereby accelerating discoveries in fields such as genomics and proteomics.
2. How does cheminformatics contribute to drug discovery?
Ans. Cheminformatics plays a vital role in drug discovery by providing tools for the analysis and visualization of chemical data. It aids in the identification of lead compounds, optimization of chemical structures, and prediction of biological activities, ultimately streamlining the drug development process and reducing costs.
3. What are the key tools used in protein informatics?
Ans. Key tools in protein informatics include databases like UniProt and PDB for protein sequences and structures, software for molecular modeling (e.g., PyMOL, Chimera), and algorithms for predicting protein-protein interactions and folding patterns. These tools facilitate in-depth analyses of protein functions and interactions.
4. What is the relationship between protein informatics and cheminformatics?
Ans. The relationship between protein informatics and cheminformatics lies in their shared goal of understanding biological systems at the molecular level. While protein informatics focuses on protein data, cheminformatics deals with chemical data. Together, they enhance the understanding of drug interactions with proteins and support the design of new therapeutics.
5. What are the challenges faced in protein and cheminformatics?
Ans. Challenges in protein and cheminformatics include data quality and standardization, the complexity of biological systems, and the integration of diverse datasets. Additionally, the rapid growth of data necessitates the development of efficient algorithms and tools to analyze and interpret this information effectively.
Related Searches

Viva Questions

,

Exam

,

mock tests for examination

,

pdf

,

NCERT Textbook: Protein Informatics and Cheminformatics | Biotechnology for Class 11 - NEET

,

practice quizzes

,

Summary

,

past year papers

,

Objective type Questions

,

Sample Paper

,

Free

,

Previous Year Questions with Solutions

,

shortcuts and tricks

,

study material

,

MCQs

,

video lectures

,

NCERT Textbook: Protein Informatics and Cheminformatics | Biotechnology for Class 11 - NEET

,

Important questions

,

NCERT Textbook: Protein Informatics and Cheminformatics | Biotechnology for Class 11 - NEET

,

Semester Notes

,

Extra Questions

,

ppt

;