Page 1
Quantitative Biology
and Bioinformatics
Unit IV
Chapter 9
Introduction to Bioinformatics
Chapter 10
Protein Informatics and
Cheminformatics
Chapter 11
Programming and Systems
Biology
The recent advances in science have resulted in
generation of enormous amount of data obtained
from genome sequencing and functional genomics.
Handling of such vast data obtained from diverse
sources is beyond the scope of humans. This
has consequently given rise to a whole new ??eld
of science called bioinformatics. Chapter 9
describes the various terminology and concepts of
bioinformatics. The scope of utilisation of raw data
in biology to retrieve important information about
proteins of interest has been discussed in Chapter
10. Chapter 11 deals with the accession, ??ltration
and manipulation of biological data with the help of
programming languages.
Bioinformatics
Chapter 9.indd 233 09/01/2025 15:18:55
Reprint 2025-26
Page 2
Quantitative Biology
and Bioinformatics
Unit IV
Chapter 9
Introduction to Bioinformatics
Chapter 10
Protein Informatics and
Cheminformatics
Chapter 11
Programming and Systems
Biology
The recent advances in science have resulted in
generation of enormous amount of data obtained
from genome sequencing and functional genomics.
Handling of such vast data obtained from diverse
sources is beyond the scope of humans. This
has consequently given rise to a whole new ??eld
of science called bioinformatics. Chapter 9
describes the various terminology and concepts of
bioinformatics. The scope of utilisation of raw data
in biology to retrieve important information about
proteins of interest has been discussed in Chapter
10. Chapter 11 deals with the accession, ??ltration
and manipulation of biological data with the help of
programming languages.
Bioinformatics
Chapter 9.indd 233 09/01/2025 15:18:55
Reprint 2025-26
Margaret Oakley Dayhoff
(1925-1983)
Margaret Oakley Dayhoff (1925-
1983) was an American physical
chemist and one of the most
important ??gures in the ??eld of
bioinformatics. She received her
doctoral degree from Columbia
University in the Department of
Chemistry and dedicated her entire
career to applying mathematics
and computational methods to
biochemistry. In 1965, she published
a comprehensive, open source
collection of protein sequences—
Atlas of Protein Sequence and
Structure. It subsequently became
a model for the sequence databases
which later developed. She had also
developed the one-letter codes for
amino acids in an attempt to reduce
the size of data ??les in computer
applications.
Chapter 9.indd 234 09/01/2025 15:18:57
Reprint 2025-26
Page 3
Quantitative Biology
and Bioinformatics
Unit IV
Chapter 9
Introduction to Bioinformatics
Chapter 10
Protein Informatics and
Cheminformatics
Chapter 11
Programming and Systems
Biology
The recent advances in science have resulted in
generation of enormous amount of data obtained
from genome sequencing and functional genomics.
Handling of such vast data obtained from diverse
sources is beyond the scope of humans. This
has consequently given rise to a whole new ??eld
of science called bioinformatics. Chapter 9
describes the various terminology and concepts of
bioinformatics. The scope of utilisation of raw data
in biology to retrieve important information about
proteins of interest has been discussed in Chapter
10. Chapter 11 deals with the accession, ??ltration
and manipulation of biological data with the help of
programming languages.
Bioinformatics
Chapter 9.indd 233 09/01/2025 15:18:55
Reprint 2025-26
Margaret Oakley Dayhoff
(1925-1983)
Margaret Oakley Dayhoff (1925-
1983) was an American physical
chemist and one of the most
important ??gures in the ??eld of
bioinformatics. She received her
doctoral degree from Columbia
University in the Department of
Chemistry and dedicated her entire
career to applying mathematics
and computational methods to
biochemistry. In 1965, she published
a comprehensive, open source
collection of protein sequences—
Atlas of Protein Sequence and
Structure. It subsequently became
a model for the sequence databases
which later developed. She had also
developed the one-letter codes for
amino acids in an attempt to reduce
the size of data ??les in computer
applications.
Chapter 9.indd 234 09/01/2025 15:18:57
Reprint 2025-26
Introduction to
Bioinformatics
9.1 The Utility of Basic
Mathematical and
Statistical Concepts
to Understand
Biological Systems
and Processes
9.2 Introduction
9.3 Biological
Databases
9.4 Genome Informatics
9.5 Role of Arti??cial
Intelligence (AI) in
future
9.1 The UTili Ty of Basic Ma The Ma Tical and
s Ta Tis Tical c oncep Ts To Unders Tand Biological
s ys Te Ms and p rocesses The objective of this chapter is to explain the understanding
of the basic concepts of mathematics and statistics is
important to a biologist.
The outcome of any biological experiment is data.
Previously, biologists used to generate and analyse data
without the help of sophisticated software, computational
tools and statistical tests. However, this is not the case
anymore. With the advent of instruments like high-
throughput DNA sequencers, powerful microscopes and
other imaging systems, and analytical instruments capable
of generating large volumes of data, biologists can no
longer deal with the data using their notebooks and excel
sheets. Instead, they need computational and statistical
tools to handle data. Large volumes of data often require
quantitative analyses to interpret and generate biological
meaning. Performing such analyses require one to have
Chapter 9
Chapter 9.indd 235 09/01/2025 15:18:57
Reprint 2025-26
Page 4
Quantitative Biology
and Bioinformatics
Unit IV
Chapter 9
Introduction to Bioinformatics
Chapter 10
Protein Informatics and
Cheminformatics
Chapter 11
Programming and Systems
Biology
The recent advances in science have resulted in
generation of enormous amount of data obtained
from genome sequencing and functional genomics.
Handling of such vast data obtained from diverse
sources is beyond the scope of humans. This
has consequently given rise to a whole new ??eld
of science called bioinformatics. Chapter 9
describes the various terminology and concepts of
bioinformatics. The scope of utilisation of raw data
in biology to retrieve important information about
proteins of interest has been discussed in Chapter
10. Chapter 11 deals with the accession, ??ltration
and manipulation of biological data with the help of
programming languages.
Bioinformatics
Chapter 9.indd 233 09/01/2025 15:18:55
Reprint 2025-26
Margaret Oakley Dayhoff
(1925-1983)
Margaret Oakley Dayhoff (1925-
1983) was an American physical
chemist and one of the most
important ??gures in the ??eld of
bioinformatics. She received her
doctoral degree from Columbia
University in the Department of
Chemistry and dedicated her entire
career to applying mathematics
and computational methods to
biochemistry. In 1965, she published
a comprehensive, open source
collection of protein sequences—
Atlas of Protein Sequence and
Structure. It subsequently became
a model for the sequence databases
which later developed. She had also
developed the one-letter codes for
amino acids in an attempt to reduce
the size of data ??les in computer
applications.
Chapter 9.indd 234 09/01/2025 15:18:57
Reprint 2025-26
Introduction to
Bioinformatics
9.1 The Utility of Basic
Mathematical and
Statistical Concepts
to Understand
Biological Systems
and Processes
9.2 Introduction
9.3 Biological
Databases
9.4 Genome Informatics
9.5 Role of Arti??cial
Intelligence (AI) in
future
9.1 The UTili Ty of Basic Ma The Ma Tical and
s Ta Tis Tical c oncep Ts To Unders Tand Biological
s ys Te Ms and p rocesses The objective of this chapter is to explain the understanding
of the basic concepts of mathematics and statistics is
important to a biologist.
The outcome of any biological experiment is data.
Previously, biologists used to generate and analyse data
without the help of sophisticated software, computational
tools and statistical tests. However, this is not the case
anymore. With the advent of instruments like high-
throughput DNA sequencers, powerful microscopes and
other imaging systems, and analytical instruments capable
of generating large volumes of data, biologists can no
longer deal with the data using their notebooks and excel
sheets. Instead, they need computational and statistical
tools to handle data. Large volumes of data often require
quantitative analyses to interpret and generate biological
meaning. Performing such analyses require one to have
Chapter 9
Chapter 9.indd 235 09/01/2025 15:18:57
Reprint 2025-26
236
Biotechnology good working knowledge of computational and statistical
concepts, for example; machine learning technologies,
regression, variance, and correlation, etc. Mathematical
and statistical concepts can only aid biologists to interpret
their data and are not a replacement for asking the right
questions and the biological acumen. The names of some
of the commonly used statistical terms used in biology is
provided in Box 1.
Let us examine with speci??c examples where both
the knowledge of computing and statistics can help
understand biological phenomena better. For example,
we want to understand the association, if any, between
blood pressure and heart rates in ten patients (Table
9.1). As provided in the table below, a simple visual
estimation (Fig.9.1) is not suf??cient to accurately
determine the relationship (correlation) between the two
variables. For that, one needs to draw a regression line.
Correlation and regression are distinct, yet correlated.
Correlation quanti??es how the variables are connected,
but regression de??nes a statistical relationship between
two or more variables where a change in one variable is
Box 1
Box 1: Glossary of the commonly used statistical terms in biology
Null hypothesis— A statement that there is no relationship between two measured
phenomena.
Statistical signi??cance— A result has statistical signi??cance when it is very
unlikely to have occurred.
p-value— The probability of ??nding the observed results when the null hypothesis of
a study question is true.
t-test —An analysis of two populations means through the use of statistical
examination.
Multivariate analysis: A set of techniques used for analysis of data that contain more
than one variable.
Regression analysis—A technique to investigate the relationship between a
dependent and an independent variable.
Multiple testing correction— A statistical test that corrects for multiple tests to
keep the overall error rate to less than or equal to the user-speci??ed P-value cutoff
Analysis of Variance or ANOVA— A collection of statistical models used to analyse
the differences among group means in a sample.
Chapter 9.indd 236 09/01/2025 15:18:58
Reprint 2025-26
Page 5
Quantitative Biology
and Bioinformatics
Unit IV
Chapter 9
Introduction to Bioinformatics
Chapter 10
Protein Informatics and
Cheminformatics
Chapter 11
Programming and Systems
Biology
The recent advances in science have resulted in
generation of enormous amount of data obtained
from genome sequencing and functional genomics.
Handling of such vast data obtained from diverse
sources is beyond the scope of humans. This
has consequently given rise to a whole new ??eld
of science called bioinformatics. Chapter 9
describes the various terminology and concepts of
bioinformatics. The scope of utilisation of raw data
in biology to retrieve important information about
proteins of interest has been discussed in Chapter
10. Chapter 11 deals with the accession, ??ltration
and manipulation of biological data with the help of
programming languages.
Bioinformatics
Chapter 9.indd 233 09/01/2025 15:18:55
Reprint 2025-26
Margaret Oakley Dayhoff
(1925-1983)
Margaret Oakley Dayhoff (1925-
1983) was an American physical
chemist and one of the most
important ??gures in the ??eld of
bioinformatics. She received her
doctoral degree from Columbia
University in the Department of
Chemistry and dedicated her entire
career to applying mathematics
and computational methods to
biochemistry. In 1965, she published
a comprehensive, open source
collection of protein sequences—
Atlas of Protein Sequence and
Structure. It subsequently became
a model for the sequence databases
which later developed. She had also
developed the one-letter codes for
amino acids in an attempt to reduce
the size of data ??les in computer
applications.
Chapter 9.indd 234 09/01/2025 15:18:57
Reprint 2025-26
Introduction to
Bioinformatics
9.1 The Utility of Basic
Mathematical and
Statistical Concepts
to Understand
Biological Systems
and Processes
9.2 Introduction
9.3 Biological
Databases
9.4 Genome Informatics
9.5 Role of Arti??cial
Intelligence (AI) in
future
9.1 The UTili Ty of Basic Ma The Ma Tical and
s Ta Tis Tical c oncep Ts To Unders Tand Biological
s ys Te Ms and p rocesses The objective of this chapter is to explain the understanding
of the basic concepts of mathematics and statistics is
important to a biologist.
The outcome of any biological experiment is data.
Previously, biologists used to generate and analyse data
without the help of sophisticated software, computational
tools and statistical tests. However, this is not the case
anymore. With the advent of instruments like high-
throughput DNA sequencers, powerful microscopes and
other imaging systems, and analytical instruments capable
of generating large volumes of data, biologists can no
longer deal with the data using their notebooks and excel
sheets. Instead, they need computational and statistical
tools to handle data. Large volumes of data often require
quantitative analyses to interpret and generate biological
meaning. Performing such analyses require one to have
Chapter 9
Chapter 9.indd 235 09/01/2025 15:18:57
Reprint 2025-26
236
Biotechnology good working knowledge of computational and statistical
concepts, for example; machine learning technologies,
regression, variance, and correlation, etc. Mathematical
and statistical concepts can only aid biologists to interpret
their data and are not a replacement for asking the right
questions and the biological acumen. The names of some
of the commonly used statistical terms used in biology is
provided in Box 1.
Let us examine with speci??c examples where both
the knowledge of computing and statistics can help
understand biological phenomena better. For example,
we want to understand the association, if any, between
blood pressure and heart rates in ten patients (Table
9.1). As provided in the table below, a simple visual
estimation (Fig.9.1) is not suf??cient to accurately
determine the relationship (correlation) between the two
variables. For that, one needs to draw a regression line.
Correlation and regression are distinct, yet correlated.
Correlation quanti??es how the variables are connected,
but regression de??nes a statistical relationship between
two or more variables where a change in one variable is
Box 1
Box 1: Glossary of the commonly used statistical terms in biology
Null hypothesis— A statement that there is no relationship between two measured
phenomena.
Statistical signi??cance— A result has statistical signi??cance when it is very
unlikely to have occurred.
p-value— The probability of ??nding the observed results when the null hypothesis of
a study question is true.
t-test —An analysis of two populations means through the use of statistical
examination.
Multivariate analysis: A set of techniques used for analysis of data that contain more
than one variable.
Regression analysis—A technique to investigate the relationship between a
dependent and an independent variable.
Multiple testing correction— A statistical test that corrects for multiple tests to
keep the overall error rate to less than or equal to the user-speci??ed P-value cutoff
Analysis of Variance or ANOVA— A collection of statistical models used to analyse
the differences among group means in a sample.
Chapter 9.indd 236 09/01/2025 15:18:58
Reprint 2025-26
237
i ntroduction to Bioinformatics associated with a change in another. Therefore, in the
example above a simple regression test will tell us if there
is a direct relationship between heart rate and blood
pressure. The output of a linear regression analysis is
R
2
-value, a statistical measure to show as to how close
the data is to the ??tted regression line. The R
2
value
ranges from 0 (no correlation between the variables) and
1 (perfect correlation between the variables). As shown
in Fig. 9.1, the R
2
value suggests that there is a good
correlation between the two variables. Therefore, the null
hypothesis is rejected in this case.
Table 9.1: Heart rate and blood pressure recorded in
ten patients
Patient Heart rate Blood pressure
(cystolic)
1 112 189
2 83 140
3 92 153
4 121 192
5 85 147
6 111 178
7 94 135
8 88 143
9 102 177
10 111 189
Fig. 9.1: Correlation between the two variables with a
simple linear regression line
R
2
Heart Rate
Chapter 9.indd 237 09/01/2025 15:18:58
Reprint 2025-26
Read More