Page 1
Introduction to Bioinformatics
Institute of Lifelong Learning, University of Delhi
0
Subject : Bioinformatics
Lesson : Introduction to Bioinformatics
Lesson Developer : Sandip Das
College/Department : Department of Botany, University of Delhi
Page 2
Introduction to Bioinformatics
Institute of Lifelong Learning, University of Delhi
0
Subject : Bioinformatics
Lesson : Introduction to Bioinformatics
Lesson Developer : Sandip Das
College/Department : Department of Botany, University of Delhi
Introduction to Bioinformatics
Institute of Lifelong Learning, University of Delhi
1
Table of Contents
Chapter: Introduction to Bioinformatics
? Introduction
? Why Bioinformatics
? Databases and tools
? Bioinformatics databases and tools for DNA analysis
? Annotation
? Similarity searching
? Molecular evolution
? Bioinformatics databases and tools for RNA analysis
? Gene Expression analysis
? RNA structure prediction
? Bioinformatics databases and tools for protein analysis
? Sequence analysis
? Structure prediction of proteins
? Bioinformatics based databases and tools for whole genome
analysis:
? Comparative genomics and genome structure
? Interactome and Biological Network tool
? Applications
? Summary
? Exercise/ Practice
? Glossary
? References
Page 3
Introduction to Bioinformatics
Institute of Lifelong Learning, University of Delhi
0
Subject : Bioinformatics
Lesson : Introduction to Bioinformatics
Lesson Developer : Sandip Das
College/Department : Department of Botany, University of Delhi
Introduction to Bioinformatics
Institute of Lifelong Learning, University of Delhi
1
Table of Contents
Chapter: Introduction to Bioinformatics
? Introduction
? Why Bioinformatics
? Databases and tools
? Bioinformatics databases and tools for DNA analysis
? Annotation
? Similarity searching
? Molecular evolution
? Bioinformatics databases and tools for RNA analysis
? Gene Expression analysis
? RNA structure prediction
? Bioinformatics databases and tools for protein analysis
? Sequence analysis
? Structure prediction of proteins
? Bioinformatics based databases and tools for whole genome
analysis:
? Comparative genomics and genome structure
? Interactome and Biological Network tool
? Applications
? Summary
? Exercise/ Practice
? Glossary
? References
Introduction to Bioinformatics
Institute of Lifelong Learning, University of Delhi
2
Introduction
Biological sciences, traditionally, was involved primarily with the observation and descriptive
study of organisms. This approach, over a period of time, gave rise to several subject areas
that amassed large amounts of factual information on morphology, inheritance, anatomy,
taxonomy, life cycle, physiology, ecological and environmental relationships and infectivity.
Over a period of time, the scientific community became curious to know the “basis” for
these characteristic features of living organisms and variations that exist among them. This
shift in scientific paradigm prompted an in-depth understanding of the “molecular” basis of
life forms. Beginning from identification of the genetic material (nucleic acids) to sequencing
the entire genomes of several organisms, biology has now been substantially re-defined. In
this endeavor, biologists were benefited immensely by inputs from physical, chemical and
mathematical sciences. The study of biological systems with a “Why is it so?” approach gave
birth to several new areas of research viz., molecular genetics, genomics, proteomics,
recombinant DNA technology, transgenic technology, etc. Extensive work in these areas on
different biological systems led to the generation of large volumes of data on linkage maps,
genomes, transcriptomes, proteomes and molecular structures, analysis of which became
impossible using manual approaches. Use of computational power to analyze biological data
was increasingly felt to be an unavoidable option leading to the birth of a new science called
“Bioinformatics”.
Why Bioinformatics
Imagine yourself trying to solve a complex mathematical calculation or trying to find a
pattern in a jumbled up string of alphabets or numbers all by yourselves without the aid of
any computational devices such as calculators or computers. Not only can such a task
become extremely time consuming but may even turn out to be “unsolvable”. However, if
you are to have a calculator or a computer for your help, the given task may be performed
in a much shorter duration of time. Of course, you need to know how to operate the
calculator/computer and the sequence of commands to be given to the machine! In an
analogous scenario, understanding the meaning of just four letter of life, namely Adenine,
Guanine, Cytosine and Thymidine (Uracil) as building block of life and storehouse of
information can prove daunting, unless we are able to decipher the hidden meaning for the
Page 4
Introduction to Bioinformatics
Institute of Lifelong Learning, University of Delhi
0
Subject : Bioinformatics
Lesson : Introduction to Bioinformatics
Lesson Developer : Sandip Das
College/Department : Department of Botany, University of Delhi
Introduction to Bioinformatics
Institute of Lifelong Learning, University of Delhi
1
Table of Contents
Chapter: Introduction to Bioinformatics
? Introduction
? Why Bioinformatics
? Databases and tools
? Bioinformatics databases and tools for DNA analysis
? Annotation
? Similarity searching
? Molecular evolution
? Bioinformatics databases and tools for RNA analysis
? Gene Expression analysis
? RNA structure prediction
? Bioinformatics databases and tools for protein analysis
? Sequence analysis
? Structure prediction of proteins
? Bioinformatics based databases and tools for whole genome
analysis:
? Comparative genomics and genome structure
? Interactome and Biological Network tool
? Applications
? Summary
? Exercise/ Practice
? Glossary
? References
Introduction to Bioinformatics
Institute of Lifelong Learning, University of Delhi
2
Introduction
Biological sciences, traditionally, was involved primarily with the observation and descriptive
study of organisms. This approach, over a period of time, gave rise to several subject areas
that amassed large amounts of factual information on morphology, inheritance, anatomy,
taxonomy, life cycle, physiology, ecological and environmental relationships and infectivity.
Over a period of time, the scientific community became curious to know the “basis” for
these characteristic features of living organisms and variations that exist among them. This
shift in scientific paradigm prompted an in-depth understanding of the “molecular” basis of
life forms. Beginning from identification of the genetic material (nucleic acids) to sequencing
the entire genomes of several organisms, biology has now been substantially re-defined. In
this endeavor, biologists were benefited immensely by inputs from physical, chemical and
mathematical sciences. The study of biological systems with a “Why is it so?” approach gave
birth to several new areas of research viz., molecular genetics, genomics, proteomics,
recombinant DNA technology, transgenic technology, etc. Extensive work in these areas on
different biological systems led to the generation of large volumes of data on linkage maps,
genomes, transcriptomes, proteomes and molecular structures, analysis of which became
impossible using manual approaches. Use of computational power to analyze biological data
was increasingly felt to be an unavoidable option leading to the birth of a new science called
“Bioinformatics”.
Why Bioinformatics
Imagine yourself trying to solve a complex mathematical calculation or trying to find a
pattern in a jumbled up string of alphabets or numbers all by yourselves without the aid of
any computational devices such as calculators or computers. Not only can such a task
become extremely time consuming but may even turn out to be “unsolvable”. However, if
you are to have a calculator or a computer for your help, the given task may be performed
in a much shorter duration of time. Of course, you need to know how to operate the
calculator/computer and the sequence of commands to be given to the machine! In an
analogous scenario, understanding the meaning of just four letter of life, namely Adenine,
Guanine, Cytosine and Thymidine (Uracil) as building block of life and storehouse of
information can prove daunting, unless we are able to decipher the hidden meaning for the
Introduction to Bioinformatics
Institute of Lifelong Learning, University of Delhi
3
maintenance and functionality of the genome. A, G, C and T/U represent just one level of
information content, and as we are familiar with the central dogma of Life, serves as the
blueprint with message being conveyed from genetic material (DNA/RNA) to messenger
RNA and eventually to proteins. Therefore at the minimum level, four nucleotides and
twenty amino acids hold the entire key to life (we are not even discussing about the
enormous variety of metabolites, biomolecules and other compounds that play a major role
in functioning of Life).
Bioinformatics, therefore, attempts to unravel the genome information and can be
understood to be comprising of two components:
Biology (bio) + Information Technology (informatics) = Computational Biology
It can be summarized as the use of information technology to generate, acquire, manage
and analysis data related to biological sciences.
Computer and internet have played a major role and may be taken as the backbone on
which the entire field of bioinformatics is flourishing.
Algorithms or computers programs are specialized programs/softwares written by specialists
consisting of a well-defined set of steps for generation, storage and analysis of data.
The need for of development of high speed processing or computing of biological data was
felt primarily on the account of the huge volume of sequencing data that was being
generated. In a matter of 10 years, the cost of sequencing has dropped from nearly
US$5200.00 per megabase in September 2001 to currently at 0.09cents per megabase in
January 2012 (http://www.dnasequencing.org/history-of-dna).
From a few hundred megabases/year based on Sanger’s di-deoxy chain termination method
of sequencing, today we can generate close to 6 billion bp/ two weeks using one of the Next
Generation Sequencing machines (http://www.dnasequencing.org/history-of-dna;
http://www.illumina.com/systems/hiseq_comparison.ilmn), the need for even higher
performing computational tools are even greater!
Although bioinformatics is largely concerned with analysis of biological data using
computational tools, it may be added that it has rapidly emerged as a multidisciplinary
science that touches upon subject areas in all branches of science, including physical
sciences, chemical sciences, mathematics, artificial intelligence and so on.
Page 5
Introduction to Bioinformatics
Institute of Lifelong Learning, University of Delhi
0
Subject : Bioinformatics
Lesson : Introduction to Bioinformatics
Lesson Developer : Sandip Das
College/Department : Department of Botany, University of Delhi
Introduction to Bioinformatics
Institute of Lifelong Learning, University of Delhi
1
Table of Contents
Chapter: Introduction to Bioinformatics
? Introduction
? Why Bioinformatics
? Databases and tools
? Bioinformatics databases and tools for DNA analysis
? Annotation
? Similarity searching
? Molecular evolution
? Bioinformatics databases and tools for RNA analysis
? Gene Expression analysis
? RNA structure prediction
? Bioinformatics databases and tools for protein analysis
? Sequence analysis
? Structure prediction of proteins
? Bioinformatics based databases and tools for whole genome
analysis:
? Comparative genomics and genome structure
? Interactome and Biological Network tool
? Applications
? Summary
? Exercise/ Practice
? Glossary
? References
Introduction to Bioinformatics
Institute of Lifelong Learning, University of Delhi
2
Introduction
Biological sciences, traditionally, was involved primarily with the observation and descriptive
study of organisms. This approach, over a period of time, gave rise to several subject areas
that amassed large amounts of factual information on morphology, inheritance, anatomy,
taxonomy, life cycle, physiology, ecological and environmental relationships and infectivity.
Over a period of time, the scientific community became curious to know the “basis” for
these characteristic features of living organisms and variations that exist among them. This
shift in scientific paradigm prompted an in-depth understanding of the “molecular” basis of
life forms. Beginning from identification of the genetic material (nucleic acids) to sequencing
the entire genomes of several organisms, biology has now been substantially re-defined. In
this endeavor, biologists were benefited immensely by inputs from physical, chemical and
mathematical sciences. The study of biological systems with a “Why is it so?” approach gave
birth to several new areas of research viz., molecular genetics, genomics, proteomics,
recombinant DNA technology, transgenic technology, etc. Extensive work in these areas on
different biological systems led to the generation of large volumes of data on linkage maps,
genomes, transcriptomes, proteomes and molecular structures, analysis of which became
impossible using manual approaches. Use of computational power to analyze biological data
was increasingly felt to be an unavoidable option leading to the birth of a new science called
“Bioinformatics”.
Why Bioinformatics
Imagine yourself trying to solve a complex mathematical calculation or trying to find a
pattern in a jumbled up string of alphabets or numbers all by yourselves without the aid of
any computational devices such as calculators or computers. Not only can such a task
become extremely time consuming but may even turn out to be “unsolvable”. However, if
you are to have a calculator or a computer for your help, the given task may be performed
in a much shorter duration of time. Of course, you need to know how to operate the
calculator/computer and the sequence of commands to be given to the machine! In an
analogous scenario, understanding the meaning of just four letter of life, namely Adenine,
Guanine, Cytosine and Thymidine (Uracil) as building block of life and storehouse of
information can prove daunting, unless we are able to decipher the hidden meaning for the
Introduction to Bioinformatics
Institute of Lifelong Learning, University of Delhi
3
maintenance and functionality of the genome. A, G, C and T/U represent just one level of
information content, and as we are familiar with the central dogma of Life, serves as the
blueprint with message being conveyed from genetic material (DNA/RNA) to messenger
RNA and eventually to proteins. Therefore at the minimum level, four nucleotides and
twenty amino acids hold the entire key to life (we are not even discussing about the
enormous variety of metabolites, biomolecules and other compounds that play a major role
in functioning of Life).
Bioinformatics, therefore, attempts to unravel the genome information and can be
understood to be comprising of two components:
Biology (bio) + Information Technology (informatics) = Computational Biology
It can be summarized as the use of information technology to generate, acquire, manage
and analysis data related to biological sciences.
Computer and internet have played a major role and may be taken as the backbone on
which the entire field of bioinformatics is flourishing.
Algorithms or computers programs are specialized programs/softwares written by specialists
consisting of a well-defined set of steps for generation, storage and analysis of data.
The need for of development of high speed processing or computing of biological data was
felt primarily on the account of the huge volume of sequencing data that was being
generated. In a matter of 10 years, the cost of sequencing has dropped from nearly
US$5200.00 per megabase in September 2001 to currently at 0.09cents per megabase in
January 2012 (http://www.dnasequencing.org/history-of-dna).
From a few hundred megabases/year based on Sanger’s di-deoxy chain termination method
of sequencing, today we can generate close to 6 billion bp/ two weeks using one of the Next
Generation Sequencing machines (http://www.dnasequencing.org/history-of-dna;
http://www.illumina.com/systems/hiseq_comparison.ilmn), the need for even higher
performing computational tools are even greater!
Although bioinformatics is largely concerned with analysis of biological data using
computational tools, it may be added that it has rapidly emerged as a multidisciplinary
science that touches upon subject areas in all branches of science, including physical
sciences, chemical sciences, mathematics, artificial intelligence and so on.
Introduction to Bioinformatics
Institute of Lifelong Learning, University of Delhi
4
Today, bioinformatics can be applied to analysis of a variety of data and some of these are
as given below:
? DNA sequence:
o Annotation
o Analysis such as
? Similarity search
? functional information,
? evolution,
? polymorphism,
? RNA level:
o Expression analysis using
? Microarray
? RNA sequencing
o Structure prediction
? Protein level:
o Domain and motif analysis
o Structure determination
o Evolution
o Functional role
? Whole genome/cell/tissue/organism level:
o Genome structure and comparative genomics
o Interactome analysis
o Metabolic pathways
? Drug design
Read More