Page 1
Databases in Bioinformatics
Institute of Lifelong Learning, University of Delhi
0
Subject : Bioinformatic
Lesson : Databases in Bioinformatics
Lesson Developer : Arun Jagannath
College/ Department : Department of Botany,
University of Delhi
Page 2
Databases in Bioinformatics
Institute of Lifelong Learning, University of Delhi
0
Subject : Bioinformatic
Lesson : Databases in Bioinformatics
Lesson Developer : Arun Jagannath
College/ Department : Department of Botany,
University of Delhi
Databases in Bioinformatics
Institute of Lifelong Learning, University of Delhi
1
Table of Contents
Chapter: Databases in Bioinformatics
? Introduction
? Biological databases
? Classification of databases
o Type of data/information
o Source of data/information
? Biological database retrieval systems – Case studies
o Identification and classification of databases
o Retrieval of nucleotide sequences
o Bibliographic databases
o Whole genome sequence databases
o Organism-specific databases
o Gene expression databases
o Protein databases
? Summary
? Exercises
? Glossary
? References
Page 3
Databases in Bioinformatics
Institute of Lifelong Learning, University of Delhi
0
Subject : Bioinformatic
Lesson : Databases in Bioinformatics
Lesson Developer : Arun Jagannath
College/ Department : Department of Botany,
University of Delhi
Databases in Bioinformatics
Institute of Lifelong Learning, University of Delhi
1
Table of Contents
Chapter: Databases in Bioinformatics
? Introduction
? Biological databases
? Classification of databases
o Type of data/information
o Source of data/information
? Biological database retrieval systems – Case studies
o Identification and classification of databases
o Retrieval of nucleotide sequences
o Bibliographic databases
o Whole genome sequence databases
o Organism-specific databases
o Gene expression databases
o Protein databases
? Summary
? Exercises
? Glossary
? References
Databases in Bioinformatics
Institute of Lifelong Learning, University of Delhi
2
Introduction
Living organisms have been subjected to innumerable studies at various levels
viz., structure (morphology, anatomy), function (physiology, biochemistry),
inheritance (genetics), evolution, taxonomy, etc. to name a few. Over the last
few decades, scientists have also attempted to unravel the molecular basis of
processes that are integral to organism biology and diversity. These studies were
initially focused on relatively less complex organisms that came to be referred to
as Model Organisms or Model Systems. Such organisms belonged to a wide range
of life forms ranging from viruses and bacteria to higher plants and animals.
Notable examples include Drosophila, C. elegans, Arabidopsis, mice, yeast and
more recently Oryza sativa, Medicago, Lotus, etc. Molecular genetic studies on
many of these life forms led to the development of markers and linkage maps,
which in turn, facilitated whole genome-sequencing programs to extract the
encoded information (genome sequence) that supports life. Subsequent analysis
of gene function based on expression profiling (transcriptome studies) and
mutant analysis (functional genomics) contributed further to our understanding of
biological systems. Rapid developments in sequencing chemistry ushered in an
era of high-throughput genome and transcriptome sequencing, which led to a
virtual explosion of biological data across the world transgressing the limits of
“model systems” for biological studies. Seminal developments in Bioinformatics
centered mainly on the development of Databases, which functioned as electronic
filing cabinets for the organization and analysis of large amounts of biological
data that were generated from such studies.
Biological Databases
Biological databases serve a critical purpose in the collation and organization of
data related to biological systems. They provide computational support and a
user-friendly interface to a researcher for meaningful analysis of biological data
viz., gene and protein sequences, molecular structures, etc. Computational tools
and techniques have also been successfully used for simulation studies on
biological macromolecules, their structures and interactions, molecular modeling
and drug design accumulating significant amount of data in these interdisciplinary
areas which would be dealt with separately in later units of this paper.
Page 4
Databases in Bioinformatics
Institute of Lifelong Learning, University of Delhi
0
Subject : Bioinformatic
Lesson : Databases in Bioinformatics
Lesson Developer : Arun Jagannath
College/ Department : Department of Botany,
University of Delhi
Databases in Bioinformatics
Institute of Lifelong Learning, University of Delhi
1
Table of Contents
Chapter: Databases in Bioinformatics
? Introduction
? Biological databases
? Classification of databases
o Type of data/information
o Source of data/information
? Biological database retrieval systems – Case studies
o Identification and classification of databases
o Retrieval of nucleotide sequences
o Bibliographic databases
o Whole genome sequence databases
o Organism-specific databases
o Gene expression databases
o Protein databases
? Summary
? Exercises
? Glossary
? References
Databases in Bioinformatics
Institute of Lifelong Learning, University of Delhi
2
Introduction
Living organisms have been subjected to innumerable studies at various levels
viz., structure (morphology, anatomy), function (physiology, biochemistry),
inheritance (genetics), evolution, taxonomy, etc. to name a few. Over the last
few decades, scientists have also attempted to unravel the molecular basis of
processes that are integral to organism biology and diversity. These studies were
initially focused on relatively less complex organisms that came to be referred to
as Model Organisms or Model Systems. Such organisms belonged to a wide range
of life forms ranging from viruses and bacteria to higher plants and animals.
Notable examples include Drosophila, C. elegans, Arabidopsis, mice, yeast and
more recently Oryza sativa, Medicago, Lotus, etc. Molecular genetic studies on
many of these life forms led to the development of markers and linkage maps,
which in turn, facilitated whole genome-sequencing programs to extract the
encoded information (genome sequence) that supports life. Subsequent analysis
of gene function based on expression profiling (transcriptome studies) and
mutant analysis (functional genomics) contributed further to our understanding of
biological systems. Rapid developments in sequencing chemistry ushered in an
era of high-throughput genome and transcriptome sequencing, which led to a
virtual explosion of biological data across the world transgressing the limits of
“model systems” for biological studies. Seminal developments in Bioinformatics
centered mainly on the development of Databases, which functioned as electronic
filing cabinets for the organization and analysis of large amounts of biological
data that were generated from such studies.
Biological Databases
Biological databases serve a critical purpose in the collation and organization of
data related to biological systems. They provide computational support and a
user-friendly interface to a researcher for meaningful analysis of biological data
viz., gene and protein sequences, molecular structures, etc. Computational tools
and techniques have also been successfully used for simulation studies on
biological macromolecules, their structures and interactions, molecular modeling
and drug design accumulating significant amount of data in these interdisciplinary
areas which would be dealt with separately in later units of this paper.
Databases in Bioinformatics
Institute of Lifelong Learning, University of Delhi
3
This lesson would provide a brief overview of different types/categories of
databases. It would however, avoid detailed descriptions that can be accessed
from several standard Bioinformatics textbooks or from the home pages of
various databases. A few practice exercises for access and retrieval of information
are provided at the end of the lesson. Some of these exercises would be
supported with step-by-step instructions for the benefit of beginners while others
are to be completed by students on their own.
Questions:
How would I know whether a database relevant to my interest/study exists or
not?
How can I be assured of the authenticity of the information available in any
database?
Answer:
The journal, Nucleic Acids Research (NAR), publishes in its January issue every
year, a comprehensive compilation of all peer-reviewed databases and online
tools. These issues can be accessed at http://nar.oxfordjournals.org/. The peer
review process ensures that the published literature and its contents are
accurate.
Classification of Biological Databases
As mentioned earlier, the quantum of biological information available and its rate
of increase have necessitated the creation of databases to collect and organize
the data in a meaningful form. In order to maintain quality, improve accessibility
of information and reduce redundancy, databases have been classified into
different types.
NOTE:
The mode of database classification might vary in published literature. It is more
important for a student/researcher to identify the information that he/she is
searching for and attempt to access it from a relevant database rather than dwell
upon its hierarchy.
Two main approaches have been used to classify databases:
Page 5
Databases in Bioinformatics
Institute of Lifelong Learning, University of Delhi
0
Subject : Bioinformatic
Lesson : Databases in Bioinformatics
Lesson Developer : Arun Jagannath
College/ Department : Department of Botany,
University of Delhi
Databases in Bioinformatics
Institute of Lifelong Learning, University of Delhi
1
Table of Contents
Chapter: Databases in Bioinformatics
? Introduction
? Biological databases
? Classification of databases
o Type of data/information
o Source of data/information
? Biological database retrieval systems – Case studies
o Identification and classification of databases
o Retrieval of nucleotide sequences
o Bibliographic databases
o Whole genome sequence databases
o Organism-specific databases
o Gene expression databases
o Protein databases
? Summary
? Exercises
? Glossary
? References
Databases in Bioinformatics
Institute of Lifelong Learning, University of Delhi
2
Introduction
Living organisms have been subjected to innumerable studies at various levels
viz., structure (morphology, anatomy), function (physiology, biochemistry),
inheritance (genetics), evolution, taxonomy, etc. to name a few. Over the last
few decades, scientists have also attempted to unravel the molecular basis of
processes that are integral to organism biology and diversity. These studies were
initially focused on relatively less complex organisms that came to be referred to
as Model Organisms or Model Systems. Such organisms belonged to a wide range
of life forms ranging from viruses and bacteria to higher plants and animals.
Notable examples include Drosophila, C. elegans, Arabidopsis, mice, yeast and
more recently Oryza sativa, Medicago, Lotus, etc. Molecular genetic studies on
many of these life forms led to the development of markers and linkage maps,
which in turn, facilitated whole genome-sequencing programs to extract the
encoded information (genome sequence) that supports life. Subsequent analysis
of gene function based on expression profiling (transcriptome studies) and
mutant analysis (functional genomics) contributed further to our understanding of
biological systems. Rapid developments in sequencing chemistry ushered in an
era of high-throughput genome and transcriptome sequencing, which led to a
virtual explosion of biological data across the world transgressing the limits of
“model systems” for biological studies. Seminal developments in Bioinformatics
centered mainly on the development of Databases, which functioned as electronic
filing cabinets for the organization and analysis of large amounts of biological
data that were generated from such studies.
Biological Databases
Biological databases serve a critical purpose in the collation and organization of
data related to biological systems. They provide computational support and a
user-friendly interface to a researcher for meaningful analysis of biological data
viz., gene and protein sequences, molecular structures, etc. Computational tools
and techniques have also been successfully used for simulation studies on
biological macromolecules, their structures and interactions, molecular modeling
and drug design accumulating significant amount of data in these interdisciplinary
areas which would be dealt with separately in later units of this paper.
Databases in Bioinformatics
Institute of Lifelong Learning, University of Delhi
3
This lesson would provide a brief overview of different types/categories of
databases. It would however, avoid detailed descriptions that can be accessed
from several standard Bioinformatics textbooks or from the home pages of
various databases. A few practice exercises for access and retrieval of information
are provided at the end of the lesson. Some of these exercises would be
supported with step-by-step instructions for the benefit of beginners while others
are to be completed by students on their own.
Questions:
How would I know whether a database relevant to my interest/study exists or
not?
How can I be assured of the authenticity of the information available in any
database?
Answer:
The journal, Nucleic Acids Research (NAR), publishes in its January issue every
year, a comprehensive compilation of all peer-reviewed databases and online
tools. These issues can be accessed at http://nar.oxfordjournals.org/. The peer
review process ensures that the published literature and its contents are
accurate.
Classification of Biological Databases
As mentioned earlier, the quantum of biological information available and its rate
of increase have necessitated the creation of databases to collect and organize
the data in a meaningful form. In order to maintain quality, improve accessibility
of information and reduce redundancy, databases have been classified into
different types.
NOTE:
The mode of database classification might vary in published literature. It is more
important for a student/researcher to identify the information that he/she is
searching for and attempt to access it from a relevant database rather than dwell
upon its hierarchy.
Two main approaches have been used to classify databases:
Databases in Bioinformatics
Institute of Lifelong Learning, University of Delhi
4
Type of data/information
In this mode of classification, databases are categorized based on the data type.
A few examples are listed below.
S. No. Type of data Example(s) Weblinks
1. Sequence of biomolecules
viz., DNA, RNA, proteins
GenBank, EMBL,
DDBJ, Swiss-Prot,
PIR
(i) www.ncbi.nlm.nih.gov/genbank/
(ii) https://www.ebi.ac.uk/embl/
(iii) www.ddbj.nig.ac.jp/
(iv)http://web.expasy.org/docs/swis
s-prot_guideline.html
(v) http://pir.georgetown.edu/
2. Bio-molecular structures PDB http://www.rcsb.org/pdb/home/hom
e.do
3. Bibliography/scientific
literature **
PubMed, Scopus
(Search engine)
(i) www.ncbi.nlm.nih.gov/pubmed
(ii) www.scopus.com
4. Patent databases USPTO www.uspto.gov/
5. Metabolic pathways /
molecular interactions
KEGG http://www.genome.jp/kegg/pathwa
y.htm
6. Gene expression profiles eFP Browser http://bar.utoronto.ca/efp/cgi-
bin/efpWeb.cgi
7. Genetic disorders OMIM www.ncbi.nlm.nih.gov/omim
8. Whole genome sequences Entrez\Genomes www.ncbi.nlm.nih.gov/sites/entrez?d
b=genome
9. Education Teaching tools –
Plant Cell
http://www.plantcell.org/site/teachi
ngtools/teaching.xhtml
**: Some of the bibliographic databases/search engines require a subscription to
access their contents. The Delhi University Library System has procured online
subscription for several national/international journals of repute and search
engines viz., Scopus that are relevant to different disciplines.
Question:
Is it necessary to remember the website addresses of databases?
Answer:
No. It would be easier to access a database based on its published reference or
by searching for its home page using search engines viz. Google.
Source of data/information
Read More