Page 1
NCBI: Data Submission
Institute of Life Long Learning 0
Subject: Bioinformatics
Lesson: NCBI: Data Submission
Lesson Developer: Sandip Das
College/Department: Department of Botany, University of Delhi
Page 2
NCBI: Data Submission
Institute of Life Long Learning 0
Subject: Bioinformatics
Lesson: NCBI: Data Submission
Lesson Developer: Sandip Das
College/Department: Department of Botany, University of Delhi
NCBI: Data Submission
Institute of Life Long Learning 1
Table of Contents
? Chapter : NCBI: Data Submission Introduction
o NCBI
? Roles and Goals of NCBI
? Resources at NCBI
o Data Submission at NCBI
? Submission of Sequence data
BankIt
Sequin
Barcode
Batch Submission
Genome Sequence data
? Submission of non-sequence data
GEO
dbGaP
PubChem
o Growth of NCBI
? Summary
? Exercises
? Glossary
? References
Page 3
NCBI: Data Submission
Institute of Life Long Learning 0
Subject: Bioinformatics
Lesson: NCBI: Data Submission
Lesson Developer: Sandip Das
College/Department: Department of Botany, University of Delhi
NCBI: Data Submission
Institute of Life Long Learning 1
Table of Contents
? Chapter : NCBI: Data Submission Introduction
o NCBI
? Roles and Goals of NCBI
? Resources at NCBI
o Data Submission at NCBI
? Submission of Sequence data
BankIt
Sequin
Barcode
Batch Submission
Genome Sequence data
? Submission of non-sequence data
GEO
dbGaP
PubChem
o Growth of NCBI
? Summary
? Exercises
? Glossary
? References
NCBI: Data Submission
Institute of Life Long Learning 2
Introduction
The National Center for Biotechnology Information was established in 1988 at the National
Institute of Health (NIH) as part of the National Library of Medicine (NLM) and is located at
Bethedsa, Maryland, USA. This association of NCBI with NIH and NLM is reflected in its web-
address (www.ncbi.nlm.nih.gov). NCBI was set up to collate information, create databases
and conduct research in the field of molecular biology especially for biomedical data, and
develop computational tools. Since then, the database and computational tools have
expanded to include diverse organisms including plants so as to encompass not only data
from biomedical field but also include agriculture, food and other plant derived resources.
NCBI has now emerged as the primary source of free public-access data encompassing a
wide range of disciplines ranging from literature, sequence information, expression profile
data, protein sequence and structure, chemical structure and bioassays, taxonomy; in
addition, NCBI has developed a variety of analysis tools that are available for free download
and use.
Figure: Webpage of NCBI
Source: http://www.ncbi.nlm.nih.gov
www.ncbi.nlm.nih.gov
Page 4
NCBI: Data Submission
Institute of Life Long Learning 0
Subject: Bioinformatics
Lesson: NCBI: Data Submission
Lesson Developer: Sandip Das
College/Department: Department of Botany, University of Delhi
NCBI: Data Submission
Institute of Life Long Learning 1
Table of Contents
? Chapter : NCBI: Data Submission Introduction
o NCBI
? Roles and Goals of NCBI
? Resources at NCBI
o Data Submission at NCBI
? Submission of Sequence data
BankIt
Sequin
Barcode
Batch Submission
Genome Sequence data
? Submission of non-sequence data
GEO
dbGaP
PubChem
o Growth of NCBI
? Summary
? Exercises
? Glossary
? References
NCBI: Data Submission
Institute of Life Long Learning 2
Introduction
The National Center for Biotechnology Information was established in 1988 at the National
Institute of Health (NIH) as part of the National Library of Medicine (NLM) and is located at
Bethedsa, Maryland, USA. This association of NCBI with NIH and NLM is reflected in its web-
address (www.ncbi.nlm.nih.gov). NCBI was set up to collate information, create databases
and conduct research in the field of molecular biology especially for biomedical data, and
develop computational tools. Since then, the database and computational tools have
expanded to include diverse organisms including plants so as to encompass not only data
from biomedical field but also include agriculture, food and other plant derived resources.
NCBI has now emerged as the primary source of free public-access data encompassing a
wide range of disciplines ranging from literature, sequence information, expression profile
data, protein sequence and structure, chemical structure and bioassays, taxonomy; in
addition, NCBI has developed a variety of analysis tools that are available for free download
and use.
Figure: Webpage of NCBI
Source: http://www.ncbi.nlm.nih.gov
www.ncbi.nlm.nih.gov
NCBI: Data Submission
Institute of Life Long Learning 3
Roles and Goals of NCBI
The diverse activities of NCBI can be broadly categorized into:
a. research at molecular level using mathematical and computational tools on
fundamental problems in biology
b. formulating uniform standards for generation and deposition of computational data,
nomenclature or annotation of biological material and information; and facilitating
exchange of such standards
c. developing and distributing databases and software
d. developing and maintaining collaborations with academia, industry and other
governmental agencies at national and international level through visitors program
e. fostering scientific communication through sponsoring and organizing meetings,
workshops and lectures
f. supporting training program on basic and applied aspects of computational biology
Resources at NCBI
The resources at NCBI are categorized into major groups and following are some of the
broad sets of various databases and tools developed, curated and hosted at NCBI:
Submissions:
Genbank: BankIt
Genbank: Barcode
Genbank: Sequin
GEO Web deposit
NIH Manuscript submission (NIHMS)
SNP submission
PUBChem Deposition gateway
BioProject Submission
Databases:
Literature (PubMed, PubMed Central; NCBI Bookshelf):
Entrez and Entrez Programming utilities:
DNA and RNA (Refseq, nucleotide, EST, GSS, WGS, PopSet, trace archive, SRA):
Proteins (Reference sequences, GenPept, UniProt/SwissProt, PRF, PDB, Protein
clusters, GEO, Structure, UniGene, CDD):
Page 5
NCBI: Data Submission
Institute of Life Long Learning 0
Subject: Bioinformatics
Lesson: NCBI: Data Submission
Lesson Developer: Sandip Das
College/Department: Department of Botany, University of Delhi
NCBI: Data Submission
Institute of Life Long Learning 1
Table of Contents
? Chapter : NCBI: Data Submission Introduction
o NCBI
? Roles and Goals of NCBI
? Resources at NCBI
o Data Submission at NCBI
? Submission of Sequence data
BankIt
Sequin
Barcode
Batch Submission
Genome Sequence data
? Submission of non-sequence data
GEO
dbGaP
PubChem
o Growth of NCBI
? Summary
? Exercises
? Glossary
? References
NCBI: Data Submission
Institute of Life Long Learning 2
Introduction
The National Center for Biotechnology Information was established in 1988 at the National
Institute of Health (NIH) as part of the National Library of Medicine (NLM) and is located at
Bethedsa, Maryland, USA. This association of NCBI with NIH and NLM is reflected in its web-
address (www.ncbi.nlm.nih.gov). NCBI was set up to collate information, create databases
and conduct research in the field of molecular biology especially for biomedical data, and
develop computational tools. Since then, the database and computational tools have
expanded to include diverse organisms including plants so as to encompass not only data
from biomedical field but also include agriculture, food and other plant derived resources.
NCBI has now emerged as the primary source of free public-access data encompassing a
wide range of disciplines ranging from literature, sequence information, expression profile
data, protein sequence and structure, chemical structure and bioassays, taxonomy; in
addition, NCBI has developed a variety of analysis tools that are available for free download
and use.
Figure: Webpage of NCBI
Source: http://www.ncbi.nlm.nih.gov
www.ncbi.nlm.nih.gov
NCBI: Data Submission
Institute of Life Long Learning 3
Roles and Goals of NCBI
The diverse activities of NCBI can be broadly categorized into:
a. research at molecular level using mathematical and computational tools on
fundamental problems in biology
b. formulating uniform standards for generation and deposition of computational data,
nomenclature or annotation of biological material and information; and facilitating
exchange of such standards
c. developing and distributing databases and software
d. developing and maintaining collaborations with academia, industry and other
governmental agencies at national and international level through visitors program
e. fostering scientific communication through sponsoring and organizing meetings,
workshops and lectures
f. supporting training program on basic and applied aspects of computational biology
Resources at NCBI
The resources at NCBI are categorized into major groups and following are some of the
broad sets of various databases and tools developed, curated and hosted at NCBI:
Submissions:
Genbank: BankIt
Genbank: Barcode
Genbank: Sequin
GEO Web deposit
NIH Manuscript submission (NIHMS)
SNP submission
PUBChem Deposition gateway
BioProject Submission
Databases:
Literature (PubMed, PubMed Central; NCBI Bookshelf):
Entrez and Entrez Programming utilities:
DNA and RNA (Refseq, nucleotide, EST, GSS, WGS, PopSet, trace archive, SRA):
Proteins (Reference sequences, GenPept, UniProt/SwissProt, PRF, PDB, Protein
clusters, GEO, Structure, UniGene, CDD):
NCBI: Data Submission
Institute of Life Long Learning 4
Genomes (Map Viewer, Genome workbench, Plant Genome Central, Genome
Reference Consortium, Epigenomics, Genomics Structural variation):
Maps:
Taxonomy:
PubChem Substance:
OMIM:
Tools:
Data mining:
Sequence analysis (Vector Screen, BLAST, CDART):
Electronic PCR (forward and Reverse)
GEO-BLAST:
Genetic codes:
ORF finder:
Splign
3-D structure viewer (Cn3D):
3-D structure and similarity searching:
1000 Genome Browser:
Others:
FTP downloads sites:
Collaborative cancer research:
Entrez is the single point database search and retrieval system that allows a user to
perform the search and retrieve action against “all” or a “specific” database in an
interlinked manner.
Data Submission at NCBI
NCBI relies on submission of accurately annotated and curated data submitted by the
research community. The data can be grouped into two major types - sequence and non-
sequence. The diverse types and categories of data hosted at NCBI require that these are
deposited into one of the many databases in an appropriate format with annotations. The
following section will introduce you to the several forms of biological data and the
submission gateways at NCBI.
Submission of sequence data:
Read More