Page 1
Biological Sequence Databases Protein Information Resource (PIR)
Institute of Lifelong Learning, University of Delhi
Subject: Bioinformatics
Lesson: Biological Sequence Databases Protein Information
Resource (PIR)
Lesson Developer: Suman Sharma
College/ Department: Department of Botany, Ramjas College,
University of Delhi
Page 2
Biological Sequence Databases Protein Information Resource (PIR)
Institute of Lifelong Learning, University of Delhi
Subject: Bioinformatics
Lesson: Biological Sequence Databases Protein Information
Resource (PIR)
Lesson Developer: Suman Sharma
College/ Department: Department of Botany, Ramjas College,
University of Delhi
Protein Information Resource
Institute of Lifelong Learning, University of Delhi 1
Table of Contents
Chapter 1: Protein Information Resource (PIR)
? Introduction
? Features of PIR
o Classification
o Non Redundancy
o Standardized Annotation
o Cross Reference
o Comprehensiveness
o Regular releases with free accessibility
o Retrieval of information from the site
? Database organization and annotation
o PIR – international sequences and auxillary database
? PIR Resources
o Data Retrieval system
o Databases in PIR
? Summary
? Exercises
? Glossary
? Suggested Reading
Page 3
Biological Sequence Databases Protein Information Resource (PIR)
Institute of Lifelong Learning, University of Delhi
Subject: Bioinformatics
Lesson: Biological Sequence Databases Protein Information
Resource (PIR)
Lesson Developer: Suman Sharma
College/ Department: Department of Botany, Ramjas College,
University of Delhi
Protein Information Resource
Institute of Lifelong Learning, University of Delhi 1
Table of Contents
Chapter 1: Protein Information Resource (PIR)
? Introduction
? Features of PIR
o Classification
o Non Redundancy
o Standardized Annotation
o Cross Reference
o Comprehensiveness
o Regular releases with free accessibility
o Retrieval of information from the site
? Database organization and annotation
o PIR – international sequences and auxillary database
? PIR Resources
o Data Retrieval system
o Databases in PIR
? Summary
? Exercises
? Glossary
? Suggested Reading
Protein Information Resource
Institute of Lifelong Learning, University of Delhi 2
Introduction
The rapid increase in number of genome sequencing projects has generated enormous
amount of molecular data. In order to fully understand this huge genome base data,
computational tools are required which can help in identification of structure, function and
biologically relevant features in the sequences. In order to serve this purpose Protein
Information Resource (PIR) was established to generate tools and resources for data
storage and analysis of protein sequence for scientific community.
In year 1984, National Biomedical Research Foundation (NBRF) developed PIR (Protein
Information Resource) for identification and interpretation of information on protein
sequences (http://www.nbrf.georgetown.edu/pir/find.html). This database was actually
derived from ‘Atlas of Protein Sequence and Structure’, which was developed by Margaret O.
Dayhoff in the year 1964. Four years later in 1988, PIR along with NBRF, Munich
Information Centre for Protein Sequences (MIPS) and the Japan International Protein
Information Database (JIPID), developed an organization referred as PIR – international
with four main aims:
(1) to create an organized, non redundant, comprehensive protein database to study
structural, functional and evolutionary relationships
(2) to generate information on biological origin of protein sequences
(3) to make database easily accessible in public domain
(4) to enable cross reference with other databases for presenting structural information of
biomolecules.
The Protein Information Resource (PIR) is one of the most well established databases for
annotated protein sequences in public domain. The expanded PIR website allows not only
sequence similarity search but also other features like text based search for protein
sequences and cross talk with auxillary databases, annotation – sorted search, domain
search, combined global and domain search and interactive text searches.
Page 4
Biological Sequence Databases Protein Information Resource (PIR)
Institute of Lifelong Learning, University of Delhi
Subject: Bioinformatics
Lesson: Biological Sequence Databases Protein Information
Resource (PIR)
Lesson Developer: Suman Sharma
College/ Department: Department of Botany, Ramjas College,
University of Delhi
Protein Information Resource
Institute of Lifelong Learning, University of Delhi 1
Table of Contents
Chapter 1: Protein Information Resource (PIR)
? Introduction
? Features of PIR
o Classification
o Non Redundancy
o Standardized Annotation
o Cross Reference
o Comprehensiveness
o Regular releases with free accessibility
o Retrieval of information from the site
? Database organization and annotation
o PIR – international sequences and auxillary database
? PIR Resources
o Data Retrieval system
o Databases in PIR
? Summary
? Exercises
? Glossary
? Suggested Reading
Protein Information Resource
Institute of Lifelong Learning, University of Delhi 2
Introduction
The rapid increase in number of genome sequencing projects has generated enormous
amount of molecular data. In order to fully understand this huge genome base data,
computational tools are required which can help in identification of structure, function and
biologically relevant features in the sequences. In order to serve this purpose Protein
Information Resource (PIR) was established to generate tools and resources for data
storage and analysis of protein sequence for scientific community.
In year 1984, National Biomedical Research Foundation (NBRF) developed PIR (Protein
Information Resource) for identification and interpretation of information on protein
sequences (http://www.nbrf.georgetown.edu/pir/find.html). This database was actually
derived from ‘Atlas of Protein Sequence and Structure’, which was developed by Margaret O.
Dayhoff in the year 1964. Four years later in 1988, PIR along with NBRF, Munich
Information Centre for Protein Sequences (MIPS) and the Japan International Protein
Information Database (JIPID), developed an organization referred as PIR – international
with four main aims:
(1) to create an organized, non redundant, comprehensive protein database to study
structural, functional and evolutionary relationships
(2) to generate information on biological origin of protein sequences
(3) to make database easily accessible in public domain
(4) to enable cross reference with other databases for presenting structural information of
biomolecules.
The Protein Information Resource (PIR) is one of the most well established databases for
annotated protein sequences in public domain. The expanded PIR website allows not only
sequence similarity search but also other features like text based search for protein
sequences and cross talk with auxillary databases, annotation – sorted search, domain
search, combined global and domain search and interactive text searches.
Protein Information Resource
Institute of Lifelong Learning, University of Delhi 3
Figure: PIR Homepage
Source: http://pir.georgetown.edu/pirwww/
Features of PIR Database
Classification :. In PIR on the basis of similarity, sequences are classified into families,
superfamilies and homology domains. These families are organized and aligned so that
database can be searched easily by the name of the gene family.
Cross Reference: In PIR all entries are cross-referred to reference and molecular
databases like Medline, Genbank, EMBL, DDBJ, Protein Data Bank, Human Genome
Database etc so that information retrieval can be optimized. Cross referenced database
entries are represented in form of Hypertext-links.
Non-Redundancy: PIR is a non-redundant database; sequences from a species with
very high identity and similarity value are merged as single entry. Even on merging identity
of independently reported sequence is not lost and can be discretely observed from the
canonical sequence so that the reported sequence can be reconstructed on PIR site.
Page 5
Biological Sequence Databases Protein Information Resource (PIR)
Institute of Lifelong Learning, University of Delhi
Subject: Bioinformatics
Lesson: Biological Sequence Databases Protein Information
Resource (PIR)
Lesson Developer: Suman Sharma
College/ Department: Department of Botany, Ramjas College,
University of Delhi
Protein Information Resource
Institute of Lifelong Learning, University of Delhi 1
Table of Contents
Chapter 1: Protein Information Resource (PIR)
? Introduction
? Features of PIR
o Classification
o Non Redundancy
o Standardized Annotation
o Cross Reference
o Comprehensiveness
o Regular releases with free accessibility
o Retrieval of information from the site
? Database organization and annotation
o PIR – international sequences and auxillary database
? PIR Resources
o Data Retrieval system
o Databases in PIR
? Summary
? Exercises
? Glossary
? Suggested Reading
Protein Information Resource
Institute of Lifelong Learning, University of Delhi 2
Introduction
The rapid increase in number of genome sequencing projects has generated enormous
amount of molecular data. In order to fully understand this huge genome base data,
computational tools are required which can help in identification of structure, function and
biologically relevant features in the sequences. In order to serve this purpose Protein
Information Resource (PIR) was established to generate tools and resources for data
storage and analysis of protein sequence for scientific community.
In year 1984, National Biomedical Research Foundation (NBRF) developed PIR (Protein
Information Resource) for identification and interpretation of information on protein
sequences (http://www.nbrf.georgetown.edu/pir/find.html). This database was actually
derived from ‘Atlas of Protein Sequence and Structure’, which was developed by Margaret O.
Dayhoff in the year 1964. Four years later in 1988, PIR along with NBRF, Munich
Information Centre for Protein Sequences (MIPS) and the Japan International Protein
Information Database (JIPID), developed an organization referred as PIR – international
with four main aims:
(1) to create an organized, non redundant, comprehensive protein database to study
structural, functional and evolutionary relationships
(2) to generate information on biological origin of protein sequences
(3) to make database easily accessible in public domain
(4) to enable cross reference with other databases for presenting structural information of
biomolecules.
The Protein Information Resource (PIR) is one of the most well established databases for
annotated protein sequences in public domain. The expanded PIR website allows not only
sequence similarity search but also other features like text based search for protein
sequences and cross talk with auxillary databases, annotation – sorted search, domain
search, combined global and domain search and interactive text searches.
Protein Information Resource
Institute of Lifelong Learning, University of Delhi 3
Figure: PIR Homepage
Source: http://pir.georgetown.edu/pirwww/
Features of PIR Database
Classification :. In PIR on the basis of similarity, sequences are classified into families,
superfamilies and homology domains. These families are organized and aligned so that
database can be searched easily by the name of the gene family.
Cross Reference: In PIR all entries are cross-referred to reference and molecular
databases like Medline, Genbank, EMBL, DDBJ, Protein Data Bank, Human Genome
Database etc so that information retrieval can be optimized. Cross referenced database
entries are represented in form of Hypertext-links.
Non-Redundancy: PIR is a non-redundant database; sequences from a species with
very high identity and similarity value are merged as single entry. Even on merging identity
of independently reported sequence is not lost and can be discretely observed from the
canonical sequence so that the reported sequence can be reconstructed on PIR site.
Protein Information Resource
Institute of Lifelong Learning, University of Delhi 4
Annotation standardized: Unlike other databases original submission entries are
annotated at PIR. All entries have complete citations, which includes article titles, genetic
information, mapped genes, position of introns. For high consistency and accuracy
conserved and standardized terminologies and annotations are provided in the database
Comprehensiveness: PIR along with other databases, which are maintained by it,
presents the most comprehensive repository of protein sequences.
Regular releases and free accessibility: the database is updated and released
quarterly. Weekly updates can also be searched on PIR website. Unlike other database,
sequences in PIR can be accessed in public domain as soon as they are received by the
resource.
Retrieval of information from the site: retrieval of data and knowledge is
supported by various options like superfamilies, features, authors, keywords, and sequence
similarity. Multiple sequence alignments and family classification supported by hypertext
links, facilitates fast retrieval of information on related sequences either in PIR or in other
molecular databases.
Table : PIR web site URLs
Tools URLs
PIR Home page http://www.nbrf.georgetown.edu/pir/
MIPS Home Page http://www.mips.biochem.mpg.de/
Text Search http://www.nbrf.georgetown.edu/pir/find.html
Sequence Scan http://www.nbrf.georgetown.edu/nbrf/scan.html
Sequence
Search
http://www.nbrf.georgetown.edu/nbrf/search.html
Complete
Genome
http://www.nbrf.georgetown.edu/pir/genome.html
PIR Alignment
search
http://www.nbrf.georgetown.edu/nbrf/getaln.html
Read More