Botany Exam  >  Botany Notes  >  Bioinformatics  >  Lecture 8 - Sequence Alignment

Lecture 8 - Sequence Alignment | Bioinformatics - Botany PDF Download

Download, print and study this document offline
Please wait while the PDF view is loading
 Page 1


Sequence Alignment 
Institute of Lifelong Learning, University of Delhi 
 
 
 
 
 
Subject: Bioinformatics 
Lesson: Sequence Alignment 
Lesson Developer: Sandip Das 
College/ Depatment : Department of Botany, University of Delhi 
 
Page 2


Sequence Alignment 
Institute of Lifelong Learning, University of Delhi 
 
 
 
 
 
Subject: Bioinformatics 
Lesson: Sequence Alignment 
Lesson Developer: Sandip Das 
College/ Depatment : Department of Botany, University of Delhi 
 
Sequence Alignment 
Institute of Lifelong Learning, University of Delhi 1 
 
Table of Contents       
 
Chapter: Sequence Alignment 
? Introduction  
? Principle of alignment 
? Matrices for alignment 
o DNA matrices 
o Protein Matrices 
? Multiple Sequence Alignment             
? Summary  
? Exercise/ Practice 
? Glossary 
? References/ Bibliography/ Further Reading 
 
 
 
 
 
 
 
 
Page 3


Sequence Alignment 
Institute of Lifelong Learning, University of Delhi 
 
 
 
 
 
Subject: Bioinformatics 
Lesson: Sequence Alignment 
Lesson Developer: Sandip Das 
College/ Depatment : Department of Botany, University of Delhi 
 
Sequence Alignment 
Institute of Lifelong Learning, University of Delhi 1 
 
Table of Contents       
 
Chapter: Sequence Alignment 
? Introduction  
? Principle of alignment 
? Matrices for alignment 
o DNA matrices 
o Protein Matrices 
? Multiple Sequence Alignment             
? Summary  
? Exercise/ Practice 
? Glossary 
? References/ Bibliography/ Further Reading 
 
 
 
 
 
 
 
 
Sequence Alignment 
Institute of Lifelong Learning, University of Delhi 2 
 
Introduction 
One of the central themes in bioinformatics is the concept of “similarity” and “relatedness” 
which in turn in based on evolutionary relationship or ancestry. We use such themes of 
“similarity/relatedness” in a variety of applications such as 
? Gene and genetic element finding 
? Molecular evolution or phylogeny 
? Comparative genomics 
? Structure prediction through homology modeling, and several others 
The principle on which all these are based is sequence similarity that can be deduced via 
Sequence Alignment. 
We often can deduce relationship among objects by identifying similar features or 
characters. Alignment also attempts to identify similarity between two or multiple sequences 
by applying a similar logic, except that several events (such as types, frequency and 
occurrence of mutation) that may have led to similarity or dissimilarity are also taken into 
account. 
Before we delve into the principles of sequence alignment, it may be useful to refresh some 
of the concepts of mutation and evolution and keep them in mind while understanding 
alignment. 
a. Mutations occur at the level of DNA 
b. Mutations can survive or are accepted if they are potentially non-harmful 
(selectively neutral) or confer some selective advantage to the organism and 
population. A mutation that is harmful, has a negative impact and may be lethal 
will be lost from the population 
c. Small mutations such as single-base changes include transitions and 
transversions, and insertion and deletion of bases 
d. Transitions are more frequently encountered than transversions 
e. Non-coding DNA can accumulate mutations or changes at a higher rate than 
coding regions (because of the subsequent consequences on the encoded 
proteins) 
f. Due to degeneracy of codons and Wobble bases, all mutations at DNA level do 
not have an impact at the protein level and are thus deemed to be silent. 
Page 4


Sequence Alignment 
Institute of Lifelong Learning, University of Delhi 
 
 
 
 
 
Subject: Bioinformatics 
Lesson: Sequence Alignment 
Lesson Developer: Sandip Das 
College/ Depatment : Department of Botany, University of Delhi 
 
Sequence Alignment 
Institute of Lifelong Learning, University of Delhi 1 
 
Table of Contents       
 
Chapter: Sequence Alignment 
? Introduction  
? Principle of alignment 
? Matrices for alignment 
o DNA matrices 
o Protein Matrices 
? Multiple Sequence Alignment             
? Summary  
? Exercise/ Practice 
? Glossary 
? References/ Bibliography/ Further Reading 
 
 
 
 
 
 
 
 
Sequence Alignment 
Institute of Lifelong Learning, University of Delhi 2 
 
Introduction 
One of the central themes in bioinformatics is the concept of “similarity” and “relatedness” 
which in turn in based on evolutionary relationship or ancestry. We use such themes of 
“similarity/relatedness” in a variety of applications such as 
? Gene and genetic element finding 
? Molecular evolution or phylogeny 
? Comparative genomics 
? Structure prediction through homology modeling, and several others 
The principle on which all these are based is sequence similarity that can be deduced via 
Sequence Alignment. 
We often can deduce relationship among objects by identifying similar features or 
characters. Alignment also attempts to identify similarity between two or multiple sequences 
by applying a similar logic, except that several events (such as types, frequency and 
occurrence of mutation) that may have led to similarity or dissimilarity are also taken into 
account. 
Before we delve into the principles of sequence alignment, it may be useful to refresh some 
of the concepts of mutation and evolution and keep them in mind while understanding 
alignment. 
a. Mutations occur at the level of DNA 
b. Mutations can survive or are accepted if they are potentially non-harmful 
(selectively neutral) or confer some selective advantage to the organism and 
population. A mutation that is harmful, has a negative impact and may be lethal 
will be lost from the population 
c. Small mutations such as single-base changes include transitions and 
transversions, and insertion and deletion of bases 
d. Transitions are more frequently encountered than transversions 
e. Non-coding DNA can accumulate mutations or changes at a higher rate than 
coding regions (because of the subsequent consequences on the encoded 
proteins) 
f. Due to degeneracy of codons and Wobble bases, all mutations at DNA level do 
not have an impact at the protein level and are thus deemed to be silent. 
Sequence Alignment 
Institute of Lifelong Learning, University of Delhi 3 
g. Any change in DNA sequence that does not alter protein sequence is termed as 
synonymous; and a change in DNA that leads to incorporation of an alternate 
amino acid is termed non-synonymous 
h. Within proteins, replacement rate of one amino acid with another is rarely 
observed within domains or functional units 
i. Amino acids belonging to similar chemical or physical properties are more likely 
to replace one another 
j. Rate of evolution among DNA is higher than proteins; or in other words, proteins 
are more conserved than DNA sequences 
 
As alignment aims to find matches between similar residues, concepts of evolutionary 
biology are widely used. DNA sequences that shared a last common ancestor upto 600 
million years ago and proteins that have diverged upto a billion years ago can be 
successfully aligned. 
 
Principle of Alignment:  
The course of evolution proceeds in small incremental stages i.e. instead of large scale 
disruptions that span entire genomes, evolution favours small variations spread throughout 
the genome. Of-course it is difficult to actually define the physical boundaries of what 
constitutes “large” or “small”! For the sake of simplicity, let us limit our definition of “small” 
to single base or amino acids, and “large” being several Kilobases or even Megabases in 
dimensions. As majority of the changes are small, it is possible for us to detect similar 
regions with the genome through alignment. We also presume that regions that share 
considerable levels of similarity as measured through alignment must have shared ancestry 
or have common evolutionary history. Such regions are termed as homologous sequences. 
Homology can be further sub-divided into orthology and paralogy which are shared 
evolutionary history either by speciation or through duplication. A note of caution: Two 
sequences can also share high similarity without sharing recent ancestry. Such sequences 
are termed as xenologs and are generally acquired through horizontal gene transfer.  
 
Page 5


Sequence Alignment 
Institute of Lifelong Learning, University of Delhi 
 
 
 
 
 
Subject: Bioinformatics 
Lesson: Sequence Alignment 
Lesson Developer: Sandip Das 
College/ Depatment : Department of Botany, University of Delhi 
 
Sequence Alignment 
Institute of Lifelong Learning, University of Delhi 1 
 
Table of Contents       
 
Chapter: Sequence Alignment 
? Introduction  
? Principle of alignment 
? Matrices for alignment 
o DNA matrices 
o Protein Matrices 
? Multiple Sequence Alignment             
? Summary  
? Exercise/ Practice 
? Glossary 
? References/ Bibliography/ Further Reading 
 
 
 
 
 
 
 
 
Sequence Alignment 
Institute of Lifelong Learning, University of Delhi 2 
 
Introduction 
One of the central themes in bioinformatics is the concept of “similarity” and “relatedness” 
which in turn in based on evolutionary relationship or ancestry. We use such themes of 
“similarity/relatedness” in a variety of applications such as 
? Gene and genetic element finding 
? Molecular evolution or phylogeny 
? Comparative genomics 
? Structure prediction through homology modeling, and several others 
The principle on which all these are based is sequence similarity that can be deduced via 
Sequence Alignment. 
We often can deduce relationship among objects by identifying similar features or 
characters. Alignment also attempts to identify similarity between two or multiple sequences 
by applying a similar logic, except that several events (such as types, frequency and 
occurrence of mutation) that may have led to similarity or dissimilarity are also taken into 
account. 
Before we delve into the principles of sequence alignment, it may be useful to refresh some 
of the concepts of mutation and evolution and keep them in mind while understanding 
alignment. 
a. Mutations occur at the level of DNA 
b. Mutations can survive or are accepted if they are potentially non-harmful 
(selectively neutral) or confer some selective advantage to the organism and 
population. A mutation that is harmful, has a negative impact and may be lethal 
will be lost from the population 
c. Small mutations such as single-base changes include transitions and 
transversions, and insertion and deletion of bases 
d. Transitions are more frequently encountered than transversions 
e. Non-coding DNA can accumulate mutations or changes at a higher rate than 
coding regions (because of the subsequent consequences on the encoded 
proteins) 
f. Due to degeneracy of codons and Wobble bases, all mutations at DNA level do 
not have an impact at the protein level and are thus deemed to be silent. 
Sequence Alignment 
Institute of Lifelong Learning, University of Delhi 3 
g. Any change in DNA sequence that does not alter protein sequence is termed as 
synonymous; and a change in DNA that leads to incorporation of an alternate 
amino acid is termed non-synonymous 
h. Within proteins, replacement rate of one amino acid with another is rarely 
observed within domains or functional units 
i. Amino acids belonging to similar chemical or physical properties are more likely 
to replace one another 
j. Rate of evolution among DNA is higher than proteins; or in other words, proteins 
are more conserved than DNA sequences 
 
As alignment aims to find matches between similar residues, concepts of evolutionary 
biology are widely used. DNA sequences that shared a last common ancestor upto 600 
million years ago and proteins that have diverged upto a billion years ago can be 
successfully aligned. 
 
Principle of Alignment:  
The course of evolution proceeds in small incremental stages i.e. instead of large scale 
disruptions that span entire genomes, evolution favours small variations spread throughout 
the genome. Of-course it is difficult to actually define the physical boundaries of what 
constitutes “large” or “small”! For the sake of simplicity, let us limit our definition of “small” 
to single base or amino acids, and “large” being several Kilobases or even Megabases in 
dimensions. As majority of the changes are small, it is possible for us to detect similar 
regions with the genome through alignment. We also presume that regions that share 
considerable levels of similarity as measured through alignment must have shared ancestry 
or have common evolutionary history. Such regions are termed as homologous sequences. 
Homology can be further sub-divided into orthology and paralogy which are shared 
evolutionary history either by speciation or through duplication. A note of caution: Two 
sequences can also share high similarity without sharing recent ancestry. Such sequences 
are termed as xenologs and are generally acquired through horizontal gene transfer.  
 
Sequence Alignment 
Institute of Lifelong Learning, University of Delhi 4 
 
Figure: Homologs: Orthologs and Paralogs 
Source: Dr Sandeep Das  
An alignment attempts to create a matrix of rows and columns where each row denotes a 
sequence and each column is occupied by similar characters derived from each sequences 
or a gap. Pairwise alignment attempts to align two sequence at-a-time, whereas 
multiple sequence alignment (MSA) attempts to align more than two sequences. If 
there are several sequences are derived from organisms having a common shared ancestry 
or evolutionary history, we expect that these sequences will exhibit similarity but will not be 
exactly identical i.e. we expect to find similar characters or residues and also some 
differences. The differences or dissimilarities encountered are a result of mutational events; 
more the time since common ancestry, more the number or accumulated mutation and 
therefore more the number of dissimilar residues. The number of changes is therefore 
directly proportional to evolutionary time. 
Therefore alignment tools will try to generate the matrix such that there are more identical 
and/or similar residues.  It may be worthwhile to point out in case a mutational event or 
events lead to deletion of the nucleotides, “gaps” are introduced while performing the 
alignment to “mimic” the event and “achieve” an alignment with maximal identity. 
Therefore sequence alignment is a combination of correctly identifying and placing similar 
and dissimilar residues in columns. 
Read More
16 docs

FAQs on Lecture 8 - Sequence Alignment - Bioinformatics - Botany

1. What is sequence alignment in botany?
Ans. Sequence alignment in botany refers to the process of comparing and matching the genetic sequences of different plant species. It helps identify similarities and differences in the DNA or protein sequences, providing insights into evolutionary relationships and genetic variations among plants.
2. Why is sequence alignment important in botany research?
Ans. Sequence alignment plays a crucial role in botany research as it allows scientists to study the genetic makeup of plants. By aligning sequences, researchers can identify conserved regions, detect mutations, and infer evolutionary relationships among different plant species. This information is valuable for understanding plant diversity, developing new crop varieties, and studying plant adaptation to environmental changes.
3. What are the methods used for sequence alignment in botany?
Ans. In botany, various methods are used for sequence alignment, including pairwise alignment and multiple sequence alignment. Pairwise alignment compares two sequences at a time, while multiple sequence alignment involves aligning three or more sequences. Commonly used algorithms for sequence alignment in botany include Needleman-Wunsch, Smith-Waterman, and ClustalW.
4. How does sequence alignment help in identifying plant species?
Ans. Sequence alignment aids in identifying plant species by comparing their genetic sequences with known DNA or protein sequences in databases. By aligning the sequences, researchers can determine the degree of similarity between the unknown plant sequence and the reference sequences. This similarity analysis can provide valuable information about the taxonomic classification and evolutionary relationships of the plant species.
5. Can sequence alignment be used to study plant diseases?
Ans. Yes, sequence alignment is widely used in studying plant diseases. By aligning the genetic sequences of diseased plants with healthy plants or known pathogenic sequences, researchers can identify specific genetic variations or mutations associated with the disease. This information helps in understanding the molecular basis of plant diseases, developing diagnostic tools, and designing strategies for disease management in agriculture.
16 docs
Download as PDF
Explore Courses for Botany exam
Signup for Free!
Signup to see your scores go up within 7 days! Learn & Practice with 1000+ FREE Notes, Videos & Tests.
10M+ students study on EduRev
Related Searches

MCQs

,

Objective type Questions

,

Semester Notes

,

Extra Questions

,

Important questions

,

Lecture 8 - Sequence Alignment | Bioinformatics - Botany

,

ppt

,

past year papers

,

mock tests for examination

,

practice quizzes

,

Lecture 8 - Sequence Alignment | Bioinformatics - Botany

,

shortcuts and tricks

,

Sample Paper

,

Viva Questions

,

study material

,

Previous Year Questions with Solutions

,

Free

,

Summary

,

pdf

,

Lecture 8 - Sequence Alignment | Bioinformatics - Botany

,

Exam

,

video lectures

;