Short Notes: Parsing | Short Notes for Computer Science Engineering - Computer Science Engineering (CSE) PDF Download

Download, print and study this document offline
Please wait while the PDF view is loading
 Page 1


Syntax Analyzer (Parser)
Syntax analyzer creates the syntactic structure of the given source program. This 
syntactic structure is mostly a parse tree. The syntax of a programming is 
described by a Context-Free Grammar (CFG). We will use BNF (Backus-Naur Form) 
notation in the description of CFGs.
The syntax analyzer (parser) checks whether a given source program satisfies the 
rules implied by a context-free grammar or not. If it satisfies, the parser creates the 
parse tree of that program. Otherwise the parser gives the error messages.
What syntax analysis cannot do!
• To check whether variables are of types on which operations are allowed
• To check whether a variable has been declared before use
• To check whether a variable has been initialized
• These issues will be handled in semantic analysis
We categorise the parser into two groups
1. Top-down parser (starts from the root).
2. Bottom-up parser (starts from the leaf).
• Both top-down and bottom-up parsers scan the input from left-to-right (one 
symbol at a time).
• Efficient top-down and bottom-up parsers can be implemented only for 
subclasses of context-free grammars.
1. LL for top-down parsing
2. LR for bottom-up parsing)
Example: Consider the following grammar
Page 2


Syntax Analyzer (Parser)
Syntax analyzer creates the syntactic structure of the given source program. This 
syntactic structure is mostly a parse tree. The syntax of a programming is 
described by a Context-Free Grammar (CFG). We will use BNF (Backus-Naur Form) 
notation in the description of CFGs.
The syntax analyzer (parser) checks whether a given source program satisfies the 
rules implied by a context-free grammar or not. If it satisfies, the parser creates the 
parse tree of that program. Otherwise the parser gives the error messages.
What syntax analysis cannot do!
• To check whether variables are of types on which operations are allowed
• To check whether a variable has been declared before use
• To check whether a variable has been initialized
• These issues will be handled in semantic analysis
We categorise the parser into two groups
1. Top-down parser (starts from the root).
2. Bottom-up parser (starts from the leaf).
• Both top-down and bottom-up parsers scan the input from left-to-right (one 
symbol at a time).
• Efficient top-down and bottom-up parsers can be implemented only for 
subclasses of context-free grammars.
1. LL for top-down parsing
2. LR for bottom-up parsing)
Example: Consider the following grammar
E ::= E + T | E - T | T T ::= T * F | T / F | F F ::= num | id 
For the input string: id(x) + num(2) * id(y)
Analysis of the top-down parsing:
E => E + T
=> E + T * F 
=> T + T * F 
=> T + F * F 
=> T + num * F 
=> F + num * F 
=> id + num * F 
=> id + num * id
Top down parsing uses left most derivation to derive the string and uses 
substitutions during derivation process.
Analysis of the Bottom-up parsing:
id(x) + num(2) * id(y)
=> id(x) + num(2) * F 
=> id(x) + F * F 
=> id(x) + T * F 
=> id(x) + T 
=> F + T 
=> T + T 
=> E + T 
=> E
Bottom up parsing uses reverse of right most derivation to verify the string and 
uses reductions during the process.
Context-Free Grammars
Inherently recursive structures of a programming language are defined by a CFG. In 
a CFG, we have A start symbol (one of the non-terminals). A finite set of terminals 
(in our case, this will be the set of tokens). A set of non-terminals (syntactic 
variables).
A finite set of production rules in the following form
A — ? a, where A is non-terminal and a is a string of terminals (including the empty 
string).
Parse Trees
Graphical representation for a derivation that filters out the order of choosing non­
terminals to avoid rewriting. The root node represents the start symbol, inner nodes 
of a parse tree are non-terminal Symbol.
Ambiguity
A grammar produces more than one parse tree for a sentence is called as 
ambiguous grammar. Unambiguous grammar refers unique selection of the parse 
tree for a sentence.
Ambiguity elimination:
Page 3


Syntax Analyzer (Parser)
Syntax analyzer creates the syntactic structure of the given source program. This 
syntactic structure is mostly a parse tree. The syntax of a programming is 
described by a Context-Free Grammar (CFG). We will use BNF (Backus-Naur Form) 
notation in the description of CFGs.
The syntax analyzer (parser) checks whether a given source program satisfies the 
rules implied by a context-free grammar or not. If it satisfies, the parser creates the 
parse tree of that program. Otherwise the parser gives the error messages.
What syntax analysis cannot do!
• To check whether variables are of types on which operations are allowed
• To check whether a variable has been declared before use
• To check whether a variable has been initialized
• These issues will be handled in semantic analysis
We categorise the parser into two groups
1. Top-down parser (starts from the root).
2. Bottom-up parser (starts from the leaf).
• Both top-down and bottom-up parsers scan the input from left-to-right (one 
symbol at a time).
• Efficient top-down and bottom-up parsers can be implemented only for 
subclasses of context-free grammars.
1. LL for top-down parsing
2. LR for bottom-up parsing)
Example: Consider the following grammar
E ::= E + T | E - T | T T ::= T * F | T / F | F F ::= num | id 
For the input string: id(x) + num(2) * id(y)
Analysis of the top-down parsing:
E => E + T
=> E + T * F 
=> T + T * F 
=> T + F * F 
=> T + num * F 
=> F + num * F 
=> id + num * F 
=> id + num * id
Top down parsing uses left most derivation to derive the string and uses 
substitutions during derivation process.
Analysis of the Bottom-up parsing:
id(x) + num(2) * id(y)
=> id(x) + num(2) * F 
=> id(x) + F * F 
=> id(x) + T * F 
=> id(x) + T 
=> F + T 
=> T + T 
=> E + T 
=> E
Bottom up parsing uses reverse of right most derivation to verify the string and 
uses reductions during the process.
Context-Free Grammars
Inherently recursive structures of a programming language are defined by a CFG. In 
a CFG, we have A start symbol (one of the non-terminals). A finite set of terminals 
(in our case, this will be the set of tokens). A set of non-terminals (syntactic 
variables).
A finite set of production rules in the following form
A — ? a, where A is non-terminal and a is a string of terminals (including the empty 
string).
Parse Trees
Graphical representation for a derivation that filters out the order of choosing non­
terminals to avoid rewriting. The root node represents the start symbol, inner nodes 
of a parse tree are non-terminal Symbol.
Ambiguity
A grammar produces more than one parse tree for a sentence is called as 
ambiguous grammar. Unambiguous grammar refers unique selection of the parse 
tree for a sentence.
Ambiguity elimination:
. Ambiguity is problematic because meaning of the programs can be incorrect 
. Ambiguity can be handled in several ways 
Enforce associativity and precedence 
Rewrite the grammar (cleanest way)
. There are no general techniques for handling ambiguity
. It is impossible to convert automatically an ambiguous grammar to an 
unambiguous one 
Left Recursion
A grammar is left recursive, if it has a non-terminal A such that there is a 
derivation.
A = > Aa for some string a
The left-recursion may appear in a single step of the derivation (immediate left 
recursion) or may appear in more than one step of the derivation.
A top down parser with production A ^ A a may loop forever
From the grammar A — ? A a | b left recursion may be eliminated by transforming the
grammar to
A — ? b R
R -? a R | e
Left recursion is an issue of concern in top down parsers. A grammar is left- 
recursive if we can find some non-terminal A which will eventually derive a 
sentential form with itself as the left-symbol. In other words, a grammar is left 
recursive if it has a non-terminal A such that there is a derivation
A — ? + A a for some string a. These derivations may lead to an infinite loop.
Top-down parsing technique can't handle left recursive grammars. So, we have to 
convert our left recursive grammar into an equivalent grammar which is not left 
recursive.
Removal of left recursion 
In general
A — » Aai |Aa2|...|Aam 
I P i IP 2 I— I P n 
Transforms to 
A ^ p 1 A'|p2A'|....|pn A''
A ^ a 1 A|a2A|...|am A,|€
Left Factoring
A predictive parser (a top-down parser without backtracking) insists that the 
grammar must be left factored.
grammar — ? a new equivalent grammar suitable for predictive parsing, 
stmt -» if expr then stmt else stmt | if expr then stmt
When we see, if we can't know which production rule is to be chosen then rewrite 
stmt in the derivation,
In general,
A — » a3, | < x3 .
Page 4


Syntax Analyzer (Parser)
Syntax analyzer creates the syntactic structure of the given source program. This 
syntactic structure is mostly a parse tree. The syntax of a programming is 
described by a Context-Free Grammar (CFG). We will use BNF (Backus-Naur Form) 
notation in the description of CFGs.
The syntax analyzer (parser) checks whether a given source program satisfies the 
rules implied by a context-free grammar or not. If it satisfies, the parser creates the 
parse tree of that program. Otherwise the parser gives the error messages.
What syntax analysis cannot do!
• To check whether variables are of types on which operations are allowed
• To check whether a variable has been declared before use
• To check whether a variable has been initialized
• These issues will be handled in semantic analysis
We categorise the parser into two groups
1. Top-down parser (starts from the root).
2. Bottom-up parser (starts from the leaf).
• Both top-down and bottom-up parsers scan the input from left-to-right (one 
symbol at a time).
• Efficient top-down and bottom-up parsers can be implemented only for 
subclasses of context-free grammars.
1. LL for top-down parsing
2. LR for bottom-up parsing)
Example: Consider the following grammar
E ::= E + T | E - T | T T ::= T * F | T / F | F F ::= num | id 
For the input string: id(x) + num(2) * id(y)
Analysis of the top-down parsing:
E => E + T
=> E + T * F 
=> T + T * F 
=> T + F * F 
=> T + num * F 
=> F + num * F 
=> id + num * F 
=> id + num * id
Top down parsing uses left most derivation to derive the string and uses 
substitutions during derivation process.
Analysis of the Bottom-up parsing:
id(x) + num(2) * id(y)
=> id(x) + num(2) * F 
=> id(x) + F * F 
=> id(x) + T * F 
=> id(x) + T 
=> F + T 
=> T + T 
=> E + T 
=> E
Bottom up parsing uses reverse of right most derivation to verify the string and 
uses reductions during the process.
Context-Free Grammars
Inherently recursive structures of a programming language are defined by a CFG. In 
a CFG, we have A start symbol (one of the non-terminals). A finite set of terminals 
(in our case, this will be the set of tokens). A set of non-terminals (syntactic 
variables).
A finite set of production rules in the following form
A — ? a, where A is non-terminal and a is a string of terminals (including the empty 
string).
Parse Trees
Graphical representation for a derivation that filters out the order of choosing non­
terminals to avoid rewriting. The root node represents the start symbol, inner nodes 
of a parse tree are non-terminal Symbol.
Ambiguity
A grammar produces more than one parse tree for a sentence is called as 
ambiguous grammar. Unambiguous grammar refers unique selection of the parse 
tree for a sentence.
Ambiguity elimination:
. Ambiguity is problematic because meaning of the programs can be incorrect 
. Ambiguity can be handled in several ways 
Enforce associativity and precedence 
Rewrite the grammar (cleanest way)
. There are no general techniques for handling ambiguity
. It is impossible to convert automatically an ambiguous grammar to an 
unambiguous one 
Left Recursion
A grammar is left recursive, if it has a non-terminal A such that there is a 
derivation.
A = > Aa for some string a
The left-recursion may appear in a single step of the derivation (immediate left 
recursion) or may appear in more than one step of the derivation.
A top down parser with production A ^ A a may loop forever
From the grammar A — ? A a | b left recursion may be eliminated by transforming the
grammar to
A — ? b R
R -? a R | e
Left recursion is an issue of concern in top down parsers. A grammar is left- 
recursive if we can find some non-terminal A which will eventually derive a 
sentential form with itself as the left-symbol. In other words, a grammar is left 
recursive if it has a non-terminal A such that there is a derivation
A — ? + A a for some string a. These derivations may lead to an infinite loop.
Top-down parsing technique can't handle left recursive grammars. So, we have to 
convert our left recursive grammar into an equivalent grammar which is not left 
recursive.
Removal of left recursion 
In general
A — » Aai |Aa2|...|Aam 
I P i IP 2 I— I P n 
Transforms to 
A ^ p 1 A'|p2A'|....|pn A''
A ^ a 1 A|a2A|...|am A,|€
Left Factoring
A predictive parser (a top-down parser without backtracking) insists that the 
grammar must be left factored.
grammar — ? a new equivalent grammar suitable for predictive parsing, 
stmt -» if expr then stmt else stmt | if expr then stmt
When we see, if we can't know which production rule is to be chosen then rewrite 
stmt in the derivation,
In general,
A — » a3, | < x3 .
where a is not empty and the first symbols of Pi and p2 (if they have one) are 
different.
When processing a, we can't know whether expand
But, if we rewrite the grammar as follows 
A -? aA' A' -? Pi | p2 , so we can immediately expand A to aA'.
Dangling else problem can be handled by left factoring 
stmt -» if expr then stmt else stmt | if expr then stmt 
can be transformed to
stmt -? if expr then stmt S’
S’ -? else stmt | e 
Top-down Parsing
There are two main techniques to achieve top-down parse tree
1. Recursive descent parsing
2. Predictive parsing
Recursive Descent Parsing (Uses Backtracking)
Backtracking is needed (if a choice of a production rule does not work, we 
backtrack to try other alternatives). It tries to find the left most derivation. It is not 
efficient.
e.g., If the grammar is 
S -? aBc
B bc|b and the input is abc
In case failure, ft backtracks to generate the new one
Predictive Parser
. A non recursive top down parsing method 
. Parser "predicts" which production to use
. It removes backtracking by fixing one production for every non-terminal and input 
token(s)
. Predictive parsers accept LL(k) languages 
First L stands for left to right scan of input 
Second L stands for leftmost derivation 
k stands for number of lookahead token 
. In practice LL(1) is used
A -? aPi | ap2
£ C
Page 5


Syntax Analyzer (Parser)
Syntax analyzer creates the syntactic structure of the given source program. This 
syntactic structure is mostly a parse tree. The syntax of a programming is 
described by a Context-Free Grammar (CFG). We will use BNF (Backus-Naur Form) 
notation in the description of CFGs.
The syntax analyzer (parser) checks whether a given source program satisfies the 
rules implied by a context-free grammar or not. If it satisfies, the parser creates the 
parse tree of that program. Otherwise the parser gives the error messages.
What syntax analysis cannot do!
• To check whether variables are of types on which operations are allowed
• To check whether a variable has been declared before use
• To check whether a variable has been initialized
• These issues will be handled in semantic analysis
We categorise the parser into two groups
1. Top-down parser (starts from the root).
2. Bottom-up parser (starts from the leaf).
• Both top-down and bottom-up parsers scan the input from left-to-right (one 
symbol at a time).
• Efficient top-down and bottom-up parsers can be implemented only for 
subclasses of context-free grammars.
1. LL for top-down parsing
2. LR for bottom-up parsing)
Example: Consider the following grammar
E ::= E + T | E - T | T T ::= T * F | T / F | F F ::= num | id 
For the input string: id(x) + num(2) * id(y)
Analysis of the top-down parsing:
E => E + T
=> E + T * F 
=> T + T * F 
=> T + F * F 
=> T + num * F 
=> F + num * F 
=> id + num * F 
=> id + num * id
Top down parsing uses left most derivation to derive the string and uses 
substitutions during derivation process.
Analysis of the Bottom-up parsing:
id(x) + num(2) * id(y)
=> id(x) + num(2) * F 
=> id(x) + F * F 
=> id(x) + T * F 
=> id(x) + T 
=> F + T 
=> T + T 
=> E + T 
=> E
Bottom up parsing uses reverse of right most derivation to verify the string and 
uses reductions during the process.
Context-Free Grammars
Inherently recursive structures of a programming language are defined by a CFG. In 
a CFG, we have A start symbol (one of the non-terminals). A finite set of terminals 
(in our case, this will be the set of tokens). A set of non-terminals (syntactic 
variables).
A finite set of production rules in the following form
A — ? a, where A is non-terminal and a is a string of terminals (including the empty 
string).
Parse Trees
Graphical representation for a derivation that filters out the order of choosing non­
terminals to avoid rewriting. The root node represents the start symbol, inner nodes 
of a parse tree are non-terminal Symbol.
Ambiguity
A grammar produces more than one parse tree for a sentence is called as 
ambiguous grammar. Unambiguous grammar refers unique selection of the parse 
tree for a sentence.
Ambiguity elimination:
. Ambiguity is problematic because meaning of the programs can be incorrect 
. Ambiguity can be handled in several ways 
Enforce associativity and precedence 
Rewrite the grammar (cleanest way)
. There are no general techniques for handling ambiguity
. It is impossible to convert automatically an ambiguous grammar to an 
unambiguous one 
Left Recursion
A grammar is left recursive, if it has a non-terminal A such that there is a 
derivation.
A = > Aa for some string a
The left-recursion may appear in a single step of the derivation (immediate left 
recursion) or may appear in more than one step of the derivation.
A top down parser with production A ^ A a may loop forever
From the grammar A — ? A a | b left recursion may be eliminated by transforming the
grammar to
A — ? b R
R -? a R | e
Left recursion is an issue of concern in top down parsers. A grammar is left- 
recursive if we can find some non-terminal A which will eventually derive a 
sentential form with itself as the left-symbol. In other words, a grammar is left 
recursive if it has a non-terminal A such that there is a derivation
A — ? + A a for some string a. These derivations may lead to an infinite loop.
Top-down parsing technique can't handle left recursive grammars. So, we have to 
convert our left recursive grammar into an equivalent grammar which is not left 
recursive.
Removal of left recursion 
In general
A — » Aai |Aa2|...|Aam 
I P i IP 2 I— I P n 
Transforms to 
A ^ p 1 A'|p2A'|....|pn A''
A ^ a 1 A|a2A|...|am A,|€
Left Factoring
A predictive parser (a top-down parser without backtracking) insists that the 
grammar must be left factored.
grammar — ? a new equivalent grammar suitable for predictive parsing, 
stmt -» if expr then stmt else stmt | if expr then stmt
When we see, if we can't know which production rule is to be chosen then rewrite 
stmt in the derivation,
In general,
A — » a3, | < x3 .
where a is not empty and the first symbols of Pi and p2 (if they have one) are 
different.
When processing a, we can't know whether expand
But, if we rewrite the grammar as follows 
A -? aA' A' -? Pi | p2 , so we can immediately expand A to aA'.
Dangling else problem can be handled by left factoring 
stmt -» if expr then stmt else stmt | if expr then stmt 
can be transformed to
stmt -? if expr then stmt S’
S’ -? else stmt | e 
Top-down Parsing
There are two main techniques to achieve top-down parse tree
1. Recursive descent parsing
2. Predictive parsing
Recursive Descent Parsing (Uses Backtracking)
Backtracking is needed (if a choice of a production rule does not work, we 
backtrack to try other alternatives). It tries to find the left most derivation. It is not 
efficient.
e.g., If the grammar is 
S -? aBc
B bc|b and the input is abc
In case failure, ft backtracks to generate the new one
Predictive Parser
. A non recursive top down parsing method 
. Parser "predicts" which production to use
. It removes backtracking by fixing one production for every non-terminal and input 
token(s)
. Predictive parsers accept LL(k) languages 
First L stands for left to right scan of input 
Second L stands for leftmost derivation 
k stands for number of lookahead token 
. In practice LL(1) is used
A -? aPi | ap2
£ C
Functions used in Constructing LL (1) Parsing Tables
• Two functions are used in the construction of LL (1) parsing tables: FIRST and 
FOLLOW.
• FIRST (a) is a set of the terminal symbols which occur as first symbols in 
strings derived from a, where a is any string of grammar symbols. If a derives 
to a, then e is also in FIRST (a).
• FOLLOW (A) is the set of the terminals which occur immediately after 
(FOLLOW) the non-terminal A in the strings derived from the starting symbol.
• First set is computed for all non-terminals, but follow set is computed only for 
those non-terminals in their first set contain epsilon.
• For every terminal x in FIRST(X), there is an entry (production which derives x) 
in LL(1) table when x is not an epsilon.
• For every terminal y in Follow(Y), there is an entry (null production) in the 
table.
To Compute FIRST of any String X
• If X is a terminal symbol -» FIRST (X) = {X}
• If X is a non-terminal symbol and X — ? e is a production rule -» e is in FIRST
(X).
• If X is a non-terminal symbol and X — ? Y1 ( Y2, ...., Yn is a production rule. If a 
terminal a in FIRST (Yj) and e is in all FIRST (Yj) for j = 1,..., i -1, then a is in 
FIRST (X). If e is in all FIRST (Yj) for j = 1,... n, then e is in FIRST (X).
. If X is e , then FIRST (X)= {e }
Read More
90 docs

Top Courses for Computer Science Engineering (CSE)

Explore Courses for Computer Science Engineering (CSE) exam

Top Courses for Computer Science Engineering (CSE)

Signup for Free!
Signup to see your scores go up within 7 days! Learn & Practice with 1000+ FREE Notes, Videos & Tests.
10M+ students study on EduRev
Related Searches

study material

,

practice quizzes

,

past year papers

,

Short Notes: Parsing | Short Notes for Computer Science Engineering - Computer Science Engineering (CSE)

,

MCQs

,

Viva Questions

,

pdf

,

Semester Notes

,

mock tests for examination

,

Extra Questions

,

Free

,

Short Notes: Parsing | Short Notes for Computer Science Engineering - Computer Science Engineering (CSE)

,

Short Notes: Parsing | Short Notes for Computer Science Engineering - Computer Science Engineering (CSE)

,

Summary

,

Exam

,

video lectures

,

Important questions

,

shortcuts and tricks

,

ppt

,

Sample Paper

,

Previous Year Questions with Solutions

,

Objective type Questions

;