Page 1
Syntax Analyzer (Parser)
Syntax analyzer creates the syntactic structure of the given source program. This
syntactic structure is mostly a parse tree. The syntax of a programming is
described by a Context-Free Grammar (CFG). We will use BNF (Backus-Naur Form)
notation in the description of CFGs.
The syntax analyzer (parser) checks whether a given source program satisfies the
rules implied by a context-free grammar or not. If it satisfies, the parser creates the
parse tree of that program. Otherwise the parser gives the error messages.
What syntax analysis cannot do!
• To check whether variables are of types on which operations are allowed
• To check whether a variable has been declared before use
• To check whether a variable has been initialized
• These issues will be handled in semantic analysis
We categorise the parser into two groups
1. Top-down parser (starts from the root).
2. Bottom-up parser (starts from the leaf).
• Both top-down and bottom-up parsers scan the input from left-to-right (one
symbol at a time).
• Efficient top-down and bottom-up parsers can be implemented only for
subclasses of context-free grammars.
1. LL for top-down parsing
2. LR for bottom-up parsing)
Example: Consider the following grammar
Page 2
Syntax Analyzer (Parser)
Syntax analyzer creates the syntactic structure of the given source program. This
syntactic structure is mostly a parse tree. The syntax of a programming is
described by a Context-Free Grammar (CFG). We will use BNF (Backus-Naur Form)
notation in the description of CFGs.
The syntax analyzer (parser) checks whether a given source program satisfies the
rules implied by a context-free grammar or not. If it satisfies, the parser creates the
parse tree of that program. Otherwise the parser gives the error messages.
What syntax analysis cannot do!
• To check whether variables are of types on which operations are allowed
• To check whether a variable has been declared before use
• To check whether a variable has been initialized
• These issues will be handled in semantic analysis
We categorise the parser into two groups
1. Top-down parser (starts from the root).
2. Bottom-up parser (starts from the leaf).
• Both top-down and bottom-up parsers scan the input from left-to-right (one
symbol at a time).
• Efficient top-down and bottom-up parsers can be implemented only for
subclasses of context-free grammars.
1. LL for top-down parsing
2. LR for bottom-up parsing)
Example: Consider the following grammar
E ::= E + T | E - T | T T ::= T * F | T / F | F F ::= num | id
For the input string: id(x) + num(2) * id(y)
Analysis of the top-down parsing:
E => E + T
=> E + T * F
=> T + T * F
=> T + F * F
=> T + num * F
=> F + num * F
=> id + num * F
=> id + num * id
Top down parsing uses left most derivation to derive the string and uses
substitutions during derivation process.
Analysis of the Bottom-up parsing:
id(x) + num(2) * id(y)
=> id(x) + num(2) * F
=> id(x) + F * F
=> id(x) + T * F
=> id(x) + T
=> F + T
=> T + T
=> E + T
=> E
Bottom up parsing uses reverse of right most derivation to verify the string and
uses reductions during the process.
Context-Free Grammars
Inherently recursive structures of a programming language are defined by a CFG. In
a CFG, we have A start symbol (one of the non-terminals). A finite set of terminals
(in our case, this will be the set of tokens). A set of non-terminals (syntactic
variables).
A finite set of production rules in the following form
A — ? a, where A is non-terminal and a is a string of terminals (including the empty
string).
Parse Trees
Graphical representation for a derivation that filters out the order of choosing non
terminals to avoid rewriting. The root node represents the start symbol, inner nodes
of a parse tree are non-terminal Symbol.
Ambiguity
A grammar produces more than one parse tree for a sentence is called as
ambiguous grammar. Unambiguous grammar refers unique selection of the parse
tree for a sentence.
Ambiguity elimination:
Page 3
Syntax Analyzer (Parser)
Syntax analyzer creates the syntactic structure of the given source program. This
syntactic structure is mostly a parse tree. The syntax of a programming is
described by a Context-Free Grammar (CFG). We will use BNF (Backus-Naur Form)
notation in the description of CFGs.
The syntax analyzer (parser) checks whether a given source program satisfies the
rules implied by a context-free grammar or not. If it satisfies, the parser creates the
parse tree of that program. Otherwise the parser gives the error messages.
What syntax analysis cannot do!
• To check whether variables are of types on which operations are allowed
• To check whether a variable has been declared before use
• To check whether a variable has been initialized
• These issues will be handled in semantic analysis
We categorise the parser into two groups
1. Top-down parser (starts from the root).
2. Bottom-up parser (starts from the leaf).
• Both top-down and bottom-up parsers scan the input from left-to-right (one
symbol at a time).
• Efficient top-down and bottom-up parsers can be implemented only for
subclasses of context-free grammars.
1. LL for top-down parsing
2. LR for bottom-up parsing)
Example: Consider the following grammar
E ::= E + T | E - T | T T ::= T * F | T / F | F F ::= num | id
For the input string: id(x) + num(2) * id(y)
Analysis of the top-down parsing:
E => E + T
=> E + T * F
=> T + T * F
=> T + F * F
=> T + num * F
=> F + num * F
=> id + num * F
=> id + num * id
Top down parsing uses left most derivation to derive the string and uses
substitutions during derivation process.
Analysis of the Bottom-up parsing:
id(x) + num(2) * id(y)
=> id(x) + num(2) * F
=> id(x) + F * F
=> id(x) + T * F
=> id(x) + T
=> F + T
=> T + T
=> E + T
=> E
Bottom up parsing uses reverse of right most derivation to verify the string and
uses reductions during the process.
Context-Free Grammars
Inherently recursive structures of a programming language are defined by a CFG. In
a CFG, we have A start symbol (one of the non-terminals). A finite set of terminals
(in our case, this will be the set of tokens). A set of non-terminals (syntactic
variables).
A finite set of production rules in the following form
A — ? a, where A is non-terminal and a is a string of terminals (including the empty
string).
Parse Trees
Graphical representation for a derivation that filters out the order of choosing non
terminals to avoid rewriting. The root node represents the start symbol, inner nodes
of a parse tree are non-terminal Symbol.
Ambiguity
A grammar produces more than one parse tree for a sentence is called as
ambiguous grammar. Unambiguous grammar refers unique selection of the parse
tree for a sentence.
Ambiguity elimination:
. Ambiguity is problematic because meaning of the programs can be incorrect
. Ambiguity can be handled in several ways
Enforce associativity and precedence
Rewrite the grammar (cleanest way)
. There are no general techniques for handling ambiguity
. It is impossible to convert automatically an ambiguous grammar to an
unambiguous one
Left Recursion
A grammar is left recursive, if it has a non-terminal A such that there is a
derivation.
A = > Aa for some string a
The left-recursion may appear in a single step of the derivation (immediate left
recursion) or may appear in more than one step of the derivation.
A top down parser with production A ^ A a may loop forever
From the grammar A — ? A a | b left recursion may be eliminated by transforming the
grammar to
A — ? b R
R -? a R | e
Left recursion is an issue of concern in top down parsers. A grammar is left-
recursive if we can find some non-terminal A which will eventually derive a
sentential form with itself as the left-symbol. In other words, a grammar is left
recursive if it has a non-terminal A such that there is a derivation
A — ? + A a for some string a. These derivations may lead to an infinite loop.
Top-down parsing technique can't handle left recursive grammars. So, we have to
convert our left recursive grammar into an equivalent grammar which is not left
recursive.
Removal of left recursion
In general
A — » Aai |Aa2|...|Aam
I P i IP 2 I— I P n
Transforms to
A ^ p 1 A'|p2A'|....|pn A''
A ^ a 1 A|a2A|...|am A,|€
Left Factoring
A predictive parser (a top-down parser without backtracking) insists that the
grammar must be left factored.
grammar — ? a new equivalent grammar suitable for predictive parsing,
stmt -» if expr then stmt else stmt | if expr then stmt
When we see, if we can't know which production rule is to be chosen then rewrite
stmt in the derivation,
In general,
A — » a3, | < x3 .
Page 4
Syntax Analyzer (Parser)
Syntax analyzer creates the syntactic structure of the given source program. This
syntactic structure is mostly a parse tree. The syntax of a programming is
described by a Context-Free Grammar (CFG). We will use BNF (Backus-Naur Form)
notation in the description of CFGs.
The syntax analyzer (parser) checks whether a given source program satisfies the
rules implied by a context-free grammar or not. If it satisfies, the parser creates the
parse tree of that program. Otherwise the parser gives the error messages.
What syntax analysis cannot do!
• To check whether variables are of types on which operations are allowed
• To check whether a variable has been declared before use
• To check whether a variable has been initialized
• These issues will be handled in semantic analysis
We categorise the parser into two groups
1. Top-down parser (starts from the root).
2. Bottom-up parser (starts from the leaf).
• Both top-down and bottom-up parsers scan the input from left-to-right (one
symbol at a time).
• Efficient top-down and bottom-up parsers can be implemented only for
subclasses of context-free grammars.
1. LL for top-down parsing
2. LR for bottom-up parsing)
Example: Consider the following grammar
E ::= E + T | E - T | T T ::= T * F | T / F | F F ::= num | id
For the input string: id(x) + num(2) * id(y)
Analysis of the top-down parsing:
E => E + T
=> E + T * F
=> T + T * F
=> T + F * F
=> T + num * F
=> F + num * F
=> id + num * F
=> id + num * id
Top down parsing uses left most derivation to derive the string and uses
substitutions during derivation process.
Analysis of the Bottom-up parsing:
id(x) + num(2) * id(y)
=> id(x) + num(2) * F
=> id(x) + F * F
=> id(x) + T * F
=> id(x) + T
=> F + T
=> T + T
=> E + T
=> E
Bottom up parsing uses reverse of right most derivation to verify the string and
uses reductions during the process.
Context-Free Grammars
Inherently recursive structures of a programming language are defined by a CFG. In
a CFG, we have A start symbol (one of the non-terminals). A finite set of terminals
(in our case, this will be the set of tokens). A set of non-terminals (syntactic
variables).
A finite set of production rules in the following form
A — ? a, where A is non-terminal and a is a string of terminals (including the empty
string).
Parse Trees
Graphical representation for a derivation that filters out the order of choosing non
terminals to avoid rewriting. The root node represents the start symbol, inner nodes
of a parse tree are non-terminal Symbol.
Ambiguity
A grammar produces more than one parse tree for a sentence is called as
ambiguous grammar. Unambiguous grammar refers unique selection of the parse
tree for a sentence.
Ambiguity elimination:
. Ambiguity is problematic because meaning of the programs can be incorrect
. Ambiguity can be handled in several ways
Enforce associativity and precedence
Rewrite the grammar (cleanest way)
. There are no general techniques for handling ambiguity
. It is impossible to convert automatically an ambiguous grammar to an
unambiguous one
Left Recursion
A grammar is left recursive, if it has a non-terminal A such that there is a
derivation.
A = > Aa for some string a
The left-recursion may appear in a single step of the derivation (immediate left
recursion) or may appear in more than one step of the derivation.
A top down parser with production A ^ A a may loop forever
From the grammar A — ? A a | b left recursion may be eliminated by transforming the
grammar to
A — ? b R
R -? a R | e
Left recursion is an issue of concern in top down parsers. A grammar is left-
recursive if we can find some non-terminal A which will eventually derive a
sentential form with itself as the left-symbol. In other words, a grammar is left
recursive if it has a non-terminal A such that there is a derivation
A — ? + A a for some string a. These derivations may lead to an infinite loop.
Top-down parsing technique can't handle left recursive grammars. So, we have to
convert our left recursive grammar into an equivalent grammar which is not left
recursive.
Removal of left recursion
In general
A — » Aai |Aa2|...|Aam
I P i IP 2 I— I P n
Transforms to
A ^ p 1 A'|p2A'|....|pn A''
A ^ a 1 A|a2A|...|am A,|€
Left Factoring
A predictive parser (a top-down parser without backtracking) insists that the
grammar must be left factored.
grammar — ? a new equivalent grammar suitable for predictive parsing,
stmt -» if expr then stmt else stmt | if expr then stmt
When we see, if we can't know which production rule is to be chosen then rewrite
stmt in the derivation,
In general,
A — » a3, | < x3 .
where a is not empty and the first symbols of Pi and p2 (if they have one) are
different.
When processing a, we can't know whether expand
But, if we rewrite the grammar as follows
A -? aA' A' -? Pi | p2 , so we can immediately expand A to aA'.
Dangling else problem can be handled by left factoring
stmt -» if expr then stmt else stmt | if expr then stmt
can be transformed to
stmt -? if expr then stmt S’
S’ -? else stmt | e
Top-down Parsing
There are two main techniques to achieve top-down parse tree
1. Recursive descent parsing
2. Predictive parsing
Recursive Descent Parsing (Uses Backtracking)
Backtracking is needed (if a choice of a production rule does not work, we
backtrack to try other alternatives). It tries to find the left most derivation. It is not
efficient.
e.g., If the grammar is
S -? aBc
B bc|b and the input is abc
In case failure, ft backtracks to generate the new one
Predictive Parser
. A non recursive top down parsing method
. Parser "predicts" which production to use
. It removes backtracking by fixing one production for every non-terminal and input
token(s)
. Predictive parsers accept LL(k) languages
First L stands for left to right scan of input
Second L stands for leftmost derivation
k stands for number of lookahead token
. In practice LL(1) is used
A -? aPi | ap2
£ C
Page 5
Syntax Analyzer (Parser)
Syntax analyzer creates the syntactic structure of the given source program. This
syntactic structure is mostly a parse tree. The syntax of a programming is
described by a Context-Free Grammar (CFG). We will use BNF (Backus-Naur Form)
notation in the description of CFGs.
The syntax analyzer (parser) checks whether a given source program satisfies the
rules implied by a context-free grammar or not. If it satisfies, the parser creates the
parse tree of that program. Otherwise the parser gives the error messages.
What syntax analysis cannot do!
• To check whether variables are of types on which operations are allowed
• To check whether a variable has been declared before use
• To check whether a variable has been initialized
• These issues will be handled in semantic analysis
We categorise the parser into two groups
1. Top-down parser (starts from the root).
2. Bottom-up parser (starts from the leaf).
• Both top-down and bottom-up parsers scan the input from left-to-right (one
symbol at a time).
• Efficient top-down and bottom-up parsers can be implemented only for
subclasses of context-free grammars.
1. LL for top-down parsing
2. LR for bottom-up parsing)
Example: Consider the following grammar
E ::= E + T | E - T | T T ::= T * F | T / F | F F ::= num | id
For the input string: id(x) + num(2) * id(y)
Analysis of the top-down parsing:
E => E + T
=> E + T * F
=> T + T * F
=> T + F * F
=> T + num * F
=> F + num * F
=> id + num * F
=> id + num * id
Top down parsing uses left most derivation to derive the string and uses
substitutions during derivation process.
Analysis of the Bottom-up parsing:
id(x) + num(2) * id(y)
=> id(x) + num(2) * F
=> id(x) + F * F
=> id(x) + T * F
=> id(x) + T
=> F + T
=> T + T
=> E + T
=> E
Bottom up parsing uses reverse of right most derivation to verify the string and
uses reductions during the process.
Context-Free Grammars
Inherently recursive structures of a programming language are defined by a CFG. In
a CFG, we have A start symbol (one of the non-terminals). A finite set of terminals
(in our case, this will be the set of tokens). A set of non-terminals (syntactic
variables).
A finite set of production rules in the following form
A — ? a, where A is non-terminal and a is a string of terminals (including the empty
string).
Parse Trees
Graphical representation for a derivation that filters out the order of choosing non
terminals to avoid rewriting. The root node represents the start symbol, inner nodes
of a parse tree are non-terminal Symbol.
Ambiguity
A grammar produces more than one parse tree for a sentence is called as
ambiguous grammar. Unambiguous grammar refers unique selection of the parse
tree for a sentence.
Ambiguity elimination:
. Ambiguity is problematic because meaning of the programs can be incorrect
. Ambiguity can be handled in several ways
Enforce associativity and precedence
Rewrite the grammar (cleanest way)
. There are no general techniques for handling ambiguity
. It is impossible to convert automatically an ambiguous grammar to an
unambiguous one
Left Recursion
A grammar is left recursive, if it has a non-terminal A such that there is a
derivation.
A = > Aa for some string a
The left-recursion may appear in a single step of the derivation (immediate left
recursion) or may appear in more than one step of the derivation.
A top down parser with production A ^ A a may loop forever
From the grammar A — ? A a | b left recursion may be eliminated by transforming the
grammar to
A — ? b R
R -? a R | e
Left recursion is an issue of concern in top down parsers. A grammar is left-
recursive if we can find some non-terminal A which will eventually derive a
sentential form with itself as the left-symbol. In other words, a grammar is left
recursive if it has a non-terminal A such that there is a derivation
A — ? + A a for some string a. These derivations may lead to an infinite loop.
Top-down parsing technique can't handle left recursive grammars. So, we have to
convert our left recursive grammar into an equivalent grammar which is not left
recursive.
Removal of left recursion
In general
A — » Aai |Aa2|...|Aam
I P i IP 2 I— I P n
Transforms to
A ^ p 1 A'|p2A'|....|pn A''
A ^ a 1 A|a2A|...|am A,|€
Left Factoring
A predictive parser (a top-down parser without backtracking) insists that the
grammar must be left factored.
grammar — ? a new equivalent grammar suitable for predictive parsing,
stmt -» if expr then stmt else stmt | if expr then stmt
When we see, if we can't know which production rule is to be chosen then rewrite
stmt in the derivation,
In general,
A — » a3, | < x3 .
where a is not empty and the first symbols of Pi and p2 (if they have one) are
different.
When processing a, we can't know whether expand
But, if we rewrite the grammar as follows
A -? aA' A' -? Pi | p2 , so we can immediately expand A to aA'.
Dangling else problem can be handled by left factoring
stmt -» if expr then stmt else stmt | if expr then stmt
can be transformed to
stmt -? if expr then stmt S’
S’ -? else stmt | e
Top-down Parsing
There are two main techniques to achieve top-down parse tree
1. Recursive descent parsing
2. Predictive parsing
Recursive Descent Parsing (Uses Backtracking)
Backtracking is needed (if a choice of a production rule does not work, we
backtrack to try other alternatives). It tries to find the left most derivation. It is not
efficient.
e.g., If the grammar is
S -? aBc
B bc|b and the input is abc
In case failure, ft backtracks to generate the new one
Predictive Parser
. A non recursive top down parsing method
. Parser "predicts" which production to use
. It removes backtracking by fixing one production for every non-terminal and input
token(s)
. Predictive parsers accept LL(k) languages
First L stands for left to right scan of input
Second L stands for leftmost derivation
k stands for number of lookahead token
. In practice LL(1) is used
A -? aPi | ap2
£ C
Functions used in Constructing LL (1) Parsing Tables
• Two functions are used in the construction of LL (1) parsing tables: FIRST and
FOLLOW.
• FIRST (a) is a set of the terminal symbols which occur as first symbols in
strings derived from a, where a is any string of grammar symbols. If a derives
to a, then e is also in FIRST (a).
• FOLLOW (A) is the set of the terminals which occur immediately after
(FOLLOW) the non-terminal A in the strings derived from the starting symbol.
• First set is computed for all non-terminals, but follow set is computed only for
those non-terminals in their first set contain epsilon.
• For every terminal x in FIRST(X), there is an entry (production which derives x)
in LL(1) table when x is not an epsilon.
• For every terminal y in Follow(Y), there is an entry (null production) in the
table.
To Compute FIRST of any String X
• If X is a terminal symbol -» FIRST (X) = {X}
• If X is a non-terminal symbol and X — ? e is a production rule -» e is in FIRST
(X).
• If X is a non-terminal symbol and X — ? Y1 ( Y2, ...., Yn is a production rule. If a
terminal a in FIRST (Yj) and e is in all FIRST (Yj) for j = 1,..., i -1, then a is in
FIRST (X). If e is in all FIRST (Yj) for j = 1,... n, then e is in FIRST (X).
. If X is e , then FIRST (X)= {e }
Read More