Page 1
Lexical analyzer reads the source program character by character and returns the
tokens of the source program. It puts information about identifiers into the symbol
table.
The Role of Lexical Analyzer:
• It is the first phase of a compiler
• It reads the input character and produces output sequence of tokens that the
Parser uses for syntax analysis.
• It can either work as a separate module or as submodule.
• Lexical Analyzer is also responsible for eliminating comments and white
spaces from the source program.
• It also generates lexical errors.
Source
Program
Lexical Analvzer
Token
Syntax Analyzer
Get Next Token
\ /
Symbol Table
• Lexical Analyzer is also responsible for eliminating comments and white
spaces from the source program.
• It also generates lexical errors.
Tokens, Lexemes and Patterns
• A token describes a pattern of characters having same meaning in the source
program such as identifiers, operators, keywords, numbers, delimiters and so
on. A token may have a single attribute which holds the required information
Page 2
Lexical analyzer reads the source program character by character and returns the
tokens of the source program. It puts information about identifiers into the symbol
table.
The Role of Lexical Analyzer:
• It is the first phase of a compiler
• It reads the input character and produces output sequence of tokens that the
Parser uses for syntax analysis.
• It can either work as a separate module or as submodule.
• Lexical Analyzer is also responsible for eliminating comments and white
spaces from the source program.
• It also generates lexical errors.
Source
Program
Lexical Analvzer
Token
Syntax Analyzer
Get Next Token
\ /
Symbol Table
• Lexical Analyzer is also responsible for eliminating comments and white
spaces from the source program.
• It also generates lexical errors.
Tokens, Lexemes and Patterns
• A token describes a pattern of characters having same meaning in the source
program such as identifiers, operators, keywords, numbers, delimiters and so
on. A token may have a single attribute which holds the required information
for that token. For identifiers, this attribute is a pointer to the symbol table
and the symbol table holds the actual attributes for that token.
• Token type and its attribute uniquely identify a lexeme.
• Regular expressions are widely used to specify pattern.
Tokens, Patterns and Lexemes Lexeme: Sequence of character in the source pm
that is matched against the pattern for a token Pattern: The rule associated with
each set of string is called pattern. Lexeme is matched against pattern to generate
token. Token: Token is word, which describes the lexeme in source pgm. Its is
generated when lexeme is matches against pattern. Example: Lexeme: A1, Sum,
Total
• Pattern: Starting with a letter and followed by letter or digit but not a keyword.
• Token:ID
Lexeme: If | Then | Else
• Pattern: If | Then | Else
• Token: IF | THEN | ELSE
Lexeme: 123.45
• Pattern: Starting with digit followed by a digit or optional fraction and or
optional exponent
• Token: NUM
Counting Number of tokens :
A token is usually described by an integer representing the kind of token, possibly
together with an attribute, representing the value of the token. For example, in most
programming languages we have the following kinds of tokens.
• Identifiers (x, y, average, etc.)
• Reserved or keywords (if, else, while, etc.)
• Integer constants (42, O xFF, 0177 etc.)
• Floating point constants (5.6, 3.6e8, etc.)
• String constants ("hello there\n", etc.)
• Character constants ('a', 'b', etc.)
• Special symbols (()::= + - etc.)
• Comments (To be ignored.)
• Compiler directives (Directives to include files, define macros, etc.)
• Line information (We might need to detect newline characters as tokens, if
they are syntactically important. We must also increment the line count, so
that we can indicate the line number for error messages.)
• White space (Blanks and tabs that are used to separate tokens, but are
otherwise not important).
• End of file
Each reserved word or special symbol is considered to be a different kind of token,
as far as the parser is concerned. They are distinguished by a different integer to
represent their kind.
Example :
Page 3
Lexical analyzer reads the source program character by character and returns the
tokens of the source program. It puts information about identifiers into the symbol
table.
The Role of Lexical Analyzer:
• It is the first phase of a compiler
• It reads the input character and produces output sequence of tokens that the
Parser uses for syntax analysis.
• It can either work as a separate module or as submodule.
• Lexical Analyzer is also responsible for eliminating comments and white
spaces from the source program.
• It also generates lexical errors.
Source
Program
Lexical Analvzer
Token
Syntax Analyzer
Get Next Token
\ /
Symbol Table
• Lexical Analyzer is also responsible for eliminating comments and white
spaces from the source program.
• It also generates lexical errors.
Tokens, Lexemes and Patterns
• A token describes a pattern of characters having same meaning in the source
program such as identifiers, operators, keywords, numbers, delimiters and so
on. A token may have a single attribute which holds the required information
for that token. For identifiers, this attribute is a pointer to the symbol table
and the symbol table holds the actual attributes for that token.
• Token type and its attribute uniquely identify a lexeme.
• Regular expressions are widely used to specify pattern.
Tokens, Patterns and Lexemes Lexeme: Sequence of character in the source pm
that is matched against the pattern for a token Pattern: The rule associated with
each set of string is called pattern. Lexeme is matched against pattern to generate
token. Token: Token is word, which describes the lexeme in source pgm. Its is
generated when lexeme is matches against pattern. Example: Lexeme: A1, Sum,
Total
• Pattern: Starting with a letter and followed by letter or digit but not a keyword.
• Token:ID
Lexeme: If | Then | Else
• Pattern: If | Then | Else
• Token: IF | THEN | ELSE
Lexeme: 123.45
• Pattern: Starting with digit followed by a digit or optional fraction and or
optional exponent
• Token: NUM
Counting Number of tokens :
A token is usually described by an integer representing the kind of token, possibly
together with an attribute, representing the value of the token. For example, in most
programming languages we have the following kinds of tokens.
• Identifiers (x, y, average, etc.)
• Reserved or keywords (if, else, while, etc.)
• Integer constants (42, O xFF, 0177 etc.)
• Floating point constants (5.6, 3.6e8, etc.)
• String constants ("hello there\n", etc.)
• Character constants ('a', 'b', etc.)
• Special symbols (()::= + - etc.)
• Comments (To be ignored.)
• Compiler directives (Directives to include files, define macros, etc.)
• Line information (We might need to detect newline characters as tokens, if
they are syntactically important. We must also increment the line count, so
that we can indicate the line number for error messages.)
• White space (Blanks and tabs that are used to separate tokens, but are
otherwise not important).
• End of file
Each reserved word or special symbol is considered to be a different kind of token,
as far as the parser is concerned. They are distinguished by a different integer to
represent their kind.
Example :
No of tokens in the following C statement are___
/* abc sk/Printff1 what's up %d", ++&&** *a);
Solution:
/* abc */printf ("what's up %d", ++&&** *a)
/* abc */printf (“what’s up %d”, ++&&***a)
ignored by
lexical analysed
Its entire 1 taken
hence # taker count to 11
Read More