Short Notes: Lexical Analysis | Short Notes for Computer Science Engineering - Computer Science Engineering (CSE) PDF Download

Download, print and study this document offline
Please wait while the PDF view is loading
 Page 1


  
Lexical analyzer reads the source program character by character and returns the 
tokens of the source program. It puts information about identifiers into the symbol 
table.
The Role of Lexical Analyzer:
• It is the first phase of a compiler
• It reads the input character and produces output sequence of tokens that the 
Parser uses for syntax analysis.
• It can either work as a separate module or as submodule.
• Lexical Analyzer is also responsible for eliminating comments and white 
spaces from the source program.
• It also generates lexical errors.
Source
Program
Lexical Analvzer
Token
Syntax Analyzer
Get Next Token
\ /
Symbol Table
• Lexical Analyzer is also responsible for eliminating comments and white 
spaces from the source program.
• It also generates lexical errors.
Tokens, Lexemes and Patterns
• A token describes a pattern of characters having same meaning in the source 
program such as identifiers, operators, keywords, numbers, delimiters and so 
on. A token may have a single attribute which holds the required information
Page 2


  
Lexical analyzer reads the source program character by character and returns the 
tokens of the source program. It puts information about identifiers into the symbol 
table.
The Role of Lexical Analyzer:
• It is the first phase of a compiler
• It reads the input character and produces output sequence of tokens that the 
Parser uses for syntax analysis.
• It can either work as a separate module or as submodule.
• Lexical Analyzer is also responsible for eliminating comments and white 
spaces from the source program.
• It also generates lexical errors.
Source
Program
Lexical Analvzer
Token
Syntax Analyzer
Get Next Token
\ /
Symbol Table
• Lexical Analyzer is also responsible for eliminating comments and white 
spaces from the source program.
• It also generates lexical errors.
Tokens, Lexemes and Patterns
• A token describes a pattern of characters having same meaning in the source 
program such as identifiers, operators, keywords, numbers, delimiters and so 
on. A token may have a single attribute which holds the required information
for that token. For identifiers, this attribute is a pointer to the symbol table 
and the symbol table holds the actual attributes for that token.
• Token type and its attribute uniquely identify a lexeme.
• Regular expressions are widely used to specify pattern.
Tokens, Patterns and Lexemes Lexeme: Sequence of character in the source pm 
that is matched against the pattern for a token Pattern: The rule associated with 
each set of string is called pattern. Lexeme is matched against pattern to generate 
token. Token: Token is word, which describes the lexeme in source pgm. Its is 
generated when lexeme is matches against pattern. Example: Lexeme: A1, Sum, 
Total
• Pattern: Starting with a letter and followed by letter or digit but not a keyword.
• Token:ID
Lexeme: If | Then | Else
• Pattern: If | Then | Else
• Token: IF | THEN | ELSE
Lexeme: 123.45
• Pattern: Starting with digit followed by a digit or optional fraction and or 
optional exponent
• Token: NUM
Counting Number of tokens :
A token is usually described by an integer representing the kind of token, possibly 
together with an attribute, representing the value of the token. For example, in most 
programming languages we have the following kinds of tokens.
• Identifiers (x, y, average, etc.)
• Reserved or keywords (if, else, while, etc.)
• Integer constants (42, O xFF, 0177 etc.)
• Floating point constants (5.6, 3.6e8, etc.)
• String constants ("hello there\n", etc.)
• Character constants ('a', 'b', etc.)
• Special symbols (()::= + - etc.)
• Comments (To be ignored.)
• Compiler directives (Directives to include files, define macros, etc.)
• Line information (We might need to detect newline characters as tokens, if 
they are syntactically important. We must also increment the line count, so 
that we can indicate the line number for error messages.)
• White space (Blanks and tabs that are used to separate tokens, but are 
otherwise not important).
• End of file
Each reserved word or special symbol is considered to be a different kind of token, 
as far as the parser is concerned. They are distinguished by a different integer to 
represent their kind.
Example :
Page 3


  
Lexical analyzer reads the source program character by character and returns the 
tokens of the source program. It puts information about identifiers into the symbol 
table.
The Role of Lexical Analyzer:
• It is the first phase of a compiler
• It reads the input character and produces output sequence of tokens that the 
Parser uses for syntax analysis.
• It can either work as a separate module or as submodule.
• Lexical Analyzer is also responsible for eliminating comments and white 
spaces from the source program.
• It also generates lexical errors.
Source
Program
Lexical Analvzer
Token
Syntax Analyzer
Get Next Token
\ /
Symbol Table
• Lexical Analyzer is also responsible for eliminating comments and white 
spaces from the source program.
• It also generates lexical errors.
Tokens, Lexemes and Patterns
• A token describes a pattern of characters having same meaning in the source 
program such as identifiers, operators, keywords, numbers, delimiters and so 
on. A token may have a single attribute which holds the required information
for that token. For identifiers, this attribute is a pointer to the symbol table 
and the symbol table holds the actual attributes for that token.
• Token type and its attribute uniquely identify a lexeme.
• Regular expressions are widely used to specify pattern.
Tokens, Patterns and Lexemes Lexeme: Sequence of character in the source pm 
that is matched against the pattern for a token Pattern: The rule associated with 
each set of string is called pattern. Lexeme is matched against pattern to generate 
token. Token: Token is word, which describes the lexeme in source pgm. Its is 
generated when lexeme is matches against pattern. Example: Lexeme: A1, Sum, 
Total
• Pattern: Starting with a letter and followed by letter or digit but not a keyword.
• Token:ID
Lexeme: If | Then | Else
• Pattern: If | Then | Else
• Token: IF | THEN | ELSE
Lexeme: 123.45
• Pattern: Starting with digit followed by a digit or optional fraction and or 
optional exponent
• Token: NUM
Counting Number of tokens :
A token is usually described by an integer representing the kind of token, possibly 
together with an attribute, representing the value of the token. For example, in most 
programming languages we have the following kinds of tokens.
• Identifiers (x, y, average, etc.)
• Reserved or keywords (if, else, while, etc.)
• Integer constants (42, O xFF, 0177 etc.)
• Floating point constants (5.6, 3.6e8, etc.)
• String constants ("hello there\n", etc.)
• Character constants ('a', 'b', etc.)
• Special symbols (()::= + - etc.)
• Comments (To be ignored.)
• Compiler directives (Directives to include files, define macros, etc.)
• Line information (We might need to detect newline characters as tokens, if 
they are syntactically important. We must also increment the line count, so 
that we can indicate the line number for error messages.)
• White space (Blanks and tabs that are used to separate tokens, but are 
otherwise not important).
• End of file
Each reserved word or special symbol is considered to be a different kind of token, 
as far as the parser is concerned. They are distinguished by a different integer to 
represent their kind.
Example :
No of tokens in the following C statement are___
/* abc sk/Printff1 what's up %d", ++&&** *a);
Solution:
/* abc */printf ("what's up %d", ++&&** *a)
/* abc */printf (“what’s up %d”, ++&&***a)
ignored by 
lexical analysed
Its entire 1 taken
hence # taker count to 11
Read More
90 docs

Top Courses for Computer Science Engineering (CSE)

Explore Courses for Computer Science Engineering (CSE) exam

Top Courses for Computer Science Engineering (CSE)

Signup for Free!
Signup to see your scores go up within 7 days! Learn & Practice with 1000+ FREE Notes, Videos & Tests.
10M+ students study on EduRev
Related Searches

Free

,

MCQs

,

Objective type Questions

,

Viva Questions

,

pdf

,

Sample Paper

,

Important questions

,

video lectures

,

Summary

,

ppt

,

study material

,

shortcuts and tricks

,

Exam

,

Short Notes: Lexical Analysis | Short Notes for Computer Science Engineering - Computer Science Engineering (CSE)

,

past year papers

,

practice quizzes

,

Short Notes: Lexical Analysis | Short Notes for Computer Science Engineering - Computer Science Engineering (CSE)

,

Semester Notes

,

Previous Year Questions with Solutions

,

Short Notes: Lexical Analysis | Short Notes for Computer Science Engineering - Computer Science Engineering (CSE)

,

mock tests for examination

,

Extra Questions

;