Lexical Analysis | Compiler Design - Computer Science Engineering (CSE) PDF Download

Introduction to Lexical Analysis

  • Lexical Analysis is the initial phase of the compiler, also known as a scanner.
  • It converts high-level input programs into a sequence of Tokens.
  • Implemented using Deterministic Finite Automata.
  • The output is a sequence of tokens sent to the parser for syntax analysis.

Token Definition

A lexical token is a sequence of characters treated as a unit in the grammar of programming languages.

Tokens : It  include type tokens (id, number, real), punctuation tokens (IF, void, return), alphabetic tokens (keywords like for, while, if), identifiers (variable name, function name), operators (+, ++, -), and separators (, ;).

Keywords; Examples-for, while, if etc.
Identifier; Examples-Variable name, function name etc.
Operators; Examples '+', '++', '-' etc.
Separators; Examples ',' ';' etc

Non-tokens:  It include comments, preprocessor directives, macros, blanks, tabs, newline, etc.

Lexeme: A lexeme is the sequence of characters forming a token or a single token's input sequence (e.g., "float", "abs_zero_Kelvin", "=", "-", "273", ";").

  • Input Preprocessing: Clean up input by removing comments, whitespace, and non-essential characters.
  • Tokenization: Break input into tokens using patterns or regular expressions.
  • Token Classification: Determine the type of each token (keywords, identifiers, operators).
  • Token Validation: Check each token's validity according to language rules.
  • Output Generation: Generate the final list of tokens for the next compilation or interpretation stage.Lexical Analysis | Compiler Design - Computer Science Engineering (CSE)

Lexical Analyzer in Action

  • Identifies errors with the help of automation and language grammar, providing row and column numbers.
  • Generates token sequences, e.g., for the statement "a = b + c;" it creates "id=id+id;" where each id refers to a variable in the symbol table.

For example, consider the program

int main()
{
// 2 variables
int a, b;
a = 10;
return 0;
}

All the valid tokens are:

'int'  'main'  '('  ')'  '{'  '}'  'int'  'a'  'b'  ';'
'a' '=' '10' ';' 'return' '0' ';' '}'
  • Above are the valid tokens.
  • You can observe that we have omitted comments.
  • As another example, consider below printf statement.

Lexical Analysis | Compiler Design - Computer Science Engineering (CSE)

There are 5 valid token in this printf statement.
Exercise 1:
Count number of tokens :

int main()
{
int a = 10, b = 20;
printf("sum is :%d",a+b);
return 0;
}
Answer: Total number of token: 27.


Exercise: Count number of tokens: int max(int i);
Answer: Lexical analyzer first read int and finds it to be valid and accepts as token.max is read by it and found to be a valid function name after reading (int is also a token , then again I as another token and finally.
Hence, Total number of tokens 7:     
int, max, ( ,int, i, ), ;

Representation of Tokens

Lexical Analysis | Compiler Design - Computer Science Engineering (CSE)

Advantages

  • Efficiency: Improves parsing efficiency by breaking down input into smaller, manageable chunks.
  • Flexibility: Allows the use of keywords and reserved words, aiding language creation and modification.
  • Error Detection: Detects errors like misspellings, missing semicolons, and undefined variables.
  • Code Optimization: Identifies patterns for code optimization, improving program performance.

Disadvantages

  • Complexity: Can be complex, requiring significant computational power, making it challenging to implement in some languages.
  • Limited Error Detection: Does not detect all errors, such as logic errors or type errors.
  • Increased Code Size: Addition of keywords and reserved words may increase code size, affecting readability.
  • Reduced Flexibility: Use of keywords and reserved words may reduce language flexibility.

The document Lexical Analysis | Compiler Design - Computer Science Engineering (CSE) is a part of the Computer Science Engineering (CSE) Course Compiler Design.
All you need of Computer Science Engineering (CSE) at this link: Computer Science Engineering (CSE)
26 videos|66 docs|30 tests

Top Courses for Computer Science Engineering (CSE)

FAQs on Lexical Analysis - Compiler Design - Computer Science Engineering (CSE)

1. What is the main purpose of lexical analysis in computer science engineering?
Ans. Lexical analysis, also known as scanning, is the process of converting a sequence of characters into a sequence of tokens. Its main purpose is to recognize the basic elements of a programming language, such as keywords, identifiers, literals, and operators.
2. How does lexical analysis contribute to the overall compiler design process?
Ans. Lexical analysis is the first phase of the compiler design process. It helps in breaking down the input source code into tokens, which are then used by the parser for syntax analysis. This division of labor between lexical analysis and syntax analysis simplifies the overall compiler design.
3. What are some common challenges faced during lexical analysis?
Ans. Some common challenges faced during lexical analysis include handling white spaces, comments, and special characters, as well as identifying and differentiating between similar-looking tokens. Lexical analysis also needs to consider the language's reserved words and identifiers.
4. How does lexical analysis differ from syntax analysis in compiler design?
Ans. Lexical analysis focuses on recognizing and categorizing the basic elements of a programming language, such as keywords and identifiers, while syntax analysis focuses on the grammar rules and the structure of the language. Lexical analysis precedes syntax analysis in the compiler design process.
5. Can lexical analysis be performed manually, or is it usually automated in modern compilers?
Ans. While lexical analysis can be done manually, it is usually automated in modern compilers using tools like lex or Flex. Automating lexical analysis helps in improving efficiency, accuracy, and maintainability of the compiler design process.
26 videos|66 docs|30 tests
Download as PDF
Explore Courses for Computer Science Engineering (CSE) exam

Top Courses for Computer Science Engineering (CSE)

Signup for Free!
Signup to see your scores go up within 7 days! Learn & Practice with 1000+ FREE Notes, Videos & Tests.
10M+ students study on EduRev
Related Searches

Lexical Analysis | Compiler Design - Computer Science Engineering (CSE)

,

Lexical Analysis | Compiler Design - Computer Science Engineering (CSE)

,

Free

,

practice quizzes

,

pdf

,

Extra Questions

,

Previous Year Questions with Solutions

,

Exam

,

Sample Paper

,

Objective type Questions

,

ppt

,

past year papers

,

Lexical Analysis | Compiler Design - Computer Science Engineering (CSE)

,

shortcuts and tricks

,

Semester Notes

,

Summary

,

Viva Questions

,

MCQs

,

Important questions

,

mock tests for examination

,

study material

,

video lectures

;