Lexical Analysis | Compiler Design - Computer Science Engineering (CSE) PDF Download

Introduction to Lexical Analysis

  • Lexical Analysis is the initial phase of the compiler, also known as a scanner.
  • It converts high-level input programs into a sequence of Tokens.
  • Implemented using Deterministic Finite Automata.
  • The output is a sequence of tokens sent to the parser for syntax analysis.

Token Definition

A lexical token is a sequence of characters treated as a unit in the grammar of programming languages.

Tokens : It  include type tokens (id, number, real), punctuation tokens (IF, void, return), alphabetic tokens (keywords like for, while, if), identifiers (variable name, function name), operators (+, ++, -), and separators (, ;).

Keywords; Examples-for, while, if etc.
Identifier; Examples-Variable name, function name etc.
Operators; Examples '+', '++', '-' etc.
Separators; Examples ',' ';' etc

Non-tokens:  It include comments, preprocessor directives, macros, blanks, tabs, newline, etc.

Lexeme: A lexeme is the sequence of characters forming a token or a single token's input sequence (e.g., "float", "abs_zero_Kelvin", "=", "-", "273", ";").

  • Input Preprocessing: Clean up input by removing comments, whitespace, and non-essential characters.
  • Tokenization: Break input into tokens using patterns or regular expressions.
  • Token Classification: Determine the type of each token (keywords, identifiers, operators).
  • Token Validation: Check each token's validity according to language rules.
  • Output Generation: Generate the final list of tokens for the next compilation or interpretation stage.Lexical Analysis | Compiler Design - Computer Science Engineering (CSE)

Lexical Analyzer in Action

  • Identifies errors with the help of automation and language grammar, providing row and column numbers.
  • Generates token sequences, e.g., for the statement "a = b + c;" it creates "id=id+id;" where each id refers to a variable in the symbol table.

For example, consider the program

int main()
{
// 2 variables
int a, b;
a = 10;
return 0;
}

All the valid tokens are:

'int'  'main'  '('  ')'  '{'  '}'  'int'  'a'  'b'  ';'
'a' '=' '10' ';' 'return' '0' ';' '}'
  • Above are the valid tokens.
  • You can observe that we have omitted comments.
  • As another example, consider below printf statement.

Lexical Analysis | Compiler Design - Computer Science Engineering (CSE)

There are 5 valid token in this printf statement.
Exercise 1:
Count number of tokens :

int main()
{
int a = 10, b = 20;
printf("sum is :%d",a+b);
return 0;
}
Answer: Total number of token: 27.
This doc is part of
26 videos|66 docs|30 tests
Join course for free


Exercise: Count number of tokens: int max(int i);
Answer: Lexical analyzer first read int and finds it to be valid and accepts as token.max is read by it and found to be a valid function name after reading (int is also a token , then again I as another token and finally.
Hence, Total number of tokens 7:     
int, max, ( ,int, i, ), ;

Representation of Tokens

Lexical Analysis | Compiler Design - Computer Science Engineering (CSE)

Download the notes
Lexical Analysis
Download as PDF
Download as PDF

Advantages

  • Efficiency: Improves parsing efficiency by breaking down input into smaller, manageable chunks.
  • Flexibility: Allows the use of keywords and reserved words, aiding language creation and modification.
  • Error Detection: Detects errors like misspellings, missing semicolons, and undefined variables.
  • Code Optimization: Identifies patterns for code optimization, improving program performance.
Take a Practice Test
Test yourself on topics from Computer Science Engineering (CSE) exam
Practice Now
Practice Now

Disadvantages

  • Complexity: Can be complex, requiring significant computational power, making it challenging to implement in some languages.
  • Limited Error Detection: Does not detect all errors, such as logic errors or type errors.
  • Increased Code Size: Addition of keywords and reserved words may increase code size, affecting readability.
  • Reduced Flexibility: Use of keywords and reserved words may reduce language flexibility.

The document Lexical Analysis | Compiler Design - Computer Science Engineering (CSE) is a part of the Computer Science Engineering (CSE) Course Compiler Design.
All you need of Computer Science Engineering (CSE) at this link: Computer Science Engineering (CSE)
Are you preparing for Exam? Then you should check out the best video lectures, notes, free mock test series, crash course and much more provided by EduRev. You also get your detailed analysis and report cards along with 24x7 doubt solving for you to excel in exam. So join EduRev now and revolutionise the way you learn!
Sign up for Free Download App for Free
26 videos|66 docs|30 tests

Up next

FAQs on Lexical Analysis - Compiler Design - Computer Science Engineering (CSE)

1. What is the main purpose of lexical analysis in computer science engineering?
Ans. Lexical analysis, also known as scanning, is the process of converting a sequence of characters into a sequence of tokens. Its main purpose is to recognize the basic elements of a programming language, such as keywords, identifiers, literals, and operators.
2. How does lexical analysis contribute to the overall compiler design process?
Ans. Lexical analysis is the first phase of the compiler design process. It helps in breaking down the input source code into tokens, which are then used by the parser for syntax analysis. This division of labor between lexical analysis and syntax analysis simplifies the overall compiler design.
3. What are some common challenges faced during lexical analysis?
Ans. Some common challenges faced during lexical analysis include handling white spaces, comments, and special characters, as well as identifying and differentiating between similar-looking tokens. Lexical analysis also needs to consider the language's reserved words and identifiers.
4. How does lexical analysis differ from syntax analysis in compiler design?
Ans. Lexical analysis focuses on recognizing and categorizing the basic elements of a programming language, such as keywords and identifiers, while syntax analysis focuses on the grammar rules and the structure of the language. Lexical analysis precedes syntax analysis in the compiler design process.
5. Can lexical analysis be performed manually, or is it usually automated in modern compilers?
Ans. While lexical analysis can be done manually, it is usually automated in modern compilers using tools like lex or Flex. Automating lexical analysis helps in improving efficiency, accuracy, and maintainability of the compiler design process.
26 videos|66 docs|30 tests
Download as PDF

Up next

Related Searches

Lexical Analysis | Compiler Design - Computer Science Engineering (CSE)

,

study material

,

past year papers

,

Lexical Analysis | Compiler Design - Computer Science Engineering (CSE)

,

pdf

,

Important questions

,

practice quizzes

,

Extra Questions

,

Free

,

Sample Paper

,

Exam

,

shortcuts and tricks

,

Semester Notes

,

mock tests for examination

,

Objective type Questions

,

Summary

,

ppt

,

video lectures

,

MCQs

,

Lexical Analysis | Compiler Design - Computer Science Engineering (CSE)

,

Previous Year Questions with Solutions

,

Viva Questions

;