Computer Science Engineering (CSE) Exam  >  Computer Science Engineering (CSE) Notes  >  Lexical Analysis, Computer Science and IT Engineering

Lexical Analysis, Computer Science and IT Engineering - Computer Science Engineering (CSE) PDF Download

LEXICAL ANALYSIS

A simple way to build lexical analyzer is to construct a diagram that illustrates the structure of the tokens of the source language, and then to hand-translate the diagram into a program for finding tokens. Efficient lexical analysers can be produced in this manner.

Role of Lexical Analyser

The lexical analyzer is the first phase of compiler. Its main task is to read the input characters and produces output a sequence of tokens that the parser uses for syntax analysis. As in the figure, upon receiving a “get next token” command from the parser the lexical analyzer reads input characters until it can identify the next token.

Lexical Analysis, Computer Science and IT Engineering - Computer Science Engineering (CSE)

Since the lexical analyzer is the part of the compiler that reads the source text, it may also perform certain secondary tasks at the user interface. One such task is stripping out from the source program comments and white space in the form of blank, tab, and new line character. Another is correlating error messages from the compiler with the source program.

 

Issues in Lexical Analysis

There are several reasons for separating the analysis phase of compiling into lexical analysis and parsing

1)  Simpler design is the most important consideration. The separation of lexical analysis from syntax analysis often allows us to simplify one or the other of these phases.

2) Compiler efficiency is improved.

3) Compiler portability is enhanced.

 

Tokens Patterns and Lexemes.

There is a set of strings in the input for which the same token is produced as output. This set of strings is described by a rule called a pattern associated with the token. The pattern is set to match each string in the set.

In most programming languages, the following constructs are treated as tokens: keywords, operators, identifiers, constants, literal strings, and punctuation symbols such as parentheses, commas, and semicolons.

Lexeme

Collection or group of characters forming tokens is called Lexeme. A lexeme is a sequence of characters in the source program that is matched by the pattern for the token. For example in the Pascal’s statement const pi = 3.1416; the substring pi is a lexeme for the token identifier.

 

Patterns

A pattern is a rule describing a set of lexemes that can represent a particular token in source program. The pattern for the token const in the above table is just the single string const that spells out the keyword.

Lexical Analysis, Computer Science and IT Engineering - Computer Science Engineering (CSE)

Certain language conventions impact the difficulty of lexical analysis. Languages such as FORTRAN require a certain constructs in fixed positions on the input line. Thus the alignment of a lexeme may be important in determining the correctness of a source program.

 

Attributes of Token

The lexical analyzer returns to the parser a representation for the token it has found. The representation is an integer code if the token is a simple construct such as a left parenthesis, comma, or colon. The representation is a pair consisting of an integer code and a pointer to a table if the token is a more complex element such as an identifier or constant.

The integer code gives the token type, the pointer points to the value of that token. Pairs are also retuned whenever we wish to distinguish between instances of a token.

The attributes influence the translation of tokens.

i) Constant : value of the constant

ii) Identifiers: pointer to the corresponding symbol table entry.

 

Error Recovery Strategies In Lexical Analysis

The following are the error-recovery actions in lexical analysis:

1) Deleting an extraneous character.

2) Inserting a missing character.

3) Replacing an incorrect character by a correct character.

4)Transforming two adjacent characters.

5) Panic mode recovery: Deletion of successive characters from the token until error is resolved.

The document Lexical Analysis, Computer Science and IT Engineering - Computer Science Engineering (CSE) is a part of Computer Science Engineering (CSE) category.
All you need of Computer Science Engineering (CSE) at this link: Computer Science Engineering (CSE)
Are you preparing for Computer Science Engineering (CSE) Exam? Then you should check out the best video lectures, notes, free mock test series, crash course and much more provided by EduRev. You also get your detailed analysis and report cards along with 24x7 doubt solving for you to excel in Computer Science Engineering (CSE) exam. So join EduRev now and revolutionise the way you learn!
Sign up for Free Download App for Free

FAQs on Lexical Analysis, Computer Science and IT Engineering - Computer Science Engineering (CSE)

1. What is lexical analysis in computer science and IT engineering?
Ans. Lexical analysis, also known as scanning, is an important phase in the compilation process of a programming language. It involves breaking down the source code into a stream of tokens, which are the smallest units of meaning in the language. These tokens are then used as input for the subsequent phases of the compiler.
2. What is the role of lexical analysis in computer science and IT engineering?
Ans. The role of lexical analysis is to analyze the source code and convert it into a sequence of tokens. This helps in identifying and categorizing the individual lexemes (such as keywords, identifiers, operators, etc.) in the code. Lexical analysis ensures that the syntax of the code is correct and prepares it for the subsequent stages of the compilation process.
3. How does lexical analysis work in computer science and IT engineering?
Ans. Lexical analysis works by scanning the source code character by character. It uses a set of rules, known as lexical rules or regular expressions, to identify and extract the tokens from the code. These rules define the patterns for different types of tokens, allowing the lexer to recognize keywords, literals, identifiers, and other language constructs.
4. What are some common challenges in lexical analysis for computer science and IT engineering?
Ans. Some common challenges in lexical analysis include handling whitespace and comments, dealing with escape sequences in strings and characters, and resolving ambiguities between similar token patterns. The lexer also needs to efficiently handle large source code files and provide informative error messages in case of lexical errors.
5. Can lexical analysis be performed manually in computer science and IT engineering?
Ans. Yes, lexical analysis can be performed manually, but it is often automated using tools known as lexers or lexical analyzers. These tools generate lexical analyzers based on a set of lexical rules and can handle complex tokenization tasks more efficiently. Manual lexical analysis can be error-prone and time-consuming, especially for large and complex programming languages.
Download as PDF
Related Searches

study material

,

Free

,

Computer Science and IT Engineering - Computer Science Engineering (CSE)

,

Objective type Questions

,

practice quizzes

,

ppt

,

mock tests for examination

,

Computer Science and IT Engineering - Computer Science Engineering (CSE)

,

video lectures

,

Extra Questions

,

pdf

,

Sample Paper

,

past year papers

,

Exam

,

MCQs

,

Previous Year Questions with Solutions

,

Semester Notes

,

Important questions

,

Lexical Analysis

,

Lexical Analysis

,

Computer Science and IT Engineering - Computer Science Engineering (CSE)

,

Lexical Analysis

,

Summary

,

Viva Questions

,

shortcuts and tricks

;