Cheatsheet Compiler Design - RRB JE for Computer Science Engineering -

1. Compiler Phases

1.1 Overview

PhaseDescription
Lexical AnalysisReads character stream, generates tokens
Syntax AnalysisCreates parse tree from tokens using grammar
Semantic AnalysisType checking, scope resolution, semantic validation
Intermediate Code GenerationGenerates intermediate representation (IR)
Code OptimizationImproves IR for efficiency
Code GenerationProduces target machine code

2. Lexical Analysis

2.1 Key Concepts

TermDefinition
LexemeSequence of characters matching a token pattern
Token<token-name, attribute-value> pair
PatternRule describing lexeme formation

2.2 Regular Expressions

OperatorNotation
Unionr|s or r+s
Concatenationrs
Kleene Closurer* (zero or more)
Positive Closurer+ (one or more)
Optionalr? (zero or one)

2.3 Finite Automata

TypeProperties
NFAε-transitions allowed, multiple transitions on same symbol, non-deterministic
DFANo ε-transitions, exactly one transition per symbol, deterministic
  • RE → NFA: Thompson's Construction
  • NFA → DFA: Subset Construction (Powerset Construction)
  • DFA Minimization: State reduction using equivalence classes
  • Number of states in minimized DFA ≤ Number of states in DFA

3. Syntax Analysis (Parsing)

3.1 Context-Free Grammars

ComponentDescription
Terminals (T)Basic symbols (tokens)
Non-terminals (N)Syntactic variables
Start Symbol (S)Initial non-terminal
Productions (P)Rules: A → α

3.2 Derivations

TypeDescription
Leftmost DerivationReplace leftmost non-terminal at each step
Rightmost DerivationReplace rightmost non-terminal at each step
Ambiguous GrammarMultiple parse trees for same string

3.3 Parse Trees vs. Syntax Trees

  • Parse Tree: Shows complete derivation with all grammar symbols
  • Abstract Syntax Tree (AST): Condensed representation, operators as internal nodes

3.4 Top-Down Parsing

3.4.1 Recursive Descent Parsing

  • Each non-terminal has a procedure
  • May require backtracking
  • Predictive parsing: No backtracking with lookahead

3.4.2 FIRST and FOLLOW Sets

SetDefinition
FIRST(α)Set of terminals that begin strings derived from α; includes ε if α ⇒* ε
FOLLOW(A)Set of terminals that can appear immediately after A in a sentential form

3.4.3 LL(1) Parsing

  • Left-to-right scan, Leftmost derivation, 1 lookahead symbol
  • Uses parsing table M[N, T]
  • Table entry M[A, a] = {A → α} if a ∈ FIRST(α) or (ε ∈ FIRST(α) and a ∈ FOLLOW(A))
  • No entry with multiple productions → Grammar is LL(1)
  • LL(1) conditions: No ambiguity, no left recursion, left-factored

3.5 Bottom-Up Parsing

3.5.1 Shift-Reduce Parsing

ActionDescription
ShiftMove next input symbol onto stack
ReduceReplace handle on stack with LHS non-terminal
AcceptParsing complete successfully
ErrorSyntax error detected
  • Handle: Substring matching RHS of production in rightmost derivation
  • Viable Prefix: Prefix of right-sentential form not extending past handle

3.5.2 LR Parsing

  • Left-to-right scan, Rightmost derivation in reverse
  • LR(0), SLR(1), LALR(1), CLR(1) (Canonical LR)
  • Power hierarchy: LR(0) ⊂ SLR(1) ⊂ LALR(1) ⊂ CLR(1) (= LR(1)).

3.5.3 LR(0) Items

  • Item: Production with dot (•) in RHS indicating position
  • A → •XYZ, A → X•YZ, A → XY•Z, A → XYZ•
  • Augmented Grammar: Add S' → S to original grammar

3.5.4 SLR(1) Parsing

  • Construct canonical LR(0) collection of items
  • Reduce by A → α if [A → α•] in state and lookahead ∈ FOLLOW(A)
  • Conflicts: shift-reduce, reduce-reduce

3.5.5 CLR(1) vs LALR(1)

TypeDescription
CLR(1)Canonical LR, largest parsing table, most powerful
LALR(1)Merges CLR(1) states with same core, smaller table than CLR(1)
  • LALR(1) may introduce shift-reduce or reduce-reduce conflicts not present in CLR(1).
  • LALR(1) may introduce reduce-reduce conflicts not in CLR(1)

3.5.6 Operator Precedence Parsing

  • Uses precedence relations: ⋖ (yields precedence), ≐ (equal precedence), ⋗ (takes precedence)
  • Applicable only to operator grammars (no ε-productions and no two adjacent non-terminals)

4. Syntax-Directed Translation

4.1 Definitions

TermDefinition
Syntax-Directed Definition (SDD)CFG with semantic rules attached to productions
Synthesized AttributeValue computed from child attributes (bottom-up)
Inherited AttributeValue computed from parent/sibling attributes (top-down)

4.2 Attribute Grammar Types

TypeProperties
S-attributedOnly synthesized attributes, evaluated bottom-up
L-attributedInherited attributes depend on left siblings/parent, evaluated in depth-first order
  • All S-attributed grammars are L-attributed
  • L-attributed can be evaluated during LL parsing

5. Intermediate Code Generation

5.1 Intermediate Representations

TypeDescription
Three-Address Code (TAC)At most three operands per instruction: x = y op z
Quadruples(operator, arg1, arg2, result)
Triples(operator, arg1, arg2) with implicit result position
Indirect TriplesList of pointers to triples

5.2 Three-Address Code Instructions

  • Assignment: x = y op z, x = op y, x = y
  • Copy: x = y
  • Unconditional jump: goto L
  • Conditional jump: if x relop y goto L
  • Procedure call: call p, n; return y
  • Indexed assignment: x = y[i], x[i] = y
  • Address operations: x = &y, x = *y, *x = y

5.3 Translation of Expressions

  • Postfix notation: No parentheses needed, operands before operators
  • Infix to Postfix: Use operator stack with precedence rules
  • DAG (Directed Acyclic Graph): Identifies common subexpressions

5.4 Boolean Expressions

  • Numerical representation: true = 1, false = 0
  • Short-circuit evaluation: Conditional jumps, more efficient
  • Backpatching: Fill in addresses of jumps during one-pass compilation

5.5 Control Flow Translation

  • If-then-else: Generate labels for true, false, and next
  • While loop: Test at beginning, jump back to test
  • For loop: Initialization, test, increment, body
  • Switch-case: Jump table or sequence of conditional jumps

6. Code Optimization

6.1 Types of Optimization

LevelScope
LocalWithin basic block
GlobalWithin function (across basic blocks)
Inter-proceduralAcross functions

6.2 Basic Block

  • Sequence of consecutive statements with single entry and single exit
  • Leaders: First statement, target of jump, statement after jump
  • Control Flow Graph (CFG): Nodes are basic blocks, edges are control flow

6.3 Local Optimizations

TechniqueDescription
Common Subexpression EliminationReuse previously computed values
Constant FoldingEvaluate constant expressions at compile time
Copy PropagationReplace variable with its value after assignment
Dead Code EliminationRemove unreachable or unused code
Algebraic Simplificationx + 0 = x, x * 1 = x, x * 0 = 0
Strength ReductionReplace expensive operations: 2 * x → x + x, x² → x * x

6.4 Global Optimizations

6.4.1 Data Flow Analysis

AnalysisPurpose
Reaching DefinitionsWhich definitions reach a point
Live Variable AnalysisWhich variables are live at a point
Available ExpressionsWhich expressions are available at a point
Very Busy ExpressionsExpressions computed on all paths

6.4.2 Reaching Definitions

  • Forward analysis, union over all paths
  • gen[B]: Definitions generated in B
  • kill[B]: Definitions killed in B
  • out[B] = gen[B] ∪ (in[B] - kill[B])
  • in[B] = ∪ out[P] for all predecessors P

6.4.3 Live Variable Analysis

  • Backward analysis, union over all paths
  • use[B]: Variables used before definition in B
  • def[B]: Variables defined in B
  • in[B] = use[B] ∪ (out[B] - def[B])
  • out[B] = ∪ in[S] for all successors S

6.4.4 Available Expressions

  • Forward analysis, intersection over all paths
  • e_gen[B]: Expressions computed in B
  • e_kill[B]: Expressions killed in B
  • out[B] = e_gen[B] ∪ (in[B] - e_kill[B])
  • in[B] = ∩ out[P] for all predecessors P

6.5 Loop Optimizations

TechniqueDescription
Loop Invariant Code MotionMove computations outside loop if result unchanged
Induction Variable EliminationRemove variables that are linear functions of loop counter
Loop UnrollingReplicate loop body to reduce overhead
Loop FusionCombine adjacent loops with same bounds
  • Dominator: Node d dominates n if every path to n goes through d
  • Natural Loop: Single entry (header), back edge from node to dominating header

7. Runtime Environments

7.1 Storage Organization

RegionContents
CodeExecutable instructions
Static DataGlobal variables, constants
StackActivation records (grows downward)
HeapDynamic memory (grows upward)

7.2 Activation Records

  • Components: Return address, arguments, local variables, saved registers, control link, access link
  • Frame pointer (FP): Points to fixed location in activation record
  • Stack pointer (SP): Points to top of stack

7.3 Parameter Passing

MethodDescription
Call by ValueCopy of argument passed, changes not reflected
Call by ReferenceAddress passed, changes reflected in caller
Call by Copy-RestoreCopy in, compute, copy out
Call by NameTextual substitution with late binding

7.4 Access Links

  • Static scope: Access link points to activation record of lexically enclosing procedure
  • Nesting depth: Level of nesting in program structure
  • Follow access links to find non-local variables

7.5 Heap Management

TechniqueDescription
First FitAllocate first block large enough
Best FitAllocate smallest sufficient block
Worst FitAllocate largest available block

7.6 Garbage Collection

AlgorithmMethod
Reference CountingTrack number of references to each object
Mark and SweepMark reachable objects, sweep unmarked
Copy CollectionCopy live objects to new space
GenerationalSeparate young and old objects

8. Code Generation

8.1 Issues in Code Generation

  • Instruction selection: Choose appropriate machine instructions
  • Register allocation: Assign variables to registers
  • Instruction ordering: Optimize instruction sequence

8.2 Register Allocation

8.2.1 Register Descriptors

  • Track which variables are in which registers
  • Address descriptors: Track locations of variable values

8.2.2 Graph Coloring

  • Interference graph: Nodes are variables, edges connect simultaneously live variables
  • k-colorable if chromatic number ≤ k (k = number of registers)
  • Spilling: Store variable in memory when insufficient registers

8.3 Instruction Selection

  • Tile IR with machine instruction patterns
  • Tree pattern matching for expression evaluation
  • Consider cost of different instruction sequences

8.4 Peephole Optimization

TechniqueExample
Redundant Load/StoreMOV R, a; MOV a, R → MOV R, a
Unreachable CodeRemove code after unconditional jump
Control Flowgoto L1; L1: goto L2 → goto L2
Strength Reductionx = x * 2 → x = x + x

9. Error Handling

9.1 Error Types

TypeDetected By
LexicalScanner (e.g., illegal character)
SyntaxParser (e.g., missing semicolon)
SemanticSemantic analyzer (e.g., type mismatch)
LogicalRuntime or testing

9.2 Error Recovery Strategies

StrategyDescription
Panic ModeDiscard tokens until synchronizing token found
Phrase LevelLocal correction (insert/delete/replace token)
Error ProductionsAdd productions for common errors to grammar
Global CorrectionFind minimal sequence of changes (expensive)

10. Important Formulas and Facts

10.1 Time Complexity

  • Lexical analysis: O(n) where n = input length
  • LL(1) parsing: O(n)
  • LR parsing: O(n)
  • NFA to DFA conversion: O(2ⁿ) worst case where n = NFA states
  • DFA minimization: O(n log n) using Hopcroft's algorithm

10.2 Grammar Properties

  • Every regular language has a CFG
  • Not all CFLs are regular
  • Ambiguous grammar cannot be LL(1) or LR(k)
  • Left-recursive grammar cannot be LL(1)
  • Every LL(1) grammar is unambiguous
  • Every LR(1) grammar is unambiguous

10.3 Parsing Power

  • LL(1) ⊂ LR(0) ⊂ SLR(1) ⊂ LALR(1) ⊂ CLR(1)
  • Number of LR(0) states ≤ Number of CLR(1) states
  • Number of SLR(1) states = Number of LR(0) states
  • Number of LALR(1) states = Number of SLR(1) states

10.4 Optimization

  • Basic block optimization: Always safe, preserves semantics
  • Loop optimization: High impact on performance
  • Data flow equations: Iterated until fixed point reached
  • Optimizations must preserve program semantics
The document Cheatsheet: Compiler Design is a part of the Computer Science Engineering (CSE) Course RRB JE for Computer Science Engineering.
All you need of Computer Science Engineering (CSE) at this link: Computer Science Engineering (CSE)

FAQs on Cheatsheet: Compiler Design

1. What are the main phases of a compiler?
Ans. The main phases of a compiler include Lexical Analysis, Syntax Analysis (Parsing), Syntax-Directed Translation, Intermediate Code Generation, Code Optimization, Runtime Environments, and Code Generation. Each phase plays a crucial role in transforming high-level source code into machine code.
2. What is the purpose of Lexical Analysis in a compiler?
Ans. Lexical Analysis serves to read the source code and convert it into tokens, which are the basic building blocks used in the subsequent phases of compilation. It identifies keywords, operators, identifiers, and other components, ensuring that the code is syntactically correct before passing it to the Syntax Analysis phase.
3. How does Syntax Analysis contribute to the compilation process?
Ans. Syntax Analysis, or Parsing, checks the sequence of tokens generated by the Lexical Analysis phase to ensure they conform to the grammatical rules of the programming language. It constructs a parse tree or abstract syntax tree, which represents the hierarchical structure of the source code, facilitating further processing in later phases.
4. What is Code Optimization and why is it important?
Ans. Code Optimization is the phase of compilation that improves the efficiency of the intermediate code generated. It aims to reduce resource consumption, such as execution time and memory usage, by applying various techniques, such as removing redundant code and improving data access patterns, ultimately leading to better performance of the final executable.
5. What role does Error Handling play in compiler design?
Ans. Error Handling is crucial in compiler design as it detects and manages errors that occur during various compilation phases. It provides feedback to the programmer about syntax errors, semantic errors, and runtime errors, allowing for debugging and correction. Effective error handling enhances the usability of the compiler by providing informative messages and facilitating a smoother development process.
Explore Courses for Computer Science Engineering (CSE) exam
Get EduRev Notes directly in your Google search
Related Searches
Cheatsheet: Compiler Design, shortcuts and tricks, Cheatsheet: Compiler Design, MCQs, Viva Questions, Previous Year Questions with Solutions, Sample Paper, study material, Semester Notes, Exam, Summary, Cheatsheet: Compiler Design, practice quizzes, Important questions, mock tests for examination, Objective type Questions, video lectures, past year papers, Extra Questions, Free, pdf , ppt;