Introduction to Compiler Construction Notes | EduRev

: Introduction to Compiler Construction Notes | EduRev

 Page 1


9/26/2008
1
Introduction to Compiler Construction
David Notkin
Autumn 2008
Source
Program
[Higher-Level
Programming
Language]
Compiler
Target
Program
[Lower-Level
Language/
Architecture]
CSE401
“Compiler”: from the web
• The Oxford English Dictionary (OED) indicates that the first 
usage of the term is circa 1330, referring to one who collects 
and puts together materials
– They also note a usage “Diuerse translatours and 
compilaris” from Scotland in 1549
• Most dictionaries give the above definition as well as the 
computing-based definition (which the OED dates to 1953)
– A program that translates programs written in a high-level 
programming language into equivalent programs in a lower-
level language
• Wikipedia credits Grace Hopper with the first compiler (for a 
language called A-0) in 1952, and John Backus’ IBM team with 
the first complete compiler (for FORTRAN) in 1957
Trivia: In what year was I born?
CSE401 Au08 2
A world with no compilers
CSE401 Au08 3
Assembly/machine language coding
• …is slow, error-prone, tedious, not portable, …
• The size (roughly, lines of code) of a high-level 
language program relative to its assembly language 
equivalent is approximately linear – but that may well 
be a factor of 10 or even 100
– Microsoft Vista is something like 50 million lines of 
source code (50 MLOC)
• Printed double-sided something like triple the 
height of the Allen Center
• Something like 20 person-years just to retype
• Q: Why is harder to build a program 10 times larger?
CSE401 Au08 4
Page 2


9/26/2008
1
Introduction to Compiler Construction
David Notkin
Autumn 2008
Source
Program
[Higher-Level
Programming
Language]
Compiler
Target
Program
[Lower-Level
Language/
Architecture]
CSE401
“Compiler”: from the web
• The Oxford English Dictionary (OED) indicates that the first 
usage of the term is circa 1330, referring to one who collects 
and puts together materials
– They also note a usage “Diuerse translatours and 
compilaris” from Scotland in 1549
• Most dictionaries give the above definition as well as the 
computing-based definition (which the OED dates to 1953)
– A program that translates programs written in a high-level 
programming language into equivalent programs in a lower-
level language
• Wikipedia credits Grace Hopper with the first compiler (for a 
language called A-0) in 1952, and John Backus’ IBM team with 
the first complete compiler (for FORTRAN) in 1957
Trivia: In what year was I born?
CSE401 Au08 2
A world with no compilers
CSE401 Au08 3
Assembly/machine language coding
• …is slow, error-prone, tedious, not portable, …
• The size (roughly, lines of code) of a high-level 
language program relative to its assembly language 
equivalent is approximately linear – but that may well 
be a factor of 10 or even 100
– Microsoft Vista is something like 50 million lines of 
source code (50 MLOC)
• Printed double-sided something like triple the 
height of the Allen Center
• Something like 20 person-years just to retype
• Q: Why is harder to build a program 10 times larger?
CSE401 Au08 4
9/26/2008
2
Ergo: we need compilers
• And to have compilers, somebody has to build 
compilers
– At least every time there is a need to program in a 
new <programming language, architecture> pair
– Roughly how many pl’s and how many ISA’s?  
Cross product?
• Unless the compilers could be generated 
automatically – and parts can (a bit more on this later 
in the course)
Trivia: In what year did I first write a program?
In what language?  On what architecture?
CSE401 Au08 5
But why might you care?
• Crass reasons: jobs
• Class reasons: grade in 401
• Cool reasons: loveliest blending of theory and practice in 
computer science & engineering
• Cruel reasons: we all had to learn it ?
• Practice reasons: more experience with software design, 
modifying software written by others, etc.
• Practical reasons: the techniques are widely used outside of 
conventional compilers
• Super-practical reasons: lays foundation for understanding or 
even researching really cool stuff like JIT (just-in-time) 
compilers, compiling for multicore, building interpreters, scripting 
languages, (de)serializing data for distribution, and more…
CSE401 Au08 6
Better understand…
• Compile-time vs. run-time
• Interactions among
– language features
– implementation efficiency
– compiler complexity
– architectural features
CSE401 Au08 7
Compiling (or related) Turing Awards
• 1966 Alan Perlis
• 1972 Edsger Dijkstra
• 1976 Michael Rabin 
and Dana Scott
• 1977 John Backus
• 1978 Bob Floyd
• 1979 Bob Iverson
• 1980 Tony Hoare
• 1984 Niklaus Wirth
• 1987 John Cocke
• 2001 Ole-Johan Dahl 
and Kristen Nygaard
• 2003 Alan Kay
• 2005 Peter Naur
• 2006 Fran Allen
CSE401 Au08 8
Page 3


9/26/2008
1
Introduction to Compiler Construction
David Notkin
Autumn 2008
Source
Program
[Higher-Level
Programming
Language]
Compiler
Target
Program
[Lower-Level
Language/
Architecture]
CSE401
“Compiler”: from the web
• The Oxford English Dictionary (OED) indicates that the first 
usage of the term is circa 1330, referring to one who collects 
and puts together materials
– They also note a usage “Diuerse translatours and 
compilaris” from Scotland in 1549
• Most dictionaries give the above definition as well as the 
computing-based definition (which the OED dates to 1953)
– A program that translates programs written in a high-level 
programming language into equivalent programs in a lower-
level language
• Wikipedia credits Grace Hopper with the first compiler (for a 
language called A-0) in 1952, and John Backus’ IBM team with 
the first complete compiler (for FORTRAN) in 1957
Trivia: In what year was I born?
CSE401 Au08 2
A world with no compilers
CSE401 Au08 3
Assembly/machine language coding
• …is slow, error-prone, tedious, not portable, …
• The size (roughly, lines of code) of a high-level 
language program relative to its assembly language 
equivalent is approximately linear – but that may well 
be a factor of 10 or even 100
– Microsoft Vista is something like 50 million lines of 
source code (50 MLOC)
• Printed double-sided something like triple the 
height of the Allen Center
• Something like 20 person-years just to retype
• Q: Why is harder to build a program 10 times larger?
CSE401 Au08 4
9/26/2008
2
Ergo: we need compilers
• And to have compilers, somebody has to build 
compilers
– At least every time there is a need to program in a 
new <programming language, architecture> pair
– Roughly how many pl’s and how many ISA’s?  
Cross product?
• Unless the compilers could be generated 
automatically – and parts can (a bit more on this later 
in the course)
Trivia: In what year did I first write a program?
In what language?  On what architecture?
CSE401 Au08 5
But why might you care?
• Crass reasons: jobs
• Class reasons: grade in 401
• Cool reasons: loveliest blending of theory and practice in 
computer science & engineering
• Cruel reasons: we all had to learn it ?
• Practice reasons: more experience with software design, 
modifying software written by others, etc.
• Practical reasons: the techniques are widely used outside of 
conventional compilers
• Super-practical reasons: lays foundation for understanding or 
even researching really cool stuff like JIT (just-in-time) 
compilers, compiling for multicore, building interpreters, scripting 
languages, (de)serializing data for distribution, and more…
CSE401 Au08 6
Better understand…
• Compile-time vs. run-time
• Interactions among
– language features
– implementation efficiency
– compiler complexity
– architectural features
CSE401 Au08 7
Compiling (or related) Turing Awards
• 1966 Alan Perlis
• 1972 Edsger Dijkstra
• 1976 Michael Rabin 
and Dana Scott
• 1977 John Backus
• 1978 Bob Floyd
• 1979 Bob Iverson
• 1980 Tony Hoare
• 1984 Niklaus Wirth
• 1987 John Cocke
• 2001 Ole-Johan Dahl 
and Kristen Nygaard
• 2003 Alan Kay
• 2005 Peter Naur
• 2006 Fran Allen
CSE401 Au08 8
9/26/2008
3
Questions?
CSE401 Au08 9
Administrivia: see web
• Text: Engineering a Compiler, Cooper and Torczon, 
Morgan-Kaufmann 2004
• Mail list – automatically subscribed
• Google calendar with links
• Grading
– Project 40%
– Homework 15%
– Midterm 15%
– Final 25%
– Other (class participation, extra credit, etc.) 5%
CSE401 Au08 10
Project
• Start with a MiniJava compiler in Java
• Add features such as comments, floating-point, 
arrays, class variables, for loops, etc.
• Completed in stages over the term
• Not teams: but you can talk to each other (“Prison 
Break” rule, see web) for the project
• Grading basis: correctness, clarity of design and 
implementation, quality of test cases, etc.
CSE401 Au08 11
Compiler structure: overview
Source
Program
Target
Program
Compiler
Generate
(back end)
Analyze
(front end)
Intermediate
Representation
Lexical &
Syntactic &
Semantic
Intermediate Code Generation &
Optimization &
Code Generation
CSE401 Au08 12
Page 4


9/26/2008
1
Introduction to Compiler Construction
David Notkin
Autumn 2008
Source
Program
[Higher-Level
Programming
Language]
Compiler
Target
Program
[Lower-Level
Language/
Architecture]
CSE401
“Compiler”: from the web
• The Oxford English Dictionary (OED) indicates that the first 
usage of the term is circa 1330, referring to one who collects 
and puts together materials
– They also note a usage “Diuerse translatours and 
compilaris” from Scotland in 1549
• Most dictionaries give the above definition as well as the 
computing-based definition (which the OED dates to 1953)
– A program that translates programs written in a high-level 
programming language into equivalent programs in a lower-
level language
• Wikipedia credits Grace Hopper with the first compiler (for a 
language called A-0) in 1952, and John Backus’ IBM team with 
the first complete compiler (for FORTRAN) in 1957
Trivia: In what year was I born?
CSE401 Au08 2
A world with no compilers
CSE401 Au08 3
Assembly/machine language coding
• …is slow, error-prone, tedious, not portable, …
• The size (roughly, lines of code) of a high-level 
language program relative to its assembly language 
equivalent is approximately linear – but that may well 
be a factor of 10 or even 100
– Microsoft Vista is something like 50 million lines of 
source code (50 MLOC)
• Printed double-sided something like triple the 
height of the Allen Center
• Something like 20 person-years just to retype
• Q: Why is harder to build a program 10 times larger?
CSE401 Au08 4
9/26/2008
2
Ergo: we need compilers
• And to have compilers, somebody has to build 
compilers
– At least every time there is a need to program in a 
new <programming language, architecture> pair
– Roughly how many pl’s and how many ISA’s?  
Cross product?
• Unless the compilers could be generated 
automatically – and parts can (a bit more on this later 
in the course)
Trivia: In what year did I first write a program?
In what language?  On what architecture?
CSE401 Au08 5
But why might you care?
• Crass reasons: jobs
• Class reasons: grade in 401
• Cool reasons: loveliest blending of theory and practice in 
computer science & engineering
• Cruel reasons: we all had to learn it ?
• Practice reasons: more experience with software design, 
modifying software written by others, etc.
• Practical reasons: the techniques are widely used outside of 
conventional compilers
• Super-practical reasons: lays foundation for understanding or 
even researching really cool stuff like JIT (just-in-time) 
compilers, compiling for multicore, building interpreters, scripting 
languages, (de)serializing data for distribution, and more…
CSE401 Au08 6
Better understand…
• Compile-time vs. run-time
• Interactions among
– language features
– implementation efficiency
– compiler complexity
– architectural features
CSE401 Au08 7
Compiling (or related) Turing Awards
• 1966 Alan Perlis
• 1972 Edsger Dijkstra
• 1976 Michael Rabin 
and Dana Scott
• 1977 John Backus
• 1978 Bob Floyd
• 1979 Bob Iverson
• 1980 Tony Hoare
• 1984 Niklaus Wirth
• 1987 John Cocke
• 2001 Ole-Johan Dahl 
and Kristen Nygaard
• 2003 Alan Kay
• 2005 Peter Naur
• 2006 Fran Allen
CSE401 Au08 8
9/26/2008
3
Questions?
CSE401 Au08 9
Administrivia: see web
• Text: Engineering a Compiler, Cooper and Torczon, 
Morgan-Kaufmann 2004
• Mail list – automatically subscribed
• Google calendar with links
• Grading
– Project 40%
– Homework 15%
– Midterm 15%
– Final 25%
– Other (class participation, extra credit, etc.) 5%
CSE401 Au08 10
Project
• Start with a MiniJava compiler in Java
• Add features such as comments, floating-point, 
arrays, class variables, for loops, etc.
• Completed in stages over the term
• Not teams: but you can talk to each other (“Prison 
Break” rule, see web) for the project
• Grading basis: correctness, clarity of design and 
implementation, quality of test cases, etc.
CSE401 Au08 11
Compiler structure: overview
Source
Program
Target
Program
Compiler
Generate
(back end)
Analyze
(front end)
Intermediate
Representation
Lexical &
Syntactic &
Semantic
Intermediate Code Generation &
Optimization &
Code Generation
CSE401 Au08 12
9/26/2008
4
Lexical analysis (scanning, lexing)
Source
Program
Analyze:
scan; parse
Intermediate
Representation
t 6 :=
Fac . Com put eFac ( t hi s , t 3) ;
Scan
(lexical analysis)
Token Stream
Character Stream
28 characters not counting
whitespace
name=t6,assign,name=Fac,period,
name=ComputeFac,lparen,name=this,
comma,name=t3,rparen,semicolon
(11 tokens)
CSE401 Au08 13
Syntactic analysis
name=t6,assign,name=Fac,period,
name=ComputeFac,lparen,name=this,
comma,name=t3,rparen,semicolon
Assignment 
statement
Lefthand
side
Identifier: 
t6
Righthand
side
Method 
invocation
Method name
QualifiedName
Identifier:
Fac
Identifer: 
ComputeFac
Parameter List
Identifier: 
this
Identifier: 
t3
Analyze:
scan;parse
Abstract
syntax tree
CSE401 Au08 14
Assign… 
statement
Lefthand
side
Identifier
: t6
Righthand
side
Method 
invocation
Method name
Qualified…
Identifier:
Fac
Identifer: 
ComputeFac
Parameter List
Identifier: 
this
Identifier
: t3
Semantic analysis
• Annotate abstract 
syntax tree
• Primarily determine 
which identifiers 
are associated with 
which declarations
• Scoping is key 
issue
• Symbol table is key 
data structure
CSE401 Au08 15
Code generation (backend)
Target
Program
Generate
(back end)
Annotated abstract
syntax tree
Intermediate
Language
Intermediate 
code 
generation
Annotated abstract
syntax tree
Target code 
generation
Target
Program
CSE401 Au08 16
Page 5


9/26/2008
1
Introduction to Compiler Construction
David Notkin
Autumn 2008
Source
Program
[Higher-Level
Programming
Language]
Compiler
Target
Program
[Lower-Level
Language/
Architecture]
CSE401
“Compiler”: from the web
• The Oxford English Dictionary (OED) indicates that the first 
usage of the term is circa 1330, referring to one who collects 
and puts together materials
– They also note a usage “Diuerse translatours and 
compilaris” from Scotland in 1549
• Most dictionaries give the above definition as well as the 
computing-based definition (which the OED dates to 1953)
– A program that translates programs written in a high-level 
programming language into equivalent programs in a lower-
level language
• Wikipedia credits Grace Hopper with the first compiler (for a 
language called A-0) in 1952, and John Backus’ IBM team with 
the first complete compiler (for FORTRAN) in 1957
Trivia: In what year was I born?
CSE401 Au08 2
A world with no compilers
CSE401 Au08 3
Assembly/machine language coding
• …is slow, error-prone, tedious, not portable, …
• The size (roughly, lines of code) of a high-level 
language program relative to its assembly language 
equivalent is approximately linear – but that may well 
be a factor of 10 or even 100
– Microsoft Vista is something like 50 million lines of 
source code (50 MLOC)
• Printed double-sided something like triple the 
height of the Allen Center
• Something like 20 person-years just to retype
• Q: Why is harder to build a program 10 times larger?
CSE401 Au08 4
9/26/2008
2
Ergo: we need compilers
• And to have compilers, somebody has to build 
compilers
– At least every time there is a need to program in a 
new <programming language, architecture> pair
– Roughly how many pl’s and how many ISA’s?  
Cross product?
• Unless the compilers could be generated 
automatically – and parts can (a bit more on this later 
in the course)
Trivia: In what year did I first write a program?
In what language?  On what architecture?
CSE401 Au08 5
But why might you care?
• Crass reasons: jobs
• Class reasons: grade in 401
• Cool reasons: loveliest blending of theory and practice in 
computer science & engineering
• Cruel reasons: we all had to learn it ?
• Practice reasons: more experience with software design, 
modifying software written by others, etc.
• Practical reasons: the techniques are widely used outside of 
conventional compilers
• Super-practical reasons: lays foundation for understanding or 
even researching really cool stuff like JIT (just-in-time) 
compilers, compiling for multicore, building interpreters, scripting 
languages, (de)serializing data for distribution, and more…
CSE401 Au08 6
Better understand…
• Compile-time vs. run-time
• Interactions among
– language features
– implementation efficiency
– compiler complexity
– architectural features
CSE401 Au08 7
Compiling (or related) Turing Awards
• 1966 Alan Perlis
• 1972 Edsger Dijkstra
• 1976 Michael Rabin 
and Dana Scott
• 1977 John Backus
• 1978 Bob Floyd
• 1979 Bob Iverson
• 1980 Tony Hoare
• 1984 Niklaus Wirth
• 1987 John Cocke
• 2001 Ole-Johan Dahl 
and Kristen Nygaard
• 2003 Alan Kay
• 2005 Peter Naur
• 2006 Fran Allen
CSE401 Au08 8
9/26/2008
3
Questions?
CSE401 Au08 9
Administrivia: see web
• Text: Engineering a Compiler, Cooper and Torczon, 
Morgan-Kaufmann 2004
• Mail list – automatically subscribed
• Google calendar with links
• Grading
– Project 40%
– Homework 15%
– Midterm 15%
– Final 25%
– Other (class participation, extra credit, etc.) 5%
CSE401 Au08 10
Project
• Start with a MiniJava compiler in Java
• Add features such as comments, floating-point, 
arrays, class variables, for loops, etc.
• Completed in stages over the term
• Not teams: but you can talk to each other (“Prison 
Break” rule, see web) for the project
• Grading basis: correctness, clarity of design and 
implementation, quality of test cases, etc.
CSE401 Au08 11
Compiler structure: overview
Source
Program
Target
Program
Compiler
Generate
(back end)
Analyze
(front end)
Intermediate
Representation
Lexical &
Syntactic &
Semantic
Intermediate Code Generation &
Optimization &
Code Generation
CSE401 Au08 12
9/26/2008
4
Lexical analysis (scanning, lexing)
Source
Program
Analyze:
scan; parse
Intermediate
Representation
t 6 :=
Fac . Com put eFac ( t hi s , t 3) ;
Scan
(lexical analysis)
Token Stream
Character Stream
28 characters not counting
whitespace
name=t6,assign,name=Fac,period,
name=ComputeFac,lparen,name=this,
comma,name=t3,rparen,semicolon
(11 tokens)
CSE401 Au08 13
Syntactic analysis
name=t6,assign,name=Fac,period,
name=ComputeFac,lparen,name=this,
comma,name=t3,rparen,semicolon
Assignment 
statement
Lefthand
side
Identifier: 
t6
Righthand
side
Method 
invocation
Method name
QualifiedName
Identifier:
Fac
Identifer: 
ComputeFac
Parameter List
Identifier: 
this
Identifier: 
t3
Analyze:
scan;parse
Abstract
syntax tree
CSE401 Au08 14
Assign… 
statement
Lefthand
side
Identifier
: t6
Righthand
side
Method 
invocation
Method name
Qualified…
Identifier:
Fac
Identifer: 
ComputeFac
Parameter List
Identifier: 
this
Identifier
: t3
Semantic analysis
• Annotate abstract 
syntax tree
• Primarily determine 
which identifiers 
are associated with 
which declarations
• Scoping is key 
issue
• Symbol table is key 
data structure
CSE401 Au08 15
Code generation (backend)
Target
Program
Generate
(back end)
Annotated abstract
syntax tree
Intermediate
Language
Intermediate 
code 
generation
Annotated abstract
syntax tree
Target code 
generation
Target
Program
CSE401 Au08 16
9/26/2008
5
Optimization
• Takes place at various (and multiple) places during 
code generation
– Might optimize the intermediate language code
– Might optimize the target code
– Might optimize during execution of the program
• Q: Is it better to have an optimizing compiler or to 
hand-optimize code?
CSE401 Au08 17
Quotations about optimization
• Michael Jackson
– Rule 1: Don't do it.
– Rule 2 (for experts only): Don't do it yet.
• Bill Wulf
– More computing sins are committed in the name of 
efficiency (without necessarily achieving it) than for 
any other single reason – including blind stupidity.
• Don Knuth
– We should forget about small efficiencies, say about 
97% of the time: premature optimization is the root 
of all evil.
CSE401 Au08 18
Questions?
CSE401 Au08 19
Lexing: reprise
• Read in characters
• Clump into tokens
• Strip out whitespace and comments
• Tokens are specified using regular expressions
Ident ::= Letter AlphaNum* 
Integer ::= Digit+ 
AlphaNum ::= Letter | Digit 
Letter ::= 'a' | … | 'z' | 'A' | … | 'Z' 
Digit ::= '0' | … | '9'
• Q: regular expressions are equivalent to something 
you’ve previously learned about… what is it?
CSE401 Au08 20
Read More
Offer running on EduRev: Apply code STAYHOME200 to get INR 200 off on our premium plan EduRev Infinity!