Page 1
1
11/30/2007 1
Code Generation
CSE413
Autumn 2007
11/30/2007 2
Compiler Structure (review)
Source Target
Scanner
Parser
Middle
(optimization)
Code Gen
characters
tokens
IR
IR (maybe different)
Assembly code
11/30/2007 3
What We Need
To run a D program:
Space needs to be allocated for a stack
and a heap
ESP and other registers need to have
sensible initial values
We need some way to communicate with
the outside world (I/O)
11/30/2007 4
Bootstraping from C
Idea: take advantage of the existing C
runtime library
Write a small C main program that calls
the main method in the assembly you
generate as if it were a C function.
C’s standard library provides the execution
environment
We can write C functions for get() and
put() and call them from our asm code.
11/30/2007 5
Bootstrap Program
The bootstrap will be a tiny C program
that calls your compiled code as if it
were an ordinary C function
It also contains some functions that
compiled code can call as needed
Mini “runtime library”
get(), put()
11/30/2007 6
Example Bootstrap Program
#include <stdio.h>
int d$main(); /* prototype for external function */
int main() {
printf("\nValue returned from D main: %d.\n", d$main());
return 0;
}
/* return next integer from standard input */
int get() { scanf.... }
/* write x to standard output */
int put(int x) { printf... }
Page 2
1
11/30/2007 1
Code Generation
CSE413
Autumn 2007
11/30/2007 2
Compiler Structure (review)
Source Target
Scanner
Parser
Middle
(optimization)
Code Gen
characters
tokens
IR
IR (maybe different)
Assembly code
11/30/2007 3
What We Need
To run a D program:
Space needs to be allocated for a stack
and a heap
ESP and other registers need to have
sensible initial values
We need some way to communicate with
the outside world (I/O)
11/30/2007 4
Bootstraping from C
Idea: take advantage of the existing C
runtime library
Write a small C main program that calls
the main method in the assembly you
generate as if it were a C function.
C’s standard library provides the execution
environment
We can write C functions for get() and
put() and call them from our asm code.
11/30/2007 5
Bootstrap Program
The bootstrap will be a tiny C program
that calls your compiled code as if it
were an ordinary C function
It also contains some functions that
compiled code can call as needed
Mini “runtime library”
get(), put()
11/30/2007 6
Example Bootstrap Program
#include <stdio.h>
int d$main(); /* prototype for external function */
int main() {
printf("\nValue returned from D main: %d.\n", d$main());
return 0;
}
/* return next integer from standard input */
int get() { scanf.... }
/* write x to standard output */
int put(int x) { printf... }
2
11/30/2007 7
Code Generation in Our
Project
11/30/2007 8
Generating .asm Code
Suggestion: isolate the actual output
operations in a handful of routines
Modularity & saves some typing
Possibilities
// write code string s to .asm output
void gen(String s) { … }
// write “op src,dst” to .asm output
void genbin(String op, String src, String dst) { … }
// write label L to .asm output as “L:”
void genLabel(String L) { … }
A handful of these methods should do it
11/30/2007 9
Agenda
Mapping source code to x86
Today: Stuff Needed for project: basic
statements and expressions
Next: Other cool stuff: Object
representation, method calls, and dynamic
dispatch
11/30/2007 10
Conventions for Examples
Examples show code snippets in isolation
A “real” code generator needs to worry about things
like :
Which registers are busy at which point in the program
Which registers to spill into memory when a new register is
needed and no free ones are available
You won’t need to worry about this for your compiler!
Register eax used below as a generic example
Rename as needed for more complex code involving multiple
registers
But for the most part in our compiler we can stick to just
using eax and ecx.
11/30/2007 11
A Simple Code Generation
Strategy
Priority: quick ‘n dirty correct code first,
optimize later if time
Treat the x86 as a 1-register stack
machine
11/30/2007 12
x86 as a Stack Machine
Idea: Use x86 stack for expression evaluation with
eax as the “top” of the stack
Invariant: Whenever an expression (or part of one)
is evaluated at runtime, the result is in eax
If a value needs to be preserved while another
expression is evaluated, push eax, evaluate, then pop
when needed
Remember: always pop what you push
Will produce lots of redundant, but correct, code
Page 3
1
11/30/2007 1
Code Generation
CSE413
Autumn 2007
11/30/2007 2
Compiler Structure (review)
Source Target
Scanner
Parser
Middle
(optimization)
Code Gen
characters
tokens
IR
IR (maybe different)
Assembly code
11/30/2007 3
What We Need
To run a D program:
Space needs to be allocated for a stack
and a heap
ESP and other registers need to have
sensible initial values
We need some way to communicate with
the outside world (I/O)
11/30/2007 4
Bootstraping from C
Idea: take advantage of the existing C
runtime library
Write a small C main program that calls
the main method in the assembly you
generate as if it were a C function.
C’s standard library provides the execution
environment
We can write C functions for get() and
put() and call them from our asm code.
11/30/2007 5
Bootstrap Program
The bootstrap will be a tiny C program
that calls your compiled code as if it
were an ordinary C function
It also contains some functions that
compiled code can call as needed
Mini “runtime library”
get(), put()
11/30/2007 6
Example Bootstrap Program
#include <stdio.h>
int d$main(); /* prototype for external function */
int main() {
printf("\nValue returned from D main: %d.\n", d$main());
return 0;
}
/* return next integer from standard input */
int get() { scanf.... }
/* write x to standard output */
int put(int x) { printf... }
2
11/30/2007 7
Code Generation in Our
Project
11/30/2007 8
Generating .asm Code
Suggestion: isolate the actual output
operations in a handful of routines
Modularity & saves some typing
Possibilities
// write code string s to .asm output
void gen(String s) { … }
// write “op src,dst” to .asm output
void genbin(String op, String src, String dst) { … }
// write label L to .asm output as “L:”
void genLabel(String L) { … }
A handful of these methods should do it
11/30/2007 9
Agenda
Mapping source code to x86
Today: Stuff Needed for project: basic
statements and expressions
Next: Other cool stuff: Object
representation, method calls, and dynamic
dispatch
11/30/2007 10
Conventions for Examples
Examples show code snippets in isolation
A “real” code generator needs to worry about things
like :
Which registers are busy at which point in the program
Which registers to spill into memory when a new register is
needed and no free ones are available
You won’t need to worry about this for your compiler!
Register eax used below as a generic example
Rename as needed for more complex code involving multiple
registers
But for the most part in our compiler we can stick to just
using eax and ecx.
11/30/2007 11
A Simple Code Generation
Strategy
Priority: quick ‘n dirty correct code first,
optimize later if time
Treat the x86 as a 1-register stack
machine
11/30/2007 12
x86 as a Stack Machine
Idea: Use x86 stack for expression evaluation with
eax as the “top” of the stack
Invariant: Whenever an expression (or part of one)
is evaluated at runtime, the result is in eax
If a value needs to be preserved while another
expression is evaluated, push eax, evaluate, then pop
when needed
Remember: always pop what you push
Will produce lots of redundant, but correct, code
3
11/30/2007 13
Constants
Source
17
x86
mov eax, 17
Idea: realize constant value in a register
Aside: Optimization: if constant is 0
xor eax, eax
11/30/2007 14
Use (RHS) of Variables
Source
(use of variable) a
x86
mov eax, [ebp+ -8]
All variables in our programs will be addressable
relative to ebp.
In this example a is a local variable.
If the offset was positive, it would be a parameter.
Check your symbol table to generate the correct
offset for a given variable.
11/30/2007 15
Assign (LHS) of Variables
Source
a = exp
x86
<code for exp, leaves result in eax>
mov [ebp+ -8], eax
11/30/2007 16
Example: var = exp;
Assuming that var is a local variable
<code for exp>
Generates code that leaves the result of
evaluating exp in eax
gen(mov [ebp+offset of variable],eax)
11/30/2007 17
Example: Generate Code for
Constants and Identifiers
Integer constants, say 17
gen(mov eax,17)
leaves value in eax
Variables
gen(mov eax, [ebp + appropriate offset])
also leaves value in eax
11/30/2007 18
Assignment Statement
Source
var = exp;
x86
<code to evaluate exp,
leaving result in register eax>
mov [ebp+offset
var
], eax
Page 4
1
11/30/2007 1
Code Generation
CSE413
Autumn 2007
11/30/2007 2
Compiler Structure (review)
Source Target
Scanner
Parser
Middle
(optimization)
Code Gen
characters
tokens
IR
IR (maybe different)
Assembly code
11/30/2007 3
What We Need
To run a D program:
Space needs to be allocated for a stack
and a heap
ESP and other registers need to have
sensible initial values
We need some way to communicate with
the outside world (I/O)
11/30/2007 4
Bootstraping from C
Idea: take advantage of the existing C
runtime library
Write a small C main program that calls
the main method in the assembly you
generate as if it were a C function.
C’s standard library provides the execution
environment
We can write C functions for get() and
put() and call them from our asm code.
11/30/2007 5
Bootstrap Program
The bootstrap will be a tiny C program
that calls your compiled code as if it
were an ordinary C function
It also contains some functions that
compiled code can call as needed
Mini “runtime library”
get(), put()
11/30/2007 6
Example Bootstrap Program
#include <stdio.h>
int d$main(); /* prototype for external function */
int main() {
printf("\nValue returned from D main: %d.\n", d$main());
return 0;
}
/* return next integer from standard input */
int get() { scanf.... }
/* write x to standard output */
int put(int x) { printf... }
2
11/30/2007 7
Code Generation in Our
Project
11/30/2007 8
Generating .asm Code
Suggestion: isolate the actual output
operations in a handful of routines
Modularity & saves some typing
Possibilities
// write code string s to .asm output
void gen(String s) { … }
// write “op src,dst” to .asm output
void genbin(String op, String src, String dst) { … }
// write label L to .asm output as “L:”
void genLabel(String L) { … }
A handful of these methods should do it
11/30/2007 9
Agenda
Mapping source code to x86
Today: Stuff Needed for project: basic
statements and expressions
Next: Other cool stuff: Object
representation, method calls, and dynamic
dispatch
11/30/2007 10
Conventions for Examples
Examples show code snippets in isolation
A “real” code generator needs to worry about things
like :
Which registers are busy at which point in the program
Which registers to spill into memory when a new register is
needed and no free ones are available
You won’t need to worry about this for your compiler!
Register eax used below as a generic example
Rename as needed for more complex code involving multiple
registers
But for the most part in our compiler we can stick to just
using eax and ecx.
11/30/2007 11
A Simple Code Generation
Strategy
Priority: quick ‘n dirty correct code first,
optimize later if time
Treat the x86 as a 1-register stack
machine
11/30/2007 12
x86 as a Stack Machine
Idea: Use x86 stack for expression evaluation with
eax as the “top” of the stack
Invariant: Whenever an expression (or part of one)
is evaluated at runtime, the result is in eax
If a value needs to be preserved while another
expression is evaluated, push eax, evaluate, then pop
when needed
Remember: always pop what you push
Will produce lots of redundant, but correct, code
3
11/30/2007 13
Constants
Source
17
x86
mov eax, 17
Idea: realize constant value in a register
Aside: Optimization: if constant is 0
xor eax, eax
11/30/2007 14
Use (RHS) of Variables
Source
(use of variable) a
x86
mov eax, [ebp+ -8]
All variables in our programs will be addressable
relative to ebp.
In this example a is a local variable.
If the offset was positive, it would be a parameter.
Check your symbol table to generate the correct
offset for a given variable.
11/30/2007 15
Assign (LHS) of Variables
Source
a = exp
x86
<code for exp, leaves result in eax>
mov [ebp+ -8], eax
11/30/2007 16
Example: var = exp;
Assuming that var is a local variable
<code for exp>
Generates code that leaves the result of
evaluating exp in eax
gen(mov [ebp+offset of variable],eax)
11/30/2007 17
Example: Generate Code for
Constants and Identifiers
Integer constants, say 17
gen(mov eax,17)
leaves value in eax
Variables
gen(mov eax, [ebp + appropriate offset])
also leaves value in eax
11/30/2007 18
Assignment Statement
Source
var = exp;
x86
<code to evaluate exp,
leaving result in register eax>
mov [ebp+offset
var
], eax
4
11/30/2007 19
Binary +
Source
exp1 + exp2
x86
<code evaluating exp1 into eax>
<code evaluating exp2 into ecx>
add eax, ecx
11/30/2007 20
Example: Generate Code for
exp1 + exp2
<code to calculate exp1>
generates code to evaluate exp1 and put result in eax
gen(push eax)
generate a push instruction
<code to calculate exp2>
generates code for exp2; result in eax
gen(pop ecx)
pop left argument into ecx; cleans up stack
gen(add eax, ecx)
perform the addition; result in eax
11/30/2007 21
Binary -, *
Same as +
Use sub for –
// Be sure to get order of operands correct!!
Use imul for *
Aside: Optimizations
Use left shift to multiply by powers of 2
Use x+x instead of 2*x, etc. (faster)
Use dec for x-1
11/30/2007 22
Control Flow
Basic idea: decompose higher level operation
into conditional and unconditional gotos
In the following, j
false
is used to mean:
“jump when a condition is false”
No such instruction on x86
Will have to realize with appropriate sequence of
instructions to set condition codes followed by
conditional jumps
Normally won’t actually generate the value “true”
or “false” in a register
11/30/2007 23
While
Source
while (cond) stmt
x86
test: <code evaluating cond>
j
false
done
<code for stmt>
jmp test
done:
11/30/2007 24
Labels
In x86 assembly language we’ll need to
produce unique labels for each if, while,
etc.
Labels can appear on a line by
themselves.
jmp label
will start execution on the first line of x86
code it finds after seeing the label.
Page 5
1
11/30/2007 1
Code Generation
CSE413
Autumn 2007
11/30/2007 2
Compiler Structure (review)
Source Target
Scanner
Parser
Middle
(optimization)
Code Gen
characters
tokens
IR
IR (maybe different)
Assembly code
11/30/2007 3
What We Need
To run a D program:
Space needs to be allocated for a stack
and a heap
ESP and other registers need to have
sensible initial values
We need some way to communicate with
the outside world (I/O)
11/30/2007 4
Bootstraping from C
Idea: take advantage of the existing C
runtime library
Write a small C main program that calls
the main method in the assembly you
generate as if it were a C function.
C’s standard library provides the execution
environment
We can write C functions for get() and
put() and call them from our asm code.
11/30/2007 5
Bootstrap Program
The bootstrap will be a tiny C program
that calls your compiled code as if it
were an ordinary C function
It also contains some functions that
compiled code can call as needed
Mini “runtime library”
get(), put()
11/30/2007 6
Example Bootstrap Program
#include <stdio.h>
int d$main(); /* prototype for external function */
int main() {
printf("\nValue returned from D main: %d.\n", d$main());
return 0;
}
/* return next integer from standard input */
int get() { scanf.... }
/* write x to standard output */
int put(int x) { printf... }
2
11/30/2007 7
Code Generation in Our
Project
11/30/2007 8
Generating .asm Code
Suggestion: isolate the actual output
operations in a handful of routines
Modularity & saves some typing
Possibilities
// write code string s to .asm output
void gen(String s) { … }
// write “op src,dst” to .asm output
void genbin(String op, String src, String dst) { … }
// write label L to .asm output as “L:”
void genLabel(String L) { … }
A handful of these methods should do it
11/30/2007 9
Agenda
Mapping source code to x86
Today: Stuff Needed for project: basic
statements and expressions
Next: Other cool stuff: Object
representation, method calls, and dynamic
dispatch
11/30/2007 10
Conventions for Examples
Examples show code snippets in isolation
A “real” code generator needs to worry about things
like :
Which registers are busy at which point in the program
Which registers to spill into memory when a new register is
needed and no free ones are available
You won’t need to worry about this for your compiler!
Register eax used below as a generic example
Rename as needed for more complex code involving multiple
registers
But for the most part in our compiler we can stick to just
using eax and ecx.
11/30/2007 11
A Simple Code Generation
Strategy
Priority: quick ‘n dirty correct code first,
optimize later if time
Treat the x86 as a 1-register stack
machine
11/30/2007 12
x86 as a Stack Machine
Idea: Use x86 stack for expression evaluation with
eax as the “top” of the stack
Invariant: Whenever an expression (or part of one)
is evaluated at runtime, the result is in eax
If a value needs to be preserved while another
expression is evaluated, push eax, evaluate, then pop
when needed
Remember: always pop what you push
Will produce lots of redundant, but correct, code
3
11/30/2007 13
Constants
Source
17
x86
mov eax, 17
Idea: realize constant value in a register
Aside: Optimization: if constant is 0
xor eax, eax
11/30/2007 14
Use (RHS) of Variables
Source
(use of variable) a
x86
mov eax, [ebp+ -8]
All variables in our programs will be addressable
relative to ebp.
In this example a is a local variable.
If the offset was positive, it would be a parameter.
Check your symbol table to generate the correct
offset for a given variable.
11/30/2007 15
Assign (LHS) of Variables
Source
a = exp
x86
<code for exp, leaves result in eax>
mov [ebp+ -8], eax
11/30/2007 16
Example: var = exp;
Assuming that var is a local variable
<code for exp>
Generates code that leaves the result of
evaluating exp in eax
gen(mov [ebp+offset of variable],eax)
11/30/2007 17
Example: Generate Code for
Constants and Identifiers
Integer constants, say 17
gen(mov eax,17)
leaves value in eax
Variables
gen(mov eax, [ebp + appropriate offset])
also leaves value in eax
11/30/2007 18
Assignment Statement
Source
var = exp;
x86
<code to evaluate exp,
leaving result in register eax>
mov [ebp+offset
var
], eax
4
11/30/2007 19
Binary +
Source
exp1 + exp2
x86
<code evaluating exp1 into eax>
<code evaluating exp2 into ecx>
add eax, ecx
11/30/2007 20
Example: Generate Code for
exp1 + exp2
<code to calculate exp1>
generates code to evaluate exp1 and put result in eax
gen(push eax)
generate a push instruction
<code to calculate exp2>
generates code for exp2; result in eax
gen(pop ecx)
pop left argument into ecx; cleans up stack
gen(add eax, ecx)
perform the addition; result in eax
11/30/2007 21
Binary -, *
Same as +
Use sub for –
// Be sure to get order of operands correct!!
Use imul for *
Aside: Optimizations
Use left shift to multiply by powers of 2
Use x+x instead of 2*x, etc. (faster)
Use dec for x-1
11/30/2007 22
Control Flow
Basic idea: decompose higher level operation
into conditional and unconditional gotos
In the following, j
false
is used to mean:
“jump when a condition is false”
No such instruction on x86
Will have to realize with appropriate sequence of
instructions to set condition codes followed by
conditional jumps
Normally won’t actually generate the value “true”
or “false” in a register
11/30/2007 23
While
Source
while (cond) stmt
x86
test: <code evaluating cond>
j
false
done
<code for stmt>
jmp test
done:
11/30/2007 24
Labels
In x86 assembly language we’ll need to
produce unique labels for each if, while,
etc.
Labels can appear on a line by
themselves.
jmp label
will start execution on the first line of x86
code it finds after seeing the label.
5
11/30/2007 25
Example:
Control Flow: Unique Labels
Needed: a String-valued method that
returns a different label each time it is
called (e.g., L1, L2, L3, …)
Variation: a set of methods that generate
different kinds of labels for different
constructs (can really help readability of
the generated code)
(while1, while2, while3, …; if1, if2, …; else1,
else2, ….)
11/30/2007 26
If
Source
if (cond) stmt
x86
<code evaluating cond>
j
false
skip
<code for stmt>
skip:
11/30/2007 27
If-Else
Source
if (cond) stmt1 else stmt2
x86
<code evaluating cond>
j
false
else
<code for stmt1>
jmp done
else: <code for stmt2>
done:
11/30/2007 28
Boolean Expressions
What do we do with this?
x > y
It is an expression that evaluates to
true or false
Could generate the value (0/1 or whatever
the local convention is)
But normally we don’t want/need the
value; we’re only trying to decide whether
to jump
11/30/2007 29
Code for exp1 > exp2
Basic idea: designate jump target, and
whether to jump if the condition is true
or if it is false
Example: exp1 > exp2, target L123,
jump on false
<evaluate exp1 to eax>
<evaluate exp2 to ecx>
cmp eax, ecx
jng L123
11/30/2007 30
Example exp1 < exp2
Similar to other binary operators
Difference: context is a target label and whether to
jump if true or false
x86:
<evaluate exp1 to eax>
gen(push eax)
<evaluate exp2 to eax>// eax contains exp2
gen(pop ecx) // ecx contains exp1
gen(cmp ecx, eax)
gen(condjump targetLabel)
appropriate conditional jump depending on sense of test
Read More