CS 536 / Fall 2016
CS 536 / Fall 2020
Introduction to programming languages and compilers
Aws Albarghouthi
aws@cs.wisc.edu
About me
PhD at University of Toronto
Joined University of Wisconsin in 2015
Part of madPL group
Program verification
Program synthesis
http://pages.cs.wisc.edu/~aws/
2
About the course
We will study compilers
We will understand how they work
We will build a full compiler
We will have fun
3
Course Mechanics
Home page: http://cs.wisc.edu/~aws/courses/cs536-f20/
Workload:
6 Programs (40% = 5% + 7% + 7% + 7% + 7% + 7%)
2 exams (midterm: 30% + final: 30%)
5
A compiler is a
recognizer of language S
a translator from S to T
a program in language H
6
front end = recognize source code S;
map S to IR
IR = intermediate representation
back end = map IR to T
Executing the T program produces the same
result as executing the S program?
Phases of a compiler
7
front end
back end
Symbol
table
P1
P2
P3
P4, P5
P6
Scanner (P2)
Input: characters from source program
Output: sequence of tokens
Actions:
group chars into lexemes (tokens)
Identify and ignore whitespace, comments, etc.
Error checking:
bad characters such as ^
unterminated strings, e.g., “Hello
int literals that are too large
8
Example
9
a = 2 * b + abs(-71)
scanner
ident
(a)
asgn
int lit
(2)
times
ident
(b)
plus
ident
(abs)
lparens
minus
int lit
(71)
rparens
a = 2 *b+
abs ( –
71)
ident
(a)
asgn
int lit
(2)
times
ident
(b)
plus
ident
(abs)
lparens
minus
int lit
(71)
rparens
Whitespace (spaces, tabs, and newlines) filtered out
The scanner’s output is still the sequence
Parser (P3)
Input: sequence of tokens from the scanner
Output: AST (abstract syntax tree)
Actions:
groups tokens into sentences
Error checking:
syntax errors, e.g., x = y *= 5
(possibly) static semantic errors, e.g., use of undeclared variables
10
Semantic analyzer (P4,P5)
Input: AST
Output: annotated AST
Actions: does more static semantic checks
Name analysis
process declarations and uses of variables
enforces scope
Type checking
checks types
augments AST w/ types
11
Semantic analyzer (P4,P5)
Scope example:
12
…
{
int i = 4;
i++;
}
i = 5;
out of scope
Intermediate code generation
Input: annotated AST (assumes no errors)
Output: intermediate representation (IR)
e.g., 3-address code
instructions have 3 operands at most
easy to generate from AST
1 instr per AST internal node
13
Phases of a compiler
14
front end
back end
Symbol
table
P1
P2
P3
P6
P4, P5
Example
15
a = 2 * b + abs(-71)
scanner
parser
ident
(a)
asgn
int lit
(2)
times
ident
(b)
plus
ident
(abs)
lparens
minus
int lit
(71)
rparens
Example (cont’d)
16
semantic analyzer
Symbol
table
a var int
b var int
abs fun int->int
Example (cont’d)
17
code generation
tmp1 = 0 – 71
move tmp1 param1
call abs
move ret1 tmp2
tmp3 = 2*b
tmp4 = tmp3 + tmp2
a = tmp4
Optimizer
Input: IR
Output: optimized IR
Actions: Improve code
make it run faster; make it smaller
several passes: local and global optimization
more time spent in compilation; less time in execution
18
Code generator (~P6)
Input: IR from optimizer
Output: target code
19
This slide does not exactly match what will be done in Program 6. In CS 536, we are building a simple compiler that does not have an optimization phase. Thus, instead of translating the (annotated) AST to intermediate code and then translating the intermediate code to target code, P6 goes directly from the (annotated) AST to target code
Symbol table (P1)
Compiler keeps track of names in
semantic analyzer — both name analysis and type checking
code generation — offsets into stack
optimizer — def-use info
P1: implement symbol table
20
Symbol table
Block-structured language
Java, C, C++
Ideas:
nested visibility of names (no access to a variable out of scope)
easy to tell which def of a name applies (nearest definition)
lifetime of data is bound to scope
21
Symbol table
int x, y;
void A() {
double x, z;
C(x, y, z)
}
void B() {
C(x, y, z);
}
22
block structure: need
symbol table with nesting
implement as list of hashtables