Compiler Design Week 8
Much of the materials on the slides were originally prepared by Prof. Engelen, FSU.
Detailed content
Weekly program
Week Week Week Week Week Week Week
Week
Week Week Week Week Week
1 – Introduction to Compiler Design 2 – Lexical Analysis
3 – CD Programming Language
4 – Syntax Analysis
5 – Top Down Parsing
6 – Symbol Table and Error Recovery 7 – Bottom-Up Parsing
8 – Semantic Analysis
9 – Stack Machine (SM) Architecture 10 – Code Generation
11 – Code Optimisation
12 – Revision
13 – Extra revision (if needed)
2
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
Week 08 Lecture Outline
Semantic Analysis
Semantic Analysis: what, why and when Syntax Directed Definition
Synthesized & Inherited Attributes
S-attributed & L-attributed definitions
Type System and Type Expressions
SDD for Type Checking
CD21 Semantic Analysis
3
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
Semantic Analysis
Source Code
Lexemes (tokens)
Syntax Object Tree Code
Lexical Analyser
• All tokens are structured according to the given CFG – from syntax analysis
• Can we start emitting target object code?
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
Syntax Analyser
Symbol Table
Code Gen.
Semantic Analysis will generally be done here
• All tokens are valid
– from lexical analysis
4
Semantic Analysis
• Semantics: the branch of linguistics, programming languages and
formal logic concerned with meaning.
• Semantic analysis is the front end’s penultimate phase and the compiler’s last chance to weed out incorrect programs.
• We need to ensure the program is sound enough to carry on to code generation where we will utilize the syntax tree for generating object code.
• Now is the time to ensure that all the context-sensitive rules are upheld.
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
5
Semantic Analysis: Tasks
• Compliance with the rules that govern the types of the operands in expressions and in assignments
• Compliance with the rules of visibility and uniqueness of identifiers
• Correctness of the language control structures
• Compliance with other semantic rules of
– import and export of objects
– inheritance relationship
– name-related checks
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
6
Example
Type Checks, Overloading, Coercion, and Polymorphism
int op(int), op(float);
int f(float);
int a, c[10], d;
d = c+d;
*d = a;
a = op(d);
a = f(d); vector
// FAIL
// FAIL
// OK: overloading (C++)
// OK: coersion
// OK: template instantiation
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
7
Example Uniqueness Checks
myfunc()
{ int i, j, i; // ERROR
… }
cnufym(int a, int a) // ERROR {…
}
struct myrec
{ int name;
};
struct myrec // ERROR
{ int id;
};
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
8
Example Flow-of-Control Checks
myfunc() {…
switch (a)
{ case 0:
…
break; // OK
case 1:
… }
}
myfunc() {…
break; // ERROR
}
myfunc() {…
while (n) {…
if (i>10)
break; // OK
} }
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
9
Example Name-Related Checks
LoopA: for (int I = 0; I < n; I++) {...
if (a[I] == 0)
break LoopB;
... }
21/09/2020
COMP3290 - Semester 2 - 2020| www.newcastle.edu.au
10
Semantic Analysis: Output
• The only real output of Semantic Analysis will be:
– If semantic errors are found:
• output of error messages
• suppression of code generation
– If there is no semantic error,
• the (valid and meaningful) syntax tree (and symbol table) is passed to the Code Generator.
21/09/2020
COMP3290 - Semester 2 - 2020| www.newcastle.edu.au
11
Semantic Error Reporting
• Notice no recovery or correction - you really are in a situation where the programmer does not know what he/she means.
– Recovery? - There is nothing to recover from - we have a program with a confusing meaning.
– Correction? - We could possibly implicitly (automatically) declare variables and warn of them but it is too dangerous.
• It is likely that an
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
12
Syntax-Directed Definitions
• A syntax-directed definition (SDD) is a CFG with attributes and rules
– Called attribute grammar (if there is no side effect in setting attribute values e.g. update of a global variable)
• InSDD
– Productions are bound with a set of semantic rules – Terminals and nonterminals have attributes
• Given a program’s syntax tree, the node for the grammar symbol gathers semantic information from its parent/sibling/children in the syntax tree and updates these attributes.
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
13
Example Attribute Grammar
21/09/2020
Production
L E n EE1 +T E T TT1 *F T F
F ( E )
F digit
Semantic Rule
L.val = E.val E.val:=E1.val+T.val E.val := T.val T.val:=T1.val*F.val T.val := F.val
F.val := E.val
F.val := digit.lexval
Note: all attributes in this example are of the synthesized type
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
14
Annotated Parse Tree
A parse tree showing the value(s) of its attribute(s)
L.val=16
E.val = 16
E.val = 14 T.val = 2
E.val = 9 T.val = 5 T.val = 9 F.val = 5 F.val = 9
F.val = 5
9+5+2n
Note: all attributes in this example are of the synthesized type
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
15
Attributes
• Attribute values can represent
– Numbers (literal constants)
– Strings (literal constants)
– A data type for type checking of expressions
– Scoping information for local declarations
– Intermediate program representations
– Memory locations, such as of a local variable or function argument
– Etc.
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
16
Synthesized VS Inherited Attributes
• Given a production
A
then each semantic rule is of the form
b := f(c1,c2,…,ck)
where f is a function and ci are attributes of A and , and either
– b is a synthesized attribute of A
– b is an inherited attribute of one of the grammar symbols in
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
17
Synthesized VS Inherited Attributes (cont’d)
21/09/2020
Production
Semantic Rule
inherited
D T L
T int ……
L id … := L.in
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
L.in := T.type T.type := ‘integer’
synthesized
18
Example Attribute Grammar with Synthesized+Inherited Attributes
Synthesized: T.val, T.syn Inherited: T.inh
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
19
Evaluation of SDD
Production
A B
Semantic Rule
A.s := B.i B.i := A.s +1
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
20
Dependency Graphs for Parse Trees
•
Edges in the dependence graph show the evaluation order for attribute values
A.a
AXY
A.a := f(X.x, Y.y)
X.x
Y.y
A.a
Direction of
A.a
X.x
Y.y
X.x := f(A.a, Y.y)
value dependence
Y.y := f(A.a, X.x)
X.x COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
Y.y
21/09/2020
21
Dependency Graphs with Cycles?
• Dependency graphs cannot be cyclic
A.a X.x Y.y
A.a := f(X.x) X.x := f(Y.y) Y.y := f(A.a)
Error: cyclic dependence
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
22
Example Annotated Parse Tree with Dependency Graph
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
23
Topological sort of Dependency Graph
• Topological sort: 1, 2, 3, 4, 5, 6, 7, 8, 9
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
24
S-Attributed Definitions
• A syntax-directed definition that uses synthesized attributes exclusively is called an S-attributed definition (or S-attributed grammar)
• In an S-attributed SDD, each rule computes an attribute for the nonterminal at the head of a production from attributes taken from the body of the production.
• A parse tree of an S-attributed definition can be annotated with a simple bottom-up traversal
• An S-attributed SDD can be implemented naturally in conjunction with an LR parser
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
25
Annotating a Parse Tree With Depth-First Traversals
21/09/2020
procedure visit(n : node); begin
for each child m of n, from left to right do visit(m);
evaluate semantic rules at node n end
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
26
Depth-First Traversals (Example)
L.val=16
E.val = 16
E.val = 14 T.val = 2
E.val = 9 T.val = 5 T.val = 9 F.val = 5 F.val = 9
F.val = 5
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
9+5+2n
Note: S-attributed grammar
27
L-Attributed Definitions
•
More precisely, a syntax-directed definition is L- attributed if each attribute must be
– –
•
• •
Either synthesized attribute or
Each inherited attribute of Xj on the right side of A X1 X2
… Xn depends only on
the inherited or synthesized attributes of the symbols X1, X2, …, Xj-1
the inherited attributes of A
“Use information from above or from the left”
Shown: dependences of inherited attributes
A.a X1.x X2.x
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
28
L-Attributed Definitions (cont’d)
•
L-attributed definitions allow for a natural order of evaluating attributes: depth-first and left to right
A X Y X.i:=A.i A A.s:=Y.s X.i := A.i Y.i := X.s
• •
A.s := Y.s Note: every S-attributed syntax-directed definition is also L-
attributed
Note: L-attribute is an essential grammar property for a one- pass compiler, because semantic rules can be applied directly during parsing and parse trees do not need to be kept in memory
XY
Y.i:=X.s
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
29
Recursive Descent Parser for L-Attributed Grammar
• Productions for each nonterminal are implemented as a subroutine
• Subroutine returns synthesized attributes of the nonterminal
• Subroutine takes inherited attributes of the nonterminal as subroutine arguments
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
30
S-Attributed Definitions for Generating Abstract Syntax Trees
Production E E1 + T E E1 – T ET TT1 *id TT1 /id T id
Semantic Rule
E.nptr := mknode(‘+’, E1.nptr, T.nptr)
E.nptr := mknode(‘-’, E1.nptr, T.nptr)
E.nptr := T.nptr
T.nptr := mknode(‘*’, T1.nptr, mkleaf(id, id.entry)) T.nptr := mknode(‘/’, T1.nptr, mkleaf(id, id.entry)) T.nptr := mkleaf(id, id.entry)
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
31
Translation Schemes
• A variant of Syntax Directed Definitions
Production
DTL
T int
T real LL1 ,id L id
Translation Scheme
Semantic Rule
L.in := T.type
T.type := ‘integer’
T.type := ‘real’
L1.in := L.in; addtype(id.entry, L.in) addtype(id.entry, L.in)
Semantic actions
D T { L.in := T.type } L
T int { T.type := ‘integer’ }
T real { T.type := ‘real’ }
L { L1.in := L.in } L1 , id { addtype(id.entry, L.in) } L id { addtype(id.entry, L.in) }
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
32
Type Systems
• A type is a set of values and associated operations.
• A type system is a collection of rules for assigning type expressions to various parts of the program.
– Impose constraints that help enforce correctness.
– Provide a high-level interface for commonly used constructs (for example, arrays, records).
– Make it possible to tailor computations to the type, increasing efficiency (for example, integer vs. real arithmetic).
– Different languages have different type systems.
– A type checker implements a type system.
– A programming language and said Strongly-Typed (Statically Typed), if each program is compiled only if free of type errors.
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
33
Type Expression
• The type system of a language is denoted by the type expression.
• A type expression can be:
– Basic types: such as int and bool
– Type names: A name that denotes the type of an expression.
• Such as typedefs in C and named types in Pascal
– Type Constructors (applied to other type expressions) • Arrays
• Pointers
• Functions
• Type system specifies the type equivalence technique
– Name Equivalence
– Structural Equivalence
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
34
Name Equivalence
• Each type name is a distinct type, even when the type expressions the names refer to are the same
• Types are identical only if names match
• Used by Pascal (inconsistently)
type link = ^node;
var next : link;
last : link;
p : ^node;
q, r : ^node;
With name equivalence in Pascal:
p ≠ next
p ≠ last p = q = r next = last
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
35
Structural Equivalence
• Two types are the same if they are structurally identical
• Used in C, Java, C#
pointer = struct
val next
pointer
struct
val int
next
int
pointer
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
36
Type Checking
•
We need to be able to assign types to all expressions in a program and show that they are all being used correctly.
Input: x * y + f(a,b)
• Are x, y and f declared?
• Can x and y be multiplied together?
• What is the return type of function f?
• Does f have the right number and
type of parameters?
• Can f’s return type be added to
something?
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
37
SDD for Type Checking
• While parsing input program, need to:
– Process declarations for given symbols
• Scope – what are the visible symbols in the current scope?
• Type – what is the declared type of the symbol?
• Lookup symbols used in program to find current binding
• Determine the type of the expressions in the program
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
38
A Simple Language Example
PD;S DD;D
| id : T
T boolean
| char
| integer
| array [ num ] of T |^T
S id := E
| if E then S
| while E do S
P D;S
|S;SD;DS;S E true
| false
| literal
| num
| id
| E and E |E+E |E[E] |E^
21/09/2020
id := E E+E
num id num
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
id : T integer
id:T id:=E
E+E
i: integer; j: integer; i := i + 1; j := j + 1
integer
id
39
Processing Declarations
D id : T
T boolean
T char
T integer
T array [ num ] of T1 T ^ T1
Put information into the symbol table
{ addtype(id.entry, T.type) }
{ T.type := boolean }
{ T.type := char }
{ T.type := integer }
{ T.type := array(1..num.val, T1.type) } { T.type := pointer(T1)
Accumulating information about the declared type
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
40
Type Checking Expressions
21/09/2020
E true E false E literal E num E id
…
{ E.type = boolean }
{ E.type = boolean }
{ E.type = char }
{ E.type = integer }
{ E.type = lookup(id.entry) }
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
41
Type Checking Expressions (cont’d)
E E1 and E2 E E1 + E2
E E1 [ E2 ] E E1 ^
{ E.type := if E1.type = boolean and E2.type = boolean
then boolean else type_error } { E.type := if E1.type = integer and
E2.type = integer
then integer else type_error }
{ E.type := if E1.type = array of t and E2.type = integer
then t else type_error } { E.type := if E1.type = pointer(t)
then t else type_error }
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
42
Type Checking Statements
S id := E
S if E then S1
S while E do S1 SS1 ;S2
{ S.type := if id.type = E.type then void else type_error }
{ S.type := if E.type = boolean then S1.type else type_error }
{ S.type := if E.type = boolean then S1.type else type_error }
{S.type:=ifS1.type=voidandS2.type=void then void else type_error }
In this case, we assume that statements do not have types (not always the case).
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
43
CD21 Semantic Analysis
• A simplified variant of Syntax Directed Definitions
• This can be done either as part of the parser or as a prelude to code generation.
• We simply update/augment the parser. Why?
– Because all the relevant information is right at hand.
• It will generally mean doing specific checks as the syntax tree is built – just before you return from a recursive descent routine.
– Normally you will still be able to build a valid syntax tree (useless but still valid syntactically)
• Each TreeNode returning function will incorporate its relevant semantic rules before it returns.
– For each semantic rule there will be extra code to check that usage in the source program satisfies the semantics.
– The Symbol Table now becomes the main ally of the syntax tree in being able to carry out Semantic Analysis.
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
44
Semantic Errors in CD21 (1)
•
• Array size – must be known at compile time – how can we check this?
• Strong typing exists for real variables, real arrays, arithmetic
operations, and boolean expressions. This flows through to all
statements as well as expressions and operators.
• The assignment operator accepts real variables or whole arrays on the
LHS
– Real array elements are real variables in their own right.
– Whole of array operations (eg A = B) are allowed.
– Whole of structure (array element) operations (eg A[i] = B[j]) allowed
– Whole arrays may be passed as actual parameters provided the
corresponding formal parameter is an array of the same type (either read only or read/write).
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
45
Semantic Errors in CD21 (2)
•
– The old declaration becomes inaccessible for the duration of the procedure or function.
•
• Actual parameters in a procedure or function call must match the type of their respective formal parameter in the procedure definition.
• The number of actual parameters in a procedure call must be equal to the number of formal parameters in the procedure definition.
• The function must have at least one return statement
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
46
The Role of The Symbol Table
• It is likely that you have symbol table records with attributes:
– a String which is the name
– a line number or position where the identifier was first found when it
was inserted into the symbol table.
• We now need to add extra fields, such as:
– Has this variable previously been declared?
– If so, what is its type (int, real, boolean, constant, structure-type-
name, array-type-name, array, function, etc – how would be a good
way to handle all these possibilities?)
– What “sort” of variable is it – parameter, global array, simple
variable – each of these may need different storage requirements in the code generator. The code generator will (later on) need to be able to allocate some style of memory allocation to the variable (e.g. address or register/offset).
– What new methods will you add to the Symbol Table class?
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
47
The Role of The Syntax Tree
• You will be able to do most (all?) of the semantic checking as you build the Syntax Tree (or immediately after).
– Build structures and relationships at declare time and check at usage time. • What about expressions?
– You can probably determine the type of value returned by a particular sub- expression at the time that that particular part of the syntax tree is being built.
– eg. you can test that a relational operator has two int/real (resultant sub-tree) operands and can record the knowledge that it will produce a boolean result. The sub-tree is built “on the way down” the recursive descent calls, and the type is added “on the way back out”.
– Once a complete tree’s type is known then it can be checked against the LHS for assignment or checked to be boolean, in the context of boolean expressions in for, repeat, and if statements
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
48
Example 1
• x : real, y: real, z: real
decls
decls
decl
decl
decl
X: line 12 yes
Y: line 12 yes
Z: line 12 yes
Real (type) special s/t obj
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
49
Example 1
• x : real
1. getNextToken(): (TIDENT,x)
NSDECL
2.getNextToken(): (TCOLN)
3.getNextToken(): TREAL
NSDECL
Name: x
Line No: 5
ID: TIDENT
Declared?= True
Type:
Name: x
Line No: 5
ID: TIDENT
Declared?= T/F
real
What happens if another variable X is declared?
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
50
Example 2
• x = y + z;
assign simple var
simple var
Add
simple var
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
X: line 12 yes
The asgn node will be able to check types
Y: line 12 yes
Z: line 12 yes
Real (type) special s/t obj
51
Example 3
CD21 ex1 ………
arrays R : X, Q : X
func Pr1 (par1 : X, par2 : real, par3 : real) : void loc1 : X, loc2 : real;
end
end CD21 ex1
…
par1[1].J = par2; Q[1].J = Q[1].J + 1; …
…
q, p;
Pr1(R, q, p);
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
52
Example 3
func Pr1 (par1 : X, par2 : real, par3 : real) : void
“Pr1” NFUND void
par local stats
Name: “Pr1”
Line No: 5
ID: TIDENT
Declared?= True
Type:
TypeName: simple
…
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
void
53
Example 3
func Pr1 (par1 : X, par2 : real, par3 : real) : void
“Pr1” void
NFUND
par local
stats
Check symbol table
“Pr1” void?
Pr1(R, q, p);
NCALL
R
q
Name: “Pr1”
Line No: 5
ID: TIDENT
Declared?= True
Type:
TypeName: simple
…
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
void
p
54
Example 3
func Pr1 (par1 : X, par2 : real, par3 : real) : void
“Pr1” void
NFUND
par1
par2 q
local
stats
Pr1(R, q, p);
NCALL
R
q
Name: “Pr1”
Line No: 5
ID: TIDENT
Declared?= True
Type:
TypeName: simple
…
par3 p
“Pr1” void?
Check symbol table
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
void
p
55
Example 4
CD21 ex1 ………
arrays R : X, Q : X
func Pr1 (par1 : X, par2 : real, R : real) : void Q : X, loc2 : real;
…
par1[1].J = par2; Q[1].J = Q[1].J + 1; …
end
func Pr2 (par1 : X, par2 : real, R : real) : void
end
end CD21 ex1
21/09/2020
Q : X, loc2 : real; …
…
local par1, loc2;
Pr1(R, par1, loc2);
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
56
References
• Compilers: Principles, Techniques, and Tools (2nd Edition)
– By A.V. Aho, Lam, R. Sethi, . Ullman
• Chapter 5, 6
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
57
Synthesized VS Inherited Attributes (cont’d)
• N.a is NOT allowed to be defined in terms of attribute values at the
children of node N, where a is inherited attribute node N
• N.a is allowed to be defined in terms of inherited attribute values at
node N itself, where a is a synthesized attribute of node N
• Terminals can have synthesized attributes, but not inherited attributes.
– Attributes for terminals have lexical values that are supplied by the lexical analyser
– There are no semantic rules in the SDD itself for computing the value of an attribute for a terminal.
21/09/2020
COMP3290 – Semester 2 – 2020| www.newcastle.edu.au
63