Semantic Analysis
Semantic Analysis
CS106 — Compiler Principles and Construction
Fall 2011
MUST FIT
Zhiyao Liang
Semantic Analysis Dr. Zhiyao Liang
1
What is semantic analysis
There are requirements by the programming language definition that cannot be specified by context-free grammar.
Semantics here we mean static semantics, the properties of a program that can be checked by a compiler before running the program.
Common tasks of a semantic analyzer:
Maintain a symbol table
to keep track the meaning of names, like scope of variables.
Type checking.
A type checker is a semantic analyzer that computes the data type attribute of all language entities and verifies that these types conform to the type rules of the language.
Semantic Analysis Dr. Zhiyao Liang
2
Two categories and two aspects of semantic analysis
Two categories of purposes of semantic analysis:
For correctness requirements of programming language.
Ada and C are statically typed.
LISP and Smalltalk have no static semantic analysis at all.
Pascal is between the two extremes.
For generating efficient code.
Like constant folding, replace x = 3 + 4 + 5; with x = 12;
Two aspects of works are involved:
Description of the requirements
Implementation
A compiler writer need to do the two aspects of work. There is no widely used automatic analyzer generator (unlike Lex and Yacc).
Semantic Analysis Dr. Zhiyao Liang
3
Description of the analyses to perform.
Common approach is to identify attributes of program entities.
such as type, value, syntax tree of a varaible X.
Relations between these attributes can be described by attribute equations, (also called semantic rules)
These attributes and attributes equations are together called attribute grammar.
There is no universal specification mechanism.
Semantic Analysis Dr. Zhiyao Liang
4
Attributes of a program
Data type
value of an expression.
Location of a variable in memory.
radix of number (octal, decimal, or hexadecimal).
Syntax tree
…
Semantic Analysis Dr. Zhiyao Liang
5
Attribute Grammar
Given a collection of attributes
a1, a2, …ak,
For a grammar rule
X0 X1X2…Xn,
An attribute equation or semantic rule has the form:
xi.aj =
fij( x0.a1,…, x0.ak,
x1.a1,…, x1.ak,
…,
xn.a1,…, xn.ak)
Semantic Analysis Dr. Zhiyao Liang
6
Example of attribute grammar
Grammar:
number number digit | digit
digit 0 | 1 | 2| 3 | 4| 5| 6| 7 | 8 | 9 |
Attribute grammar:
number.val = digit.val
number1.val = number2.val * 10 + digit.val
Semantic Analysis Dr. Zhiyao Liang
7
Semantic Analysis Dr. Zhiyao Liang
8
number.val can be computed in a parser.
A parse tree showing attribute computation.
Semantic Analysis Dr. Zhiyao Liang
9
We do not need to construct this tree; it just shows the computation structure.
Another example: values
Semantic Analysis Dr. Zhiyao Liang
10
Parse tree for
(34-3) *42
Semantic Analysis Dr. Zhiyao Liang
11
Semantic Analysis Dr. Zhiyao Liang
12
Parse tree for
float x, y
Semantic Analysis Dr. Zhiyao Liang
13
We can use if then else .
We can also use functions .
Attribute grammar for abstract syntax trees of simple integer arithmetic expressions
Semantic Analysis Dr. Zhiyao Liang
14
Dependency Graph and Evaluation Order
Dependency graph shows the computation order in a attribute grammar.
number.val = digit.val
number1.val = number2.val * 10 + digit.val
Semantic Analysis Dr. Zhiyao Liang
15
Example
decl type var-list
var-list id, var-list
type int | float
Semantic Analysis Dr. Zhiyao Liang
16
id.dtpe = var-list1.dtype
var-list2.dtpe = var-list.dtype
Dependency graph of float x, y
over its parse tree (dashed line)
Semantic Analysis Dr. Zhiyao Liang
17
dependency graph of 345o
refer to slide 13
Semantic Analysis Dr. Zhiyao Liang
18
Computation order
Semantic Analysis Dr. Zhiyao Liang
19
Only when the dependency graph has no circle (A DAG) can we do it.
Synthesized and Inherited Attributes
An attribute is synthesized if all its dependencies point from child to parent in the parse tree.
An attribute grammar in which all attributes are synthesized is called an S-attributed grammar.
An attribute that is not synthesized is called an inherited attribute
Has dependency either from parent or from sibling.
Semantic Analysis Dr. Zhiyao Liang
20
Computing attributes based on a syntax tree.
Attributes of an S-attributed grammar can be computed by a post-order traversal of the tree:
Semantic Analysis Dr. Zhiyao Liang
21
Computing attributes based on a syntax tree, cont.
If all attributes are inherited, then these attributes can be computed by a pre-order traversal of the tree.
Semantic Analysis Dr. Zhiyao Liang
22
Example of Pre-order traversal
float x, y
Semantic Analysis Dr. Zhiyao Liang
23
Computing attributes based on a syntax tree, cont.
If synthesized attributes depend on inherited attributes, but inherited attributes do not depend on synthesized attribute, we can combine the pre-order and post-order evaluation together:
Semantic Analysis Dr. Zhiyao Liang
24
Other forms of attributes computation
Attributes can be the return value or parameter of function calls.
For example, how do we compute the syntax tree of C-Minus?
Attributes can be stored by external data structure, instead of by the tree nodes.
The prime example, symbol table.
Some attributes can be computed during parsing.
Like the syntax tree.
Semantic Analysis Dr. Zhiyao Liang
25
Symbol table
Principle operations on a symbol tables are:
Insert
Lookup
Delete
Which data structure can do these three kind of operation efficiently?
Commonly, hash tables are used.
Semantic Analysis Dr. Zhiyao Liang
26
About hash tables.
A hash table is an array of entries, called buckets, with integer indices.
A hash function turns the search key (A character string) into an integer hash value in the index range.
Collisions occur when two keys are mapped to the same hash value.
Open addressing.
Separate chaining.
double hashing .
Semantic Analysis Dr. Zhiyao Liang
27
Separate chaining is common
Semantic Analysis Dr. Zhiyao Liang
28
Common hash function
Semantic Analysis Dr. Zhiyao Liang
29
Avoid the overflow problem
Declarations
The names (and associated information) introduced by declarations will be saved in symbol tables.
There are different kinds of declarations:
Constant declarations: const int size = 199;
#define is a preprocessor, not handled by compiler.
Type declarations:
struct, union, enum in C.
typedef in C: typedef struct Entry * EntryPtr;
Variable declarations: int a, b[100];
Procedure/function declarations.
Like the function prototypes and definitions in C.
Semantic Analysis Dr. Zhiyao Liang
30
Declarations, cont.
Explicit declarations: like those in C.
Implicit declarations (declaration by use)
like in Fortran and Basic
For example, variables with name starting with I through N, others are real.
Semantic Analysis Dr. Zhiyao Liang
31
How many symbol tables
For some language: one symbol table can be used to hold all declarations.
For languages Algol-derived languages like C and Pascal, whose programs have scopes:
we may wish to associated separate symbol tables with different regions (different scopes) and link them together.
Semantic Analysis Dr. Zhiyao Liang
32
Binding attributes to a name introduced by a declaration
Binding data type to a name.
Binding scope to a name.
Scope is implied by the position of the declaration in a program.
Memory allocation of a name:
Whether memory is allocated for the name.
For example: extern declarations do not perform allocation.
Declaration vs. definition
Lifetime or extent of the memory allocation.
For example, static local variables and global variables exist through out the whole program execution.
Automatic duration vs. static duration.
Semantic Analysis Dr. Zhiyao Liang
33
Declaration vs. definition:
In C, we call a statement a “declaration” when memory allocation is not performed, and definition when otherwise.
33
Scope Rules and Block Structure
Declaration before use
Like C and Pascal
Advantage: allow one-pass compiling.
A block in a program is any construct that can contain declaration.
A language is block structured if it permits the nesting blocks inside other blocks.
Most closely nested rule : If there are several legal declarations with the same, the one in the most closely nested block to the reference applies.
In C, a block is surrounded by { }.
How about those used by struct declaration?
We can consider the function head (name and parameters) and its body forms a single block.
Semantic Analysis Dr. Zhiyao Liang
34
Scope rules in C
Declaration before use.
Most nested rule.
One name cannot be declared twice on the same level.
The scope of a name is the region in the program that starts from the declaration of the name and finish at the end of the most nested block containing the name.
A name can only be (seen) used within its scope.
There are complicate situations.
in C++, a filed f of class A can be accessed by A::f.
in C, a field f of a struct A can be accessed as A.f.
Understanding: extern, static, global variables, functions.
Semantic Analysis Dr. Zhiyao Liang
35
Semantic Analysis Dr. Zhiyao Liang
36
Example using One Pascal program. One design is to use one symbol table, and let it expand and shrink, and it only keeps the information of currently usable names according to the scope rules.
36
Maybe a better choice
Semantic Analysis Dr. Zhiyao Liang
37
We show another choice of symbol table design. comparing to the graph (b) of the previous slide. Sometimes the delete operations can be implemented by “hidden” those information.
37
Dynamic scope vs. Static scope
#include
int i =1;
void f(void)
{printf(“%d\n”, i);}
int main(void){
int i = 2;
f();
return 0;
}
Semantic Analysis Dr. Zhiyao Liang
38
What value is printed out?
Using static scope, like C, the value printed out is 1.
Using dynamic scope, like the older versions of LISP, SNOBOL, some database query language, the value printed out is 2.
About dynamic scope
The symbol table should simulate execution, and is part of the runtime environment.
Incompatible for static type checking.
If global i is defined as a double, what is the type of the i inside f()?
Interaction of Same-Level Declarations
One common rule:
A name cannot appear in two declarations on the same level.
What is the value of h?
With sequential declaration, like C, h is 3.
With collateral declaration, like ML and Scheme, h is 2.
Semantic Analysis Dr. Zhiyao Liang
39
int j = 1;
void f(void){
int j = 2, h = j + 1;
…
}
What does a semantic analyzer do?
We can let semantic analyzer to check scope requirements.
Memory location of global and static variables.
Semantic Analysis Dr. Zhiyao Liang
40
Interaction of Same-Level Declarations, cont.
For recursive declaration (a function calls itself), the name of the function should be entered into the symbol table before the recursive call.
C uses function prototype to deal with mutual recursion.
C-Minus should avoid mutual recursion, according to its grammar.
Semantic Analysis Dr. Zhiyao Liang
41
An extended example of an attribute grammar using a symbol table.
What are the values of the following let expressions?
let x = 2+1, y = 3+4 in x + y
let x = 2, y = 3 in
(let x = x + 1,
y = (let z = 3 in x + y + z)
in (x+y)
)
Semantic Analysis Dr. Zhiyao Liang
42
Attribute grammar to compute the err attributes of an expression
Semantic Analysis Dr. Zhiyao Liang
43
Which attributes are inherited, which are synthesized?
43
cont.
Semantic Analysis Dr. Zhiyao Liang
44
The function insert() produce a new symbol table, without changing the original one.
44
Data type and type checking
Type inference
For example: Given int arr[5], what is the type of arr[3]?
type checking.
For example:
printf(): data and format specifier
function call should matches the definition.
What is the type of a function?
Array range checking.
Theoretically, a data type is a set of values with certain associated operations.
Semantic Analysis Dr. Zhiyao Liang
45
Type Expressions and Type Constructors.
There are built-in types: int, double, …
There are also constructed types:
array, enum, structure, union.
Pointers
Functions
Explicit type and implicit types.
Semantic Analysis Dr. Zhiyao Liang
46
type expressions
typedef char Ar[3];
void sort (int a[], int first, int last);
// open indexed array
Semantic Analysis Dr. Zhiyao Liang
47
cont.
// error:
struct instBST
{ int isNull;
int val;
struct intBST left, right;
}
Semantic Analysis Dr. Zhiyao Liang
48
// OK: indirect recursion
struct instBST
{ int isNull;
int val;
struct intBST *left, *right;
}
typedef struct intBST * intBST;
48
Type equivalence
Given two expressions, check if their values belong to the same type.
function typeEqual (t1, t2: TypeExp): Boolean
Three ways of checking type equivalence.
Structural equivalence
Name equivalence:
Two type expressions are equivalent if and only if they are the same simple type or are the same type name.
Declaration equivalence
t1 = t2; means that t1 is the alias of type t2.
Two types are the same if they are alias of the same base type ( after a tracing back a chain of alias).
Semantic Analysis Dr. Zhiyao Liang
49
Structural equivalence
Given a structural type expression (struct, union, function)
We can describe the type expression as a tree.
Two type expressions are equivalent iff
The top nodes are of the same kind
The two top nodes have the same number of children.
Children have the same kind (pairwisely).
Semantic Analysis Dr. Zhiyao Liang
50
Describing a structural type as a tree
Semantic Analysis Dr. Zhiyao Liang
51
Type inference and type checking
Semantic Analysis Dr. Zhiyao Liang
52
Cont.
Semantic Analysis Dr. Zhiyao Liang
53
Additional topics in type checking
Overloading:
An operator is overloaded if the same operator name is used for two differernt operations.
2 + 3 2.0 + 3.0
procedure max(x, y: integer): integer;
procedure max(x,y: real): real;
C++ allows overloading, but C and Pascal does not.
Semantic Analysis Dr. Zhiyao Liang
54
Additional topics in type checking, cont.
Type conversion: 2.1 + 3
Type coercing: 2.1 + float(3)
Semantic Analysis Dr. Zhiyao Liang
55
Polymorphic typing
procedure swap(var x, y: anytype);
var x, y: integer;
a, b: char;
…
swap(x, y);
swap(a, b);
swap(a, x);
Some sophisticated pattern matching algorithm may be involved for this.
Semantic Analysis Dr. Zhiyao Liang
56
/docProps/thumbnail.jpeg