COMP90045 Programming Language Implementation
Semantic Analysis
Harald Søndergaard Lecture 13
Semester 1, 2019
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 1 / 26
Semantic Analysis
The parser checks whether the input matches the syntax of the language. However, programs must pass “static semantic” checks as well as syntactic ones.
The set of semantic checks of course depends on the language, but the following checks are part of most languages:
Variables and functions must be declared before being used. They must be declared only once in any one scope.
Function calls and function declarations must match in the number and type of arguments, including return values.
The semantic analyzer also gathers information for the code generator. For example, in most languages it figures out whether
the + operator denotes int addition, float addition or double addition.
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 2 / 26
Attribute Grammars
Semantic analyzers can be formally described by attribute grammars.
An attribute grammar consists of a context-free grammar,
a set of attributes for each grammar symbol (both terminal and nonterminal),
rules for assigning values to the attributes of the nodes in a parse tree, and
a set of conditions on attribute values.
The idea is that the input satisfies the semantic requirements of the language if and only if all the conditions are satisfied, with each violation of a condition representing a semantic error.
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 3 / 26
Attributes
You can think of the set of attributes of a grammar symbol as the set of fields of a C structure, or a Haskell enumeration using record syntax: each attribute has a name and a type.
An attribute grammar effectively attaches one such structure to each node of the parse tree, except the nodes for grammar symbols that have no attributes.
Assigning values to the fields of these structures is called annotating or decorating the parse tree.
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 4 / 26
Attributes: Expressions
What attributes you associate with each nonterminal depends on the program’s overall purpose.
In a desk calculator, the nonterminal representing expressions should have an attribute representing the value of the expression.
In a compiler, the nonterminal representing expressions should have an attribute representing the structure of the expression:
Whether it is a reference to the value of an identifier or constant, and if so, which identifier or constant;
If not, what operation is being performed (integer addition, floating point addition, etc.) and what are its operands.
It should also have other attributes giving its type, and any other information of interest to the analyzer and the code generator.
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 5 / 26
QUIZ: Type Checking
In C, what is the type of:
3+4/5+7 x ? 3 : 7.0 3+x/5+8
PLI (Sem 1, 2019)
Semantic Analysis
⃝c University of Melbourne
6 / 26
QUIZ: Type Checking
In C, what is the type of:
3+4/5+7 x ? 3 : 7.0 3+x/5+8
3+4/5+7 has typeint
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 6 / 26
QUIZ: Type Checking
In C, what is the type of:
3+4/5+7 x ? 3 : 7.0 3+x/5+8
3+4/5+7 has typeint x?3 : 7.0hastypedouble
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 6 / 26
QUIZ: Type Checking
In C, what is the type of:
3+4/5+7 x ? 3 : 7.0 3+x/5+8
3+4/5+7 has typeint
x?3 : 7.0hastypedouble
3 + x/5 + 8 has a type that depends on typeof(x)
Information flows upwards in the parse tree. But we also need to resolve information about symbols.
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 6 / 26
Attributes: Environments
In compilers for languages such as C, the nonterminal for variable declarations should have an attribute listing the names and types of the declared variables.
The set of such declarations visible to a statement or expression is the environment of that statement or expression.
You need the environment to figure out, for example, what the type of an expression is when that expression refers to a variable. The nonterminals for statements and expressions therefore usually have an environment attribute.
The environment is the symbol table, or a part thereof. If there is only one and it is updated destructively, people tend to call it the symbol table; if several different versions can exist simultaneously, people tend to call it the environment.
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 7 / 26
Environments: C
typedef int x;
void g(void);
int main(int argc, char** argv) { x x = 3;
x y = 4; /* ? */ }
void f(void) { g(); } void g(void) { }
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 8 / 26
Environments: Haskell
data X = X Integer f x = g (x+y) gx=Xy
where y = x+1 y=1
PLI (Sem 1, 2019)
Semantic Analysis
⃝c University of Melbourne
9 / 26
Attributes for Error Handling
Some attributes help not in generating code or in detecting errors, but in generating good error messages.
It is a good idea to give all grammar symbols an attribute that says which part of the input file the symbol was derived from, for use in error messages.
At the minimum, this attribute should contain “line L in file F”. At the maximum, it should contain “from line L1 column C1 to line L2 column C2 in file F”. The latter is especially useful in IDEs.
You can also enhance environments with information you can use to suppress duplicate error messages. For example, environments could contain a list, initially empty, listing the variables that so far have been used while undeclared.
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 10 / 26
Attribute Grammars
In attribute grammars, each production has associated with it a set of semantic rules.
Given a production A → X1 . . . Xn, its semantic rules have the form b := f (c1, . . . , ck )
where f is a function and b and c1, . . . , ck are all attributes of either A or of one of the Xi .
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 11 / 26
Synthesized vs Inherited Attributes
The attributes of each grammar symbol are divided into two sets: the synthesized attributes and the inherited attributes.
If A.s is a synthesized attribute of A, then every production of the form A → X1,…,Xn must have a semantic rule A.s := f(c1,…,ck).
If A.i is an inherited attribute of A, then every production of the form X → X1,…,A,…,Xn must have a semantic rule
A.i :=f(c1,…,ck).
Synthesized attributes represent information flowing upward in the parse tree, while inherited attributes represent information flowing down and sideways.
Terminals usually only have synthesized attributes set by lexical analysis.
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 12 / 26
Semantic Rules: An Example
production var → id
var.type := lookup var type(var.env, id.name)
production expr → var1 op var2
var1.env := expr.env
var2.env := expr.env
expr.type := expr type(op.name, var1.type, var2.type) expr.operation := expr op(op.name, expr.type)
production assignment → var := expr
var.env := assignment.env
expr.env := assignment.env
assignment.operation := assign op(var.type, expr.type)
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 13 / 26
Attribute Dependencies
In this example, assignment.operation depends on var.type and expr.type: the code generator needs to know the number of bytes to move, and whether a conversion is required (say, int to float).
Givenasemanticruleb:=f(c1,…,ck),everyoneoftheci mustbe computed before we can compute b. We say that b depends on each of the ci .
We can plot the dependencies for a given parse tree as a graph. This graph should be acyclic, since no attribute should depend on itself, directly or indirectly: if it does, then its value can never be computed.
Attributes can be evaluated in any order compatible with the dependencies. Such an order can be computed by a topological sort of the graph.
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 14 / 26
A Dependency Graph
assignment: operation, env
var: type, env
ident: name
expr: type, operation, env
PLI (Sem 1, 2019)
var: type, env
ident: name Semantic Analysis
op: name var: type, env
ident: name
⃝c University of Melbourne 15 / 26
Attribute Evaluation
It is obviously possible to
read the input and build a parse tree, compute the attribute dependency graph, check that it is noncircular,
find an evaluation order, and
evaluate the attributes.
However, this does all the work at runtime.
It would be nice to be able to rule out circularities and to compute the evaluation order when we build the compiler.
It would also be nice to compute attribute values during parsing; reducing the number of tree traversals speeds up the compiler.
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 16 / 26
S-Attributed Definitions
An S-attributed definition is a syntax-directed definition that uses only synthesized attributes. This one is for a small calculator:
E → E1+T E.val:=E1.val+T.val
T.val := F.val F.val := E.val F.val := num.val
E.val := T.val
E → T
T → T1∗F T.val:=T1.val∗F.val
T → F
F → (E) F → num
Here, val is a synthesized attribute of E, F, and T (set by the parser) and also of num (set by the scanner).
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 17 / 26
S-Attributed Definition Example
Consider “3 * 5 + 4”:
E: val
E: val + T: val T: val F: val
T: val * F: val num: val
F: val num: val num: val
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 18 / 26
S-Attributed Definitions in happy
It is not hard to systematically convert an S-attributed definition to a happy or yacc specification.
Since happy associates at most one value with each grammar symbol, you must group all the attributes of a nonterminal into a single value. If the nonterminal has more than one attribute, each attribute becomes one element of a tuple or algebraic data type.
You can then implement the semantic rules of a production by putting an action that implements them at the end of the production. This action has access to the synthesized attributes of all the symbols of the RHS of the production, and can compute from them the synthesized attributes of the nonterminal on the LHS.
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 19 / 26
Yacc Example
%token ADD MUL LPAR RPAR %token
%type
%start expr
%%
expr :exprADDterm | term
;
term :termMULfactor
| factor
;
factor :LPARexprRPAR
| NUM ;
{$$=$1+$3} { $$ = $1 }
{$$=$1*$3} { $$ = $1 }
{$$=$2} { $$ = $1 }
See Lecture 12, slides 10–11, for a similar example written for happy. PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 20 / 26
Yacc Example
In attribute grammars, the values of attributes should be defined using pure functions. This is assured by the use of a pure functional language in which there are no side-effects and no global variables.
This purity is what allows an evaluator to compute attribute values in any order that is compatible with the dependencies.
Yacc specifications like the one on the previous slide are not attribute grammars but translation schemes. In translation schemes, the order of attribute evaluation is fixed (actions are evaluated when the production is reduced), and the actions may have side effects:
line : expr NEWLINE { printf(“%d\n”,$1) } ;
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 21 / 26
Building Abstract Syntax Trees
Instead of evaluating expressions, parsers in compilers usually build representations of them. This representation is usually an abstract syntax tree (sometimes also called just a syntax tree), which discards irrelevant details from parse trees. Consider 3 * 5 + 4:
Abstract syntax tree Parse tree
+E *4 E+T 35TF T*F4
F5 3
PLI (Sem 1, 2019)
Semantic Analysis ⃝c University of Melbourne 22 / 26
Constructing Syntax Trees
E → E → E → T → T → T →
E1 + T E.node := mknode ′ +′ E1.node T.node E1 − T E.node := mknode ′ −′ E1.node T.node
T (E) id num
E.node := T.node
T.node := E.node
T.node := mkleaf id id.entry T.node := mkleaf num num.value
Here each nonterminal has one synthesized attribute pointing to a node in a tree representing an expression and its components.
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 23 / 26
Constructing Syntax Trees
Example: a – 4 + c
+
•
•
E: node T: node
id
E: node –
E: node +
T: node num
T: node id
−
•
•
id
c
num
4
id
a
PLI (Sem 1, 2019)
Semantic Analysis
⃝c University of Melbourne
24 / 26
Syntax DAGs
The abstract syntax tree need not be a tree; it can be a directed acyclic graph (or DAG for short). DAGs generalize trees by allowing a node to have more than one parent.
Shared subtrees can represent common subexpressions. For example, we can representx+x*(2-y)+(2-y)*yas:
+ +* *
x– 2y
PLI (Sem 1, 2019)
Semantic Analysis
⃝c University of Melbourne
25 / 26
Constructing DAGs
To achieve this, get mkleaf and mknode to look up a hash table to see if the desired subtree already exists (this is hash-consing).
mknode(’+’, •, •)
mknode(’+’, •, •)
mknode(’*’, •, •)
mknode(’*’, •, •)
mkleaf(id,x)
mknode(’-’, •, •)
mkleaf(num,2)
mkleaf(id,y)
PLI (Sem 1, 2019) Semantic Analysis ⃝c University of Melbourne 26 / 26