Compiler Design Week 5
Detailed content
Weekly program
Week Week Week Week
Week
1 – Introduction to Compiler Design 2 – Lexical Analysis
3 – CD Programming Language
4 – Syntax Analysis
5 – Top Down Parsing
2
Week
Week
Week
Week
Week
Week 11 – Code Optimisation Week 12 – Revision
6 – Symbol Table and Error Recovery 7 – Bottom-Up Parsing
8 – Stack Machine (SM) Architecture 9 – Semantic Analysis
10 – Code Generation
Week 13 – Extra revision (if needed)
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
Announcement
Project Part 2 in BB
Deadline: 12 Sep 2021
3
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
Week 05 Lecture Outline
Top Down Parsing
Top Down Parsing
LL(K) Parsing
Recursive Descent Parser
Backtracking
Predictive Parser
Abstract Syntax Tree
Implementing Recursive Descent Parser FIRST and FOLLOW sets
Table Dirven Parsing
Predictive Parsing Table Construction
4
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
Top-Down Parsing
• LL methods (Left-to-right, Leftmost derivation)
–
Constructs the parse tree for the input string starting from the root and creates the nodes of the tree in preorder (depth-first order)
Grammar:
ET+T T(E) T-E
T id
Leftmost derivation:
Elm T+T lm id+T lm id + id
EEEE TTTTTT
+ id + id + id COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
17/08/2021
5
Top-Down Parsing (cont)
• We begin with start symbol and repeatedly apply productions to obtain a new sentential form which is closer to the input string.
L lm ….. lm ( ( L ) L ) L
• At each step we have 2 decisions:
1. Which non-terminal in the current sentential form to expand.
2. Which of the productions for this non-terminal do we use in making the expansion.
Grammar: L(L)L L
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
6
Top-Down Parsing (cont)
• There is an obvious choice for decision 1
– always choose the leftmost non-terminal.
The reason is that we generally read the input stream from a file sequentially so we would read the left-most characters of the input first.
Egthesententialform(a1 a2…an B0 X1 X2 …Xm)with ai T, B0 N,and Xi (N U T).
Goal: match input symbols, we will ensure that the ai’s match the first n characters from the input. If not, then our derivation must fail.
Suppose: we decide to expand some Xi (e.g. X1 N) next. Can we do that without expanding B0?
So our parser will always choose the leftmost non-terminal at each derivation step and so produce a leftmost derivation.
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
7
Top-Down Parsing (cont)
• Which production to use in expanding the leftmost non-terminal
• This can be very difficult – consider the previous example:
– Suppose the productions for B0 are
• B0 ::= Mz1 and B0 ::= Mz2 where M N and zi (N U T). (or
maybe z2 is ε.)
– Which rule do we use? We can’t tell easily. We would have to read all the symbols for M first.
• A CD21 example:
• In both examples we need to know what is after M or
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
8
LL(1) Grammars
• We restrict our attention to grammars where these problems cannot occur.
read left to right
LL(1)
produce a leftmost derivation
only 1 input symbol
read ahead is needed to be able to choose a production
• So we can read left to right and construct a leftmost derivation using only the next input symbol (after the characters that have already been matched).
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
9
LL(k) Parsing
• Process input k symbols at a time.
• Initially, ‘current’ non-terminal is start symbol.
• Algorithm
– Loop until no more input
• Given next k input tokens and ‘current’ non-terminal T, choose a rule R (T …)
• For each element X in rule R from left to right,
– if X is a non-terminal, we will need to ‘expand’ X
– else if symbol X is a terminal, see if next input symbol matches X; if so, update from the input
• Typically, we consider LL(1)
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
10
Approaches to Top-Down Parsers
• Recursive Descent parsing
– A set of mutually recursive procedures
– Code tailored to the grammar
• Table Driven parsing
– A generic parsing engine and a parse table to drive the engine
– Table tailored to the grammar
– General Algorithm
• Both algorithms driven by the tokens coming from the lexer.
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
11
Writing a Recursive Descent parser
•
1) 2) 3) 4) 5) 6) 7)
Generate a procedure for each non-terminal A. void A() {
Choose an A-production, A X1 X2 . . . Xk ; for ( i = 1 to k ) {
} }
if ( Xi is a nonterminal ) call procedure Xi () ;
else if ( Xi equals the current input symbol a ) advance the input to the next symbol;
else /* an error has occurred * / ;
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
12
Writing a Recursive Descent parser
• Use next token (lookahead) to choose (PREDICT) which production to ‘mimic’.
• Ex: B b C D B() {
if (lookahead == ‘b’)
{ match(‘b’); C(); D(); } else …
}
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
13
Writing a Recursive Descent parser • Also need the following:
match(symbol) {
if (symbol == lookahead) lookahead = nexttoken()
else error() }
main() {
lookahead = nextoken();
S(); /* S is the start symbol */ if (lookahead == EOF) then accept else reject
}
error() { … }
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
14
Writing a Recursive Descent parser
S() {
if (lookahead == a ) { match(a);B(); } else if (lookahead == b) { match(b); C(); } else error(“expecting a or b”);
} B() {
if (lookahead == b) {match(b); match(b); C();} else error();
} C() {
if (lookahead == c)
{ match(c) ; match(c) ;} else error();
}
S a B S b C
B b b C
C c c
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
15
SaB |bC
BbbC Ccc
Parsing abbcc
S
abbcc
Call S() from main()
S() {
if (lookahead == a ) { match(a);B(); } else if (lookahead == b) { match(b); C(); } else error(“expecting a or b”);
}
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
S a B S b C
16
SaB |bC
BbbC Ccc
Parsing abbcc
S aB
abbcc
Call B() from S()
B() {
if (lookahead == b)
{match(b); match(b); C();}
else error();
}
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
B b b C
17
SaB |bC
BbbC Ccc
Parsing abbcc
S aB
bbC
abbcc
Call C() from B()
C() {
cc
if (lookahead == c)
{ match(c) ; match(c) ;}
else error();
}
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
C c c
18
SaB |bC
BbbC Ccc
Backtracking
• General Recursive Descent may require backtracking
– May require repeated scans over the input
– Determines which production to use by trying each production in turn
• Backtracking is not very efficient – Should be avoided
• Backtracking is rarely needed to parse programming language constructs
• Recursive descent with backtracking is not limited to LL(k) grammars,
– Not guaranteed to terminate unless the grammar is LL(k).
– May require exponential time.
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
19
•
cad
Input: w = cad
cad cad cad
cad
Backtracking
scAd A a b Ia
Backtracking
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
20
Recursive Descent parser with backtracking
•
1) 2) 3) 4) 5) 6) 7)
How should we change the following skeleton code? void A() {
Choose an A-production, A X1 X2 . . . Xk ; for ( i = 1 to k ) {
} }
if ( Xi is a nonterminal ) call procedure Xi () ;
else if ( Xi equals the current input symbol a ) advance the input to the next symbol;
else /* an error has occurred * / ;
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
21
Predictive Parsing
• Aim of predictive (recursive descent) parsing
– To construct a top-down parser that never backtracks.
– To do so, we must first transform a grammar
• Eliminate left recursion to avoid infinite loop
• Perform left factoring to eliminate backtracking.
• Perform layering to avoid ambiguity
– Possible only for the class of LL(k) grammars
– Predictive parser runs in linear time
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
22
Implementing Recursive Descent Parser
• Goal: To construct the abstract syntax tree starting at
backtracking.
• Assumptions: Grammar is free of left recursion and is left-factored.
• Input: Stream of tokens from Lexer as and when called.
• Output: An AST parsing of the input program or Error Listing
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
23
Abstract Syntax Tree
• The abstract syntax tree captures/represents the underlying structure
of the program,
– It contains the important bits of the parse tree rather than containing all the steps of the derivation
• In an abstract syntax tree for an expression, each interior node represents an operator; the children of the node represent the operands of the operator.
• More generally, any programming construct can be handled by making up an operator for the construct and treating as operands the semantically meaningful components of that construct.
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
24
Abstract Syntax Tree
E → E+T | E-T | T T→ T*F|T/F|F F→ id|(E)
id→ 1|2|…|9
Input: 5-3-2
E
–
T
F
F
id
– –
53
E E-T
2
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
T F id
id
3 5
2
25
Abstract Syntax Tree
• Our parser will create a syntax tree
• AST is a very common intermediate representation
• Subsequent phases may use AST, add additional information to AST or transform AST
• AST can be used for translating the program in intermediate code
• With our recursive descent parser, the parse tree is actually constructed (in fact, stored) in the procedure calling sequence that the input string forces the parser to take
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
26
AST: Implementation (abstract)
• Node Value: e.g. NUNDEF, NPROG, NGLOB, NILIST, NINIT, NFUNCS, NMAIN, etc.
• Reference to Children (3 offspring is sufficient)
• Reference to Symbol Table record
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
27
Parser Implementation (abstract)
• Need to keep track of (attributes)
– Next T oken Read by Looking Ahead
– Root of the abstract syntax tree
– Scanner object for getting next lexeme
– Output object 1 (for showing errors and listing)
– Output object 2 (for showing AST and later SM code)
• Suggestive Methods
– Token getNextToken(): This method essentially gets the next lexeme using the scanner object.
– TreeNode NT () {} for each non-terminal NT
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
28
Parser Implementation (abstract)
NPROG
• TreeNode program():
– Create a new TreeNode with node value NPROG. This node refers to the non-terminal
– Set its reference to the symbol table record for the lexeme of the current token (which is the name of the program, e.g., Ex1 in CD21 Ex1)
– token = getNextToken() // get the next token
• if token has a value of TCONS or TTYPS or TARRS, then create a TreeNode for non-terminal globals() and make it the left child of the TreeNode for NPROG
• if token has a value of TFUNC, then create a TreeNode for non-terminal funcs() and make it the middle child of the TreeNode for NPROG
• Create a TreeNode for non-terminal mainbody() and make it the right child of the TreeNode for NPROG
– Return TreeNode for NPROG
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
29
Parser Implementation: Left Factoring
OR
private TreeNode initlist() {
TreeNode tr1 = null, tr2 = null;
}
tr1 = init();
return inittail(tr1);
private TreeNode inittail(TreeNode tr1) { TreeNode tr2 = null;
if (next-token is of type TokId.TTYPS or TokId.TARRS or TokId.FUNC or TokId.TMAIN)
// this is the ε path
return tr1;
if (token is TokId.TCOMA) {
token = nextToken();
tr2 = initlist();
return new TreeNode(Node.NILIST, tr1, tr2);
} }
// tr1 is for
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
30
Parser Implementation: Left Factoring
OR
• May combine the above two functions (i.e., reorganize parser rather than reorganizing the grammar)
• In general, anything that can be left factored can be written directly as recursive descent routines.
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
31
private TreeNode initlist() {
TreeNode tr1 = null, tr2 = null;
tr1 = init();
if (next-token is of type TokId.TTYPS or TokId.TARRS or TokId.FUNC or TokId.TMAIN)
// only one
return tr1;
if (token is TokId.TCOMA) {
token = nextToken();
tr2 = initlist();
return new TreeNode(Node.NILIST, tr1, tr2);
} }
// tr1 is for
Parser Implementation: Left Recursion
private TreeNode bool() {
TreeNode tr1 = null;
}
tr1 = rel();
return booltail(tr1);
private TreeNode booltail(TreeNode left) {
TreeNode parent, right;
if (token is an AND, OR, XOR keyword) { // logcal operator, consume token
// Node for non-terminal
if (token is AND) nodeValue = Node.NAND;
if (token is OR) nodeValue = Node.NOR;
if (token is XOR) nodeValue = Node.NXOR;
parent = new TreeNode(nodeValue); parent.setLeft(left); // rel() – LHS of logop // step to next token
token = getNextToken();
right = rel(); // rel() – RHS of logop parent.setRight(right);
return (booltail(parent));
} else return(left);
}
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
32
Parse Tree VS AST
i = 10
NASGN
NSIMV i
NILIT 10
= i
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
33
Structure of AST
• Even though the grammar for the projects needs to be transformed to make it LL(1), the syntax tree must reflect the original grammar, rather than the new LL(1) version.
• You must be very careful when doing the left-recursion removal, that you still build the syntax tree with the correct associativity for the operation being performed
– Be careful, because the natural interpretation of applying the revised grammatical rules alters the associativity
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
34
Structure of AST
• e.g. A – B – C – D can come out wrong if you don’t take care when you use the transformed expression grammar.
—
-DA-
-C B-
AB CD
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
35
Recursive Descent Parsing Limitation
• Any modification to grammar needs a change in the parsing procedures.
• Inefficiency due to calling a non-terminal processing function, while trying to match a single token or to recognize the right-hand side of a production.
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
36
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
37
FIRST and FOLLOW sets
• Construction of top-down (and bottom-up) parsers are aided by functions
– FIRST and FOLLOW
• FIRST and FOLLOW allow us to choose which production to apply based on the next input symbol
• Elements of FOLLOW can be used for error recovery
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
38
FIRST
• FIRST() is the set of terminals that begin all strings derived from a
• FIRST() = {a T | * a } { } if *
• FIRST(a) = {a} if a T FIRST() = {}
FIRST(A) = A FIRST() for A P FIRST(X1X2…Xk) =
17/08/2021
if for all j = 1, …, i-1 : FIRST(Xj) then
add non- in FIRST(Xi) to FIRST(X1X2…Xk)
if for all j = 1, …, k : FIRST(Xj) then add to FIRST(X1X2…Xk)
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
39
FIRST – Example 1
• P i |c |n T S
• Q P |a S| d Sc S T
• R b |
• S e|R n|
• T R S q
• FIRST(P) = {i, c, n}
• FIRST(Q) = {i, c, n, a, d}
• FIRST(R) = {b, }
• FIRST(S) = {e, b, n, }
• FIRST(T) = {b, e, n, q}
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
40
FIRST – Example 2
E T E
E + T E | T F T
T * F T | F ( E ) | id
• FIRST(E) = { }
• FIRST(E) = { }
• FIRST(T) = { }
• FIRST(T) = { }
• FIRST(F) = { }
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
41
FIRST – Example 2
E T E
E + T E | T F T
T * F T | F ( E ) | id
• FIRST(E) = {(, id}
• FIRST(E) = {+, }
• FIRST(T) = {(, id}
• FIRST(T) = {*, }
• FIRST(F) = {(, id}
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
42
FOLLOW
• FOLLOW(A) is the set of terminals (including end of file – $) that may follow non-terminal A in some sentential form.
• FOLLOW(A) = {c T | S + …Ac…} {$} if S + …A
• NOTE: is never in FOLLOW sets
• FOLLOW(A) =
for all (B A ) P do
17/08/2021
add FIRST()\{} to FOLLOW(A) for all (B A) P do
add FOLLOW(B) to FOLLOW(A)
for all (B A ) P and FIRST() do add FOLLOW(B) to FOLLOW(A)
if A is the start symbol S then
add $ to FOLLOW(A)
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
43
FOLLOW – Example 1
• S A B C |AD
• A|aA
• B b |c |
• CDdC
• D e b|f c
• FOLLOW (S) = {$}
• FOLLOW (A) = {b, c, e, f}
• FOLLOW (B) = {e, f}
• FOLLOW (C) = {$}
• FOLLOW (D) = {$}
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
FIRST(S) ={a, b, c, e, f} FIRST(A) ={a, } FIRST(B) ={b, c, } FIRST(C) ={e, f} FIRST(D) ={e, f}
44
FOLLOW – Example 2
E T E
E + T E | T F T
T * F T | F ( E ) | id
• FOLLOW{E} = { }
• FOLLOW{E} = { }
• FOLLOW{T} = { }
• FOLLOW{T} = { }
• FOLLOW{F} = { }
FIRST(E) = {(, id} FIRST(E) = {+, } FIRST(T) = {(, id} FIRST(T) = {*, } FIRST(F) = {(, id}
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
45
FOLLOW – Example 2
E T E
E + T E | T F T
T * F T | F ( E ) | id
• FOLLOW{E} = {$, )}
• FOLLOW{E} = {$, )}
• FOLLOW{T} = {+, $, )}
• FOLLOW{T} = {+, $, )}
• FOLLOW{F} = {*, +, $, )}
FIRST(E) = {(, id} FIRST(E) = {+, } FIRST(T) = {(, id} FIRST(T) = {*, } FIRST(F) = {(, id}
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
46
LL(1) Grammar
• A grammar G is LL(1) if for each collections of productions A 1 | 2 | … | n
for nonterminal A the following holds:
17/08/2021
1.
2. 3.
FIRST(i) FIRST(j) = for all i j
More than one rule can not derive strings starting with the
same terminal
ifi * then j * forallij
At most one i can derive empty string
if i * then FIRST(j) FOLLOW(A) = for all i j
if i * then j (i j) does not derive a string beginning with a terminal in Follow (A)
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
47
Non-LL(1) Examples
Grammar
Not LL(1) because
SSa|a
Left recursive
SaS|a
FIRST(a S) FIRST(a)
SaR| RS|
ForR:S* and*
SaRa RS|
For R:
FIRST(S) FOLLOW(R)
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
48
Using FIRST and FOLLOW to Write a Recursive Descent Parser
expr term rest rest + term rest
| – term rest
procedure rest(); begin
if lookahead in FIRST(+ term rest) then match(‘+’); term(); rest()
else if lookahead in FIRST(- term rest) then match(‘-’); term(); rest()
else if lookahead in FOLLOW(rest) then return
else error() end;
FIRST(+ term rest) = { + } FIRST(- term rest) = { – } FOLLOW(rest) = { $ }
| term id
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
49
Non-Recursive Predictive Parsing
• Given an LL(1) grammar G=(N,T,P,S) construct a table M[A,a] for A N, a T and use a driver program with a stack
a
+
b
$
input stack
output
X
Y
Z
$
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
Predictive parsing program (driver)
Parsing table
M
50
Predictive Parsing Program (Driver)
push($) push(S)
a := lookahead repeat
X := pop()
if X is a terminal or X = $ then
match(X) // move to next token, a := lookahead
else if M[X,a] = X Y1Y2…Yk then
push(Yk, Yk-1, …, Y2, Y1) // such that Y1 is on top produce output and/or invoke actions
else error()
endif until X = $
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
51
Example Table-Driven Parsing
Operation of Predictive Parser
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
Parsing Table
52
Constructing a Predictive Parsing Table for each production A do
for each a FIRST() do add A to M[A,a]
enddo
if FIRST() then
for each b FOLLOW(A) do add A to M[A,b]
enddo endif
enddo
Mark each undefined entry in M error COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
17/08/2021
53
Example Table
E T E E+TE | T F T
T * F T | F ( E ) | id
A
FIRST()
FOLLOW(A)
E T E
( id
$)
E + T E
+
$)
E
TFT
( id
+$ )
T*FT
*
+$ )
T
F(E)
(
* +$ )
F id
id
E
E
T
T 17/08/2021
id
E T E
TFT
F F id
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
+
E + T E
T
*
T*FT
(
E T E
TFT
F(E)
)
E
T
$
E
T
54
LL(1) Grammars are Unambiguous
Ambiguous grammar S i E t S SR | a
SR eS|
Eb
Error: duplicate table entry
A
FIRST()
FOLLOW(A)
S i E t S SR
i
e$
Sa
a
SR e S
e
e$
SR
Eb
b
t
S
SR
17/08/2021
a
Sa
E Eb
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
b
e
SR SR e S
i
S i E t S SR
t
$
SR
55
References
• Compilers: Principles, Techniques, and Tools (2nd Edition)
• By A.V. Aho, Lam, R. Sethi, . Ullman
• Chapter 4
17/08/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
56