程序代写代做代考 AI compiler algorithm Top-Down Parsing

Top-Down Parsing

CS106 — Compiler Principles and Construction
Fall 2011

MUST FIT
Zhiyao Liang

Top-Down Parsing Dr. Zhiyao Liang
1

Two approaches of parsing
Top-Down
Logically, the parse tree is constructed from top to bottom.
The structure of the algorithm is similar to a preorder traversal of the parse tree.
The larger tree is described first in the algorithm.
Easier to understand and program manually.
Bottom-Up
Uses an explicit stack to perform a parse.
More powerful and used by most parser generators.
Top-Down Parsing Dr. Zhiyao Liang
2

Two Top-down Parsing Methods
Backtracking:
Explore all possible ways.
Very slow, not commonly used in practice.
Predictive: Look ahead some tokens, then decide which grammar rule can apply.
Recursive-descendent.
LL(K). It means:
Reading the input from Left to right.
Tracing out a leftmost derivation of the input string.
Read K symbols ahead (lookahead), then decide which parsing rule can apply.
We focus on LL(1).
We will study formal notions: First and Follow ( mentioned in 4.1.3).

Top-Down Parsing Dr. Zhiyao Liang
3

Recursive Descent Parsing
For each rule in the grammar of the form
X  product
Write a function X()
The body of this function is described by product.
The behavior of X():
Reads a string of tokens from the input
This string can be derived from X.
Construct a syntax tree use these tokens according to product.
If there is a variable Y in product, then the body of X() calls Y().
Y() builds a subtree for the tree of X().

Top-Down Parsing Dr. Zhiyao Liang
4

Avoid non-termination in RD-parsing
Consider the rule
exp  exp addop term | term
addop  + | –
If we translate this rule directly into code:
TreeNode * exp(){
TreeNode* t = newAddopNode();
t->leftChild = exp();
t->rightChild = term();
return t;
}
This code could be non-terminating.
Top-Down Parsing Dr. Zhiyao Liang
5

Avoid non-termination in RD-parsing
In the code, the non-recursive case should appear before the recursive.
The rule exp  exp addop term | term
generates a string of the form:
term addop term addop term …. term
A term followed by 0 or more copies of
addop term
It is clearer for coding to describe the above rule as:
exp  term {addop term}
This form of grammar is called EBNF, E for extended.
It is nice to translate BNF grammar to a form that clearly shows the implied computation.
Chapter 3.5 shows two forms of specification for this purpose
EBNF
Syntax Diagrams

Top-Down Parsing Dr. Zhiyao Liang
6

exp  term {addop term}
Implementation
TreeNode * exp(){
TreeNode * t = term();
while(token == PLUS) || (token == MINUS){
TreeNode* q = newAddopNode();
q->leftChild = t;
q->rightChild = term();
t = q;
}
return t;
}
token is a global variable.
token records the next unprocessed token during parsing.
Top-Down Parsing Dr. Zhiyao Liang
7

Overview of LL(1) Parsing
A top down parsing. Simulates a left-most derivation.
Uses a stack in the computation.
A table, called LL(1) parsing table is built to guide the computation (explained later).
Usually only works for a LL(1) grammar ( each entry of its LL(1) parsing table has at most one element).
Also works for a non-LL(1) grammar if some disambiguating rule is used showing difference of priority of choices in a table entry.
If a grammar is not a LL(1) grammar, we may use techniques ( like Left factoring and Left recursion removing ) to adjust the grammar.
But these techniques does not guarantee making a LL(1) grammar.
So LL(1) is not practically often used.
Why do we study LL(1) ?
To know some stack based parsing techniques.
To learn some notions, such as the First set and Flow set.
Top-Down Parsing Dr. Zhiyao Liang
8

LL(1) Parsing Example
Grammar: S  ( S ) S | ϵ
The string: ( )
How to parse?

Parsing stack Input Action
1 $S ( ) $ S  ( S ) S
2 $ S ) S ( ( ) $ match
3 $ S ) S ) $ S  ϵ
4 $ S ) ) $ match
5 $ S $ S  ϵ
6 $ $ accept

Two kinds of actions:
Generate (1, 3, 5)
Match (2, 4, 6)
Top-Down Parsing Dr. Zhiyao Liang
9

Definition of a LL(1) Parse Table
Given a Grammar
N : the set of non-terminals
T : the set of terminals, including $.
A parse Table M(N, T) is a N rows and T columns
A rule A  α in the grammar is in the entry M[A, t] iff
α ==>* t β // t is in First(α)
or
α ==>* ϵ, and S$ ==>* β A t γ //t is in Follow(α)

Top-Down Parsing Dr. Zhiyao Liang
10

Example LL(1) Parse table
Grammar: S  ( S ) S | ϵ

M(N, T) ( ) $
S S  ( S ) S S  ϵ S  ϵ

Top-Down Parsing Dr. Zhiyao Liang
11

LL(1) Grammar and its parsing algorithm
If each entry of the LL(1) parsing table has no more than one production, then the grammar is called a LL(1) Grammar .
It is nice to have a LL(1) grammar, since it has a simple parsing algorithm:
Repeat until both stack top α and the input token β are $
if (α == β)
consume α and β
else if ((α is a terminal) && (α != β) )
Error
else { /* α is a non-terminal */
pop the stack
push the production of the entry M[α, β]
}
Top-Down Parsing Dr. Zhiyao Liang
12

Questions on LL(1) parsing
How to deal with a grammar that is not LL(1)?
When the grammar is ambiguous, it is usually not a LL(1) grammar.
We can add disambiguate rule to make it in fact LL(1), i.e., only one rule can apply at a time.
For example
elsepart  else statement | ϵ
makes the dangling else problem, which can be solved by add higher priority to
elsepart  else statement
We can change the grammar to make it LL(1).
use techniques like left-recursion removal, left factoring.
But there is no guarantee that these technique can produce a LL(1) grammar.
How to construct the parse table?
The table can be computed using First and Follow sets.

Top-Down Parsing Dr. Zhiyao Liang
13

Left Recursion Removal
A grammar with left recursion incurs trouble (non-termination) for LL(1) parsing.
Algorithm could be non-terminating
Eg. exp  exp addop term | term
Given a grammar G with left recursion, we can change it to G’, so that
G’ and G generates the same language
G’ has no left recursion.
Top-Down Parsing Dr. Zhiyao Liang
14

A method of left-recursion removal
Case 1: Simple immediate left recursion
For a rule with the form: A  A α | β
Change it to: A  β A’
A’  α A’ | ϵ
Example:

exp  exp addop term | term
exp  term exp’
exp’  addop term exp’ | ϵ

Top-Down Parsing Dr. Zhiyao Liang
15

Case 2: General immediate left recursion removal
A  A α1 | A α2 | … A αn | … β 1 | β 2 | … β m

A  β1 A’ | β2 A’ | … βm A’
A’  α1 A | α2 A | … αn A | ϵ
Example :

exp  exp + term | exp – term | term
exp  term exp’
exp’  + term exp’ | – term exp’ | ϵ

Top-Down Parsing Dr. Zhiyao Liang
16

Case 3: General Left-recursion removal
We have to consider indirect recursion, like
A  B b …
B  A a …
An algorithm: Order the variables from 1 to m.

for i:= 1 to m do
for j:= 1 to i -1 do
replace each grammar rule of the form
Ai  Aj β
by
Aj  α1 β | α2 β | … αk β ,
where Aj  α1| α2| … αk
is the current rule for Aj
Remove, if necessary, immediate left recursion of Ai.
This algorithm only works for a grammar where here is no rule like A  ϵ, and there is no cycle in derivation, like A =>* A.
Usually works for programming languages.

Top-Down Parsing Dr. Zhiyao Liang
17

Don’t need to worry about the new variables, since they have no left recursion.
Top-Down Parsing Dr. Zhiyao Liang
18

Top-Down Parsing Dr. Zhiyao Liang
19

The parse tree of 3 – 4 – 5
of the grammar with left recursion removed
But, we want to have to a syntax tree like :

–
/ \
– 5
/ \
3 4
How to do it?
Idea:
If we know how to compute the value of the expression based on the parse tree, then we know how to construct the wanted parse tree.
The computation order implies the order of tree construction
Top-Down Parsing Dr. Zhiyao Liang
20

Construct the syntax tree with LL(1) parsing
The syntax tree can be constructed on the fly during LL(1) parsing.
Just like how an expression will be evaluated.

E  E + n | n

is transformed into:

E  n E’
E’  + n # E’ | ϵ

Top-Down Parsing Dr. Zhiyao Liang
21

Left Factoring
It is applied when several choices share the same left prefix.
A  α β1 | α β2 | α β3 | …α βn

A  α A’
A’  β1 | β2 | β3 |… βn

For example:
stmt-sequence  stmt; stmt-sequence | stmt

stmt-sequence  stmt stmt-seq’
stmt-seq’  ; stmt-seq’ | ϵ

There is a simple algorithm that repeats removing shared prefixes until no two choices share a left-prefix in the gramma.
Top-Down Parsing Dr. Zhiyao Liang
22

First set and Follow set
The First set of a string X is the set of symbols that can appear as the first character in a sentential form (string of symbols, terminals and non-terminals) that can be derived from X.
The Follow set of a variable Y is the set of symbols that can appear after Y in a sentential form derived from S, and Y can be reduced to empty.

Top-Down Parsing Dr. Zhiyao Liang
23

First Set — definition
Let X be a grammar symbol (terminal or nonterminal) or ϵ. Then the set First(X) is defined as follows
If X is a terminal or ϵ, then First(X) = {X}
If X is a nonterminal, then for each production choice X  X1X2…Xn,
First(X) contains First(X1) – {ϵ}.
If also for some i < n, all the sets First(X1), … First(Xi) contains ϵ, then First(X) contains First(Xi+1) – {ϵ}. If all the sets First(X1), … First(Xn) contain ϵ, First(X) also contain ϵ. For a string α = X1X2…Xn, First(α) is defined similarly. Top-Down Parsing Dr. Zhiyao Liang 24 Algorithm to find First set for all nonterminals A do First(A) := {} while there are changes to any First(A) do for each production choice A → X1X2…Xn do k:=1 ; Continue := true; while Continue == true and k <= n do add First(Xk) – {ϵ} to First(A); if ϵ is not in First(Xk) then Continue := false; k := k + 1; if Continue == true then add ϵ to First(A) Top-Down Parsing Dr. Zhiyao Liang 25 About ϵ A variable X is nullable if X ==>* ϵ
Theorem:
X is nullable iff First(X) includes ϵ

γ
Top-Down Parsing Dr. Zhiyao Liang
26

Follow Set — definition
Given a nonterminal A, the set Follow(A), consisting of terminals, and possibly $, is defined as follows:
If A is the start symbol, then $ is in Follow(A).
If there is a production B  α A γ, then First(γ) – {ϵ} is in Follow(A).
If there is a production B  α A γ such that ϵ is in First(γ), then Follow(A) contains Follow(B).

Top-Down Parsing Dr. Zhiyao Liang
27

Algorithm to find Follow
Follow(start-symbol) := {$};
for all nonterminals A ≠ start-symbol do
Follow(A) := {};
while there are changes to any Follow sets do
for each production A  X1X2…Xn do
for each Xi that is a nonterminal do
add First(Xi+1Xi+2…Xn)–{ϵ} to
Follow(Xi)
/* Note: if i==n, then Xi+1Xi+2…Xn = ϵ */
if ϵ is in First(Xi+1Xi+2…Xn) then
add Follow(A) to Follow(Xi)

Top-Down Parsing Dr. Zhiyao Liang
28

Example

Top-Down Parsing Dr. Zhiyao Liang
29

Construction of the LL(1) Parsing table
Repeat the following two steps for each nonterminal A and production choice A  α
For each token α in First(α), add A  α to the entry M[A, α].
If ϵ is in First(α), for each element α of Follow(A) (a token or $), add A  α to M[A, α].

Top-Down Parsing Dr. Zhiyao Liang
30

Error Recovery in a Parser
Good design of error recovery in a parser:
Detect an error as soon as possible
After error, pick a place in the token stream to resume the parse.
For example, a form of error recovery is called panic mode. If an error occurs, scan ahead tokens until one of a set of synchronization tokens is reached, then resume parsing.
Parse as much of the code as possible.
Avoid error cascade problem.
Avoid endless error reports.
Top-Down Parsing Dr. Zhiyao Liang
31

Some error recovery code in the TINY parser
TreeNode * stmt_sequence(void)
{ TreeNode * t = statement(); // Error if t == NULL
TreeNode * p = t;
while ((token!=ENDFILE) && (token!=END)
&& (token!=ELSE) && (token!=UNTIL))
{ TreeNode * q;
match(SEMI);
q = statement();
if (q!=NULL) {
// Error recovery when t = NULL
if (t==NULL) t = p = q;
else /* now p cannot be NULL either */
{ p->sibling = q;
p = q;
}
}
}
return t;
}
Top-Down Parsing Dr. Zhiyao Liang
32

/docProps/thumbnail.jpeg

Related Posts