Microsoft PowerPoint – lec11.pptx
Building a Predictive Parser
Copyright By PowCoder代写 加微信 powcoder
I.e., How to build the parse table for a
recursive-descent parser
Last Time: Intro LL(1) Predictive Parser
Predict the parse tree
Parser structure
– 1 token of lookahead
– A stack tracking the
current parse tree’s
– Selector/parse table
Necessary conditions
– Left-factored
– Free of left-recursion
Today: Building the Parse Table
Review grammar transformations
– Why they are necessary
– How they work
Build the parse table
– FIRST(X): Set of terminals that can begin at a subtree
rooted at X
– FOLLOW(X): Set of terminals that can appear after X
Review of LL(1) Grammar Transformations
Necessary (but not sufficient conditions) for LL(1)
– Free of left recursion
• “No left-recursive rules”
• Why? Need to look past the list to know when to cap it
– Left-factored
• “No rules with a common prefix, for any nonterminal”
• Why? We would need to look past the prefix to pick the
production
Why Left Recursion is a Problem
(Blackbox View)
XList XList x | x
How should we grow the tree top-down?
XListCurrent parse tree: Current token:
CFG snippet:
Correct if there are no more xs Correct if there are more xs
We don’t know which to choose without more lookahead
Why Left Recursion is a Problem
(Whitebox View)
XList XList x | x
xXListCurrent parse tree: Current token:
CFG snippet:
Parse table: XList XList x
(Stack overflow)
Left-Recursion Elimination: Review
Where β does not start with A, or may not be present
Preserves the language (a list of αs, starting with a β),
but uses right recursion
A’ α A’ | ε
Head of the list
Left-Recursion Elimination: Ex1
A’ α A’ | ε
E E cross id | id
E’ cross id E’ | ε
Left-Recursion Elimination: Ex2
A’ α A’ | ε
E E + T | T
T T * F | F
F ( E ) | id
E’ + T E’ | ε
T’ * F T’ | ε
F ( E ) | id
Left-Recursion Elimination: Ex3
A’ α A’ | ε
DList DList D | ε
D Type id semi
Type bool | int
DList ε DList’
DList’ D DList’ | ε
D Type id semi
Type bool | int
DList D DList | ε
D Type id semi
Type bool | int
Left Factoring: Review
Removing a common prefix from a grammar
Replace A α β1 | … | α βm | y1 | … | yn
With A α A’ | y1 | … | yn
A’ β1 | … | βm
Where βi and yi are sequence of symbols with no common prefix
Note: yi may not be present, and one of the β may be ε
Combine all “problematic” rules that start with α into one rule α A’
Now A’ represents the suffix of the “problematic” rules
Left Factoring: Example 1
A α β1 | … | α βm | y1 | … | yn
A α A’ | y1 | … | yn
A’ β1 | … | βm
X < a > | < b > | < c > | d
X < X’ | d X’ a > | b > | c >
α α αβ1 β2 β3 γ1
β 1 β 2 β 3
Left Factoring: Example 2
A α β1 | … | α βm | y1 | … | yn
A α A’ | y1 | … | yn
A’ β1 | … | βm
Stmt id assign E | id ( EList ) | return
E intlit | id
Elist E | E comma EList
Stmt id Stmt’ | return
Stmt’ assign E | ( EList )
E intlit | id
Elist E | E comma EList
Left Factoring: Example 3
A α β1 | … | α βm | y1 | … | yn
A α A’ | y1 | … | yn
A’ β1 | … | βm
S if E then S | if E then S else S | semi
S if E then S S’ | semi
S’ else S | ε
α αβ1 = ε β2
Left Factoring: Not Always Immediate
A α β1 | … | α βm | y1 | … | yn
A α A’ | y1 | … | yn
A’ β1 | … | βm
S A | C | return
A id assign E
C id ( EList )
This snippet yearns for left factoring
but we cannot! At least without inlining
S id assign E | id ( Elist ) | return
Let’s be more constructive
So far, we have only talked about what precludes
us from building a predictive parser
It is time to actually build the parse table
Building the Parse Table
What do we actually need to ensure that
production A α is the correct one to apply?
Assume α is an arbitrary sequence of symbols
1. What terminals could α possibly start with
we call this the FIRST set
2. What terminal could possibly come after A
we call this the FOLLOW set
Why is FIRST Important?
Assume the top-of-stack symbol is A and current
token is a
– Production 1: A α
– Production 2: A β
FIRST lets us disambiguate:
– If a is in FIRST(α), we know Production 1 is a viable
– If a is in FIRST(β), we know Production 2 is a viable choice
– If a is only in one of FIRST(α) and FIRST(β), we can
predict the production we need
FIRST Sets
FIRST(α) is the set of terminals that begin the strings
derivable from α, and also, if α can derive ε, then ε is
in FIRST(α).
Formally, let’s write it together
FIRST(α) =
FIRST Sets
FIRST(α) is the set of terminals that begin the strings
derivable from α, and also, if α can derive ε, then ε is
in FIRST(α).
Formally, let’s write it together
FIRST(α) =
FIRST Construction: Single Symbol
We begin by doing FIRST sets for a single,
arbitrary symbol X
– If X is a terminal: FIRST(X) = { X }
– If X is ε: FIRST(ε) = { ε }
– If X is a nonterminal, for each X Y1 Y2 … Yk
• Put FIRST(Y1) – {ε} into FIRST(X)
• If ε is in FIRST(Y1) , put FIRST(Y2) – {ε} into FIRST(X)
• If ε is also in FIRST(Y2), put FIRST(Y3) – {ε} into FIRST(X)
• If ε is in FIRST of all Yi symbols, put ε into FIRST(X)
Repeat this step until there are no changes to any nonterminal’s FIRST set
FIRST(X) Example
Exp Term Exp’
Exp’ minus Term Exp’ | ε
Term Factor Term’
Term’ divide Factor Term’ | ε
Factor intlit | lparen Exp rparen
FIRST(Factor) = { intlit, lparen }
FIRST(Term’) = { divide, ε }
FIRST(Term) = { intlit, lparen }
FIRST(Exp’) = { minus, ε}
FIRST(Exp) = { intlit, lparen}
Building FIRST(X) for nonterm X
for each X Y1 Y2 … Yk
• Add FIRST(Y1) – {ε}
• If ε is in FIRST(Y1 to i-1): add FIRST(Yi) – {ε}
• If ε is in all RHS symbols, add ε
We now extend FIRST to strings of symbols α
– We want to define FIRST for all RHS
Looks very similar to the procedure for single
Let α =Y1 Y2 … Yk
– Put FIRST(Y1) – {ε} in FIRST(α)
– If ε is in FIRST(Y1): add FIRST(Y2) – {ε} to FIRST(α)
– If ε is in FIRST(Y2): add FIRST(Y3) – {ε} to FIRST(α)
– If ε is in FIRST of all Yi symbols, put ε into FIRST(α)
Building FIRST(α) from FIRST(X)
Building FIRST(X) for nonterm X
for each X Y1 Y2 … Yk
• Add FIRST(Y1) – {ε}
• If ε is in FIRST(Y1 to i-1): add FIRST(Yi) – {ε}
• If ε is in all RHS symbols, add ε
Building FIRST(α)
Let α = Y1 Y2 … Yk
• Add FIRST(Y1) – {ε}
• If ε is in FIRST(Y1 to i-1): add FIRST(Yi) – {ε}
• If ε is in all RHS symbols, add ε
FIRST(α) Example
Building FIRST(α)
Let α = Y1 Y2 … Yk
• Add FIRST(Y1) – {ε}
• If ε is in FIRST(Y1 to i-1): add FIRST(Yi) – {ε}
• If, for all RHS symbols Yj, ε is in FIRST(Yj), add ε
X → + T X | ε
Y → * F Y | ε
F → ( E ) | id
FIRST(E) = {(, id}
FIRST(T) = {(, id}
FIRST(F) = {(, id}
FIRST(X) = {+, ε}
FIRST(Y) = {*, ε} FIRST( id ) = { id }
FIRST(T X) = {(, id}
FIRST(+ T X) = { + }
FIRST(F Y) = { (, id }
FIRST (* F Y) = { * }
FIRST( ( E ) ) = { ( }
FIRST sets alone do not provide enough
information to construct a parse table
If a rule R can derive ε, we need to know what
terminals can come just after R
FOLLOW Sets: Pictorially
For nonterminal A, FOLLOW(A) is the set of
terminals that can appear immediately to the
right of A
FOLLOW Sets: Pictorially
For nonterminal A, FOLLOW(A) is the set of
terminals that can appear immediately to the
right of A
table[A,+] = R ε ε
table[A,-] = R
FOLLOW Sets
For nonterminal A, FOLLOW(A) is the set of terminals
that can appear immediately to the right of A
Let’s write it together,
FOLLOW(A) =
FOLLOW Sets
For nonterminal A, FOLLOW(A) is the set of terminals
that can appear immediately to the right of A
Let’s write it together,
FOLLOW(A) =
FOLLOW Sets: Construction
To build FOLLOW(A)
– If A is the start nonterminal, add
– For rules X α A β
• Add FIRST(β) – {ε}
• If ε is in FIRST(β) or β is empty, add
Continue building FOLLOW sets
until reach a fixed point (i.e., no
more symbols can be added)
Where α, β may be empty
FOLLOW Sets Example
FOLLOW(A) for X α A β
If A is the start, add eof
Add FIRST(β) – {ε}
Add FOLLOW(X) if ε in FIRST(β) or β is empty
S B c | D B
B a b | c S
FIRST (D) { d, ε }
{ a, c, d }
FIRST (D B) { d, a, c }
FIRST (B c) { a, c }
FIRST (a b) { a }
FIRST (c S) { c }
FOLLOW (S) { eof } =
FOLLOW (B) { c, eof }=
FOLLOW (D) { a, c } =
FOLLOW (S) { eof, c } =
FOLLOW (B) { c, eof }=
FOLLOW (D) { a, c } =
FOLLOW (S) { eof, c } =
FOLLOW (B) { c, eof }=
FOLLOW (D) { a, c } =
Building the Parse Table
for each production X α {
for each terminal t in FIRST(α){
put α in Table[X][t]
if ε is in FIRST(α){
for each terminal t in FOLLOW(X){
put α in Table[X][t]
Table collision Grammar is not in LL(1)
Putting it all together
Build FIRST sets for each nonterminal
Build FIRST sets for each production’s RHS
Build FOLLOW sets for each nonterminal
Use FIRST and FOLLOW to fill parse table for each
production
Tips n’ Tricks
FIRST sets
– Only contain alphabet terminals and ε
– Defined for arbitrary RHS and nonterminals
– Constructed by starting at the beginning of a
production
FOLLOW sets
– Only contain alphabet terminals and eof
– Defined for nonterminals only
– Constructed by jumping into production
FOLLOW(A) for X α A β
If A is the start, add eof
Add FIRST(β) – {ε}
Add FOLLOW(X) if ε in FIRST(β) or β empty
S B c | D B
B a b | c S
FIRST (D) { d, ε }
{ a, c, d }
FIRST (D B) { d, a, c }
FIRST (B c) { a, c }
FIRST (a b) { a }
FIRST (c S) { c }
FIRST(α) for α = Y1 Y2 … Yk
Add FIRST(Y1) – {ε}
If ε is in FIRST(Y1 to i-1): add FIRST(Yi) – {ε}
If ε is in all RHS symbols, add ε
for each production X α
for each terminal t in FIRST(α)
put α in Table[X][t]
if ε is in FIRST(α){
for each terminal t in FOLLOW(X){
put α in Table[X][t]
Table[X][t]
a b c d eof
B c B c D B
FOLLOW (S) { eof, c } =
FOLLOW (B) { c, eof }=
FOLLOW (D) { a, c } =
FIRST (d) { d }
FIRST (ε) { ε }
S B c | D B
B a b | c S
a b c d eof
B c B c D B
Why is a Table Collision a Problem?
Off Limits!
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com