留学生作业代写 Microsoft PowerPoint – lec11.pptx

Microsoft PowerPoint – lec11.pptx

Building a Predictive Parser

Copyright By PowCoder代写 加微信 powcoder

I.e., How to build the parse table for a
recursive-descent parser

Last Time: Intro LL(1) Predictive Parser

Predict the parse tree
Parser structure
– 1 token of lookahead
– A stack tracking the

current parse tree’s

– Selector/parse table
Necessary conditions
– Left-factored
– Free of left-recursion

Today: Building the Parse Table

Review grammar transformations
– Why they are necessary
– How they work

Build the parse table
– FIRST(X): Set of terminals that can begin at a subtree

rooted at X
– FOLLOW(X): Set of terminals that can appear after X

Review of LL(1) Grammar Transformations

Necessary (but not sufficient conditions) for LL(1)
– Free of left recursion
• “No left-recursive rules”
• Why? Need to look past the list to know when to cap it

– Left-factored
• “No rules with a common prefix, for any nonterminal”
• Why? We would need to look past the prefix to pick the

production

Why Left Recursion is a Problem
(Blackbox View)

XList XList x | x

How should we grow the tree top-down?

XListCurrent parse tree: Current token:

CFG snippet:

Correct if there are no more xs Correct if there are more xs

We don’t know which to choose without more lookahead

Why Left Recursion is a Problem
(Whitebox View)

XList XList x | x

xXListCurrent parse tree: Current token:

CFG snippet:

Parse table: XList XList x

(Stack overflow)

Left-Recursion Elimination: Review

Where β does not start with A, or may not be present

Preserves the language (a list of αs, starting with a β),
but uses right recursion

A’ α A’ | ε

Head of the list

Left-Recursion Elimination: Ex1

A’ α A’ | ε

E E cross id | id
E’ cross id E’ | ε

Left-Recursion Elimination: Ex2

A’ α A’ | ε

E E + T | T
T T * F | F
F ( E ) | id

E’ + T E’ | ε
T’ * F T’ | ε
F ( E ) | id

Left-Recursion Elimination: Ex3

A’ α A’ | ε

DList DList D | ε

D Type id semi

Type bool | int

DList ε DList’

DList’ D DList’ | ε

D Type id semi

Type bool | int

DList D DList | ε

D Type id semi

Type bool | int

Left Factoring: Review
Removing a common prefix from a grammar

Replace A α β1 | … | α βm | y1 | … | yn

With A α A’ | y1 | … | yn
A’ β1 | … | βm

Where βi and yi are sequence of symbols with no common prefix
Note: yi may not be present, and one of the β may be ε

Combine all “problematic” rules that start with α into one rule α A’
Now A’ represents the suffix of the “problematic” rules

Left Factoring: Example 1

A α β1 | … | α βm | y1 | … | yn
A α A’ | y1 | … | yn
A’ β1 | … | βm

X < a > | < b > | < c > | d

X < X’ | d X’ a > | b > | c >

α α αβ1 β2 β3 γ1

β 1 β 2 β 3

Left Factoring: Example 2

A α β1 | … | α βm | y1 | … | yn
A α A’ | y1 | … | yn
A’ β1 | … | βm

Stmt id assign E | id ( EList ) | return

E intlit | id

Elist E | E comma EList

Stmt id Stmt’ | return

Stmt’ assign E | ( EList )

E intlit | id

Elist E | E comma EList

Left Factoring: Example 3

A α β1 | … | α βm | y1 | … | yn
A α A’ | y1 | … | yn
A’ β1 | … | βm

S if E then S | if E then S else S | semi

S if E then S S’ | semi

S’ else S | ε

α αβ1 = ε β2

Left Factoring: Not Always Immediate

A α β1 | … | α βm | y1 | … | yn
A α A’ | y1 | … | yn
A’ β1 | … | βm

S A | C | return
A id assign E
C id ( EList )

This snippet yearns for left factoring

but we cannot! At least without inlining

S id assign E | id ( Elist ) | return

Let’s be more constructive

So far, we have only talked about what precludes
us from building a predictive parser
It is time to actually build the parse table

Building the Parse Table

What do we actually need to ensure that
production A α is the correct one to apply?
Assume α is an arbitrary sequence of symbols

1. What terminals could α possibly start with
 we call this the FIRST set

2. What terminal could possibly come after A
 we call this the FOLLOW set

Why is FIRST Important?

Assume the top-of-stack symbol is A and current
token is a
– Production 1: A α
– Production 2: A β

FIRST lets us disambiguate:
– If a is in FIRST(α), we know Production 1 is a viable

– If a is in FIRST(β), we know Production 2 is a viable choice
– If a is only in one of FIRST(α) and FIRST(β), we can

predict the production we need

FIRST Sets

FIRST(α) is the set of terminals that begin the strings
derivable from α, and also, if α can derive ε, then ε is
in FIRST(α).

Formally, let’s write it together
FIRST(α) =

FIRST Sets

FIRST(α) is the set of terminals that begin the strings
derivable from α, and also, if α can derive ε, then ε is
in FIRST(α).

Formally, let’s write it together
FIRST(α) =

FIRST Construction: Single Symbol

We begin by doing FIRST sets for a single,
arbitrary symbol X
– If X is a terminal: FIRST(X) = { X }
– If X is ε: FIRST(ε) = { ε }
– If X is a nonterminal, for each X Y1 Y2 … Yk
• Put FIRST(Y1) – {ε} into FIRST(X)
• If ε is in FIRST(Y1) , put FIRST(Y2) – {ε} into FIRST(X)
• If ε is also in FIRST(Y2), put FIRST(Y3) – {ε} into FIRST(X)
• If ε is in FIRST of all Yi symbols, put ε into FIRST(X)

Repeat this step until there are no changes to any nonterminal’s FIRST set

FIRST(X) Example

Exp Term Exp’

Exp’ minus Term Exp’ | ε

Term Factor Term’

Term’ divide Factor Term’ | ε

Factor intlit | lparen Exp rparen

FIRST(Factor) = { intlit, lparen }

FIRST(Term’) = { divide, ε }

FIRST(Term) = { intlit, lparen }

FIRST(Exp’) = { minus, ε}

FIRST(Exp) = { intlit, lparen}

Building FIRST(X) for nonterm X
for each X Y1 Y2 … Yk

• Add FIRST(Y1) – {ε}
• If ε is in FIRST(Y1 to i-1): add FIRST(Yi) – {ε}
• If ε is in all RHS symbols, add ε

We now extend FIRST to strings of symbols α
– We want to define FIRST for all RHS

Looks very similar to the procedure for single
Let α =Y1 Y2 … Yk
– Put FIRST(Y1) – {ε} in FIRST(α)
– If ε is in FIRST(Y1): add FIRST(Y2) – {ε} to FIRST(α)
– If ε is in FIRST(Y2): add FIRST(Y3) – {ε} to FIRST(α)
– If ε is in FIRST of all Yi symbols, put ε into FIRST(α)

Building FIRST(α) from FIRST(X)

Building FIRST(X) for nonterm X
for each X Y1 Y2 … Yk

• Add FIRST(Y1) – {ε}
• If ε is in FIRST(Y1 to i-1): add FIRST(Yi) – {ε}
• If ε is in all RHS symbols, add ε

Building FIRST(α)
Let α = Y1 Y2 … Yk

• Add FIRST(Y1) – {ε}
• If ε is in FIRST(Y1 to i-1): add FIRST(Yi) – {ε}
• If ε is in all RHS symbols, add ε

FIRST(α) Example

Building FIRST(α)
Let α = Y1 Y2 … Yk

• Add FIRST(Y1) – {ε}
• If ε is in FIRST(Y1 to i-1): add FIRST(Yi) – {ε}
• If, for all RHS symbols Yj, ε is in FIRST(Yj), add ε

X → + T X | ε
Y → * F Y | ε
F → ( E ) | id

FIRST(E) = {(, id}
FIRST(T) = {(, id}
FIRST(F) = {(, id}
FIRST(X) = {+, ε}
FIRST(Y) = {*, ε} FIRST( id ) = { id }

FIRST(T X) = {(, id}
FIRST(+ T X) = { + }
FIRST(F Y) = { (, id }
FIRST (* F Y) = { * }
FIRST( ( E ) ) = { ( }

FIRST sets alone do not provide enough
information to construct a parse table

If a rule R can derive ε, we need to know what
terminals can come just after R

FOLLOW Sets: Pictorially

For nonterminal A, FOLLOW(A) is the set of
terminals that can appear immediately to the
right of A

FOLLOW Sets: Pictorially

For nonterminal A, FOLLOW(A) is the set of
terminals that can appear immediately to the
right of A

table[A,+] = R ε ε

table[A,-] = R

FOLLOW Sets

For nonterminal A, FOLLOW(A) is the set of terminals
that can appear immediately to the right of A
Let’s write it together,
FOLLOW(A) =

FOLLOW Sets

For nonterminal A, FOLLOW(A) is the set of terminals
that can appear immediately to the right of A
Let’s write it together,
FOLLOW(A) =

FOLLOW Sets: Construction
To build FOLLOW(A)
– If A is the start nonterminal, add

– For rules X α A β
• Add FIRST(β) – {ε}
• If ε is in FIRST(β) or β is empty, add

Continue building FOLLOW sets
until reach a fixed point (i.e., no
more symbols can be added)

Where α, β may be empty

FOLLOW Sets Example

FOLLOW(A) for X α A β
If A is the start, add eof
Add FIRST(β) – {ε}
Add FOLLOW(X) if ε in FIRST(β) or β is empty

S B c | D B

B a b | c S

FIRST (D) { d, ε }

{ a, c, d }

FIRST (D B) { d, a, c }

FIRST (B c) { a, c }

FIRST (a b) { a }

FIRST (c S) { c }

FOLLOW (S) { eof } =
FOLLOW (B) { c, eof }=
FOLLOW (D) { a, c } =

FOLLOW (S) { eof, c } =
FOLLOW (B) { c, eof }=
FOLLOW (D) { a, c } =

FOLLOW (S) { eof, c } =
FOLLOW (B) { c, eof }=
FOLLOW (D) { a, c } =

Building the Parse Table

for each production X α {
for each terminal t in FIRST(α){

put α in Table[X][t]
if ε is in FIRST(α){

for each terminal t in FOLLOW(X){
put α in Table[X][t]

Table collision  Grammar is not in LL(1)

Putting it all together

Build FIRST sets for each nonterminal
Build FIRST sets for each production’s RHS
Build FOLLOW sets for each nonterminal
Use FIRST and FOLLOW to fill parse table for each
production

Tips n’ Tricks

FIRST sets
– Only contain alphabet terminals and ε
– Defined for arbitrary RHS and nonterminals
– Constructed by starting at the beginning of a

production
FOLLOW sets
– Only contain alphabet terminals and eof
– Defined for nonterminals only
– Constructed by jumping into production

FOLLOW(A) for X α A β
If A is the start, add eof
Add FIRST(β) – {ε}
Add FOLLOW(X) if ε in FIRST(β) or β empty

S B c | D B

B a b | c S

FIRST (D) { d, ε }

{ a, c, d }

FIRST (D B) { d, a, c }
FIRST (B c) { a, c }

FIRST (a b) { a }
FIRST (c S) { c }

FIRST(α) for α = Y1 Y2 … Yk
Add FIRST(Y1) – {ε}
If ε is in FIRST(Y1 to i-1): add FIRST(Yi) – {ε}
If ε is in all RHS symbols, add ε

for each production X α
for each terminal t in FIRST(α)
put α in Table[X][t]

if ε is in FIRST(α){
for each terminal t in FOLLOW(X){
put α in Table[X][t]

Table[X][t]

a b c d eof

B c B c D B

FOLLOW (S) { eof, c } =
FOLLOW (B) { c, eof }=
FOLLOW (D) { a, c } =

FIRST (d) { d }
FIRST (ε) { ε }

S B c | D B

B a b | c S

a b c d eof

B c B c D B

Why is a Table Collision a Problem?

Off Limits!

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com