程序代写代做代考 algorithm Programming Language Syntax: Top-down Parsing

Programming Language Syntax: Top-down Parsing
Read: Scott, Chapter 2.3.2 and 2.3.3

Lecture Outline
n Top-down parsing (also called LL parsing) n LL(1) parsing table
n FIRST, FOLLOW, and PREDICT sets n LL(1) grammars
n Bottom-up parsing (also called LR parsing) n A brief overview, no detail
Programming Languages CSCI 4430, A. Milanova 2

LL(1) Parsing Table
n One dimension: nonterminal to expand n Other dimension: lookahead token
a
n E.g., entry “nonterminal A on terminal a” contains production A ® α
n Meaning: when parser is at nonterminal A and lookahead token is a, then parser
A
expands A by production A ® α
3
α

LL(1) Parsing Table
start ® expr $$
expr®termterm_tail term_tail®+term term_tail|ε term ® id factor_tail factor_tail ® * id factor_tail | ε
start
expr
term_tail
term
factor_tail
id
expr $$
term term_tail

id factor_tail

+


+ term term_tail

ε
*




* id factor_tail


ε

ε
Programming Languages CSCI 4430, A. Milanova 4
$$

Intuition
n Top-down parsing
n Parse tree is built from the top to the leaves n Always expand the leftmost nonterminal
expr id id + id*id
id
term
ε
term_tail
factor_tail ® * id factor_tail
+
factor_tail ® ε
factor_tail
What production applies for factor_tail on +?
+ does not belong to an expansion of factor_tail. However, factor_tail has an epsilon production and + belongs to an expansion of term_tail which follows factor_tail. Thus, predict the epsilon production.
5
expr ® term term_tail term_tail®+term term_tail |ε term ® id factor_tail
factor_tail ® * id factor_tail | ε

Intuition
n Top-down parsing
n Parse tree is built from the top to the leaves n Always expand the leftmost nonterminal
expr id id + id*id
+
term
id factor_tail ε
term_tail ® ε What production applies for term_tail on +?
term_tail ® + term term_tail
term_tail
+ term term_tail
+ is the first symbol in expansions of + term term_tail.
Thus, predict production term_tail ® + term term_tail Programming Languages CSCI 4430, A. Milanova 6
expr ® term term_tail term_tail®+term term_tail |ε term ® id factor_tail
factor_tail ® * id factor_tail | ε

LL(1) Tables and LL(1) Grammars
n We can construct an LL(1) parsing table for any context-free grammar
n In general, the table will contain multiply-defined entries. That is, for some nonterminal and lookahead token, more than one production applies
n A grammar whose LL(1) parsing table has no multiply-defined entries is said to be LL(1) grammar
n LL(1) grammars are a very special subclass of context- free grammars. Why?
Programming Languages CSCI 4430, A. Milanova 7

FIRST and FOLLOW sets
n Let α be any sequence of nonterminals and terminals
n FIRST(α) is the set of terminals a that begin the strings derived from α. E.g., expr Þ* id…, thus id in FIRST(expr)
n If there is a derivation α Þ* ε, then ε is in FIRST(α) n Let A be a nonterminal
n FOLLOW(A) is the set of terminals b (including special end-of-input marker $$) that can appear immediately to the right of A in some sentential form:
start Þ* …Ab… Þ*…
Programming Languages CSCI 4430, A. Milanova 8

Notation:
α is an arbitrary sequence
of terminals and nonterminals
Computing FIRST
n Apply these rules until no more terminals or ε can be added to any FIRST(α) set
(1) If α starts with a terminal a, then FIRST(α) = { a }
(2) If α is a nonterminal X, where X ®ε, then addεto FIRST(α)
(3) If α is a nonterminal X ® Y1Y2…Yk then add a to FIRST(X) if for some i, a is in FIRST(Yi) and ε is in all of FIRST(Y1), … FIRST(Yi-1). If ε is in all of FIRST(Y1), … FIRST(Yk), add ε to FIRST(X).
n Everything in FIRST(Y1) is surely in FIRST(X)
n If Y1 does not derive ε, then we add nothing more;
Otherwise, we add FIRST(Y2), and so on Similarly, if α is Y1Y2…Yk , we’ll repeat the above
9

Warm-up Exercise
start ® expr $$
expr®termterm_tail term_tail®+term term_tail|ε term ® id factor_tail factor_tail ® * id factor_tail | ε
FIRST(term) = { id }
FIRST(expr) =
FIRST(start) =
FIRST(term_tail) = FIRST(+ term term_tail) =
FIRST(factor_tail) =
Programming Languages CSCI 4430, A. Milanova 10

Exercise
start ® S$$ B ®zS |ε S®xS|Ay C®vS|ε A ® BCD | ε D ® w S
Compute FIRST sets: FIRST(x S) = FIRST(A y) =
FIRST(BCD) = FIRST(z S) =
FIRST(v S) = FIRST(w S) =
FIRST(S) = FIRST(A) =
FIRST(B) = FIRST(C) =
FIRST(D) =
11

Notation:
A,B,S are nonterminals.
α,β are arbitrary sequences
of terminals and nonterminals.
Computing FOLLOW
n Apply these rules until nothing can be added to any FOLLOW(A) set
(1) If there is a production A ® αBβ, then everything in FIRST(β) except for ε should be added to
FOLLOW(B)
(2) If there is a production A ® αB, or a production
A ® αBβ, where FIRST(β) contains ε, then everything in FOLLOW(A) should be added to
FOLLOW(B)
Programming Languages CSCI 4430, A. Milanova 12

Warm-up
start ® expr $$
expr®termterm_tail term_tail®+term term_tail|ε term ® id factor_tail factor_tail ® * id factor_tail | ε
FOLLOW(expr) = { $$ } FOLLOW(term_tail) = FOLLOW(term) = FOLLOW(factor_tail) =
Programming Languages CSCI 4430, A. Milanova 13

Exercise
start ® S$$ B ®zS |ε S®xS|Ay C®vS|ε A ® BCD | ε D ® w S
Compute FOLLOW sets: FOLLOW(A) = FOLLOW(B) = FOLLOW(C) = FOLLOW(D) = FOLLOW(S) =
Programming Languages CSCI 4430, A. Milanova 14

PREDICT Sets
PREDICT(A ® α) =
if α does not derive ε (FIRST(α) – {ε}) U FOLLOW(A)
if α derives ε
Programming Languages CSCI 4430, A. Milanova 15

Constructing LL(1) Parsing Table
n Algorithm uses PREDICT sets:
foreach production A ® α in grammar G foreach terminal a in PREDICT(A ® α) add A ® α into entry parse_table[A,a]
n If each entry in parse_table contains at most one production, then G is said to be LL(1)
Programming Languages CSCI 4430, A. Milanova 16

Exercise
start ® S$$ B ®zS |ε S®xS|Ay C®vS|ε A ® BCD | ε D ® w S
Compute PREDICT sets: PREDICT(S ® x S) = PREDICT(S ® A y) = PREDICT(A ® BCD) = PREDICT(A ® ε) =
… etc…
Programming Languages CSCI 4430, A. Milanova 17

Writing an LL(1) Grammar
n Most context-free grammars are not LL(1) grammars
n Obstacles to LL(1)-ness n Left recursion is an
obstacle. Why?
n Common prefixes are an obstacle.
Why?
expr ® expr + term | term term ® term*id|id
stmt ® if b then stmt else stmt | if b then stmt |
a
Programming Languages CSCI 4430, A. Milanova 18

Removal of Left Recursion
n Left recursion can be removed from a grammar mechanically
n Started from this left-recursive expression grammar:
n After removal of left recursion we obtain this equivalent grammar, which is LL(1):
expr ® expr + term | term term ® term*id|id
expr ® term term_tail term_tail®+term term_tail |ε term ® id factor_tail
factor_tail ® * id factor_tail | ε
CSCI 4430, A. Milanova 19

Removal of Common Prefixes
n Common prefixes can be removed mechanically as well, by using left-factoring
n Original if-then-else grammar:
stmt ® if b then stmt else stmt | ifbthenstmt |
a
n After left-factoring:
Programming Languages CSCI 4430, A. Milanova 20
stmt ® if b then stmt else_part | a else_part ® else stmt | ε

Exercise
n Compute FIRSTs:
FIRST(stmt $$), FIRST(if b then stmt else_part),
FIRST(a), FIRST(else stmt) n Compute FOLLOW:
FOLLOW(else_part)
n Compute PREDICT sets for all 5 productions
n Construct the LL(1) parsing table. Is this grammar an LL(1) grammar?
Programming Languages CSCI 4430, A. Milanova 21
start ® stmt $$
stmt ® if b then stmt else_part | a else_part ® else stmt | ε

Exercise
n Compute FIRSTs: FIRST(stmt $$) =
FIRST(if b then stmt else_part) = FIRST(a) =
FIRST(else stmt) =
Programming Languages CSCI 4430, A. Milanova 22
start ® stmt $$
stmt ® if b then stmt else_part | a else_part ® else stmt | ε

Exercise
n Compute FOLLOW: FOLLOW(else_part) =
Programming Languages CSCI 4430, A. Milanova 23
start ® stmt $$
stmt ® if b then stmt else_part | a else_part ® else stmt | ε

Exercise
n Construct the LL(1) parsing table
n Is this grammar an LL(1) grammar?
Programming Languages CSCI 4430, A. Milanova 24
start ® stmt $$
stmt ® if b then stmt else_part | a else_part ® else stmt | ε

Exercise
Programming Languages CSCI 4430, A. Milanova 25

Lecture Outline
n Top-down parsing (also called LL parsing) n LL(1) parsing table
n FIRST, FOLLOW, and PREDICT sets n LL(1) grammars
n Bottom-up parsing (also called LR parsing) n A brief overview, no detail
Programming Languages CSCI 4430, A. Milanova 26

Bottom-up Parsing
n Terminals are seen in the
order of appearance in the
token stream
id , id , id ;
n Parse tree is constructed n From the leaves to the top
list
id
list_tail
id list_tail
, id list_tail ;
,
n A right-most derivation in reverse
list ® id list_tail
list_tail®,idlist_tail |; Programming Languages CSCI 4430, A. Milanova 27

Bottom-up Parsing
Stack Input Action
id,id,id; shift
id ,id,id;
id, id,id;
id,id ,id;
id,id, id;
id,id,id ;
id,id,id;
Programming Languages CSCI 4430, A. Milanova
shift shift shift shift shift
reduce by list_tail®; 28
list ® id list_tail list_tail®,idlist_tail |;

Bottom-up Parsing
Stack
id,id,id list_tail id,id list_tail
id list_tail
list
Input
Action reduce by
list_tail ® ,id list_tail reduce by
list_tail ® ,id list_tail reduce by
list ® id list_tail ACCEPT
Programming Languages CSCI 4430, A. Milanova
29
list ® id list_tail list_tail®,idlist_tail |;

Bottom-up Parsing
n Also called LR parsing
n LR parsers work with LR(k) grammars n L stands for “left-to-right” scan of input
n R stands for “rightmost” derivation
n k stands for “need k tokens of lookahead”
n We are interested in LR(0) and LR(1) and variants in between
n LR parsing is better than LL parsing! n Accepts larger class of languages
n Just as efficient!
Programming Languages CSCI 4430, A. Milanova 30

LR Parsing
n The parsing method used in practice
n LR parsers recognize virtually all PL constructs
n LR parsers recognize a much larger set of grammars than predictive parsers
n LR parsing is efficient n LR parsing variants
n SLR (or Simple LR)
n LALR (or Lookahead LR) – yacc/bison generate LALR
parsers
n LR (Canonical LR) n SLR