Microsoft PowerPoint – lec10.pptx
Top-Down Parsing
Copyright By PowCoder代写 加微信 powcoder
Parsing: Review of the Big Picture (1)
• Context-free grammars (CFGs)
• Generation:
• Recognition: Given , is
• Translation
• Given , create a parse tree for
• Given , create an AST for
• The AST is passed to the next component of our compiler
Parsing: Review of the Big Picture (2)
• Algorithms
• Top-down (“recursive-descent”) for LL(1) grammars
• How to parse, given the appropriate parse table for
• How to construct the parse table for
• Bottom-up for LALR(1) grammars
• How to parse, given the appropriate parse table for
• How to construct the parse table for
– Step 1: get a grammar in Chomsky Normal Form
– Step 2: Build all possible parse trees bottom-up
• Start with runs of 1 terminal
• Connect 1-terminal runs into 2-terminal runs
• Connect 1- and 2- terminal runs into 3-terminal runs
• Connect 1- and 3- or 2- and 2- terminal runs into 4 terminal runs
• If we can connect the entire tree, rooted at the start symbol,
we’ve found a valid parse
Some Interesting Properties of CYK
Very old algorithm
– Already well known in early 70s
No problems with ambiguous grammars:
– Gives a solution for all possible parse tree
simultaneously
CYK Example
I,N L I,N C I,N R
id id id, )(
1,2 2,3 3,4 4,5 5,6
1,3 2,4 3,5 4,6
1,4 2,5 3,6
In general, go up a column
and down a diagonal
Thinking about Language Design
Balanced considerations
– Powerful enough to be useful
– Simple enough to be parsable
Syntax need not be complex for complex
– Guy Steele’s “Growing a Language”
Video: https://www.youtube.com/watch?v=_ahvzDzKdB0
Text: http://www.cs.virginia.edu/~evans/cs655/readings/steele.pdf
Restricting the Grammar
By restricting our grammars we can
– Detect ambiguity
– Build linear-time, O(n) parsers
LL(1) languages
– Particularly amenable to parsing
– Parsable by predictive (top-down) parsers
• Sometimes called “recursive-descent parsers”
Top-Down Parsers
Start at the Start symbol
Repeatedly: “predict” what production to use
– Example: if the current token to be parsed is an id,
no need to try productions that start with intLiteral
– This might seem simple, but keep in mind that a
chain of productions may have to be used to get to
the rule that handles, e.g., id
Predictive Parser Sketch
Selector table
“Work to do”
EOFa b a a
Token Stream
Row: nonterminal
Column: terminal
S → ( S ) | { S } | ε
( S ) ε { S } ε εS
( ) { } eof
“Work to do”
( { } ) eof
currentcurrentcurrentcurrentcurrent
A Snapshot of a Predictive Parser
“Work to do”
Not yet seenAlready processed
The structure
that the parser
expects to build
stack.push(eof)
stack.push(Start non-term)
t = scanner.getToken()
if stack.top is a terminal y
match y with t
pop y from the stack
t = scanner.next_token()
if stack.top is a nonterminal X
get table[X,t]
pop X from the stack
push production’s RHS (each symbol from Right to Left)
Until one of the following:
stack is empty
stack.top is a terminal that does not match t
stack.top is a non-term and parse-table entry is empty
Initial stack is “Start eof”
Example 2, bad input: You try
S → ( S ) | { S } | ε
( S ) ε { S } ε εS
( ) { } eof
This Parser Works Great!
Given a single token we always knew
exactly what production it started
( S ) ε { S } ε εS
( ) { } eof
Two Outstanding Issues
1. How do we know if the language is LL(1)
– Easy to imagine a grammar where a single token
is not enough to select a rule
1. How do we build the selector table?
– It turns out that there is one answer to both:
S → ( S ) | { S } | ε | ( )
If our selector table has 1 production per cell, then grammar is LL(1)
LL(1) Grammar Transformations
Necessary (but not sufficient conditions) for LL(1)
– Free of left recursion
• “No left-recursive rules”
• Why? Need to look past the list to know when to cap it
– Left-factored
• “No rules with a common prefix, for any nonterminal”
• Why? We would need to look past the prefix to pick the
production
Left-Recursion
• Recall that a grammar for which is
left recursive
• A grammar is immediately left recursive if the
repetition of the LHS nonterminal can happen
in one step, e.g.,
• Fortunately, it is always possible to change the
grammar to remove left recursion without
changing the language it recognizes
Why Left Recursion is a Problem
(Blackbox View)
XList XList x | x
How should we grow the tree top-down?
XListCurrent parse tree: Current token:
CFG snippet:
Correct if there are no more xs Correct if there are more xs
We don’t know which to choose without more lookahead
Why Left Recursion is a Problem
(Whitebox View)
XList XList x | x
xXListCurrent parse tree: Current token:
CFG snippet:
Parse table: XList XList x
(Stack overflow)
Removing Left-Recursion
A → A α | β A → β A’
(for a single immediately left-recursive rule)
Where β does
not begin with A
Exp → Exp – Factor
| Factor
Factor → intlit | ( Exp )
A → A α | β
Exp → Factor Exp’
Exp’ → – Factor Exp’
Factor → intlit | ( Exp )
Let’s check in on the parse tree…
Exp → Exp – Factor
| Factor
Factor → intlit | ( Exp )
Exp → Factor Exp’
Exp’ → – Factor Exp’
Factor → intlit | ( Exp )
2 – 3 grouped together
grouping of 2 – 3 destroyed
… We’ll fix this issue later
General Rule for Removing Immediate
Left-Recursion
A → A α1 | A α2 | … | A αm | β1 | β2 | … | βn
A → β1 A’ | β2 A’ | … | βn A’
A’ → α1 A’ | α2 A’ | … | αm A’ | ε
Left-Factored Grammars
If a nonterminal has two productions whose
right-hand sides have a common prefix, the
grammar is not left-factored, and not LL(1)
Exp → ( Exp ) | ( )
Not left-factored
Left Factoring
Given productions of the form
A → α β1 | α β2
A’ → β1 | β2
Combined Example
Exp → ( Exp ) | Exp Exp | ( )
Exp → ( Exp ) Exp’ | ( ) Exp’
Exp’ → Exp Exp’ | ε
Exp -> ( Exp”
Exp” -> Exp ) Exp’ | ) Exp’
Exp’ -> exp exp’ | ε
Remove immediate left-recursion
Left-factoring
Where are we at?
We’ve set ourselves up for success in building
the selection table
– Two things that prevent a grammar from being LL(1)
were identified and avoided
• Left-recursive grammars
• Non left-factored grammars
– Next time
• Build two data structures that combine to yield a selector
– FIRST sets
– FOLLOW sets
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com