CS代写 Microsoft PowerPoint – lec10.pptx

Microsoft PowerPoint – lec10.pptx

Top-Down Parsing

Copyright By PowCoder代写 加微信 powcoder

Parsing: Review of the Big Picture (1)

• Context-free grammars (CFGs)
• Generation:
• Recognition: Given , is

• Translation
• Given , create a parse tree for
• Given , create an AST for
• The AST is passed to the next component of our compiler

Parsing: Review of the Big Picture (2)

• Algorithms
• Top-down (“recursive-descent”) for LL(1) grammars
• How to parse, given the appropriate parse table for
• How to construct the parse table for

• Bottom-up for LALR(1) grammars
• How to parse, given the appropriate parse table for
• How to construct the parse table for

– Step 1: get a grammar in Chomsky Normal Form
– Step 2: Build all possible parse trees bottom-up

• Start with runs of 1 terminal
• Connect 1-terminal runs into 2-terminal runs
• Connect 1- and 2- terminal runs into 3-terminal runs
• Connect 1- and 3- or 2- and 2- terminal runs into 4 terminal runs
• If we can connect the entire tree, rooted at the start symbol,

we’ve found a valid parse

Some Interesting Properties of CYK

Very old algorithm
– Already well known in early 70s

No problems with ambiguous grammars:
– Gives a solution for all possible parse tree

simultaneously

CYK Example

I,N L I,N C I,N R

id id id, )(

1,2 2,3 3,4 4,5 5,6

1,3 2,4 3,5 4,6

1,4 2,5 3,6

In general, go up a column
and down a diagonal

Thinking about Language Design

Balanced considerations
– Powerful enough to be useful
– Simple enough to be parsable

Syntax need not be complex for complex

– Guy Steele’s “Growing a Language”
Video: https://www.youtube.com/watch?v=_ahvzDzKdB0

Text: http://www.cs.virginia.edu/~evans/cs655/readings/steele.pdf

Restricting the Grammar

By restricting our grammars we can
– Detect ambiguity
– Build linear-time, O(n) parsers

LL(1) languages
– Particularly amenable to parsing
– Parsable by predictive (top-down) parsers

• Sometimes called “recursive-descent parsers”

Top-Down Parsers

Start at the Start symbol
Repeatedly: “predict” what production to use

– Example: if the current token to be parsed is an id,
no need to try productions that start with intLiteral

– This might seem simple, but keep in mind that a
chain of productions may have to be used to get to
the rule that handles, e.g., id

Predictive Parser Sketch

Selector table

“Work to do”

EOFa b a a

Token Stream

Row: nonterminal

Column: terminal

S → ( S ) | { S } | ε

( S ) ε { S } ε εS
( ) { } eof

“Work to do”

( { } ) eof

currentcurrentcurrentcurrentcurrent

A Snapshot of a Predictive Parser

“Work to do”

Not yet seenAlready processed

The structure
that the parser
expects to build

stack.push(eof)
stack.push(Start non-term)
t = scanner.getToken()

if stack.top is a terminal y
match y with t
pop y from the stack
t = scanner.next_token()

if stack.top is a nonterminal X
get table[X,t]
pop X from the stack
push production’s RHS (each symbol from Right to Left)

Until one of the following:
stack is empty
stack.top is a terminal that does not match t
stack.top is a non-term and parse-table entry is empty

Initial stack is “Start eof”

Example 2, bad input: You try

S → ( S ) | { S } | ε

( S ) ε { S } ε εS
( ) { } eof

This Parser Works Great!

Given a single token we always knew
exactly what production it started

( S ) ε { S } ε εS
( ) { } eof

Two Outstanding Issues

1. How do we know if the language is LL(1)
– Easy to imagine a grammar where a single token

is not enough to select a rule

1. How do we build the selector table?
– It turns out that there is one answer to both:

S → ( S ) | { S } | ε | ( )

If our selector table has 1 production per cell, then grammar is LL(1)

LL(1) Grammar Transformations

Necessary (but not sufficient conditions) for LL(1)

– Free of left recursion
• “No left-recursive rules”
• Why? Need to look past the list to know when to cap it

– Left-factored
• “No rules with a common prefix, for any nonterminal”
• Why? We would need to look past the prefix to pick the

production

Left-Recursion

• Recall that a grammar for which is
left recursive

• A grammar is immediately left recursive if the
repetition of the LHS nonterminal can happen
in one step, e.g.,

• Fortunately, it is always possible to change the
grammar to remove left recursion without
changing the language it recognizes

Why Left Recursion is a Problem
(Blackbox View)

XList XList x | x

How should we grow the tree top-down?

XListCurrent parse tree: Current token:

CFG snippet:

Correct if there are no more xs Correct if there are more xs

We don’t know which to choose without more lookahead

Why Left Recursion is a Problem
(Whitebox View)

XList XList x | x

xXListCurrent parse tree: Current token:

CFG snippet:

Parse table: XList XList x

(Stack overflow)

Removing Left-Recursion

A → A α | β A → β A’

(for a single immediately left-recursive rule)

Where β does
not begin with A

Exp → Exp – Factor
| Factor

Factor → intlit | ( Exp )

A → A α | β

Exp → Factor Exp’
Exp’ → – Factor Exp’

Factor → intlit | ( Exp )

Let’s check in on the parse tree…

Exp → Exp – Factor
| Factor

Factor → intlit | ( Exp )

Exp → Factor Exp’
Exp’ → – Factor Exp’

Factor → intlit | ( Exp )

2 – 3 grouped together

grouping of 2 – 3 destroyed

… We’ll fix this issue later

General Rule for Removing Immediate
Left-Recursion

A → A α1 | A α2 | … | A αm | β1 | β2 | … | βn

A → β1 A’ | β2 A’ | … | βn A’
A’ → α1 A’ | α2 A’ | … | αm A’ | ε

Left-Factored Grammars

If a nonterminal has two productions whose
right-hand sides have a common prefix, the
grammar is not left-factored, and not LL(1)

Exp → ( Exp ) | ( )

Not left-factored

Left Factoring

Given productions of the form

A → α β1 | α β2

A’ → β1 | β2

Combined Example

Exp → ( Exp ) | Exp Exp | ( )

Exp → ( Exp ) Exp’ | ( ) Exp’
Exp’ → Exp Exp’ | ε

Exp -> ( Exp”
Exp” -> Exp ) Exp’ | ) Exp’
Exp’ -> exp exp’ | ε

Remove immediate left-recursion

Left-factoring

Where are we at?

We’ve set ourselves up for success in building
the selection table

– Two things that prevent a grammar from being LL(1)
were identified and avoided
• Left-recursive grammars
• Non left-factored grammars

– Next time
• Build two data structures that combine to yield a selector

– FIRST sets
– FOLLOW sets

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com