程序代写代做代考 python Java algorithm Parsing1

Parsing1

COP5556 Programming Language Principles
Parsing 1

The rest of Scott Chapter 2
Including section 2.3.5 on the online supplement
Skim 2.3.3 Bottom-Up Parsing
2
Reading

Recall: first step of language translation is lexical analysis
Converts sequence of characters into a sequence of tokens
Lexical structure usually specified with regular expressions
Second step of language translation is parsing
Recognizes legal phrases (order of tokens)
Constructs AST

3

Regular expressions are not recursive, thus cannot specify nested constructs

Example: language of expressions
(x + y) + (w + z)
is an expression with
two subexpressions, (x + y) and (w + z)
each with two subexpressions x and y, and w and z

The language of expressions cannot be specified with regular expressions.

4
New formalism

Instead, we specify the language with a grammar using BNF (Backus-Naur Form) or EBNF (extended BNF) notation.

BNF allows us to specify context-free-grammars

5

A grammar is a tuple (, N, P, S) where
 is a set specifying the alphabet of tokens (also called terminal symbols).
N is the set of non-terminal symbols.
non-terminals are variables that denote sets of sentences.
impose a hierarchical structure on sentences.
P is a set of productions (substitution rules).
S is the start symbol where S is in N.
6
Context Free Grammar

Compact and precise specification that allows an infinite number of sentences to be specified with a finite definition
Allows systematic or automatic generation of efficient parsers for certain types of grammars
Provides a framework for specifying semantics. We will see this later.
Allows formal reasoning about languages.
7
Benefits of (E)BNF notation

S ::= 
S ::= (S)S

, N are left implicit
but  = {(,)}, N = {S}
Recursive: S defined in terms of itself
As expected,  is the empty sequence

What language does this grammar generate?
8
Example

S ::= 
S ::= (S)S

, N are left implicit
but  = {(,)}, N = {S}
Note recursion—S defined in terms of itself
As expected,  is the empty sequence

The set of strings of balance parentheses
9
Example

Pure BNF only allows productions only of the form 1 ::= 2, where 1 and 2 are strings of terminal and non-terminal symbols.
EBNF (extended BNF) adds |, *, + from RE
Example
BNF
A ::=  B 
B ::=  B
B ::= 
EBNF
A ::=  * 
10
EBNF Notation
These grammars generate
the same language

Pure BNF only allows productions only of the form 1 ::= 2, where 1 and 2 are strings of terminal and non-terminal symbols.
EBNF (extended BNF) adds |, *, + from RE
Example
BNF
A ::=  B 
B ::=  B
B ::= 
EBNF
A ::=  * 
11
EBNF Notation
Recursion in parser
Iteration in parser

11

expr ::= factor | expr op factor
factor ::= ident | numlit | ( expr )

ident, numlit, op are tokens which were defined earlier.

12
Example
::= | | |

::= * | 0
::= 0 |
::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

::=
*
::= A .. Z | a..z | $ | _
::=
|

::= + | == | *
( | )

expr ::= factor | expr op factor
factor ::= ident | numlit | ( expr )

Some expressions:
x
34
s + 2
s==3+4
(4+5)==9
(1 * (2+3) == 4) + 5
abc == 34 * 4
13

expr ::= factor | expr op factor
factor ::= ident | numlit | ( expr )

What is ?

14

expr ::= factor | expr op factor
factor ::= ident | numlit | ( expr )

 = {ident, numlit, op, ( , ) }

15

Derivation of expressions

expr ::= factor | expr op factor
factor ::= ident | numlit | ( expr )

x
expr →
factor →
ident(x)

16

expr ::= factor | expr op factor
factor ::= ident | numlit | ( expr )

34
expr →
factor →
numlit(34)

17

expr ::= factor | expr op factor
factor ::= ident | numlit | ( expr )

s+2
expr →
expr op(+) factor →
factor op(+) factor →
ident(s) op(+) factor →
ident(s) op(+) numlit(2)
18

19
Left as exercise

s==3+4
(4+5)==9
(1 * (2+3) == 4) + 5
abc == 34 * 4

The rule αAβ ::= ασβ
only allows A to be replaced by σ when it appears between α and β.
contextual constraint on the substitution of A makes this grammar NOT context-free.

vs a context-free grammar, where
all the productions have the form A ::= ….. with a single non-terminal on the left side.
This allows A to be replaced anywhere it appears.
20
Non-context free grammars

Programming languages are NOT context free.
But we use context-free grammars anyway

A context-free description of the grammar is a useful starting point for implementation of translators.

The context constraints are specified and checked using different techniques.
21

Grammars give rules for generating sentences in a language
Parsing is the process of recognizing the language (and determining its structure)
Parse trees are a way of representing the structure of a sentence
start symbol at the root
non-terminals at interior nodes
terminals as leaves
22
Parsing

Sentence ()() from grammar S ::= (S)S | ε.

23
Example: parse tree
(
)
S
s
S
s
S
s
ε
(

ε
S
)
S

ε

expr ::= factor | expr op factor
factor ::= ident | numlit | ( expr )

x
expr →
factor →
ident(x)

24
Example: parse tree
factor
expr
ident(x)

Note how the parse tree corresponds to the derivation
24

expr ::= factor | expr op factor
factor ::= ident | numlit | ( expr )

34
expr →
factor →
numlit(34)

25
Example: parse tree
expr
factor
numlit(34)

expr ::= factor | expr op factor
factor ::= ident | numlit | ( expr )

s+2
expr →
expr op(+) factor →
factor op(+) factor →
ident(s) op(+) factor →
ident(s) op(+) numlit(2)
26
Example: Parse tree
expr
expr
op (+)
factor
factor
ident(s)
numlit(2)

Admits more than one parse tree for a sentence.
Example
stmt ::= if_stmt | (other non-if statements)…..
if_stmt ::= if boolean_expr then stmt
if_stmt ::= if boolean_expr then stmt else stmt

if b1 then if b2 then s1 else s2
27
Ambiguous Grammar
if_stmt
if b1 then stmt
if b2 then stmt else stmt
s1
s2

if-stmt
if b1 then stmt else smt
if b2 then s1
s2

Note that these parse trees leave out some detail regarding the boolean expressions.
For example, in the tree on the left, the if_stmt should have two children, boolean_expr which has a child b1, and stmt, which is another statement.
27

Without changing the programming language being specified:
Use disambiguating rule to add “hack” to parser.
else is matched with nearest previously unmatched if.
or, modify the grammar so it is non-ambiguous
Resulting grammar must generate the same language
Grammars are not unique: many grammars may specify the same languages
Change the language (if you can)
Often this improves the language
28
Dealing with ambiguous grammars

else is matched with nearest previously unmatched if
if b1 then if b2 then s1 else s2

29
Example: Ambiguous grammar with disambiguating rule
if_stmt
if b1 then stmt
if b2 then stmt else stmt
s1
s2

Distinguish between matched and unmatched statements
You can’t have an unmatched statement immediately before an else

stmt ::= matched | unmatched
matched ::=
if boolean_expr then matched else matched
| ….(other non-if stmt)
unmatched ::=
if boolean_expr then stmt
| if boolean_expr then matched else unmatched

30
Example: modify the grammar

31
stmt ::= matched | unmatched
matched ::= if boolean_expr then matched else matched
| ….(other non-if stmt)
unmatched ::= if boolean_expr then stmt
| if boolean_expr then matched else unmatched

if b1 then if b2 then s1 else s2

(only one works)
unmatched
if b1 then stmt
matched
if b2 then matched else matched
s1
s2

Change the language to explicitly terminate if statements.

stmt ::= if boolean_expr then stmt endif
| if boolean_expr then stmt else stmt endif
| other non-if statements

Allowed sentences
if b1 then if b2 then s1 else s2 endif endif
if b1 then if b2 then s1 endif else s2 endif

32
Example: modify the language

The difficulty of recognition depends on the complexity of the grammar.
Grammars can be classified according to the class of parsing algorithms that can recognize the language.
There are O(n3) algorithms that work for any context free grammar
For useful subsets of CFG, there are parsers that are linear.
LL (left-to-right, leftmost derivation) top-down (often hand written)
LR (left-to-right, rightmost derivation) bottom-up (usually generated)
33
Parsing Complexity

n is the length of the input program.

33

So far,
Lexical analysis
can specify tokens with RE
recognize tokens with DFA implemented by a scanner
scanners also keep track of position of tokens in the source text and handle white space, keywords, and comments.
34

Phrase structure
use EBNF notation to specify context-free grammar
in contrast to REs, can be recursive
allows nested constructs to be specified
parsers recognize phrases
multiple CFGs can specify the same language; we try to choose one that is most appropriate for our purposes
clear to human reader
can be parsed by a desirable parsing algorithm
parse trees illustrate the structure of a phrase
an ambiguous grammar admits multiple parse trees
Options for dealing with ambiguity
Impose a disambiguating rule
Eliminate ambiguity by using a different grammar that generates the same language but is not ambiguous.
If possible, change the language.

35

bottom-up
we will only briefly cover this
top-down (recursive descent)
we will focus on this and use in our project
characteristics of grammars that allow top-down parsing
LL grammars
FIRST, FOLLOW, and PREDICT sets
transformations on grammars
36
Approaches to parsing

Scan, looking for leftmost substring that matches r.h.s (right hand side) of some production, replace with l.h.s. (left hand side)
Repeat.
If the start symbol is obtained, the sentence is in grammar.

Usually, bottom up parsers are generated by a tool.
37
Bottom-up parsing

S ::= AB
A ::= x | y
B := z | w

String to parse: xw

38
Example: Bottom up parsing

38
Find a prefix of xw on the right hand side of a production

S ::= AB
A ::= x | y
B := z | w

String to parse: xw

xw

Aw

39
Example: Bottom up parsing (2)

S ::= AB
A ::= x | y
B := z | w

String to parse: xw

xw

Aw

AB

40
Example: Bottom up parsing (3)

S ::= AB
A ::= x | y
B := z | w

String to parse: xw

xw

Aw

AB

S This is the start symbol, so the string is in the language
41
Example: Bottom up parsing (4)

S ::= aABe
A ::= Abc | b
B ::= d

String to recognize: abbcde
42
Example 2: Bottom-up parsing (1)

S ::= aABe
A ::= Abc | b
B ::= d

String to recognize: abbcde
Scan, looking for substring that matches the right side of some production.
b and d qualify.
Choose leftmost one, b, and replace it by A
abbcde

aAbcde
43
Example 2: Bottom-up parsing (2)

S ::= aABe
A ::= Abc | b
B ::= d

String to recognize: abbcde
abbcde

aAbcde
Now substrings Abc, b, and d match a r.h.s, Abc is leftmost, so replace Abc by A
aAbcde

aAde

44
Example 2: Bottom-up parsing (3)

S ::= aABe
A ::= Abc | b
B ::= d

String to recognize: abbcde
abbcde

aAbcde

aAde
replace d with B
aAde

aABe
45
Example 2: Bottom-up parsing (4)

S ::= aABe
A ::= Abc | b
B ::= d

String to recognize: abbcde
abbcde

aAbcde

aAde

aABe

S

46
Example 2: Bottom-up parsing (5)

Start with start symbol in language
At each step, replace a nonterminal symbol with one of its productions, trying to match prefixes in the sentence
Remove matching prefixes
If the resulting string is empty, then the sentence is in the grammar
47
Top down parsing

S ::= AB
A ::= x | y
B := z | w

String to parse xw

48
Example: Top-down parsing (1)

S ::= AB
A ::= x | y
B := z | w

String to parse xw
S xw

49
Example: Top-down parsing (2)

S ::= AB
A ::= x | y
B := z | w

String to parse xw
S xw

AB xw (no match, just replace S with its r.h.s)

50
Example: Top-down parsing (3)

S ::= AB
A ::= x | y
B := z | w

String to parse xw
S xw

AB xw (replace A with one of its alternatives)

xB xw

51
Example: Top-down parsing (4)

51
Here, we look at the string we are trying to match to determine which alternative to choose

S ::= AB
A ::= x | y
B := z | w

String to parse xw
S xw

AB xw

xB xw (x is a common prefix)

B w (remove x from both sides)

52
Example: Top-down parsing (5)

S ::= AB
A ::= x | y
B := z | w

String to parse xw
S xw

AB xw

xB xw

B w (replace B with one of its alternatives)

w w

53
Example: Top-down parsing (6)

S ::= AB
A ::= x | y
B := z | w

String to parse xw
S xw

AB xw

xB xw

B w

w w (delete common prefix)

ε ε
OK String is legal

54
Example: Top-down parsing (7)

Recall grammars generate, parsers recognize
O(n3) parsing algorithms for any context-free grammar exist, but we want a linear algorithm

Goal: identify special classes of context-free grammars that can be parsed with linear algorithms
LL grammars (top down parsing alg.)
LR grammars (bottom up parsing alg.)

Now we will explore LL(1) grammars in more detail
Look at some motivating examples to see problems
State rules for grammars that rule out the problematic situations

55
Grammars and Parsing

55
For our project, we want one symbol lookahead

S ::= A|B
A ::= xA | y
B ::= xB | z

String to parse: xxz

56
Top-down parsing

S ::= A|B
A ::= xA | y
B ::= xB | z

String to parse: xxz

57
Top-down parsing
S xxz

S ::= A|B
A ::= xA | y
B ::= xB | z

String to parse: xxz

58
Top-down parsing
S xxz

A xxz

S ::= A|B
A ::= xA | y
B ::= xB | z

String to parse: xxz

59
Top-down parsing
S xxz

A xxz

xA xxz

S ::= A|B
A ::= xA | y
B ::= xB | z

String to parse: xxz

60
Top-down parsing
S xxz

A xxz

xA xxz
→ consume x
A xz

S ::= A|B
A ::= xA | y
B ::= xB | z

String to parse: xxz

Parsing failed,
but string is in language!!
61
Top-down parsing
S xxz
→ (choose A)
A xxz

xA xxz
→ (consume x)
A xz

xA xz
→ (consume x)
A z

S ::= A|B
A ::= xA | y
B ::= xB | z

String to parse: xxz

62
Top-down parsing
S xxz S xxz
→ (choose A) → (choose B)
A xxz B xxz

xA xxz xB xxz
→ → (consume x)
A xz B xz
→ →
xA xz xB xz
fail → (consume x)
B z

Success!!! z z

62

We need to predict which production to choose, preferably by looking only at one symbol
In this example, we must look at the last symbol, thus required lookahead not even bounded.
S ::= A|B
A ::= xA | y 0 more x’s followed by a y or a z
B ::= xB | z
Need to formulate restrictions on grammars that will allow top down parsing with bounded lookahead.
An LL(k) grammar can be parsed by a top-down parser with max k-token lookahead.
An LL(*) grammar can be parsed by a top-down parser with unbounded lookahead.

63
Lookahead and prediction

63
In the past, bounding the lookahead was important. Now, this is not so important since enough memory is available to read the whole input string into memory (and an IDE will do this anyway).
For example, Antlr generates parsers for LL(*) algorithms.

It is conventional in general discussions of grammars to use
lower case letters near the beginning of the alphabet for terminals
lower case letters near the end of the alphabet for strings of terminals
upper case letters near the beginning of the alphabet for non-terminals
upper case letters near the end of the alphabet for arbitrary symbols
Greek letters for arbitrary strings of symbols

64
Notational conventions

FIRST() is the set of tokens (or ) that appear as the first symbol in some string generated from 

FIRST() ≡ {c :  * c }

recall  and  are arbitrary strings of symbols while c is a terminal

65
FIRST sets

65
recall that  and  are arbitrary strings, while a is a terminal.

S ::= AB
A ::= x | y
B ::= z | w
Sentences in language: xz, xw, yz, yw

FIRST(x) = ?????
66
FIRST sets: Example 1

S ::= AB
A ::= x | y
B ::= z | w
Sentences in language: xz, xw, yz, yw

FIRST(x) = {x}
FIRST(y) = ?????
67
FIRST sets: Example 1(2)

S ::= AB
A ::= x | y
B ::= z | w
Sentences in language: xz, xw, yz, yw

FIRST(x) = {x}
FIRST(y) = {y}
FIRST(w) = ?????
68
FIRST sets: Example 1 (3)

S ::= AB
A ::= x | y
B ::= z | w
Sentences in language: xz, xw, yz, yw

FIRST(x) = {x}
FIRST(y) = {y}
FIRST(w) = {w}
FIRST(z} = ?????
69
FIRST sets: Example 1(4)

S ::= AB
A ::= x | y
B ::= z | w
Sentences in language: xz, xw, yz, yw

FIRST(x) = {x}
FIRST(y) = {y}
FIRST(w) = {w}
FIRST(z) = {z}
FIRST(A) = ????
70
FIRST sets: Example 1(5)

S ::= AB
A ::= x | y
B ::= z | w
Sentences in language: xz, xw, yz, yw

FIRST(x) = {x}
FIRST(y) = {y}
FIRST(w) = {w}
FIRST(z} = {z}
FIRST(A) =
FIRST(B) = ????
71
FIRST sets: Example 1(6)

S ::= AB
A ::= x | y
B ::= z | w
Sentences in language: xz, xw, yz, yw

FIRST(x) = {x}
FIRST(y) = {y}
FIRST(w) = {w}
FIRST(z} = {z}
FIRST(A) = FIRST(x) U FIRST(y) = {x,y}
FIRST(B) = FIRST(w) U FIRST(z) = {w,z}
FIRST(S) = ?????
72
FIRST sets: Example 1(7)

S ::= AB
A ::= x | y
B ::= z | w
Sentences in language: xz, xw, yz, yw

FIRST(x) = {x}
FIRST(y) = {y}
FIRST(w) = {w}
FIRST(z) = {z}
FIRST(A) = FIRST(x) U FIRST(y) = {x,y}
FIRST(B) = FIRST(w) U FIRST(z) = {w,z}
FIRST(S) = FIRST(AB) = FIRST(A) = {x,y}
73
FIRST sets: Example 1(8)

S ::= A | B
A ::= xA | y
B ::= xB | z
Sentences in language: y, xy, xxy, …, z, xz, xxz,..

FIRST(xA) = ?????
74
FIRST sets: Example 2 (we saw this earlier)

S ::= A | B
A ::= xA | y
B ::= xB | z
Sentences in language: y, xy, xxy, …, z, xz, xxz,..

FIRST(xA) = FIRST(x) = {x}
FIRST(y) = ?????
75
FIRST sets: Example 2 (2)

S ::= A | B
A ::= xA | y
B ::= xB | z
Sentences in language: y, xy, xxy, …z, xz, xxz,..

FIRST(xA) = FIRST(x) = {x}
FIRST(y) = {y}
FIRST(xB) = ?????
76
FIRST sets: Example 2 (3)

S ::= A | B
A ::= xA | y
B ::= xB | z
Sentences in language: y, xy, xxy, …z, xz, xxz,..

FIRST(xA) = FIRST(x) = {x}
FIRST(y) = {y}
FIRST(xB) = FIRST(x) = {x}
FIRST(z) = ?????
77
FIRST set: Example 2 (4)

S ::= A | B
A ::= xA | y
B ::= xB | z
Sentences in language: y, xy, xxy, …z, xz, xxz,..

FIRST(xA) = FIRST(x) = {x}
FIRST(y) = {y}
FIRST(xB) = FIRST(x) = {x}
FIRST(z) = {z}
FIRST(A) = ?????
78
FIRST sets: Example 2 (5)

S ::= A | B
A ::= xA | y
B ::= xB | z
Sentences in language: y, xy, xxy, …z, xz, xxz,..

FIRST(xA) = FIRST(x) = {x}
FIRST(y) = {y}
FIRST(xB) = FIRST(x) = {x}
FIRST(z) = {z}
FIRST(A) = FIRST(xA) U FIRST(y) = {x,y}
FIRST(B) = ?????
79
FIRST sets: Example 2 (6)

S ::= A | B
A ::= xA | y
B ::= xB | z
Sentences in language: y, xy, xxy, …z, xz, xxz,..

FIRST(xA) = FIRST(x) = {x}
FIRST(y) = {y}
FIRST(xB) = FIRST(x) = {x}
FIRST(z) = {z}
FIRST(A) = FIRST(xA) U FIRST(y) = {x,y}
FIRST(B) = FIRST(xB) U FIRST(z) = {x,z}
FIRST(S) = ?????
80
FIRST sets: Example 2 (7)

S ::= A | B
A ::= xA | y
B ::= xB | z
Sentences in language: y, xy, xxy, …, z, xz, xxz,..

FIRST(xA) = FIRST(x) = {x}
FIRST(y) = {y}
FIRST(xB) = FIRST(x) = {x}
FIRST(z) = {z}
FIRST(A) = FIRST(xA) U FIRST(y) = {x,y}
FIRST(B) = FIRST(xB) U FIRST(z) = {x,z}
FIRST(S) = FIRST(A) U FIRST(B) = {x,y,z}
81
FIRST sets: Example 2 (8)

S ::= Ay
A ::= x | ε
Sentences in language: xy, y

FIRST(x) = { x }

FIRST(A) = { x }

FIRST(S) = { x, y }

Check answer by looking at legal sentences
82
FIRST set: Example 3

Recall that we started this discussion of FIRST sets by giving an example where we couldn’t predict which production to choose just by looking at the first character.
To predict the production to use, we need to satisfy:
The FIRST set of all productions with the same left hand sides are disjoint.
83
First requirement for LL(1)

84
non-LL(1) example, revisited
Example 2

S ::= A
S ::= B
A ::= xA
A ::= y
B ::= xB
B ::= z
(re-written in pure BNF)
Which productions have same left hand side?
This was the example where we had to look to at the last character to know whether to choose A or B

84
See slide 61 for the example

85
non-LL(1) example, revisited
Recall example:

S ::= A
S ::= B
A ::= xA
A ::= y
B ::= xB
B ::= z
FIRST(A) = {x,y}
FIRST(B) = {x,z}

Grammar does not satisfy rule

86
Another non-LL(1) example

S ::= Ax
A ::= xA | ε

Sentences in the language x, xx, xxx, …

S x

Ax x

xAx x

Ax ε
FAILURE

87
Another non-LL(1) example

S ::= Ax
A ::= xA | ε

Sentences in the language x, xx, xxx, …

S x

Ax x

xAx x

Ax ε
FAILURE

Condition on FIRST sets not quite enough

88
Another non-LL(1) example

S ::= Ax
A ::= xA | ε

Sentences in the language x, xx, xxx, …

S x

Ax x

xAx x

Ax ε
FAIL

S x

Ax x

εx x

ε ε
SUCCESS

FOLLOW(A) is the set of all terminal symbols that can immediately follow a subsequence derived from A in a sequence derived from the start symbol,S.

FOLLOW(A) ≡ {c : S +  A c }

89
Definition of FOLLOW set

S ::= AB
A ::= x|y
B ::= z|w

FOLLOW(A) = ????

90
Example
strings in language: {xz,xw, yz, yw}

S ::= AB
A ::= x|y
B ::= z|w

S

AB
→ or →
Az Aw
Thus FOLLOW(A) = FIRST(B) = {w,z}
91
Example
strings in language: {xz,xw, yz, yw}

S ::= AB
A ::= x|y
B ::= z|w

FOLLOW(B) = ????

92
Example
strings in language:
{xz, xw, yz, yw}

S ::= AB
A ::= x|y
B ::= z|w

S

AB

FOLLOW(B) = {}

93
Example
strings in language:
{xz, xw, yz, yw}

S ::= AB
A ::= x|y
B ::= z|w

FOLLOW(x) = ????

94
Example
strings in language:
{xz, xw, yz, yw}

S ::= AB
A ::= x|y
B ::= z|w

Find x on r.h.s of production. It is rightmost symbol in production. Look at FOLLOW of left side.
Thus
FOLLOW(x) = FOLLOW(A) = {z,w}
95
Example
strings in language:
{xz, xw, yz, yw}

S ::= AB strings in language {xz,xw, yz, yw}
A ::= x|y
B ::= z|w

FOLLOW(A) = FIRST(B) = {z,w}
FOLLOW(B) = {}
FOLLOW(S) = FOLLOW(B) = {}
FOLLOW(x) = FOLLOW(A)
FOLLOW(y) = FOLLOW(A)
FOLLOW(z) = FOLLOW(B)
FOLLOW(w) = FOLLOW(B)
96
Example

S ::= Aw|B
A ::= xA | y
B ::= xB | z

FOLLOW(A) = ?????
97
Another Example

S ::= Aw|B
A ::= xA | y
B ::= xB | z

FOLLOW(A) = FIRST(w) = {w}

98
Another Example

S ::= Aw|B
A ::= xA | y
B ::= xB | z

FOLLOW(A) = FIRST(w) = {w}

99
Another Example

S ::= Aw|B
A ::= xA | y
B ::= xB | z

FOLLOW(x) = ????

100
Another Example

S ::= Aw|B
A ::= xA | y
B ::= xB | z

Find all occurrences in right hand sides of productions. x is followed by A in one and B in another, thus

FOLLOW(x) = FIRST(A) U FIRST(B) = {x,y,z}
101
Another Example

S ::= Aw|B
A ::= xA | y
B ::= xB | z

FOLLOW(A) = FIRST(w) = {w}
FOLLOW(B) = {}
FOLLOW(S) = {}
FOLLOW(x) = FIRST(A) U FIRST(B) = {x,y,z}
FOLLOW(y) = FOLLOW(A) = {w}
FOLLOW(z) = FOLLOW(B) = {}
102
summary

S ::= Ay Strings = {xy, y}
A ::= x | ε

FOLLOW(A) = {y}
FOLLOW(x) = FOLLOW(A) = {y}
FOLLOW(y) = {}
FOLLOW(S) = {}

103
Still another example

104
Recall this example that gave us trouble parsing

S ::= Ax
A ::= xA | ε

Sentences in the language x, xx, xxx, …

S x

Ax x

xAx x

Ax ε
FAIL

S x

Ax x

εx x

ε ε
SUCCESS

S ::= Ax
A ::= x | ε

FIRST(A) = { x}
FOLLOW(S) = { ε }
FOLLOW(A) = FIRST(x) = { x }
FOLLOW(x) = FOLLOW(A) = { x }

Additional condition:
for all non-terminals A:
if A →* ε, then FIRST(A) ∩ FOLLOW(A) = { }
105

105
See slide

Combine ideas to get the predict set of a production

Define

EPS(α) = (α →* ε) (this is either true for false)

PREDICT(A ::= α) = FIRST(α) U
(if EPS(α) then FOLLOW(A) else {})

The predict set tells us which production to choose.
If we see non-terminal a, and a in PREDICT(A ::= α), then choose this production
For the choice to be unambiguous, the predict sets with same left side should be unique.
106
Predict sets

The predict sets of all productions with the same left side are disjoint.

Necessary and sufficient for a grammar to be LL(1)
LL(1) means
can be parsed left-to-right (first L)
leftmost derivation (second L)
one symbol look ahead (the 1)
107
LL(1) rule

We use predict sets for
determining whether a grammar is LL(1)
constructing the parser

See Scott for an algorithm to mechanically compute predict sets.
Parser generators need to do this

108

Remarks
LL(k) grammars can be parsed with k symbols lookahead
LL grammars are parsed with top-down parsers

LR grammars are parsed with bottom-up parsers.

There are other classifications, if you are interested, see the Scott CS supplement

109

109
yacc is an example of a parser generator that takes LR grammars and generates bottom-up parsers written in c. javacc takes LL(k) grammars and generates top-down (recursive descent) parsers written in Java. antlr takes LL(*) grammars and generates parsers written in Java, C, C++, Python,…

e
ÎÞÎ

/docProps/thumbnail.jpeg