COMP90045 Programming Language Implementation
LL(1) Grammars and Languages
Harald Søndergaard Lecture 7
Semester 1, 2019
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 1 / 34
Types of Parsing Algorithms
Top-down parsing algorithms build parse trees from the top down (starting at the root of the parse tree, the start symbol), while bottom-up parsing algorithms build them from the bottom up (starting at the leaves of the parse tree, the tokens).
Some parsing algorithms build leftmost derivations; others build rightmost derivations.
Some parsing algorithm are fast, but work only on some grammars.
Others work on all grammars, but they tend to have high (cubic time) complexity, measured in the number of tokens of the input.
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 2 / 34
Common Parsing Methods
Some top-down parsing algorithms:
LL parsing, for LL grammars.
Recursive descent, for LL grammars.
Recursive descent with backtracking, for arbitrary grammars.
Some bottom-up parsing algorithms:
Shift-reduce parsing, for LR grammars. Operator-precedence parsing, for simple operator grammars.
CYK parsing (named for its designers, Cocke, Younger and Kasami), for so-called Chomsky Normal Form grammars, a form that an arbitrary grammar can be brought into.
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 3 / 34
Top-Down Parsing
Top-down parsing algorithms build a leftmost derivation
starting from the start symbol, and
using the input string to choose between productions.
Consider parsing id + (id ) starting with E . We must choose between starting the derivation with E ⇒ E + T or with E ⇒ T.
We can either
guess which one and backtrack if mistaken, or
insist that the grammar make it possible to choose the right production given the input string.
E→E+T E→T T→T∗F T→F F→(E) F → id
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 4 / 34
LL Grammars
In LL(k) grammars, the next k terminal symbols in the input uniquely determine which production we need to apply; no production other than the chosen one can yield a derivation for the input.
Deterministic top-down parsers (those that don’t backtrack) require the grammar to be LL(k) for some k. Usually k = 1.
An LL(1) grammar should avoid
left recursive productions (can’t tell how many times to apply it) productions that start the same way (can’t tell which to apply)
Note that mutual left recursion between two or more nonterminals (e.g., A → Bα and B → Aβ) is as bad as direct left recursion.
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 5 / 34
First Trick: Simple Left-Recursion Removal
Consider the grammar (assume each αi ̸= ǫ):
A → Aα1 |···|Aαn |β1 |···|βm
The first n productions of A are left-recursive.
We can eliminate the left-recursion by transforming the grammar to
A → β1A′|···|βmA′
A′ → α1A′ |···|αnA′ |ǫ
For example, the productions E → E + T | T become E → TE′
E′ →+TE′|ǫ
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 6 / 34
General Left-Recursion Removal
The technique we gave for removal of left-recursion only applies to individual non-terminals; it does not repair indirect left-recursion, as foundinthepairofrulesA → BαandB → Aβ.
A grammar is left-recursive if it has some nonterminal A for which A ⇒+ Aα.
For every grammar G there is an equivalent non-left-recursive grammar G′, and there is a (slightly tricky) algorithm to derive G′ from G.
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 7 / 34
Second Trick: Left Factoring
Given
(where α is the longest common prefix) form
A → αA′|γ1|···|γm A′ →β1|···|βn
For example, the grammar fragment on the left becomes the one on the right (it is still ambiguous).
A → αβ1 |···|αβn |γ1 |···|γm
S →iEtS|iEtSeS|o
S →iEtSS′|o S′ → ǫ|eS
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 8 / 34
Left Factoring
Left factoring does not necessarily remove “overlapping” rules, because overlaps can be hidden in the grammar.
Consider this (non-left-recursive) grammar:
S →aS|bT|Ta T→b|aS
The rule ‘S → T a’ overlaps with the other S rules, since T can generate strings starting with ‘a’, and also strings starting with ‘b’.
Also note that left-factoring can introduce ǫ rules, which make a top-down parser’s task more difficult.
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 9 / 34
nullable
The construction of an LL(1) parser requires knowing, given a string α, whether it is nullable, that is, whether α ⇒∗ ǫ.
Clearly, α ⇒∗ ǫ iff each symbol in α can derive ǫ, and a terminal can never derive ǫ.
We capture whether a nonterminal NT can derive ǫ by nullable(NT): for each nonterminal NT do
nullable(NT) ← FALSE
while some production A → α exists such that ¬nullable(A)
and all symbols in α (if any) are nullable nonterminals do
nullable(A) ← TRUE
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 10 / 34
FIRST and FOLLOW Sets for a Grammar
The construction of an LL(1) parser requires knowing the grammar’s FIRST and FOLLOW sets.
Roughly, FIRST sets tell the parser when a production with a nonempty right-hand side is applicable, and FOLLOW sets do the same for productions with empty right-hand sides.
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 11 / 34
FIRST Sets
If α is any string of grammar symbols, then FIRST(α) is the set of terminals that begin the strings you can derive from α.
If α = xβ and x is a terminal, then FIRST(α) = {x}.
If α = xβ and x is a nonterminal and ǫ cannot be derived from x,
then FIRST(α) = FIRST(x).
If α = xβ and x is a nonterminal and x is nullable (that is, x ⇒∗ ǫ),
then FIRST(α) = FIRST(x) ∪ FIRST(β).
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 12 / 34
Rules for FIRST Sets
Given a production X → Y1Y2 …Yk,
FIRST(Y1) is a subset ofFIRST(X).
If Y1 ⇒∗ ǫ, then FIRST(Y2 ) is a subset of FIRST(X ).
If Y1 Y2 ⇒∗ ǫ, then FIRST(Y3 ) is a subset of FIRST(X ).
…
If Y1Y2 …Yk−1 ⇒∗ ǫ, thenFIRST(Yk) is a subset ofFIRST(X).
We compute the FIRST sets for all nonterminals by initializing them to the empty set and then adding elements to them as needed to fix violations of these rules, until no more additions are needed.
The order in which the additions are performed doesn’t affect the final result. (Such iterations are sometimes called chaotic iterations).
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 13 / 34
FIRST Sets: An Example
production
nonterminal
nullable
FIRST
E → TE′
E
FALSE
id,(
E′ →+TE′
E′
TRUE
+
E′ →ǫ
T → FT′
T
FALSE
id,(
T′ →∗FT′
T′
TRUE
∗
T′ →ǫ
F → id
F
FALSE
id,(
F → (E)
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 14 / 34
FOLLOW Sets
For any nonterminal A, FOLLOW(A) is the set of terminals that can appear immediately to the right of A in some sentential form, that is, the set of terminals t such that there is a derivation of the form
S ⇒∗ α A t β .
If A can be the rightmost symbol in some sentential form, then $ is also in FOLLOW(A).
($ represents the end of the input; it can be considered an end-of-file token.)
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 15 / 34
Rules for FOLLOW Sets
If S is the start symbol, then $ is in FOLLOW(S). Given a production A → αBβ,
FIRST(β) is a subset of FOLLOW(B).
If β ⇒∗ ǫ, then FOLLOW(A) is a subset of FOLLOW(B).
We compute the FOLLOW sets for all nonterminals by initializing them to the empty set and then adding elements to them as needed to fix violations of these rules, until no more additions are necessary.
The order in which the additions are performed doesn’t affect the final result.
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 16 / 34
FOLLOW Sets: An Example
production
nonterminal
nullable
FIRST
FOLLOW
E → TE′
E
FALSE
id,(
$,)
E′ →+TE′
E′
TRUE
+
$,)
E′ →ǫ
T → FT′
T
FALSE
id,(
$,+,)
T′ →∗FT′
T′
TRUE
∗
$,+,)
T′ →ǫ
F → id
F
FALSE
id,(
$,+,∗,)
F → (E)
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 17 / 34
QUIZ: nullable, FIRST and FOLLOW
Define the nullable function and FIRST and FOLLOW sets for the grammar
S→BAa A → ǫ|BcA B→ǫ|b
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 18 / 34
QUIZ: nullable, FIRST and FOLLOW
Define the nullable function and FIRST and FOLLOW sets for the grammar
S→BAa A → ǫ|BcA B→ǫ|b
nullable FIRST FOLLOW
S FALSE A TRUE B TRUE
PLI (Sem 1, 2019)
LL(1) Grammars and Languages ⃝c University of Melbourne 18 / 34
QUIZ: nullable, FIRST and FOLLOW
Define the nullable function and FIRST and FOLLOW sets for the grammar
S→BAa A → ǫ|BcA B→ǫ|b
nullable FIRST FOLLOW
S FALSE
A TRUE
B TRUE b
PLI (Sem 1, 2019)
LL(1) Grammars and Languages ⃝c University of Melbourne 19 / 34
QUIZ: nullable, FIRST and FOLLOW
Define the nullable function and FIRST and FOLLOW sets for the grammar
S→BAa A → ǫ|BcA B→ǫ|b
nullable FIRST FOLLOW
S FALSE
A TRUE b, c B TRUE b
PLI (Sem 1, 2019)
LL(1) Grammars and Languages ⃝c University of Melbourne 20 / 34
QUIZ: nullable, FIRST and FOLLOW
Define the nullable function and FIRST and FOLLOW sets for the grammar
S→BAa A → ǫ|BcA B→ǫ|b
nullable FIRST FOLLOW
S FALSE a,b,c A TRUE b, c B TRUE b
PLI (Sem 1, 2019)
LL(1) Grammars and Languages ⃝c University of Melbourne 21 / 34
QUIZ: nullable, FIRST and FOLLOW
Define the nullable function and FIRST and FOLLOW sets for the grammar
S→BAa A → ǫ|BcA B→ǫ|b
nullable FIRST FOLLOW S FALSE a,b,c $
A TRUE b, c
B TRUE b
PLI (Sem 1, 2019)
LL(1) Grammars and Languages
⃝c University of Melbourne
22 / 34
QUIZ: nullable, FIRST and FOLLOW
Define the nullable function and FIRST and FOLLOW sets for the grammar
S→BAa A → ǫ|BcA B→ǫ|b
nullable FIRST FOLLOW S FALSE a,b,c $
A TRUE b,c a
B TRUE b
PLI (Sem 1, 2019)
LL(1) Grammars and Languages ⃝c University of Melbourne 23 / 34
QUIZ: nullable, FIRST and FOLLOW
Define the nullable function and FIRST and FOLLOW sets for the grammar
S→BAa A → ǫ|BcA B→ǫ|b
nullable FIRST FOLLOW S FALSE a,b,c $
A TRUE b,c a
B TRUE b a,b,c
PLI (Sem 1, 2019)
LL(1) Grammars and Languages ⃝c University of Melbourne 24 / 34
LOOK Sets for a Grammar
We define a lookahead set for each production in the grammar. If ǫ cannot be derived from α, then
LOOK(A → α) = FIRST(α).
If α is nullable, that is, α ⇒∗ ǫ, then
LOOK(A → α) = FIRST(α) ∪ FOLLOW(A).
If we expect to see something of syntactic category A in the input, and the next input symbol is in LOOK(A → α), then we can apply A → α as the first step in the derivation.
A grammar is LL(1) iff for each nonterminal, the LOOK sets of the productions for that nonterminal are disjoint.
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 25 / 34
LOOK Sets: An Example
Production
nullable
FIRST
FOLLOW
LOOK
E → TE′
FALSE
id,(
$,)
id,(
E′ →+TE′
TRUE
+
$,)
+
E′ →ǫ
$,)
T →FT′
FALSE
id,(
$,+,)
id,(
T′ →∗FT′
TRUE
∗
$,+,)
∗
T′ →ǫ
$,+,)
F → id
FALSE
id,(
$,+,∗,)
id
F → (E)
(
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 26 / 34
QUIZ: LOOK
Define LOOK sets for this grammar. Is it LL(1)?
S→BAa A → ǫ|BcA B→ǫ|b
FIRST FOLLOW LOOK S →B Aa a,b,c $
A→ǫ a A→BcAb,c a B→ǫ a,b,c B→b b a,b,c
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 27 / 34
QUIZ: LOOK
Define LOOK sets for this grammar. Is it LL(1)?
S→BAa A → ǫ|BcA B→ǫ|b
FIRST FOLLOW LOOK
S →B Aa a,b,c $ A→ǫ a A→BcAb,c a B→ǫ a,b,c B→b b a,b,c
a,b,c
PLI (Sem 1, 2019) LL(1) Grammars and Languages
⃝c University of Melbourne
28 / 34
QUIZ: LOOK
Define LOOK sets for this grammar. Is it LL(1)?
S→BAa A → ǫ|BcA B→ǫ|b
FIRST FOLLOW LOOK S →B Aa a,b,c $ a,b,c
A→ǫaa A→BcAb,c a
B→ǫ a,b,c B→b b a,b,c
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 29 / 34
QUIZ: LOOK
Define LOOK sets for this grammar. Is it LL(1)?
S→BAa A → ǫ|BcA B→ǫ|b
FIRST FOLLOW LOOK S →B Aa a,b,c $ a,b,c
A→ǫaa A→BcA b,c a b,c B→ǫ a,b,c
B→b b a,b,c
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 30 / 34
QUIZ: LOOK
Define LOOK sets for this grammar. Is it LL(1)?
S→BAa A → ǫ|BcA B→ǫ|b
FIRST FOLLOW LOOK S →B Aa a,b,c $ a,b,c
A→ǫaa A→BcA b,c a b,c B→ǫ a,b,c a,b,c B→b b a,b,c
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 31 / 34
QUIZ: LOOK
Define LOOK sets for this grammar. Is it LL(1)?
S→BAa A → ǫ|BcA B→ǫ|b
FIRST FOLLOW LOOK S →B Aa a,b,c $ a,b,c
A→ǫaa A→BcA b,c a b,c B→ǫ a,b,c a,b,c B →b b a,b,c b
PLI (Sem 1, 2019)
LL(1) Grammars and Languages ⃝c University of Melbourne 32 / 34
QUIZ: LOOK
Define LOOK sets for this grammar. Is it LL(1)? NO!
S→BAa A → ǫ|BcA B→ǫ|b
FIRST FOLLOW LOOK S → B A a a, b, c $ a, b, c
A→ǫaa A→BcA b,c a b,c B→ǫ a,b,c a,b,c B →b b a,b,c b
PLI (Sem 1, 2019)
LL(1) Grammars and Languages ⃝c University of Melbourne 33 / 34
Coming Up …
Tomorrow is about writing parsers in Haskell.
The project is up and running, so come and learn about recursive descent parsing and the use of parser combinators.
Next week we discuss a more systematic, table-driven approach to implementing LL(1) parsers.
If time, we will also start exploring the wonders of a bottom-up technique known as shift-reduce parsing. That kind of technique is used by parsers generated by many parser sophisticated generators, including yacc, bison and happy.
PLI (Sem 1, 2019) LL(1) Grammars and Languages ⃝c University of Melbourne 34 / 34