Parsing
Aditya Thakur
ECS 140A Programming Languages – Winter 2019
Top-down parsing
• Constructs parse tree for input string starting from the root • Finds the leftmost derivation for an input string
Context-free grammar for arithmetic expressions
! → # !$
!$ → +# !$ | ‘ # →(#$
#$ → ∗ ( #$ | ‘ ( → ! |*+
Can we derive the string *+ + *+ ∗ *+?
! → # !$
!$ → +# !$ | ‘ # →(#$
#$ → ∗ ( #$ | ‘ ( → ! |*+
!→#!$
$ $ !→+#!|’
$
$$
# → ∗ ( # | ‘
( → ! |*+
# →(#
!→#!$
$ $ !→+#!|’
$
$$
# → ∗ ( # | ‘
( → ! |*+
# →(#
Top-down parsing
• Constructs parse tree for input string starting from the root
• Finds the leftmost derivation for an input string
• Two types:
• Recursive descent parsing • Predictive parsing
Recursive-descent parsing
• Consists of a procedure for each nonterminal
• Execution begins with the procedure for the start symbol
Nondeterministic, might require backtracking
Typical procedure for nonterminal !
Predictive parser
• Top-down parser that can correctly choose !-production by looking at at the next ” symbols in the input
• No backtracking during parsing
LL(1) parser
• Predictive parser that only looks at the next input symbol • Derives the leftmost derivation
LL(1) parser
• Constructs parsing table using FIRST and FOLLOW sets of the
grammar
FIRST sets
• !”#$% & is the set of terminals that begin strings derived from the string of grammar symbols &
FIRST sets
• !”#$% & is the set of terminals that begin strings derived from the string of grammar symbols &
• If ‘ is a terminal, !”#$%(‘) = {‘}
• If ‘ is a nonterminal and ‘ → ./.0 … .2 is a production, then everything in !”#$% ./ is in !”#$% ‘ .
If ./ derives 4, then add !”#$% .0 to !”#$%(‘), and so on.
If 4 is in all !”#$% .5 then add 4 to !”#$%(‘) • If ‘ → 4 is a production, add 4 to !”#$% ‘ .
FIRST sets
• !”#$% & is the set of terminals that begin strings derived from the string of grammar symbols &
‘ → % ‘)
‘) → +% ‘) | , % →!%)
%) → ∗ ! %) | , ! → ‘ |./
FIRST sets
• !”#$% & is the set of terminals that begin strings derived from the string of grammar symbols &
‘ → % ‘)
‘) → +% ‘) | , % →!%)
%) → ∗ ! %) | , ! → ‘ |./
First
E
E’
T
T’
F
FIRST sets
• !”#$% & is the set of terminals that begin strings derived from the string of grammar symbols &
‘ → % ‘)
‘) → +% ‘) | , % →!%)
%) → ∗ ! %) | , ! → ‘ |./
First
E
E’
T
T’
F
( id
FIRST sets
• !”#$% & is the set of terminals that begin strings derived from the string of grammar symbols &
‘ → % ‘)
‘) → +% ‘) | , % →!%)
%) → ∗ ! %) | , ! → ‘ |./
First
E
E’
T
( id
T’
F
( id
FIRST sets
• !”#$% & is the set of terminals that begin strings derived from the string of grammar symbols &
‘ → % ‘)
‘) → +% ‘) | , % →!%)
%) → ∗ ! %) | , ! → ‘ |./
First
E
( id
E’
T
( id
T’
F
( id
FIRST sets
• !”#$% & is the set of terminals that begin strings derived from the string of grammar symbols &
‘ → % ‘)
‘) → +% ‘) | , % →!%)
%) → ∗ ! %) | , ! → ‘ |./
First
E
( id
E’
+,
T
( id
T’
F
( id
FIRST sets
• !”#$% & is the set of terminals that begin strings derived from the string of grammar symbols &
‘ → % ‘)
‘) → +% ‘) | , % →!%)
%) → ∗ ! %) | , ! → ‘ |./
First
E
( id
E’
+,
T
( id
T’
*,
F
( id
FOLLOW
• !”##”$(&) is the set of terminals that can appear immediately to the right of a nonterminal & in a derivation
FOLLOW
• !”##”$(&) is the set of terminals that can appear immediately to the right of a nonterminal & in a derivation
• $ is in !”##”$(() where ( is the start symbol and $ is an endmarker • If there is a production & → *+,,
then !-.(/(,)\1 is in !”##”$(+)
• If there is a production & → *+ or & → *+,, where !-.(/(,) contains 1, then everything in !”##”$(&) is in !”##”$(+).
FOLLOW
• !”##”$(&) is the set of terminals that can appear immediately to the right of a nonterminal & in a derivation
( → * (+
(+ → +* (+ | . * →!*+
*+ → ∗ ! *+ | . ! → ( |01
FOLLOW
E
)$
E’
)$
T
+)$
T’
+)$
F
+*)$
Parsing table
• Parsing table !: # ×% → ‘ where # is the set of nonterminals, % is the set of terminals, ‘ is the set of production rules
For each production ( → ): •Foreachterminal*in+,’-% ) ,MA,a = (→) •If34+,’-%())and74+8998:((),!(,7 =(→ )
! → # !$
! $ → + # ! $ | ‘
# →(#$
$ $
First
E
( id
E’
+’
T
( id
T’
*’
F
(id
FOLLOW
E
)$
E’
)$
T
+)$
T’
+)$
F
+*)$
#→∗(#|’
( → ! |*+
Table-driven predictive parser
Table-driven predictive
p
a
r
r
s
e
Deriving the string !” + !” ∗ !”
Parsing table
Moves made by the parser
Deriving the string !” + !” ∗ !”
Parsing table
Moves made by the parser
Deriving the string !” + !” ∗ !”
Parsing table
Moves made by the parser
Deriving the string !” + !” ∗ !”
Parsing table
Moves made by the parser
Deriving the string !” + !” ∗ !”
Parsing table
Moves made by the parser
Deriving the string !” + !” ∗ !”
Parsing table
Moves made by the parser
Deriving the string !” + !” ∗ !”
Parsing table
Moves made by the parser
Deriving the string !” + !” ∗ !”
Parsing table
Moves made by the parser
Deriving the string !” + !” ∗ !”
Parsing table
Moves made by the parser
Deriving the string !” + !” ∗ !”
Parsing table
Moves made by the parser
Deriving the string !” + !” ∗ !”
Parsing table
Moves made by the parser
Deriving the string !” + !” ∗ !”
Parsing table
Moves made by the parser
Deriving the string !” + !” ∗ !”
Parsing table
Moves made by the parser
Deriving the string !” + !” ∗ !”
Parsing table
Moves made by the parser
Deriving the string !” + !” ∗ !”
Parsing table
Moves made by the parser
Deriving the string !” + !” ∗ !”
Parsing table
Moves made by the parser
Deriving the string !” + !” ∗ !”
Parsing table
Moves made by the parser
Not all grammars are LL(1)
Grammar
Parsing table
Multiple rules possible in the same entry in the parsing table, because of ambiguity.
All LL(1) grammars are unambiguous.
Optional reading
Aho, A. V., Lam, M. S., Sethi, R., & Ullman, J. D. Compilers: principles, Techniques and tools.
Q: What type of parser is used in the Go compiler?