Parsing
Aditya Thakur
ECS 140A Programming Languages – Winter 2019
Top-down parsing
• Constructs parse tree for input string starting from the root • Finds the leftmost derivation for an input string
Context-free grammar for arithmetic expressions
! → # !$
!$ → +# !$ | ‘ # →(#$
#$ → ∗ ( #$ | ‘ ( → ! |*+
Can we derive the string *+ + *+ ∗ *+?
! → # !$
!$ → +# !$ | ‘ # →(#$
#$ → ∗ ( #$ | ‘ ( → ! |*+
!→#!$
$ $ !→+#!|’
$
$$
# → ∗ ( # | ‘
( → ! |*+
# →(#
!→#!$
$ $ !→+#!|’
$
$$
# → ∗ ( # | ‘
( → ! |*+
# →(#
Top-down parsing
• Constructs parse tree for input string starting from the root
• Finds the leftmost derivation for an input string
• Two types:
• Recursive descent parsing • Predictive parsing
Recursive-descent parsing
• Consists of a procedure for each nonterminal
• Execution begins with the procedure for the start symbol
Nondeterministic, might require backtracking
Typical procedure for nonterminal !
Predictive parser
• Top-down parser that can correctly choose !-production by looking at at the next ” symbols in the input
• No backtracking during parsing
LL(1) parser
• Predictive parser that only looks at the next input symbol • Derives the leftmost derivation
LL(1) parser
• Constructs parsing table using FIRST and FOLLOW sets of the
grammar
FIRST sets
• !”#$% & is the set of terminals that begin strings derived from the string of grammar symbols &
FIRST sets
• !”#$% & is the set of terminals that begin strings derived from the string of grammar symbols &
• If ‘ is a terminal, !”#$%(‘) = {‘}
• If ‘ is a nonterminal and ‘ → ./.0 … .2 is a production, then everything in !”#$% ./ is in !”#$% ‘ .
If ./ derives 4, then add !”#$% .0 to !”#$%(‘), and so on.
If 4 is in all !”#$% .5 then add 4 to !”#$%(‘) • If ‘ → 4 is a production, add 4 to !”#$% ‘ .
FIRST sets
• !”#$% & is the set of terminals that begin strings derived from the string of grammar symbols &
‘ → % ‘)
‘) → +% ‘) | , % →!%)
%) → ∗ ! %) | , ! → ‘ |./
FIRST sets
• !”#$% & is the set of terminals that begin strings derived from the string of grammar symbols &
‘ → % ‘)
‘) → +% ‘) | , % →!%)
%) → ∗ ! %) | , ! → ‘ |./
First
E
E’
T
T’
F
FIRST sets
• !”#$% & is the set of terminals that begin strings derived from the string of grammar symbols &
‘ → % ‘)
‘) → +% ‘) | , % →!%)
%) → ∗ ! %) | , ! → ‘ |./
First
E
E’
T
T’
F
( id
FIRST sets
• !”#$% & is the set of terminals that begin strings derived from the string of grammar symbols &
‘ → % ‘)
‘) → +% ‘) | , % →!%)
%) → ∗ ! %) | , ! → ‘ |./
First
E
E’
T
( id
T’
F
( id
FIRST sets
• !”#$% & is the set of terminals that begin strings derived from the string of grammar symbols &
‘ → % ‘)
‘) → +% ‘) | , % →!%)
%) → ∗ ! %) | , ! → ‘ |./
First
E
( id
E’
T
( id
T’
F
( id
FIRST sets
• !”#$% & is the set of terminals that begin strings derived from the string of grammar symbols &
‘ → % ‘)
‘) → +% ‘) | , % →!%)
%) → ∗ ! %) | , ! → ‘ |./
First
E
( id
E’
+,
T
( id
T’
F
( id
FIRST sets
• !”#$% & is the set of terminals that begin strings derived from the string of grammar symbols &
‘ → % ‘)
‘) → +% ‘) | , % →!%)
%) → ∗ ! %) | , ! → ‘ |./
First
E
( id
E’
+,
T
( id
T’
*,
F
( id
FOLLOW
• !”##”$(&) is the set of terminals that can appear immediately to the right of a nonterminal & in a derivation
FOLLOW
• !”##”$(&) is the set of terminals that can appear immediately to the right of a nonterminal & in a derivation
• $ is in !”##”$(() where ( is the start symbol and $ is an endmarker • If there is a production & → *+,,
then !-.(/(,)\1 is in !”##”$(+)
• If there is a production & → *+ or & → *+,, where !-.(/(,) contains 1, then everything in !”##”$(&) is in !”##”$(+).
FOLLOW
• !”##”$(&) is the set of terminals that can appear immediately to the right of a nonterminal & in a derivation
( → * (+
(+ → +* (+ | . * →!*+
*+ → ∗ ! *+ | . ! → ( |01
FOLLOW
E
)$
E’
)$
T
+)$
T’
+)$
F
+*)$
Parsing table
• Parsing table !: # ×% → ‘ where # is the set of nonterminals, % is the set of terminals, ‘ is the set of production rules
For each production ( → ): •Foreachterminal*in+,’-% ) ,MA,a = (→) •If34+,’-%())and74+8998:((),!(,7 =(→ )
! → # !$
! $ → + # ! $ | ‘