程序代写代做代考 AI C COMP90045 Programming Language Implementation

COMP90045 Programming Language Implementation
Shift-Reduce Parsing
Harald Søndergaard Lecture 10
Semester 1, 2019
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 1 / 36

Bottom Up Parsers
Top down parsers try to predict, on the basis of the next token or next few tokens, which production to use to match a nonterminal.
This is not always possible, and even when possible, it often requires modifying the grammar (left recursion elimination, left factoring). It is often hard to attach actions to the modified grammar in a natural fashion.
Bottom up parsers “keep an open mind”. Instead of making a prediction and following it, they try to keep track of all possible ways in which the list of tokens seen so far can be extended into a valid string in the language of the grammar.
In effect, bottom up parsers try to construct derivations in reverse. Unlike LL(k) parsers, they construct rightmost derivations.
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 2 / 36

Reductions: Derivations in Reverse
S→aTUe T→Tbc|b U→d
The sentence abbcde can be reduced to S: abbcde
aTbcde aTde aTUe
S
The marked symbol sequences are the handles.
S
⇒ aTUe
⇒ aTde
⇒ aTbcde
⇒ abbcde
The corresponding rightmost derivation.
PLI (Sem 1, 2019)
Shift-Reduce Parsing ⃝c University of Melbourne 3 / 36

Bottom-Up Parsing
E → E + E | E ∗ E | id | const
Sentential form
Reduction
id + id ∗ const E +id ∗const E + E ∗ const† E +E ∗E† E+E
E
E → id
E → id
E → const E→E∗E E→E+E
† These sentential forms have two handles (grammar is ambiguous).
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 4 / 36

The Stack in LR Parsers
While a top-down parser has a stack containing the symbols that the parser expects to see next, a bottom-up parser has a stack containing a representation of the input seen so far. The representation may contain both terminals and nonterminals.
Start configuration: The parser starts with an empty stack and the input you want to parse, the string ω:
ω$ Stack Input
Goal: It aims to achieve an emptied input (containing just the end-of-input indicator), with a stack containing just the start symbol.
$ Stack Input
S
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 5 / 36

LR Parsing: Shift and Reduce
The stack contains viable prefixes. If at any point in time, the stack contains α, then S ⇒∗ αβ for some string of terminals β.
If the input remaining at that time is γ, there is of course no guarantee that S ⇒∗ αγ; the input may not be a sentence.
Bottom up parsers have two main actions.
A shift action moves the first token in the remaining input to the top of the stack.
A reduce action is applicable when the top n symbols on the stack match the right hand side of a production; it replaces those symbols with the nonterminal on the left hand side of that production.
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 6 / 36

LR Parsing Example
Stack
Input
Action
(id ∗ id)$
shift
(
id ∗ id)$
shift
(id
∗id )$
reduce E → id
(E
∗id )$
shift
(E∗
id )$
shift
(E ∗ id
)$
reduce E → id
(E ∗ E
)$
reduce E → E ∗ E
(E
)$
shift
(E)
$
reduce E → (E)
E
$
accept
E→E+E |E∗E
| (E) | id
E ⇒ (E) ⇒ (E ∗ E) ⇒ (E ∗ id) ⇒ (id ∗ id)
PLI (Sem 1, 2019)
Shift-Reduce Parsing ⃝c University of Melbourne 7 / 36

LR Parsing
The trick in LR parsing is knowing, at each step, whether to shift the current input symbol or to perform a reduction, and if the stack allows more than one reduction, which one to perform.
Given the following grammar fragment, each production can be used in a reduction if the stack has b then a as its topmost elements.
T → ab U→b V→ǫ
To guide the selection of the next action, LR parsers use a state machine constructed from the grammar, and they interleave state numbers with grammar symbols on the stack.
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 8 / 36

The LR Parser
Stack
Input
$
sm
Xm
sm−1
LR parsing program
Output
Xm−1
s0
Action table + Goto table
The action table says, for each state/input symbol combination, which action to take. The goto table is auxiliary.
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 9 / 36

LR Parser Actions
The configuration in which the stack contains s0X1s1 …Xmsm and the rest of the input contains ai ai +1 . . . an $ represents the right sentential form X1 …Xmaiai+1 …an.
action[sm,ai] = shift s: push ai and s on stack. (Transition to the configuration with stack s0X1s1 …Xmsmais and input ai+1 …an$.)
action[sm , ai ] = reduce A → α: pop r symbols and r states from stack and push A and s on the stack, where r is the length of α and s = goto[sm−r , A]. (Transition to the configuration with stack s0X1s1 …Xm−rsm−rAs and input aiai+1 …an$.)
action[sm , ai ] = accept: action[sm , ai ] = error:
finish parsing (success). start error recovery.
PLI (Sem 1, 2019)
Shift-Reduce Parsing ⃝c University of Melbourne 10 / 36

LR Parsing Example
A grammar for Prolog programs P.
Uses clauses C and bodies B.
Action : Goto : agec$PCB
s3 s3
s8 r4
r3 s8
r2 acc r2
s5 s6
r1
r4 r3
s9
r6 s10
r5
12 42
7
11
(1) P → C P
(2) P → ǫ
(3) C → agBe (4) C → ae
(5) B → acB (6) B → a
0 1 2 3 4 5 6 7 8 9
10 11
PLI (Sem 1, 2019)
Shift-Reduce Parsing
⃝c University of Melbourne
11 / 36

QUIZ: LR parsing
Use the parse table on the previous slide to parse the token stream a g a c a e.
This sequence of tokens could have come from a clause such as
p :- q, r.
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 12 / 36

QUIZ: LR parsing
STACK
INPUT
ACTION
0
agacae$
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 13 / 36

QUIZ: LR parsing
STACK
INPUT
ACTION
0
agacae$
shift 3
0a3
gacae$
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 14 / 36

QUIZ: LR parsing
STACK
INPUT
ACTION
0
agacae$
shift 3
0a3
gacae$
shift 5
0a3g 5
acae$
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 15 / 36

QUIZ: LR parsing
STACK
INPUT
ACTION
0
agacae$
shift 3
0a3
gacae$
shift 5
0a3g 5
acae$
shift 8
0a3g 5a8
cae$
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 16 / 36

QUIZ: LR parsing
STACK
INPUT
ACTION
0
agacae$
shift 3
0a3
gacae$
shift 5
0a3g 5
acae$
shift 8
0a3g 5a8
cae$
shift 10
0a3g 5a8c 10
ae$
shift 8
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 17 / 36

QUIZ: LR parsing
STACK
INPUT
ACTION
0
agacae$
shift 3
0a3
gacae$
shift 5
0a3g 5
acae$
shift 8
0a3g 5a8
cae$
shift 10
0a3g 5a8c 10
ae$
shift 8
0a3g 5a8c 10a8
e$
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 18 / 36

QUIZ: LR parsing
STACK
INPUT
ACTION
0
agacae$
shift 3
0a3
gacae$
shift 5
0a3g 5
acae$
shift 8
0a3g 5a8
cae$
shift 10
0a3g 5a8c 10
ae$
shift 8
0a3g 5a8c 10a8
e$
reduce B → a
0a3g 5a8c 10B 11
e$
reduce B → a c B
0a3g5B7
e$
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 20 / 36

QUIZ: LR parsing
STACK
INPUT
ACTION
0
agacae$
shift 3
0a3
gacae$
shift 5
0a3g 5
acae$
shift 8
0a3g 5a8
cae$
shift 10
0a3g 5a8c 10
ae$
shift 8
0a3g 5a8c 10a8
e$
reduce B → a
0a3g 5a8c 10B 11
e$
reduce B → a c B
0a3g5B7
e$
shift 9
0a3g5B7e9
$
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 21 / 36

QUIZ: LR parsing
STACK
INPUT
ACTION
0
agacae$
shift 3
0a3
gacae$
shift 5
0a3g 5
acae$
shift 8
0a3g 5a8
cae$
shift 10
0a3g 5a8c 10
ae$
shift 8
0a3g 5a8c 10a8
e$
reduce B → a
0a3g 5a8c 10B 11
e$
reduce B → a c B
0a3g5B7
e$
shift 9
0a3g5B7e9
$
reduce C → a g B e
0C2
$
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 22 / 36

QUIZ: LR parsing
STACK
INPUT
ACTION
0
agacae$
shift 3
0a3
gacae$
shift 5
0a3g 5
acae$
shift 8
0a3g 5a8
cae$
shift 10
0a3g 5a8c 10
ae$
shift 8
0a3g 5a8c 10a8
e$
reduce B → a
0a3g 5a8c 10B 11
e$
reduce B → a c B
0a3g5B7
e$
shift 9
0a3g5B7e9
$
reduce C → a g B e
0C2
$
reduce P → ǫ
0C2P4
$
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 23 / 36

QUIZ: LR parsing
STACK
INPUT
ACTION
0
agacae$
shift 3
0a3
gacae$
shift 5
0a3g 5
acae$
shift 8
0a3g 5a8
cae$
shift 10
0a3g 5a8c 10
ae$
shift 8
0a3g 5a8c 10a8
e$
reduce B → a
0a3g 5a8c 10B 11
e$
reduce B → a c B
0a3g5B7
e$
shift 9
0a3g5B7e9
$
reduce C → a g B e
0C2
$
reduce P → ǫ
0C2P4
$
reduce P → C P
0P1
$
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 24 / 36

QUIZ: LR parsing
STACK
INPUT
ACTION
0
agacae$
shift 3
0a3
gacae$
shift 5
0a3g 5
acae$
shift 8
0a3g 5a8
cae$
shift 10
0a3g 5a8c 10
ae$
shift 8
0a3g 5a8c 10a8
e$
reduce B → a
0a3g 5a8c 10B 11
e$
reduce B → a c B
0a3g5B7
e$
shift 9
0a3g5B7e9
$
reduce C → a g B e
0C2
$
reduce P → ǫ
0C2P4
$
reduce P → C P
0P1
$
accept
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 25 / 36

LR(k) Items
Each state of the state machine used by an LR parser corresponds to a set of so-called LR(k) items.
An LR(0) item has the form A → α•β, where A → α β is a production of the grammar.
This item indicates that so far we have seen a string matching α, and if we next see β, we can reduce αβ to A.
An LR(k) item is an LR(0) item plus k terminal symbols of lookahead. It indicates that so far we have seen a string matching α, and if we next see β followed by the lookahead terminal symbols, we can reduce αβ to A.
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 26 / 36

LR(0) Items Example
From the grammar rule C LR(0) items:
C C C C C
→ a g B e we can construct these five
→ •agBe → a•gBe →ag•Be →agB•e →agBe•
From the grammar rule P → ǫ we can construct only one LR(0) item: P→•
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 27 / 36

Closure of LR(0) Items
This is the workhorse used to build the parser’s state machine.
function closure(I) S←I
while some item of form A → α•Bγ ∈ S is unmarked do mark A → α•Bγ
for each production B → δ do
if B → •δ ̸∈ S then
S ← S ∪ {B → •δ}
return S
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 28 / 36

Closure Example
P→CP|ǫ
C →agBe|ae B→acB|a
IfI={C →ag•Be}then
 C → a g • B e  closure(I)= B →•acB
B→•a 
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 29 / 36

Goto of LR(0) Items
Given a set I of LR(0) items and a grammar symbol X (a terminal or a nonterminal symbol), we define goto(I,X) as the closure of the set {A → αX•β |A → α•Xβ∈I}.
If X doesn’t occur just after the dot in I, this will be the empty set. If I = { P → • , P → • C P } then
 P → C • P  
P→•CP  goto(I,C) =  P → • 
  C→•agBe
 C → • a e 
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 30 / 36

Sets-of-Items Construction
We augment the grammar with a new start symbol S′, with one production: S′ → S, where S is the old start symbol. A reduction using this production signals acceptance of the input.
States ← {closure({S′ → •S})}
while some fs ∈ States is unmarked do
mark fs
for each sym ∈ (ΣT ∪ {$} ∪ΣNT) do
ts ← goto(fs, sym)
if ts is not empty then if ts ̸∈ States then
States ← States ∪ {ts}
add arc labelled sym from state fs to state ts
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 31 / 36

Partial LR(0) State Machine
0 P′ →•P P→•CP
P→•
C → •agBe C→•ae
P
2P→C•P P→•CP
P→• C→•agBe C→•ae
C
a
C
a
3C→a•gBe C→a•e
These are four of the states of the LR(0) state machine for our example grammar.
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 32 / 36
1 P′ → P •

Complete LR(0) State Machine
0 P′ → •P P→•CP
P→•
C → •agBe C→•ae
P C
aa
C
2P→C•P P→•CP
P→•
C → •agBe C→•ae
P
g
e
B
e
a
B
ac
PLI (Sem 1, 2019)
Shift-Reduce Parsing
⃝c University of Melbourne
33 / 36
1 P′ →P•
3C→a•gBe C→a•e
5C→ag•Be B→•acB
B→•a
4P→CP•
6C→ae•
8B→a•cB B→a•
7C→agB•e
9C→agBe•
10 B →ac•B B→•acB
11 B →acB•
B→•a

Actions for Items
Items can be classified into two groups based on whether or not they have some grammar symbols after the dot.
An item such as E → E • + T indicates that the string we have seen so far matches E, and suggests that if we now see a +, we should shift that symbol.
An item such as E → E + T • suggests that we should reduce that production.
Most state machines created with the sets-of-items construction will contain at least one state with two items that suggest different actions. This is such a state:
B→a•cB B→a•
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 34 / 36

Dealing with Conflicts
In a backtracking parser, it would be OK to say “when you are in this state and the next token is this symbol, try this action first, and if that doesn’t work, try this other action”.
A deterministic parser must pick exactly one action to follow in each possible circumstance. Having two actions applicable in the same circumstance would be a conflict.
Different classes of LR parsers use different techniques to try to avoid conflicts.
These techniques differ mostly in how they restrict the set of lookahead tokens in whose presence they perform reductions.
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 35 / 36

Coming Up
In the next lecture we discuss so-called SLR and LR(1) parsers. We’ll see how they handle conflicts.
We also discuss how parser generators deal with specifications of associativity and precedence.
PLI (Sem 1, 2019) Shift-Reduce Parsing ⃝c University of Melbourne 36 / 36

Related Posts