程序代写代做代考 algorithm C COMP90045 Programming Language Implementation

COMP90045 Programming Language Implementation
SLR and LR(1) Parsers
Harald Søndergaard Lecture 11
Semester 1, 2019
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 1 / 27

Let Us Recapitulate
A shift-reduce parser maintains a stack of “viable prefixes”. If the stack contains the sequence of (terminal and nonterminal) symbols α then, for the right sequence β of remaining terminals, S ⇒∗ αβ.
Input
$
Stack
sm
Xm
sm−1
LR parsing program
Output
Xm−1
s0
Action table + Goto table
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 2 / 27

Shift/Reduce Parsing
P→CP|ǫ
C → agBe|ae B→acB|a
A parser for the grammar
could, while processing input a g a c a c a e, find itself with the stack looking as shown on the right.
Say it has consumed all the input tokens apart from e.
Its current state is 11; its most recent action was to reduce
something on the top of its stack to B.
If it now makes a reduction that pops the three top elements
off the stack, it will temporarily be back to state 5.
Its action and goto tables tell it what to do next, given its
current state and the next input token.
11
B
10
c
8
a
5
g
3
a
0
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 3 / 27

Shift/Reduce Parsing
Its main types of actions are to
shift the next input token onto the stack and change state as
dictated by the action table, or
reduce the top n stack symbols to some non-terminal (if some grammar production justifies that) and change state as dictated by the goto table.
Its remaining possible actions are to accept or to report an error. Ideally the parser is deterministic; it should never have a choice of
which action to perform next.
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 4 / 27

Complete LR(0) State Machine
0 P′ → •P P→•CP
P→•
C → •agBe C→•ae
P C
aa
C
1 P′ →P•
2P→C•P P→•CP
P→•
C → •agBe C→•ae
3C→a•gBe C→a•e
P
g
e
a
B
e
ac
B
PLI (Sem 1, 2019)
SLR and LR(1) Parsers
⃝c University of Melbourne
5 / 27
4P→CP•
5C→ag•Be B→•acB
B→•a
6C→ae•
8B→a•cB B→a•
7C→agB•e
9C→agBe•
10 B →ac•B B→•acB
11 B →acB•
B→•a

SLR Parsers
Simple LR parsers (SLR parsers for short, also called SLR(1) parsers) reduce the production A → α in a state containing the item A → α • only if the next token is in FOLLOW(A).
B→a•cB B→a•
For our example grammar, FOLLOW(B) is {e}, so there is no conflict: if the input token is c, shift
if the input token is e, reduce
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 6 / 27

Constructing SLR Tables
We use the sets-of-items construction to compute {I1, . . . In}, the collection of sets of LR(0) items. Then, for each set Ii :
1 If A → α • t β is in Ii, t is a terminal, and goto(Ii,t) = Ij, setaction[i,t] to ‘shift j’.
2 IfA→α•isinIi whereAisnotS′,setaction[i,t]to ‘reduce A → α’ for all t in FOLLOW(A).
3 If S′ → S • is in Ii, setaction[i,$] to ‘accept’.
For all nonterminals A, if goto (Ii , A) = Ij , set goto[i , A] to j .
All remaining entries are set to ‘error’.
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 7 / 27

Conflicts in Tables
If this algorithm doesn’t try to assign two different actions to the same slot in the action table, then the grammar is SLR. If it does, then the grammar is not SLR, and an SLR parser generator will fail to generate a parser.
There are two ways in which the algorithm may assign more than one action to a table entry:
A shift/reduce conflict: one item in a set suggests a shift action, another suggests a reduce action.
A reduce/reduce conflict: different items suggest reductions using different productions.
Note that two shift actions can never conflict with each other.
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 8 / 27

Limitations of SLR
Many standard programming language constructs do not have SLR grammars, or at least not natural ones. The grammar on the right is for a small subset of C.
(0) S′ (1) S (2) S (3) L (4) L (5) R
→ S
→ L=R → R
→ ∗R → id → L
One state contains the following items. It has a conflict when the next token is =, because that token is in FOLLOW(R).
S→L•=R R→L•
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 9 / 27

Limitations of SLR
Assume we are parsing id = id, and the next token is =.
In that state, the shift action is the only correct choice; reducing L to
R gets us nowhere.
The item R → L • came via
S′ ⇒ S ⇒ R ⇒ L
and in this context, each of these can only be “followed” by $.
If we made more of an effort to remember the actual right context, we would know that the = in FOLLOW(R) is irrelevant here.
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 10 / 27

LR(1) Items
We can incorporate the extra right-context information into items. An LR(1) item has the form
[A → α • β,t] where t is a terminal or ‘$’.
An LR(1) item [A → α •,t] calls for the reduction A → α only if the next input symbol is t.
Formally, an LR(1) item [A → α•β, a] is valid for a viable prefix γ if there is a rightmost derivation S ⇒∗ δAω ⇒ δαβω where γ = δα and a is the first symbol of ω$.
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 11 / 27

Sets-of-Items Construction for LR(1)
It is not hard to adapt the sets-of-items construction algorithm, which we originally defined on LR(0) items, to work on LR(1) items. (Notation: if a set of LR(1) items contains both [A → α•β,c] and [A → α•β,d] we usually show this as [A → α•β,cd]).
The starting state is the closure of the item [S′ → •S,$].
The definitions of closure and goto for LR(1) items are almost
identical to their definitions for LR(0) items. goto(I,X) is the closure of the LR(1) items
{[A → αX•β,c] | [A → α•Xβ,c] ∈ I}
The only tricky bit is figuring out the right-context tokens for the
items added by the closure operation.
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 12 / 27

Closure of LR(1) Items
function closure(I) S←I
while some item of form [A → α•Bγ, c] ∈ S is unmarked do mark [A → α•Bγ,c]
for each production B → δ do
for each f ∈ FIRST(γc) do if [B → •δ, f ] ̸∈ S then
S ← S ∪ {[B → •δ, f ]}
return S
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 13 / 27

C Fragment Example Revisited
0 [S′ → •S ,$] [S → •L=R ,$] [S→•R ,$] [L→•∗R ,=] [L → •id ,=] [R → •L ,$] [L→•∗R ,$] [L → •id ,$]
1 [S′ →S•,$]
S
⇐ The ‘=’ goes here owing to the second item: it is in FIRST(= R)
L
2 [S → L•=R ,$] [R → L• ,$]
⇐ No ‘=’ goes here: the last item came from [R → •L, $]
PLI (Sem 1, 2019)
SLR and LR(1) Parsers ⃝c University of Melbourne 14 / 27

Constructing LR(1) Tables
We use the sets-of-items construction to compute {I1, . . . In}, the collection of sets of LR(1) items. Then, for each set Ii :
1 If[A→α•t β,u]isinIi,t isaterminal,andgoto(Ii,t)=Ij, setaction[i,t] to ‘shift j’.
2 If [A → α •,u] is in Ii where A is not S′, setaction[i,u] to ‘reduce A → α’.
3 If[S′ →S •,$]isinIi,setaction[i,$]to‘accept’.
For all nonterminals A, if goto (Ii , A) = Ij , set goto[i , A] to j .
All remaining entries are set to ‘error’.
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 15 / 27

Partial LR(1) Machine for Prolog
P C
1S→P•$
0S→•P$ P→•CP$ P→•$ C→•agBea$ C→•aea$
2P→C•P$ P→•CP$ P→•$ C→•agBea$ C→•aea$
a
a ge
P
3 C →a•gBea$ C→a•ea$
4P→CP•$
5 C →ag•Bea$ B →•acB e B→•ae
6C→ae•a$
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 16 / 27
C

Complete LR(1) Machine for Prolog
0S→•P$ P→•CP$ P→•$ C→•agBea$ C→•aea$
3 C →a•gBea$ C→a•ea$
5 C →ag•Bea$ B →•acB e B→•ae
P C
aa
C
1S→P•$
2P→C•P$ P→•CP$ P→•$ C→•agBea$ C→•aea$
g
a
ac
P
e
4P→CP•$
6C→ae•a$
B
8B→a•cBe B→a•e
10 B →ac•Be B →•acBe B→•ae
11 B →acB•e
7 C →agB•ea$
e
9C→agBe•a$
B
PLI (Sem 1, 2019)
SLR and LR(1) Parsers
⃝c University of Melbourne
17 / 27

LR(1) Parser: From State Machine to Tables
The action and goto tables derived from this state machine are exactly the ones we used in Lecture 10:
agec$PCB 0
1 2 3 4 5 6 7 8 9
10 11
s3 s3
s8 r4
r3 s8
r2 acc r2
s5 s6
r1
r4 r3
s9
r6 s10
r5
12 42
7
11
PLI (Sem 1, 2019)
SLR and LR(1) Parsers
⃝c University of Melbourne
18 / 27

Compacting LR Tables
It is not unusual to have (at least) 300 states with 100 terminals = 30,000 entries. Tables that big were infeasible until the early 80s, and even today, they are bad for locality.
action table rows are often identical, so replace rows by pointers to rows.
They are also sparse, so represent them as association lists: (terminal symbol, action), . . . , (default, action).
The default action can replace an error by a reduction without compromising correctness; the error will be discovered later.
goto table error entries are never consulted, so represent goto tables as an association list for each nonterminal, maybe with a default:
(current state, next state), . . . , (default, next state).
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 19 / 27

Example of Compaction
P (0,1) (any,4) C (any,2)
B (5, 7) (any , 11)
agec$PCB 00 (a,s3) (any,r2) 11 ($, acc ) (any , error ) 22
3 3 (g,s5) (e,s6) (any,error)
4 4 (any,r1)
5 5 (a,s8) (any,error)
6 6 (any,r4)
7 7 (e,s9) (any,error)
8 8 (c,s10) (any,r6)
s3 r2
1 4
2 2
7
11
acc
s3 r2
s5 s6
r1
s8
r4 r4
s9
r6 s10
r3 r3
s8
r5
9 9
10
11 11
(any,r3) (any,r5)
10
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 20 / 27

Left Recursion
While LL parsers cannot handle left recursion, LR parsers actually perform better with left recursion than with right recursion.
A→tA|ǫ B→Bu|ǫ
Given an input containing n t tokens, the parser stack will need to keep all n ts on the stack at the same time before being able to reduce, using production A → ǫ.
Given an input containing n u tokens, the parser can reduce the production B → ǫ at the start; it can then reduce the B → B u production after each shift of a u, using only constant stack size.
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 21 / 27

Using Ambiguous Grammars
Ambiguous grammars (such as the one on the left below) always create conflicts in LR parsers, but they may be shorter and/or more natural than unambiguous grammars for the same language (such as the one on the right).

E→E+E E→E∗E E → (E)
E → id
E→E+T E→T T→T∗F T → F
F → (E) F → id
PLI (Sem 1, 2019)
SLR and LR(1) Parsers ⃝c University of Melbourne 22 / 27

Ambiguity: Precedence and Associativity
The parser for the natural grammar has a state with these items:
E→E+E• E→E•+E E→E•∗E
This state has a shift/reduce conflict on both ∗ and +.
If so far we have seen id + id, then if the next token is ∗, we should shift because ∗ has higher precedence than +, while if the next token is +, we should reduce because + is left-associative.
We can often make such choices to eliminate shift/reduce conflicts. That is why parser generators such as yacc, bison and happy allow the specification of precedence and associativity.
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 23 / 27

Using Precedence and Associativity
%token NUM /* no precedence */
%left PLUS MINUS /* lowest prec, left associative */ %left TIMES /* middle prec, left associative */ %nonassoc UMINUS /* highest prec, not associative */
In a shift/reduce conflict, bison looks at the precedence and associativity of the lookahead token and of the last token in the reduce action’s production (which may be overridden with %prec).
If the lookahead token has higher precedence, bison shifts. If the lookahead token has lower precedence, bison reduces.
If the two tokens have equal precedence, then they must have the same associativity (as they are from the same declaration).
If they are right associative, bison shifts. If they are left associative, bison reduces.
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 24 / 27

Precedence and Associativity: Example
line : expr NEWLINE
expr : expr PLUS expr
| expr MINUS expr
| expr TIMES expr
| MINUS expr %prec UMINUS | LPAR expr RPAR
| NUM
{show$1}
{$1+$3} {$1-$3} {$1*$3} { -$2 }
{ $2 } { $1 }
The %prec UMINUS annotation gives its production the precedence of the unary minus token, which need not (and typically does not) occur anywhere else in the grammar.
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 25 / 27

Resolving Conflicts by Default
If either the right-context token or the reduce action’s production doesn’t have a specified precedence, precedence and associativity can’t resolve a shift/reduce conflict.
In such cases, bison’s default is to shift. This is what the programmer wants in most cases, for example, to resolve the dangling-else problem:
stmt → if expr then stmt
stmt → if expr then stmt else stmt
When the conflict is between two reduce actions, bison’s default is to choose to reduce whichever production occurs earlier.
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 26 / 27

Handling Conflicts
Bison’s (and happy’s) default strategy for resolving conflicts is sometimes right, sometimes wrong. Whenever it resolves a conflict by falling back on its defaults, bison prints an error message to encourage programmers to find out which category that case falls into.
In the next lecture we will look at happy and its support for debugging parser specifications. One mechanism allows you to trace the parser’s actions, while the other dumps the parser’s state machine. Both can be used to check what the parser does in the state with the conflict.
We will also sketch the idea behind LALR parsers and discuss error recovery.
PLI (Sem 1, 2019) SLR and LR(1) Parsers ⃝c University of Melbourne 27 / 27