FIT2014 Theory of Computation Lecture 15 Parsing
Monash University
Faculty of Information Technology
FIT2014 Theory of Computation
Lecture 15
Parsing
slides by
COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969
Warning
This material has been reproduced and communicated to you by or on behalf of Monash University
in accordance with s113P of the Copyright Act 1968 (the Act).
The material in this communication may be subject to copyright under the Act.
Any further reproduction or communication of this material by you may be the subject of copyright protection under the Act.
Do not remove this notice.
1 / 18
Overview
I Concepts and definitions
I Examples
I Shift-reduce parser
I Lex & Yacc
2 / 18
Parsing
Suppose you have a Context Free Grammar, and a string of letters.
Parsing: determining whether the string
I is a word in the language,
and if it is,
I finding a parse tree, or a derivation, for it.
Parser: a program that does this.
I Two main types:
I Top-down parsers
I Bottom-up parsers
I reduce the string to the Start symbol
I repeatedly apply production rules in reverse
3 / 18
a∗ba∗b
S −→ BB
B −→ aB
B −→ b
Input string: aabab
S
B B
a B a B
a B b
b
4 / 18
Plus-Times-A
1. S −→ E
2. E −→ E + E
3. E −→ E ∗ E
4. E −→ i
This grammar
is ambiguous.
Input string: i + i ∗ i
S
E
E ∗ E
E + E i
i i
S
E
E + E
E E ∗ E
i i
Two parse trees
5 / 18
Plus-Times-B
S −→ E
E −→ T + E
E −→ T
T −→ F ∗ T
T −→ F
F −→ i
Input string: i + i ∗ i
S
E
T + E
F
i
T
F * T
i F
i
6 / 18
LR Parser
I bottom-up parser
I scans input Left to Right
I constructs a Rightmost derivation in reverse
I implemented using a Deterministic Pushdown Automaton (DPDA)
I Not all CFGs have such a parser: DCFL 6= CFL.
I We’ll look at one type of LR parser:
shift-reduce parser
7 / 18
Shift-reduce Parser
This is a particular type of LR Parser.
It has:
I a stack: terminals and non-terminals processed so far.
I Initially: empty.
I a buffer: the rest of the input string (yet to be processed).
I Initially: contains the entire input string.
Repeatedly . . .
I Shift input letters onto the stack, OR
I When a string of top-most stack symbols equal the right-hand side of a
production rule:
I Reduce that string, i.e., use production rule in reverse
. . . until Stack only has Start symbol, and buffer is empty.
8 / 18
a∗ba∗b
1. S → BB
2. B → aB
3. B → b
Input: abb
abb shift
a bb shift
ab b reduce, (3)
aB b reduce, (2)
B b shift
Bb reduce, (3)
BB reduce, (1)
S ACCEPT
9 / 18
Plus-Times-A
1. S −→ E
2. E −→ E + E
3. E −→ E ∗ E
4. E −→ i
Input: i+i*i
i+i*i shift
i +i*i reduce, (4)
E +i*i shift
E+ i*i shift
E+i *i reduce, (4)
E+E *i
To shift,
or to reduce?
10 / 18
Plus-Times-A
1. S −→ E
2. E −→ E + E
3. E −→ E ∗ E
4. E −→ i
Input: i+i*i
i+i*i shift
i +i*i reduce, (4)
E +i*i shift
E+ i*i shift
E+i *i reduce, (4)
E+E *i shift
E+E* i shift
E+E*i reduce, (4)
E+E*E reduce, (3)
E+E reduce, (2)
E reduce, (1)
S ACCEPT
11 / 18
Plus-Times-A
1. S −→ E
2. E −→ E + E
3. E −→ E ∗ E
4. E −→ i
Input: i+i*i
i+i*i shift
i +i*i reduce, (4)
E +i*i shift
E+ i*i shift
E+i *i reduce, (4)
E+E *i
To shift,
or to reduce?
12 / 18
Plus-Times-A
1. S −→ E
2. E −→ E + E
3. E −→ E ∗ E
4. E −→ i
Input: i+i*i
i+i*i shift
i +i*i reduce, (4)
E +i*i shift
E+ i*i shift
E+i *i reduce, (4)
E+E *i reduce, (2)
E *i shift
E* i shift
E*i reduce, (4)
E*E reduce, (3)
E reduce, (1)
S ACCEPT
shift-reduce conflict
Also:
reduce-reduce conflict: letters on top of stack correspond to > one production rule.
13 / 18
Unix/Linux tools
Yacc
I Yet Another Compiler-Compiler
I a parser-generator
I Input: a Context-Free Grammar . . .
I Output: parser, in file y.tab.c
I Typically used with a lexical analyser, e.g., Lex.
Lexical Analyser
I Input: regular expression for each token . . .
I Output: lexical analyser, in file lex.yy.c
Both are widely available in Unix/Linux
14 / 18
Lex: lexical analysis
filename.l
definitions . . .
%%
regexps + code, for each token . . .
%%
C code . . .
lex
lex.yy.c
. . . . . . yylex() . . . . . .
15 / 18
Yacc: parser generation
filename.y
declarations (incl. token names) . . .
%%
grammar: production rules . . .
%%
C code . . .
yacc
y.tab.c
. . . . . . yyparse() . . . . . .
calls yylex() to get tokens
16 / 18
Lex & Yacc
I Compile y.tab.c and lex.yy.c
using, say, cc.
I yields an executable parser.
I It can evaluate as it parses.
I See Assignment 2.
Conflict resolution in Yacc:
I Shift-reduce: shift
I Reduce-reduce: use the rule listed first.
17 / 18
Revision
I Construct a parse tree for given string and grammar.
I Understand how a Shift-reduce Parser works.
I Start using Lex and Yacc.
18 / 18