Recursive-Descent Parsing
22 March 2019 OSU CSE 1
BL Compiler Structure
Copyright By PowCoder代写 加微信 powcoder
string of characters (source code)
string of tokens (“words”)
abstract program
integers (object code)
22 March 2019
Note that the parser starts with a string of tokens.
Code Generator
Plan for the BL Parser
• Design a context-free grammar (CFG) to specify syntactically valid BL programs
• Use the grammar to implement a recursive-descent parser (i.e., an
algorithm to parse a BL program and construct the corresponding Program
22 March 2019 OSU CSE 3
• A CFG can be used to generate strings in its language
– “Given the CFG, construct a string that is in the language”
• A CFG can also be used to recognize strings in its language
– “Given a string, decide whether it is in the language”
– And, if it is, construct a derivation tree (or AST)
22 March 2019 OSU CSE 4
Parsing generally refers to this last
• A CFG can be used to generate strings in step, i.e., going from a string (in the
its language
language) to its derivation tree or—
for a programming language— – “Given the CFG, construct a string that is in
perhaps to an AST for the program. the language”
• A CFG can also be used to recognize strings in its language
– “Given a string, decide whether it is in the language”
– And, if it is, construct a derivation tree (or AST)
22 March 2019 OSU CSE 5
A Recursive-Descent Parser
• One parse method per non-terminal symbol
• Anon-terminalsymbolontheright-handsideof a rewrite rule leads to a call to the parse method for that non-terminal
• Aterminalsymbolontheright-handsideofa rewrite rule leads to “consuming” that token from the input token string
• |intheCFGleadsto“if-else”intheparser
22 March 2019 OSU CSE 6
Example: Arithmetic Expressions
expr term factor add-op mult-op
digit-seq digit
→ expr add-op term | term
→ term mult-op factor | factor → ( expr ) | digit-seq →+|-
→ * | DIV | REM
→ digit digit-seq | digit →0|1|2|3|4|5|6|7|8|9
22 March 2019
term factor add-op mult-op digit-seq digit
→ expr add-op term | term
→ term mult-op factor | factor → ( expr ) | digit-seq →+|-
→ * | DIV | REM
22 March 2019
→ digit digit-seq | digit
parser for this →0|1|2|3|4|5|6|7|8|9
Do you see a
problem with a
recursive descent
CFG? (Hint!)
expr term factor add-op mult-op
digit-seq digit
A Solution
→ term { add-op term } → factor { mult-op factor } → ( expr ) | digit-seq →+|-
→ * | DIV | REM
→ digit digit-seq | digit →0|1|2|3|4|5|6|7|8|9
22 March 2019
expr term factor add-op mult-op
digit-seq digit
A Solution
→ term { add-op term }
→ factor { mult-op factor }
→ ( expr ) | digit-seq
22 March 2019
OSU CSE 10
The special CFG symbols { and }
→ * | DIV | REM
mean that the enclosed sequence of
→ digit digit-seq | digit
symbols occurs zero or more times;
→0|1|2|3|4|5|6|7|8|9 this helps change a left-recursive
CFG into an equivalent CFG that can be parsed by recursive descent.
expr term factor add-op mult-op number nz-digit
22 March 2019
OSU CSE 11
A Solution
The special CFG symbols { and } also
simplify a non-terminal for a number
→ factor { mult-op factor } → ( expr ) | number →+|-
→ * | DIV | REM
→ 0 | nz-digit { 0 | nz-digit } →1|2|3|4|5|6|7|8|9
A Recursive-Descent Parser
• One parse method per non-terminal symbol
• Anon-terminalsymbolontheright-handsideof a rewrite rule leads to a call to the parse method for that non-terminal
• Aterminalsymbolontheright-handsideofa rewrite rule leads to “consuming” that token from the input token string
• |intheCFGleadsto“if-else”intheparser
• {…}intheCFGleadsto“while”intheparser
22 March 2019 OSU CSE 12
expr term factor add-op mult-op number nz-digit
→ term { add-op term }
then things get simpler for the
22 March 2019
More Improvements
If we treat every number as a token,
→ factor { mult-op factor }
parser: now there are only 5 non-
worry about.
→ ( expr →+|- →*|DIV|REM
→1|2|3|4|5|6|7|8|9
OSU CSE 13
→0| nz-digit {0| nz-digit }
expr term factor add-op mult-op number nz-digit
→ term { add-op term }
as a token, then it’s even simpler:
22 March 2019
OSU CSE 14
More Improvements
If we treat every add-op and mult-op
→ factor { mult-op factor }
there are only 3 non-terminals.
→ ( expr ) | number
→ * | DIV | REM
→ 0 | nz-digit { 0 | nz-digit } →1|2|3|4|5|6|7|8|9
Can you write the tokenizer for this language, so Improvements
every number, add-op, and mult-op is a token?
expr term factor add-op mult-op number nz-digit
→ term { add-op term }
→ factor { mult-op factor }
→ ( expr ) | number
→ * | DIV | REM
→ 0 | nz-digit { 0 | nz-digit } →1|2|3|4|5|6|7|8|9
22 March 2019
OSU CSE 15
Evaluating Arithmetic Expressions
• For this problem, parsing an arithmetic
expression means evaluating it
• The parser goes from a string of tokens in
the language of the CFG on the previous
slide, to the value of that expression as an
22 March 2019 OSU CSE 16
Structure of Solution
“4 + 29 DIV 3”
<"4", "+", "29",
"DIV", "3", "EOI">
value of arithmetic expression
string of characters (arithmetic expression)
string of tokens
22 March 2019
“4 + 29 DIV 3”
Structure of Solution
We will use a Queue
<"4", "+", "29",
"DIV", "3", "EOI">
value of arithmetic expression
string of characters (arithmetic expression)
string of tokens
22 March 2019
“4 + 29 DIV 3”
Structure of Solution
We will also assume the tokenizer adds an “end-of-input” token at the end.
<"4", "+", "29",
"DIV", "3", "EOI">
value of arithmetic expression
string of characters (arithmetic expression)
string of tokens
22 March 2019
Parsing an expr
• We want to parse an expr, which must start with a term and must be followed by zero or more (pairs of) add-ops and terms:
expr → term { add-op term }
• An expr has an int value, which is what
we want returned by the method to parse an expr
22 March 2019 OSU CSE 20
Contract for Parser for expr
* Evaluates an expression and returns its value. * …
* @updates ts
* @requires
* [an expr string is a proper prefix of ts]
* @ensures
* valueOfExpr = [value of longest expr string at
* start of #ts] and
* #ts = [longest expr string at start of #ts] * ts */
private static int valueOfExpr(Queue
Parsing a term
• We want to parse a term, which must start with a factor and must be followed by zero or more (pairs of) mult-ops and factors:
term → factor { mult-op factor }
• A term has an int value, which is what
we want returned by the method to parse a term
22 March 2019 OSU CSE 22
Contract for Parser for term
* Evaluates a term and returns its value.
* @updates ts
* @requires
* [a term string is a proper prefix of ts]
* @ensures
* valueOfTerm = [value of longest term string at * start of #ts] and
* #ts = [longest term string at start of #ts] * ts */
private static int valueOfTerm(Queue
Parsing a factor
• We want to parse a factor, which must
start with the token “(” followed by an expr followed by the token “)”; or it must
be a number token:
factor → ( expr ) | number
• A factor has an int value, which is what we want returned by the method to parse
22 March 2019 OSU CSE 24
Contract for Parser for factor
* Evaluates a factor and returns its value.
* @updates ts
* @requires
* [a factor string is a proper prefix of ts]
* @ensures
* valueOfFactor = [value of longest factor string at * start of #ts] and
* #ts = [longest factor string at start of #ts] * ts */
private static int valueOfFactor(Queue
22 March 2019 OSU CSE 25
Code for Parser for expr
private static int valueOfExpr(Queue
while (ts.front().equals(“+”) ||
ts.front().equals(“-“)) { String op = ts.dequeue();
int nextTerm = valueOfTerm(ts); if (op.equals(“+”)) {
value = value + nextTerm; } else /* “-” */ {
} } value = value – nextTerm;
return value; }
22 March 2019 OSU CSE 26
expr → term { add-op term } Code for Parser for expr
add-op → +|-
private static int valueOfExpr(Queue
while (ts.front().equals(“+”) ||
ts.front().equals(“-“)) { String op = ts.dequeue();
int nextTerm = valueOfTerm(ts); if (op.equals(“+”)) {
value = value + nextTerm; } else /* “-” */ {
} } value = value – nextTerm;
return value; }
22 March 2019 OSU CSE 27
Code for Parser for expr
private static int valueOfExpr(Queue
int value = valueOfTerm(ts); while (ts.front().equals(“+”) || ts.front().equals(“-“)) {
String op = ts.dequeue();
int nextTerm = valueOfTerm(ts); if (op.equals(“+”)) {
value = value + nextTerm; } else /* “-” */ {
} } value = value – nextTerm;
return value; }
22 March 2019 OSU CSE 28
This method is very similar to valueOfExpr.
Code for Parser for expr
private static int valueOfExpr(Queue
int value = valueOfTerm(ts); while (ts.front().equals(“+”) || ts.front().equals(“-“)) {
Look ahead
one token in ts to see
what’s next.
String op = ts.dequeue();
int nextTerm = valueOfTerm(ts); if (op.equals(“+”)) {
value = value + nextTerm; } else /* “-” */ {
} } value = value – nextTerm;
return value; }
22 March 2019 OSU CSE
Code for Parser for expr
private static int valueOfExpr(Queue
int value = valueOfTerm(ts); while (ts.front().equals(“+”) || ts.front().equals(“-“)) {
“Consume” the next token from ts.
} } value = value – nextTerm;
return value; }
22 March 2019 OSU CSE
String op = ts.dequeue();
int nextTerm = valueOfTerm(ts); if (op.equals(“+”)) {
value = value + nextTerm; } else /* “-” */ {
Code for Parser for expr
private static int valueOfExpr(Queue
int value = valueOfTerm(ts); while (ts.front().equals(“+”) || ts.front().equals(“-“)) {
String op = ts.dequeue();
int nextTerm = valueOfTerm(ts); if (op.equals(“+”)) {
value = value + nextTerm; } else /* “-” */ {
} } value = value – nextTerm;
return value; }
22 March 2019 OSU CSE
Evaluate next term.
Code for Parser for expr
private static int valueOfExpr(Queue
int value = valueOfTerm(ts); while (ts.front().equals(“+”) || ts.front().equals(“-“)) {
String op = ts.dequeue();
int nextTerm = valueOfTerm(ts); if (op.equals(“+”)) {
value = value + nextTerm; } else /* “-” */ {
} } value = value – nextTerm;
return value; }
22 March 2019 OSU CSE
Evaluate (some of) the expression.
Code for Parser for term private static int valueOfTerm(Queue
22 March 2019 OSU CSE
Can you write the body, using valueOfExpr as
Code for Parser for factor
private static int valueOfFactor( Queue
int value;
if (ts.front().equals(“(“)) {
ts.dequeue();
value = valueOfExpr(ts);
ts.dequeue();
String number = ts.dequeue();
} value = Integer.parseInt(number);
return value; }
22 March 2019 OSU CSE 34
factor → ( expr ) | number Code for Parser for factor
private static int valueOfFactor( Queue
int value;
if (ts.front().equals(“(“)) {
ts.dequeue();
value = valueOfExpr(ts);
ts.dequeue();
String number = ts.dequeue();
} value = Integer.parseInt(number);
return value; }
22 March 2019 OSU CSE 35
Code for Parser for factor
private static int valueOfFactor( Queue
int value;
if (ts.front().equals(“(“)) {
ts.dequeue();
value = valueOfExpr(ts);
ts.dequeue();
Look ahead
one token in ts to see
what’s next.
} value = Integer.parseInt(number);
return value; }
String number = ts.dequeue();
22 March 2019 OSU CSE 36
Code for Parser for factor
private static int valueOfFactor( Queue
int value;
if (ts.front().equals(“(“)) {
ts.dequeue();
value = valueOfExpr(ts);
ts.dequeue();
What token does this throw away?
} value = Integer.parseInt(number);
return value; }
String number = ts.dequeue();
22 March 2019 OSU CSE 37
Though method is called parseInt, it Code for Parser for factor
is not one of our parser methods; it is a
private static int valueOfFactor(
static method from the Java library’s
ts.dequeue();
value = valueOfExpr(ts);
ts.dequeue();
String number = ts.dequeue();
} value = Integer.parseInt(number); return value;
Queue
Integer class (with int utilities).
int value;
if (ts.front().equals(“(“)) {
22 March 2019 OSU CSE 38
Code for Parser for factor
private static int valueOfFactor( Queue
int value;
if (ts.front().equals(“(“)) {
ts.dequeue();
value = valueOfExpr(ts);
ts.dequeue();
String number = ts.dequeue();
Recursive descent: notice that value = Integer.parseInt(number);
return value;
valueOfExpr calls valueOfTerm, which calls valueOfFactor,
which here may call valueOfExpr.
22 March 2019 OSU CSE 39
Code for Parser for factor
private static int valueOfFactor( Queue
int value;
if (ts.front().equals(“(“)) {
ts.dequeue();
value = valueOfExpr(ts);
ts.dequeue();
How do you know this (indirect) recursion terminates?
} value = Integer.parseInt(number);
return value; }
String number = ts.dequeue();
22 March 2019 OSU CSE 40
A Recursive-Descent Parser
• One parse method per non-terminal symbol
• Anon-terminalsymbolontheright-handsideof a rewrite rule leads to a call to the parse method for that non-terminal
• Aterminalsymbolontheright-handsideofa rewrite rule leads to “consuming” that token from the input token string
• |intheCFGleadsto“if-else”intheparser
• {…}intheCFGleadsto“while”intheparser
22 March 2019 OSU CSE 41
Observations
• This is so formulaic that tools are available that can generate RDPs from CFGs
• In the lab, you will write an RDP for a language similar to the one illustrated here
– The CFG will be a bit different
– There will be no tokenizer, so you will parse a
string of characters in a Java StringBuilder • See methods charAt and deleteCharAt
22 March 2019 OSU CSE 42
• Wikipedia: Recursive Descent Parser
– http://en.wikipedia.org/wiki/Recursive_descent_parser • Java Libraries API: StringBuilder
– http://docs.oracle.com/javase/8/docs/api/
22 March 2019 OSU CSE 43
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com