CS代考 CSE 10

Recursive-Descent Parsing
22 March 2019 OSU CSE 1

BL Compiler Structure

Copyright By PowCoder代写 加微信 powcoder

string of characters (source code)
string of tokens (“words”)
abstract program
integers (object code)
22 March 2019
Note that the parser starts with a string of tokens.
Code Generator

Plan for the BL Parser
• Design a context-free grammar (CFG) to specify syntactically valid BL programs
• Use the grammar to implement a recursive-descent parser (i.e., an
algorithm to parse a BL program and construct the corresponding Program
22 March 2019 OSU CSE 3

• A CFG can be used to generate strings in its language
– “Given the CFG, construct a string that is in the language”
• A CFG can also be used to recognize strings in its language
– “Given a string, decide whether it is in the language”
– And, if it is, construct a derivation tree (or AST)
22 March 2019 OSU CSE 4

Parsing generally refers to this last
• A CFG can be used to generate strings in step, i.e., going from a string (in the
its language
language) to its derivation tree or—
for a programming language— – “Given the CFG, construct a string that is in
perhaps to an AST for the program. the language”
• A CFG can also be used to recognize strings in its language
– “Given a string, decide whether it is in the language”
– And, if it is, construct a derivation tree (or AST)
22 March 2019 OSU CSE 5

A Recursive-Descent Parser
• One parse method per non-terminal symbol
• Anon-terminalsymbolontheright-handsideof a rewrite rule leads to a call to the parse method for that non-terminal
• Aterminalsymbolontheright-handsideofa rewrite rule leads to “consuming” that token from the input token string
• |intheCFGleadsto“if-else”intheparser
22 March 2019 OSU CSE 6

Example: Arithmetic Expressions
expr term factor add-op mult-op
digit-seq digit
→ expr add-op term | term
→ term mult-op factor | factor → ( expr ) | digit-seq →+|-
→ * | DIV | REM
→ digit digit-seq | digit →0|1|2|3|4|5|6|7|8|9
22 March 2019

term factor add-op mult-op digit-seq digit
→ expr add-op term | term
→ term mult-op factor | factor → ( expr ) | digit-seq →+|-
→ * | DIV | REM
22 March 2019
→ digit digit-seq | digit
parser for this →0|1|2|3|4|5|6|7|8|9
Do you see a
problem with a
recursive descent
CFG? (Hint!)

expr term factor add-op mult-op
digit-seq digit
A Solution
→ term { add-op term } → factor { mult-op factor } → ( expr ) | digit-seq →+|-
→ * | DIV | REM
→ digit digit-seq | digit →0|1|2|3|4|5|6|7|8|9
22 March 2019

expr term factor add-op mult-op
digit-seq digit
A Solution
→ term { add-op term }
→ factor { mult-op factor }
→ ( expr ) | digit-seq
22 March 2019
OSU CSE 10
The special CFG symbols { and }
→ * | DIV | REM
mean that the enclosed sequence of
→ digit digit-seq | digit
symbols occurs zero or more times;
→0|1|2|3|4|5|6|7|8|9 this helps change a left-recursive
CFG into an equivalent CFG that can be parsed by recursive descent.

expr term factor add-op mult-op number nz-digit
22 March 2019
OSU CSE 11
A Solution
The special CFG symbols { and } also
simplify a non-terminal for a number
→ factor { mult-op factor } → ( expr ) | number →+|-
→ * | DIV | REM
→ 0 | nz-digit { 0 | nz-digit } →1|2|3|4|5|6|7|8|9

A Recursive-Descent Parser
• One parse method per non-terminal symbol
• Anon-terminalsymbolontheright-handsideof a rewrite rule leads to a call to the parse method for that non-terminal
• Aterminalsymbolontheright-handsideofa rewrite rule leads to “consuming” that token from the input token string
• |intheCFGleadsto“if-else”intheparser
• {…}intheCFGleadsto“while”intheparser
22 March 2019 OSU CSE 12

expr term factor add-op mult-op number nz-digit
→ term { add-op term }
then things get simpler for the
22 March 2019
More Improvements
If we treat every number as a token,
→ factor { mult-op factor }
parser: now there are only 5 non-
worry about.
→ ( expr →+|- →*|DIV|REM
→1|2|3|4|5|6|7|8|9
OSU CSE 13
→0| nz-digit {0| nz-digit }

expr term factor add-op mult-op number nz-digit
→ term { add-op term }
as a token, then it’s even simpler:
22 March 2019
OSU CSE 14
More Improvements
If we treat every add-op and mult-op
→ factor { mult-op factor }
there are only 3 non-terminals.
→ ( expr ) | number
→ * | DIV | REM
→ 0 | nz-digit { 0 | nz-digit } →1|2|3|4|5|6|7|8|9

Can you write the tokenizer for this language, so Improvements
every number, add-op, and mult-op is a token?
expr term factor add-op mult-op number nz-digit
→ term { add-op term }
→ factor { mult-op factor }
→ ( expr ) | number
→ * | DIV | REM
→ 0 | nz-digit { 0 | nz-digit } →1|2|3|4|5|6|7|8|9
22 March 2019
OSU CSE 15

Evaluating Arithmetic Expressions
• For this problem, parsing an arithmetic
expression means evaluating it
• The parser goes from a string of tokens in
the language of the CFG on the previous
slide, to the value of that expression as an
22 March 2019 OSU CSE 16

Structure of Solution
“4 + 29 DIV 3”
<"4", "+", "29", "DIV", "3", "EOI">
value of arithmetic expression
string of characters (arithmetic expression)
string of tokens
22 March 2019

“4 + 29 DIV 3”
Structure of Solution
We will use a Queue to hold a mathematical value like this.
<"4", "+", "29", "DIV", "3", "EOI">
value of arithmetic expression
string of characters (arithmetic expression)
string of tokens
22 March 2019

“4 + 29 DIV 3”
Structure of Solution
We will also assume the tokenizer adds an “end-of-input” token at the end.
<"4", "+", "29", "DIV", "3", "EOI">
value of arithmetic expression
string of characters (arithmetic expression)
string of tokens
22 March 2019

Parsing an expr
• We want to parse an expr, which must start with a term and must be followed by zero or more (pairs of) add-ops and terms:
expr → term { add-op term }
• An expr has an int value, which is what
we want returned by the method to parse an expr
22 March 2019 OSU CSE 20

Contract for Parser for expr
* Evaluates an expression and returns its value. * …
* @updates ts
* @requires
* [an expr string is a proper prefix of ts]
* @ensures
* valueOfExpr = [value of longest expr string at
* start of #ts] and
* #ts = [longest expr string at start of #ts] * ts */
private static int valueOfExpr(Queue ts) {…} 22 March 2019 OSU CSE 21

Parsing a term
• We want to parse a term, which must start with a factor and must be followed by zero or more (pairs of) mult-ops and factors:
term → factor { mult-op factor }
• A term has an int value, which is what
we want returned by the method to parse a term
22 March 2019 OSU CSE 22

Contract for Parser for term
* Evaluates a term and returns its value.
* @updates ts
* @requires
* [a term string is a proper prefix of ts]
* @ensures
* valueOfTerm = [value of longest term string at * start of #ts] and
* #ts = [longest term string at start of #ts] * ts */
private static int valueOfTerm(Queue ts) {…} 22 March 2019 OSU CSE 23

Parsing a factor
• We want to parse a factor, which must
start with the token “(” followed by an expr followed by the token “)”; or it must
be a number token:
factor → ( expr ) | number
• A factor has an int value, which is what we want returned by the method to parse
22 March 2019 OSU CSE 24

Contract for Parser for factor
* Evaluates a factor and returns its value.
* @updates ts
* @requires
* [a factor string is a proper prefix of ts]
* @ensures
* valueOfFactor = [value of longest factor string at * start of #ts] and
* #ts = [longest factor string at start of #ts] * ts */
private static int valueOfFactor(Queue ts){
22 March 2019 OSU CSE 25

Code for Parser for expr
private static int valueOfExpr(Queue ts) { int value = valueOfTerm(ts);
while (ts.front().equals(“+”) ||
ts.front().equals(“-“)) { String op = ts.dequeue();
int nextTerm = valueOfTerm(ts); if (op.equals(“+”)) {
value = value + nextTerm; } else /* “-” */ {
} } value = value – nextTerm;
return value; }
22 March 2019 OSU CSE 26

expr → term { add-op term } Code for Parser for expr
add-op → +|-
private static int valueOfExpr(Queue ts) { int value = valueOfTerm(ts);
while (ts.front().equals(“+”) ||
ts.front().equals(“-“)) { String op = ts.dequeue();
int nextTerm = valueOfTerm(ts); if (op.equals(“+”)) {
value = value + nextTerm; } else /* “-” */ {
} } value = value – nextTerm;
return value; }
22 March 2019 OSU CSE 27

Code for Parser for expr
private static int valueOfExpr(Queue ts) {
int value = valueOfTerm(ts); while (ts.front().equals(“+”) || ts.front().equals(“-“)) {
String op = ts.dequeue();
int nextTerm = valueOfTerm(ts); if (op.equals(“+”)) {
value = value + nextTerm; } else /* “-” */ {
} } value = value – nextTerm;
return value; }
22 March 2019 OSU CSE 28
This method is very similar to valueOfExpr.

Code for Parser for expr
private static int valueOfExpr(Queue ts) {
int value = valueOfTerm(ts); while (ts.front().equals(“+”) || ts.front().equals(“-“)) {
Look ahead
one token in ts to see
what’s next.
String op = ts.dequeue();
int nextTerm = valueOfTerm(ts); if (op.equals(“+”)) {
value = value + nextTerm; } else /* “-” */ {
} } value = value – nextTerm;
return value; }
22 March 2019 OSU CSE

Code for Parser for expr
private static int valueOfExpr(Queue ts) {
int value = valueOfTerm(ts); while (ts.front().equals(“+”) || ts.front().equals(“-“)) {
“Consume” the next token from ts.
} } value = value – nextTerm;
return value; }
22 March 2019 OSU CSE
String op = ts.dequeue();
int nextTerm = valueOfTerm(ts); if (op.equals(“+”)) {
value = value + nextTerm; } else /* “-” */ {

Code for Parser for expr
private static int valueOfExpr(Queue ts) {
int value = valueOfTerm(ts); while (ts.front().equals(“+”) || ts.front().equals(“-“)) {
String op = ts.dequeue();
int nextTerm = valueOfTerm(ts); if (op.equals(“+”)) {
value = value + nextTerm; } else /* “-” */ {
} } value = value – nextTerm;
return value; }
22 March 2019 OSU CSE
Evaluate next term.

Code for Parser for expr
private static int valueOfExpr(Queue ts) {
int value = valueOfTerm(ts); while (ts.front().equals(“+”) || ts.front().equals(“-“)) {
String op = ts.dequeue();
int nextTerm = valueOfTerm(ts); if (op.equals(“+”)) {
value = value + nextTerm; } else /* “-” */ {
} } value = value – nextTerm;
return value; }
22 March 2019 OSU CSE
Evaluate (some of) the expression.

Code for Parser for term private static int valueOfTerm(Queue ts) {
22 March 2019 OSU CSE
Can you write the body, using valueOfExpr as

Code for Parser for factor
private static int valueOfFactor( Queue ts) {
int value;
if (ts.front().equals(“(“)) {
ts.dequeue();
value = valueOfExpr(ts);
ts.dequeue();
String number = ts.dequeue();
} value = Integer.parseInt(number);
return value; }
22 March 2019 OSU CSE 34

factor → ( expr ) | number Code for Parser for factor
private static int valueOfFactor( Queue ts) {
int value;
if (ts.front().equals(“(“)) {
ts.dequeue();
value = valueOfExpr(ts);
ts.dequeue();
String number = ts.dequeue();
} value = Integer.parseInt(number);
return value; }
22 March 2019 OSU CSE 35

Code for Parser for factor
private static int valueOfFactor( Queue ts) {
int value;
if (ts.front().equals(“(“)) {
ts.dequeue();
value = valueOfExpr(ts);
ts.dequeue();
Look ahead
one token in ts to see
what’s next.
} value = Integer.parseInt(number);
return value; }
String number = ts.dequeue();
22 March 2019 OSU CSE 36

Code for Parser for factor
private static int valueOfFactor( Queue ts) {
int value;
if (ts.front().equals(“(“)) {
ts.dequeue();
value = valueOfExpr(ts);
ts.dequeue();
What token does this throw away?
} value = Integer.parseInt(number);
return value; }
String number = ts.dequeue();
22 March 2019 OSU CSE 37

Though method is called parseInt, it Code for Parser for factor
is not one of our parser methods; it is a
private static int valueOfFactor(
static method from the Java library’s
ts.dequeue();
value = valueOfExpr(ts);
ts.dequeue();
String number = ts.dequeue();
} value = Integer.parseInt(number); return value;
Queue ts) {
Integer class (with int utilities).
int value;
if (ts.front().equals(“(“)) {
22 March 2019 OSU CSE 38

Code for Parser for factor
private static int valueOfFactor( Queue ts) {
int value;
if (ts.front().equals(“(“)) {
ts.dequeue();
value = valueOfExpr(ts);
ts.dequeue();
String number = ts.dequeue();
Recursive descent: notice that value = Integer.parseInt(number);
return value;
valueOfExpr calls valueOfTerm, which calls valueOfFactor,
which here may call valueOfExpr.
22 March 2019 OSU CSE 39

Code for Parser for factor
private static int valueOfFactor( Queue ts) {
int value;
if (ts.front().equals(“(“)) {
ts.dequeue();
value = valueOfExpr(ts);
ts.dequeue();
How do you know this (indirect) recursion terminates?
} value = Integer.parseInt(number);
return value; }
String number = ts.dequeue();
22 March 2019 OSU CSE 40

A Recursive-Descent Parser
• One parse method per non-terminal symbol
• Anon-terminalsymbolontheright-handsideof a rewrite rule leads to a call to the parse method for that non-terminal
• Aterminalsymbolontheright-handsideofa rewrite rule leads to “consuming” that token from the input token string
• |intheCFGleadsto“if-else”intheparser
• {…}intheCFGleadsto“while”intheparser
22 March 2019 OSU CSE 41

Observations
• This is so formulaic that tools are available that can generate RDPs from CFGs
• In the lab, you will write an RDP for a language similar to the one illustrated here
– The CFG will be a bit different
– There will be no tokenizer, so you will parse a
string of characters in a Java StringBuilder • See methods charAt and deleteCharAt
22 March 2019 OSU CSE 42

• Wikipedia: Recursive Descent Parser
– http://en.wikipedia.org/wiki/Recursive_descent_parser • Java Libraries API: StringBuilder
– http://docs.oracle.com/javase/8/docs/api/
22 March 2019 OSU CSE 43

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com