Recursive-Descent Parser

Recursive-Descent Parser

Your task for this stage of the project is to construct a recursive-descent parser for the entire YASL language. The output will be a formatted representation of the resulting abstract syntax tree.

Sample Run

Here is a sample YASL program that shows most of the language:

// This program shows just about all of YASL
program demo3;
{ Declare some constants }
const x = 6;
const y = 7;
const z = -120;
{ Declare some variables }
var a: int;
var b: bool;
var c1: int; var c2: int;
{ Declare some procedures }
proc foo;
const a = 42; // local to foo
var b: bool;
begin
c1 = x * y;
b = a == c1;
if b then
print “Hooray!”;
end;
proc bar(n: int, b: bool, var r: int);
proc fact(n: int, var r: int);
begin
if b or not b then
if n > 0 then
begin
fact(n – 1, r);
r = r * n;
end;
else
r = (y + x) mod 2;
end;
begin
while b do
begin
b = not b;
foo;
end;
fact(n, r);
end;
begin
prompt “Enter a number”, a;
bar(a, x < y, c2);
print “The answer is “””, c2–z+c1, “””!”;
prompt “Hit any key to end”;
end.

When run on this input, the output should be something like:

Program demo3
Block
Const x = 6
Const y = 7
Const z = -120
Var a : Int
Var b : Bool
Var c1 : Int
Var c2 : Int
Proc foo
Block
Const a = 42
Var b : Bool
Assign c1
BinOp Times
Id x
Id y
Assign b
BinOp EQ
Id a
Id c1
IfThen
Id b
Print
StringItem “Hooray!”
Proc bar
Val n : Int
Val b : Bool
Var r : Int
Block
Proc fact
Val n : Int
Var r : Int
Block
IfThen
BinOp Or
Id b
UnOp Not
Id b
IfThenElse
BinOp GT
Id n
Num 0
Sequence
Call fact
BinOp Minus
Id n
Num 1
Id r
Assign r
BinOp Times
Id r
Id n
Assign r
BinOp Mod
BinOp Plus
Id y
Id x
Num 2
While
Id b
Sequence
Assign b
UnOp Not
Id b
Call foo
Call fact
Id n
Id r
Prompt2 “Enter a number”, a
Call bar
Id a
BinOp LT
Id x
Id y
Id c2
Print
StringItem “The answer is “”
ExprItem
BinOp Plus
BinOp Minus
Id c2
UnOp Neg
Id z
Id c1
StringItem “”!”
Prompt “Hit any key to end”

Specifications

Here is the full YASL grammar:

<Program> –>
program ID ; <Block> .

<Block> –>
<ConstDecls> <VarDecls> <ProcDecls> begin <Stmts> end

<ConstDecls> –>
<ConstDecl> <ConstDecls>
|

<ConstDecl> –>
const ID = <Sign> NUM ;

<Sign> –>

|

<VarDecls> –>
<VarDecl> <VarDecls>
|

<VarDecl> –>
var ID : <Type> ;

<Type> –>
int
| bool

<ProcDecls> –>
<ProcDecl> <ProcDecls>
|

<ProcDecl> –>
proc ID <ParamList> ; <Block> ;

<ParamList> –>
( <Params> )
|

<Params> –>
<Param> , <Params>
| <Param>

<Param> –>
ID : <Type>
| var ID : <Type>

<Stmts> –>
<Stmt> <Stmts>
|

<Stmt> –>
ID = <Expr> ;
| ID <ArgList> ;
| begin <Stmts> end ;
| if <Expr> then <Stmt>
| if <Expr> then <Stmt> else <Stmt>
| while <Expr> do <Stmt>
| prompt STRING ;
| prompt STRING , ID ;
| print <Items> ;

<ArgList> –>
( <Args> )
|

<Args> –>
<Expr> , <Args>
| <Expr>

<Items> –>
<Item> , <Items>
| <Item>

<Item> –>
<Expr>
| STRING

<Expr> –>
<SimpleExpr> <RelOp> <SimpleExpr>
| <SimpleExpr>

<RelOp> –>
== | <> | <= | >= | < | >

<SimpleExpr> –>
<SimpleExpr> <AddOp> <Term>
| <Term>

<AddOp> –>
+ | – | or

<Term> –>
<Term> <MulOp> <Factor>
| <Factor>

<MulOp> –>
* | div | mod | and

<Factor> –>
NUM
| ID
| true
| false
| – <Factor>
| not <Factor>
| ( <Expr> )

In addition to the tokens described in Project 1, note that there are several more keywords (var, int, bool, proc, if, then, else, while, do, prompt, and, or, not, true, false) and punctuation symbols (COLON ‘:’, LPAREN ‘(‘, RPAREN ‘)’, and COMMA ‘,’). There are also additional operators (EQUAL ==, NOTEQUAL <>, LESSEQUAL <=, GREATEREQUAL >=, LESS <, and GREATER >), some of which are more than a single character. Finally, there is an additional class of tokens that also need a lexeme: the STRING literals, which consist of (almost) any characters surrounded by double quotes (“). The only exception is that if a string literal contains a double quote it must be doubled (“”). For example, “I said, “”Hello””” is a string literal whose lexeme is the character sequence [I said, “Hello”].

Here is a description of the desired abstract syntax (in the form of Scala case classes — equivalents in C++ and Java will be discussed in class):

case class Program(name: String, block: Block)

case class Block(consts: List[ConstDecl], vars: List[VarDecl], procs: List[ProcDecl], body: List[Stmt])

case class ConstDecl(id: String, value: Int)

case class VarDecl(id: String, typ: Type)

trait Type
case object IntType extends Type
case object BoolType extends Type

case class ProcDecl(id: String, params: List[Param], block: Block)

trait Param
case class ValParam(id: String, typ: Type) extends Param
case class VarParam(id: String, typ: Type) extends Param

trait Stmt
case class Assign(id: String, expr: Expr) extends Stmt
case class Call(id: String, args: List[Expr]) extends Stmt
case class Sequence(body: List[Stmt]) extends Stmt
case class IfThen(test: Expr, trueClause: Stmt) extends Stmt
case class IfThenElse(test: Expr, trueClause: Stmt, falseClause: Stmt) extends Stmt
case class While(test: Expr, body: Stmt) extends Stmt
case class Prompt(message: String) extends Stmt
case class Prompt2(message: String, id: String) extends Stmt
case class Print(items: List[Item]) extends Stmt

trait Item
case class ExprItem(expr: Expr) extends Item
case class StringItem(message: String) extends Item

trait Expr
case class BinOp(left: Expr, op: Op2, right: Expr) extends Expr
case class UnOp(op: Op1, expr: Expr) extends Expr
case class Num(value: Int) extends Expr
case class Id(id: String) extends Expr
case object True extends Expr
case object False extends Expr

trait Op2
case object EQ extends Op2
case object NE extends Op2
case object LE extends Op2
case object GE extends Op2
case object LT extends Op2
case object GT extends Op2
case object Plus extends Op2
case object Minus extends Op2
case object Times extends Op2
case object Div extends Op2
case object Mod extends Op2
case object And extends Op2
case object Or extends Op2

trait Op1
case object Neg extends Op1
case object Not extends Op1

Suggestions

  • First augment your Scanner with the additional keywords, and punctuation (this should be easy). Then add support for the extra operators (this will require expanding the state diagram to handle multiple-character operators), and finally add string literals. Test your revised Scanner before continuing.
  • Define the data structures that will be used to represent the abstract syntax trees. If you are using Scala, you may use the definitions above; otherwise, they will need to be translated (we will do some of this in class).
  • Build the recursive descent parser by writing a parseX function for each non-terminal in the grammar. Each of these functions should return the corresponding abstract syntax tree. You will need to do a little bit of left-recursion elimination and left-factoring, as well as using the “cheat” for the dangling else.
  • Your main program should call parseProgram, then print out the resulting AST, using indentation to show the structure of the tree (see example above).
  • As with the previous project, you may deal with errors by printing out a message and stopping after the first parser error (lexical errors should still show a message and keep going).