CS计算机代考程序代写 Java Assessed Exercise, Task 1: Lexer and Parser

Assessed Exercise, Task 1: Lexer and Parser

Assessed Exercise, Task 1: Lexer and Parser

Summary

In this task you will write a lexer and a parser (or their combination, if suitable) for this simple programming language.
You are free to write these in any way you want, whether ‘by hand’ or with software tools, as long as they adhere to the requirements specified below (i.e. whether your lexer and parser are two separate programs, or how you write them do not matter for the mark).

We strongly recommend using generators (in particular ANTLR) as
the goal can be achieved with much less effort and hand-written code;
however, writing lexer or parser by hand will give you a better understanding of the underlying technical details and difficulties.

Please also carefully read the submission guidelines.

Specification

In what follows, we assume the lexer and the parser are written separately.

Lexer specification

Your lexer will take an arbitrary string (over ASCII characters) as input, and
returns either a suitable token list when the input is lexically valid.
The tokens involved in the language are as follows:

Operator tokens
Token Lexeme Token Lexeme
Assign := Eq ==
Less < Gtr >
LessEq <= GtrEq >=
Plus + Times *
Minus – Div /
And && Or ||
Xor ^^

Expression tokens
Token Lexeme
Idfr
[a-z][A-Za-z0-9_]*

IntLit
0 | [1-9][0-9]*
Skip skip

Type tokens
Token Lexeme
IntType int
BoolType bool
UnitType unit

Control flow tokens
Token Lexeme Token Lexeme
While while If if
Do do Then then
Repeat repeat Else else
Until until

Delimiter tokens
Token Lexeme
LParen (
Comma ,
RParen )
LBrace {
Semicolon ;
RBrace }

(The names of some of these tokens will be used as part of the output to your program.
We do most of the assessment using automated testing: please use the same Tokens given here, for any output your code produces!)

Your lexer will output a list of tokens, where the token and the token store the additional information of the name of an identifier or an integer value. (See the examples below.)
IdfrIntLit

Parser specification

Your parser will take as input a token list, of the sort that your lexer produces.
When the input is syntactically valid, it should return an AST, provided the input adheres to the syntax of the language.

The AST produced by your parser should be represented as an S-expression.
These are nested lists (whose elements may also be lists, and so forth), representing sub-trees of the AST.
Below, we describe the format that we would like you to use for S-expressions — in particular, what we will be looking for in the automated testing.

Implicit and explicit S-expressions

Each S-expression will have one of two formats: either

An explicit S-expression, whose first element is a ‘header label’.
These describes a specific syntactic structure with a specific number of expected elements after the header; or

An implicit S-expression, consisting of a list of zero or more entries.
These may have any number of elements, but do not have a header, so that their syntactical role must be determined from context.

We use ‘implicit’ S-expression only in a few places, where their meaning is clear:

The very top level of the AST (representing a program as a sequence of function declarations);

Block expressions (representing a block simply as a list of sub-expressions);

Lists of function parameters or function arguments.

Every other part of the AST is represented by an ‘explicit’ S-expression — except for parts of the language corresponding to ‘expression tokens’ as above, which we represent simply with the very same tokens.

List of S-expression types

As noted above, the AST may contain some S-expressions with header labels (describing their role in the AST), and some without.
The header labels are the following:

FunDecl Asgmt BinOpExpr FunInvoc
IfStmt WhileLoop RepeatLoop

Apart from , these all indicate different sorts of expressions.
FunDecl

The following are the types of S-expression, which we may use to represent the different parts of an AST for the language above.

Top-level programs are represented by an implicit S-expression, consisting simply of one or more function declarations:

These have no header, as no context is required to interpret the top level of an AST in this language.
[ (fun-decl-1) , (fun-decl-2) , … ]

Function declarations are represented by an S-expression with a “” header, which specify an identifier token (fn-name-idfr) for the function, a return type token (return-type), a list of parameters, and a function body:

The list of parameters is provided as a list, of zero or more pairs, where each pair consists of an identifier token (representing a parameter name) and a type:

The function body (fn-body) is given by an S-expression for a element, as described below.
FunDecl [FunDecl, (fn-name-idfr), (return-type), [list of parameters], (fn-body) ]
[ [(param-1-idfr), (type-1)], [(param-2-idfr), (type-2)], … ]
BLOCK

Block expressions are represented by an S-expression without a header, consisting simply of the list of one or more expressions in the block:

Note that some of these expressions may themselves be blocks, in which case they are also headerless S-expressions.
[ (expr-1), (expr-2), … ]

Assignment expressions are represented by an S-expression with an “” header, and two entries: an identifier (idfr) being assigned to, and an expression (expr) which we are assigning to it.
Asgmt [Asgmt, (idfr), (expr) ]

Binary operator expressions are represented by an S-expression with a “” header, and three entries: a binary operator token, and the two expressions involved in the operation.
BinOpExpr [BinOpExpr, (operator-token), (expr-1), (expr-2) ]

Function invocations are represented by an S-expression with a “” header, and two entries: an identifier (idfr) representing the function name, and a list of arguments to the function:

The list of arguments consists of a list of zero or more expressions (similar to a block, but with a different meaning).
FunInvoc [FunInvoc, (idfr), [list of arguments] ]

If-statements, while loops and repeat loops are represented with similar S-expressions, respectively using headers “”, “”, and “”, which have an expression representing the condition of the loop or if-statement, and either one block S-expression (for a loop body) or two block S-expressions (for the then-body and the else-body of an if-statement).
IfStmtWhileLoopRepeatLoop [IfStmt, (expr), (then-body), (else-body) ]
[WhileLoop, (expr), (loop-body) ]
[RepeatLoop, (expr), (loop-body) ]

Input/output specification

Your overall program is essentially the combination of the lexer and parser described above:

it receives a string as input from , using whitespace to separate tokens but otherwise ignoring it;
System.in

it parses it, produces an AST, and prints it as an S-expression to .
System.out

We do not impose any constraints on what spaces or newlines your output contains.
However, we are happy to offer you a Java class to pretty print these expressions, which may help you to debug your code.
You might wish to produce your S-expression as a , provide this as input to the pretty printer, and then produce the resulting string to .

StringSystem.out
Note. For this task, you can assume all input to be lexically and syntactically valid.

Examples

Here is a lexically and syntactically valid input.

int main() { fun(1, 2, 3) }
int fun(int x, int y, int z) { if (x == y) then { z } else { 0 } }

After lexing, it is turned into the following token stream. (You would not print this as output.)

IntType, Idfr(“main”), LParen, RParen, LBrace, Idfr(“fun”),
LParen, IntLit(1), Comma, IntLit(2), Comma, IntLit(3),
RParen, RBrace, IntType, Idfr(“fun”), LParen, IntType,
Idfr(“x”), Comma, IntType, Idfr(“y”), Comma, IntType,
Idfr(“z”), RParen, LBrace, If, LParen, Idfr(“x”),
Eq, Idfr(“y”), RParen, Then, LBrace, Idfr(“z”),
RBrace, Else, LBrace, IntLit(0), RBrace, RBrace

After parsing, we get the following S-expression (as formatted by the Java class to pretty print these expressions), which we would print to :

System.out

[
[FunDecl,
Idfr(“main”),
IntType,
[],
[
[FunInvoc,
Idfr(“fun”),
[IntLit(1),
IntLit(2),
IntLit(3)]
]
]
],
[FunDecl,
Idfr(“fun”),
IntType,
[
[Idfr(“x”), IntType],
[Idfr(“y”), IntType],
[Idfr(“z”), IntType]
],
[
[IfStmt,
[BinOpExpr,
Eq,
Idfr(“x”),
Idfr(“y”)],
[Idfr(“z”)],
[IntLit(0)]
]
]
]
]

(Do not worry about the formatting — this is something which we can take care of in the automated testing.)