CS计算机代考程序代写 Java Assessed Exercise, Task 1: Lexer and Parser

Assessed Exercise, Task 1: Lexer and Parser
Summary
In this task you will write a lexer and a parser (or their combination, if suitable) for this simple programming language. You are free to write these in any way you want, whether ‘by hand’ or with software tools, as long as they adhere to the requirements specified below (i.e. whether your lexer and parser are two separate programs, or how you write them do not matter for the mark).
We strongly recommend using generators (in particular ANTLR) as the goal can be achieved with much less effort and hand-written code; however, writing lexer or parser by hand will give you a better understanding of the underlying technical details and difficulties.
Please also carefully read the submission guidelines.
Specification
In what follows, we assume the lexer and the parser are written separately.
Lexer specification
Your lexer will take an arbitrary string (over ASCII characters) as input, and returns either a suitable token list when the input is lexically valid. The tokens involved in the language are as follows:

Operator tokens

Token
Lexeme
Token
Lexeme

Assign
:=
Eq
==

Less
< Gtr >

LessEq
<= GtrEq >=

Plus
+
Times
*

Minus

Div
/

And
&&
Or
||

Xor
^^

Expression tokens

Token
Lexeme

Idfr
[a-z][A-Za-z0-9_]*

IntLit
0 | [1-9][0-9]*

Skip
skip

Type tokens

Token
Lexeme

IntType
int

BoolType
bool

UnitType
unit

Control flow tokens

Token
Lexeme
Token
Lexeme

While
while
If
if

Do
do
Then
then

Repeat
repeat
Else
else

Until
until

Delimiter tokens

Token
Lexeme

LParen
(

Comma
,

RParen
)

LBrace
{

Semicolon
;

RBrace
}

(The names of some of these tokens will be used as part of the output to your program. We do most of the assessment using automated testing: please use the same Tokens given here, for any output your code produces!)
Your lexer will output a list of tokens, where the Idfr token and the IntLit token store the additional information of the name of an identifier or an integer value. (See the examples below.)
Parser specification
Your parser will take as input a token list, of the sort that your lexer produces. When the input is syntactically valid, it should return an AST, provided the input adheres to the syntax of the language.
The AST produced by your parser should be represented as an S-expression. These are nested lists (whose elements may also be lists, and so forth), representing sub-trees of the AST. Below, we describe the format that we would like you to use for S-expressions — in particular, what we will be looking for in the automated testing.
Implicit and explicit S-expressions
Each S-expression will have one of two formats: either
· An explicit S-expression, whose first element is a ‘header label’. These describes a specific syntactic structure with a specific number of expected elements after the header; or
· An implicit S-expression, consisting of a list of zero or more entries. These may have any number of elements, but do not have a header, so that their syntactical role must be determined from context.
We use ‘implicit’ S-expression only in a few places, where their meaning is clear:
1. The very top level of the AST (representing a program as a sequence of function declarations);
2. Block expressions (representing a block simply as a list of sub-expressions);
3. Lists of function parameters or function arguments.
Every other part of the AST is represented by an ‘explicit’ S-expression — except for parts of the language corresponding to ‘expression tokens’ as above, which we represent simply with the very same tokens.
List of S-expression types
As noted above, the AST may contain some S-expressions with header labels (describing their role in the AST), and some without. The header labels are the following:
FunDecl Asgmt BinOpExpr FunInvoc
IfStmt WhileLoop RepeatLoop
Apart from FunDecl, these all indicate different sorts of expressions.
The following are the types of S-expression, which we may use to represent the different parts of an AST for the language above.
· Top-level programs are represented by an implicit S-expression, consisting simply of one or more function declarations:
[ (fun-decl-1) , (fun-decl-2) , … ]
These have no header, as no context is required to interpret the top level of an AST in this language.
· Function declarations are represented by an S-expression with a “FunDecl” header, which specify an identifier token (fn-name-idfr) for the function, a return type token (return-type), a list of parameters, and a function body:
[FunDecl, (fn-name-idfr), (return-type), [list of parameters], (fn-body) ]
The list of parameters is provided as a list, of zero or more pairs, where each pair consists of an identifier token (representing a parameter name) and a type:
[ [(param-1-idfr), (type-1)], [(param-2-idfr), (type-2)], … ]
The function body (fn-body) is given by an S-expression for a BLOCK element, as described below.
· Block expressions are represented by an S-expression without a header, consisting simply of the list of one or more expressions in the block:
[ (expr-1), (expr-2), … ]
Note that some of these expressions may themselves be blocks, in which case they are also headerless S-expressions.
· Assignment expressions are represented by an S-expression with an “Asgmt” header, and two entries: an identifier (idfr) being assigned to, and an expression (expr) which we are assigning to it.
[Asgmt, (idfr), (expr) ]

· Binary operator expressions are represented by an S-expression with a “BinOpExpr” header, and three entries: a binary operator token, and the two expressions involved in the operation.
[BinOpExpr, (operator-token), (expr-1), (expr-2) ]

· Function invocations are represented by an S-expression with a “FunInvoc” header, and two entries: an identifier (idfr) representing the function name, and a list of arguments to the function:
[FunInvoc, (idfr), [list of arguments] ]
The list of arguments consists of a list of zero or more expressions (similar to a block, but with a different meaning).
· If-statements, while loops and repeat loops are represented with similar S-expressions, respectively using headers “IfStmt”, “WhileLoop”, and “RepeatLoop”, which have an expression representing the condition of the loop or if-statement, and either one block S-expression (for a loop body) or two block S-expressions (for the then-body and the else-body of an if-statement).
· [IfStmt, (expr), (then-body), (else-body) ]
· [WhileLoop, (expr), (loop-body) ]
[RepeatLoop, (expr), (loop-body) ]
Input/output specification
Your overall program is essentially the combination of the lexer and parser described above:
· it receives a string as input from System.in, using whitespace to separate tokens but otherwise ignoring it;
· it parses it, produces an AST, and prints it as an S-expression to System.out.
We do not impose any constraints on what spaces or newlines your output contains. However, we are happy to offer you a Java class to pretty print these expressions, which may help you to debug your code. You might wish to produce your S-expression as a String, provide this as input to the pretty printer, and then produce the resulting string to System.out.
Note. For this task, you can assume all input to be lexically and syntactically valid.
Examples
Here is a lexically and syntactically valid input.
int main() { fun(1, 2, 3) }
int fun(int x, int y, int z) { if (x == y) then { z } else { 0 } }
After lexing, it is turned into the following token stream. (You would not print this as output.)
IntType, Idfr(“main”), LParen, RParen, LBrace, Idfr(“fun”),
LParen, IntLit(1), Comma, IntLit(2), Comma, IntLit(3),
RParen, RBrace, IntType, Idfr(“fun”), LParen, IntType,
Idfr(“x”), Comma, IntType, Idfr(“y”), Comma, IntType,
Idfr(“z”), RParen, LBrace, If, LParen, Idfr(“x”),
Eq, Idfr(“y”), RParen, Then, LBrace, Idfr(“z”),
RBrace, Else, LBrace, IntLit(0), RBrace, RBrace
After parsing, we get the following S-expression (as formatted by the Java class to pretty print these expressions), which we would print to System.out:
[
[FunDecl,
Idfr(“main”),
IntType,
[],
[
[FunInvoc,
Idfr(“fun”),
[IntLit(1),
IntLit(2),
IntLit(3)]
]
]
],
[FunDecl,
Idfr(“fun”),
IntType,
[
[Idfr(“x”), IntType],
[Idfr(“y”), IntType],
[Idfr(“z”), IntType]
],
[
[IfStmt,
[BinOpExpr,
Eq,
Idfr(“x”),
Idfr(“y”)],
[Idfr(“z”)],
[IntLit(0)]
]
]
]
]
(Do not worry about the formatting — this is something which we can take care of in the automated testing.)