CS计算机代考程序代写 Java Assessed Exercise, Task 1- Lexer and Parser

Assessed Exercise, Task 1- Lexer and Parser

Operator tokens

Token Lexeme Token Lexeme

Assign := Eq ==
Less < Gtr >

LessEq <= GtrEq >=
Plus + Times *

Minus – Div /
And && Or ||
Xor ^^

Expression tokens

Token Lexeme

Idfr [ a-z ][ A-Za-z0-9_ ]*
IntLit 0 | [ 1-9 ][ 0-9 ]*

Skip skip

Type tokens

Token Lexeme

IntType int
BoolType bool
UnitType unit

Control flow tokens

Token Lexeme Token Lexeme

While while If if
Do do Then then

Repeat repeat Else else
Until until

Delimiter tokens

Token Lexeme

LParen (
Comma ,

RParen )
LBrace {

Semicolon ;
RBrace }

Assessed Exercise, Task 1: Lexer and Parser

Summary

In this task you will write a lexer and a parser (or their
combination, if suitable) for this simple programming language.
You are free to write these in any way you want, whether ‘by
hand’ or with software tools, as long as they adhere to the
requirements specified below (i.e. whether your lexer and
parser are two separate programs, or how you write them do
not matter for the mark).

We strongly recommend using generators (in particular
ANTLR) as the goal can be achieved with much less effort and
hand-written code; however, writing lexer or parser by hand will
give you a better understanding of the underlying technical
details and difficulties.

Please also carefully read the submission guidelines.

Specification

In what follows, we assume the lexer and the parser are written
separately.

Lexer specification

Your lexer will take an arbitrary string (over ASCII characters)
as input, and returns either a suitable token list when the input
is lexically valid. The tokens involved in the language are as
follows:

(The names of some of these tokens will be used as part of the
output to your program. We do most of the assessment using
automated testing: please use the same Tokens given here, for
any output your code produces!)

Your lexer will output a list of tokens, where the Idfr token
and the IntLit token store the additional information of the
name of an identifier or an integer value. (See the examples
below.)

Parser specification

Your parser will take as input a token list, of the sort that your
lexer produces. When the input is syntactically valid, it should
return an AST, provided the input adheres to the syntax of the
language.

The AST produced by your parser should be represented as an
S-expression. These are nested lists (whose elements may
also be lists, and so forth), representing sub-trees of the AST.
Below, we describe the format that we would like you to use for
S-expressions — in particular, what we will be looking for in the
automated testing.

Implicit and explicit S-expressions

Each S-expression will have one of two formats: either

An explicit S-expression, whose first element is a ‘header
label’. These describes a specific syntactic structure with a
specific number of expected elements after the header; or
An implicit S-expression, consisting of a list of zero or more
entries. These may have any number of elements, but do not
have a header, so that their syntactical role must be
determined from context.

We use ‘implicit’ S-expression only in a few places, where their
meaning is clear:

1. The very top level of the AST (representing a program as a
sequence of function declarations);

2. Block expressions (representing a block simply as a list of
sub-expressions);

3. Lists of function parameters or function arguments.

Every other part of the AST is represented by an ‘explicit’ S-
expression — except for parts of the language corresponding
to ‘expression tokens’ as above, which we represent simply
with the very same tokens.

List of S-expression types

As noted above, the AST may contain some S-expressions
with header labels (describing their role in the AST), and some
without. The header labels are the following:

FunDecl Asgmt BinOpExpr FunInvoc
IfStmt WhileLoop RepeatLoop

Apart from FunDecl, these all indicate different sorts of
expressions.

The following are the types of S-expression, which we may use
to represent the different parts of an AST for the language
above.

Top-level programs are represented by an implicit S-
expression, consisting simply of one or more function
declarations:
[ (fun-decl-1) , (fun-decl-2) , … ]

These have no header, as no context is required to interpret
the top level of an AST in this language.

Function declarations are represented by an S-expression
with a “FunDecl” header, which specify an identifier token
(fn-name-idfr) for the function, a return type token (return-type), a
list of parameters, and a function body:
[FunDecl, (fn-name-idfr), (return-type), [list of parameters], (fn-body) ]

The list of parameters is provided as a list, of zero or more
pairs, where each pair consists of an identifier token
(representing a parameter name) and a type:
[ [(param-1-idfr), (type-1)], [(param-2-idfr), (type-2)], … ]

The function body (fn-body) is given by an S-expression for a
BLOCK element, as described below.

Block expressions are represented by an S-expression
without a header, consisting simply of the list of one or more
expressions in the block:
[ (expr-1), (expr-2), … ]

Note that some of these expressions may themselves be
blocks, in which case they are also headerless S-
expressions.

Assignment expressions are represented by an S-
expression with an “Asgmt” header, and two entries: an
identifier (idfr) being assigned to, and an expression (expr)
which we are assigning to it.
[Asgmt, (idfr), (expr) ]

Binary operator expressions are represented by an S-
expression with a “BinOpExpr” header, and three entries: a
binary operator token, and the two expressions involved in
the operation.
[BinOpExpr, (operator-token), (expr-1), (expr-2) ]

Function invocations are represented by an S-expression
with a “FunInvoc” header, and two entries: an identifier (idfr)
representing the function name, and a list of arguments to
the function:
[FunInvoc, (idfr), [list of arguments] ]

The list of arguments consists of a list of zero or more
expressions (similar to a block, but with a different meaning).

If-statements, while loops and repeat loops are
represented with similar S-expressions, respectively using
headers “IfStmt”, “WhileLoop”, and “RepeatLoop”,
which have an expression representing the condition of the
loop or if-statement, and either one block S-expression (for a
loop body) or two block S-expressions (for the then-body and
the else-body of an if-statement).
[IfStmt, (expr), (then-body), (else-body) ]
[WhileLoop, (expr), (loop-body) ]
[RepeatLoop, (expr), (loop-body) ]

Input/output specification

Your overall program is essentially the combination of the lexer
and parser described above:

it receives a string as input from System.in, using
whitespace to separate tokens but otherwise ignoring it;
it parses it, produces an AST, and prints it as an S-
expression to System.out.

We do not impose any constraints on what spaces or newlines
your output contains. However, we are happy to offer you a
Java class to pretty print these expressions, which may help
you to debug your code. You might wish to produce your S-
expression as a String, provide this as input to the pretty
printer, and then produce the resulting string to System.out.

Note. For this task, you can assume all input to be lexically and
syntactically valid.

Examples

Here is a lexically and syntactically valid input.

int main() { fun(1, 2, 3) }
int fun(int x, int y, int z) { if (x == y) then { z } else { 0 } }

After lexing, it is turned into the following token stream. (You
would not print this as output.)
IntType, Idfr(“main”), LParen, RParen, LBrace, Idfr(“fun”),
LParen, IntLit(1), Comma, IntLit(2), Comma, IntLit(3),
RParen, RBrace, IntType, Idfr(“fun”), LParen, IntType,
Idfr(“x”), Comma, IntType, Idfr(“y”), Comma, IntType,
Idfr(“z”), RParen, LBrace, If, LParen, Idfr(“x”),
Eq, Idfr(“y”), RParen, Then, LBrace, Idfr(“z”),
RBrace, Else, LBrace, IntLit(0), RBrace, RBrace

After parsing, we get the following S-expression (as formatted
by the Java class to pretty print these expressions), which we
would print to System.out:
[
[FunDecl,
Idfr(“main”),
IntType,
[],
[
[FunInvoc,
Idfr(“fun”),
[IntLit(1),
IntLit(2),
IntLit(3)]
]
]
],
[FunDecl,
Idfr(“fun”),
IntType,
[
[Idfr(“x”), IntType],
[Idfr(“y”), IntType],
[Idfr(“z”), IntType]
],
[
[IfStmt,
[BinOpExpr,
Eq,
Idfr(“x”),
Idfr(“y”)],
[Idfr(“z”)],
[IntLit(0)]
]
]
]
]

(Do not worry about the formatting — this is something which
we can take care of in the automated testing.)

2021/11/27 10:17
⻚码:1/1