Assessed Exercise, Task 2: Semantic analysis
Assessed Exercise, Task 2: Semantic analysis
Summary
In this task you are going to implement, in Java, a type checker for ASTs
for the same simple programming language.
The input for this task is in the form of an S-expression representing an AST,
according to the specification described for Task 1
of the coursework, read from .
(We assume that the input program has passed the syntax analysis phase,
so you may assume that the S-expression is properly formatted.)
The type checker will do the checks (as detailed below) and print the appropriate
output to .
System.inSystem.out
To help you get up to speed, we provide a skeleton project
which you can use as a basis.
However, you can write the type checker in any way you like, as long as it adheres to the requirements specified below.
Please note that we require type errors to be reported through Java’s exception handling mechanism, and according to a specific format (to facilitate automated testing and feedback.)
To help in this, we provide a custom RuntimeException class (feel free to modify interfaces, arguments, etc., but please keep the generated output identical).
Once again, you will need a preprocessor (e.g., a lexer and a parser)
to turn the input AST (represented by an S-expression) into a data structure that you can work on directly.
To save your time, we also provide this as a part of the skeleton project — however, as the input is now guaranteed to be correctly formatted, and is in the form of an S-expression, it may now be manageable to write one manually if you wish.
In any case, please think early and carefully about how you will implement pattern matching in your code and make sure it works.
Please also carefully read the submission guidelines.
Function declarations and identifier scope
Functions may be declared in the top-level scope, in any order, and can invoke functions which are defined
earlier or later in the source file. (That is: the order in which functions are declared, does not matter.)
The functions must have distinct names.
We suggest that you use a symbol table to maintain a record of the functions which have been declared, and
their types.
Any parameters taken by a function must have type either or .
In type-checking a function declaration, it may be useful to define a separate symbol table for the
parameters of that function, and add the names of those parameters together with their declared types.
This type information can then be used to type-check the function body.
The body of the function itself is a , whose type must agree with the
return type of the function.
intboolblock
We may summarise the rules about scope for identifiers as follows.
The definition of one function , may involve the invocation of another function
, which is declared either before or after the function .
fnAfnBfnA
No two functions may have the same name, and no two parameters of the same function
may have the same name.
fnA
The parameters of a function , may have the same name as some function .
fnAfnB
A function parameter is only in-scope in the body of the declaration of that function —
there are no global variables or side-effects.
A function parameter must have type either or .
intbool
Type deduction rules
The following are the rules which specify how expressions of various kinds are to be typed.
Numerical expressions
Numerical expressions
Constants
Γ ⊢ ⟨IntLit⟩ : int
Addition
Γ ⊢ P : int
Γ ⊢ Q : int
Γ ⊢ (P + Q) : int
Subtraction
Γ ⊢ P : int
Γ ⊢ Q : int
Γ ⊢ (P – Q) : int
Multiplication
Γ ⊢ P : int
Γ ⊢ Q : int
Γ ⊢ (P * Q) : int
Division
Γ ⊢ P : int
Γ ⊢ Q : int
Γ ⊢ (P / Q) : int
These are expressions which take values of type , in particular arising from arithmetic operations
on two values of type .
For any expression involving an arithmetic operator as below, where one of the sub-expressions involved
is not of type , should result in an error.
intintint
Comparisons
Comparison expressions
Equality
Γ ⊢ P : int
Γ ⊢ Q : int
Γ ⊢ (P == Q) : bool
Less than
Γ ⊢ P : int
Γ ⊢ Q : int
Γ ⊢ (P < Q) : bool Greater than Γ ⊢ P : int Γ ⊢ Q : int Γ ⊢ (P > Q) : bool
Less than
or equal
Γ ⊢ P : int
Γ ⊢ Q : int
Γ ⊢ (P <= Q) : bool Greater than or equal Γ ⊢ P : int Γ ⊢ Q : int Γ ⊢ (P >= Q) : bool
Comparison expressions are ones which relate two expressions of type , to produce an expression of type .
For any expression involving a comparison operator as below, where one of the sub-expressions involved is not of type , should result in an error.
intboolint
Boolean operations
Boolean operators
Logical AND
Γ ⊢ P : bool
Γ ⊢ Q : bool
Γ ⊢ (P && Q) : bool
Logical OR
Γ ⊢ P : bool
Γ ⊢ Q : bool
Γ ⊢ (P || Q) : bool
Logical XOR
Γ ⊢ P : bool
Γ ⊢ Q : bool
Γ ⊢ (P ^^ Q) : bool
These are expressions which combine two values of type using a logical operation, to produce another value of type .
For any expression involving a logical operator as below, where one of the sub-expressions involved is not of type , should result in an error.
boolboolbool
Statements
Statements
Skip statements
Γ ⊢ : unit
skip
Assignments
Γ ⊢ : α
Γ ⊢ P : α
Γ ⊢ x := P : unit
x
While loops
Γ ⊢ C : bool
Γ ⊢ B : unit
Γ ⊢ while C do {B} : unit
Repeat loops
Γ ⊢ B : unit
Γ ⊢ C : bool
Γ ⊢ repeat {B} until C : unit
These are ‘expressions’ which represent statements — not having an interesting value
(represented by assigning them the type ).
Any statement such as those below, which involve a mismatch of types where type-agreement is
required, or which involve an expression other than type where a boolean expression is expected,
should result in an error.
unitbool
Generic expressions
Generic expressions
Function
invocations
Γ ⊢ subr : (α1 , … , αn) → β
Γ ⊢ E1 : α1
…
Γ ⊢ En : αn
Γ ⊢ subr (E1, … , En) : β
Block
expressions
Γ ⊢ E1 : α1
…
Γ ⊢ En : αn
Γ ⊢ E1 ; … ; En : αn
If statements
Γ ⊢ C : bool
Γ ⊢ B1 : α
Γ ⊢ B2 : α
Γ ⊢ if C then {B1} else {B2} : α
These are expressions, whose types depend entirely on the types of other identifiers or expressions.
In any case where there is a mismatch of types which are required to agree, or where an expression other than type
is found a boolean expression is expected, an error should be returned.
bool
Input/Output specification
Your type checker will read an AST, in the form of an S-expression, from as input.
(The format will be the same as the output of the previous task, but may also contain whitespace.)
It will conduct the following checks (not necessarily in this order):
System.in
check that the program has a function, which takes no arguments, and which has return type mainint;
check that all functions have distinct names, and all of the parameters of each function have distinct names;
check that all identifiers used in a function, are either the names of parameters of that function,
or the names of other functions which are defined somewhere;
Based on the type deduction rules above, type-check the function bodies contained in the AST.
If any one of these conditions fail, your program will throw an instance of
an Exception class that we provide for this purpose,
which will present diagnostic information in a standardised format.
If none of these errors are present, the input program will be printed on
(again as an S-expression), but with the function becoming the first
declared function.
System.outmain
We provide a skeleton project, for you to add code to,
to help you to realise this Task.
It performs the work of reading the S-expression at the input, storing it in an abstract syntax tree;
it also contains the code necessary to verify the existence of a procedure and to
produce output with this being the first declaration.
main
The specific requirements for the output of your code are as follows:
In the case where an exception is thrown, it should be caught by the method in your Java code,
which should then invoke the method of the exception.
mainreport
In the case where the program is semantically valid, you do not have to use whitespace
in any specific way in the printed S-expression, but you may like to use the
Java class which we provide
to pretty print these expressions (this is also included in the
skeleton project), for your own benefit/convenience.
Example #1
Consider the following input AST (as an S-expression).
[
[FunDecl,
Idfr(“fun”), IntType,
[[Idfr(“x”), IntType], [Idfr(“y”), IntType], [Idfr(“z”), IntType]],
[
[IfStmt,
[BinOpExpr, Eq, Idfr(“x”), Idfr(“y”)],
[
[BinOpExpr, Plus, Idfr(“z”),
[BinOpExpr, Eq, IntLit(1), IntLit(1)]
]
],
[IntLit(0)]
]
]
],
[FunDecl,
Idfr(“main”), IntType, [],
[
[FunInvoc,
Idfr(“fun”), [IntLit(1), IntLit(2), IntLit(3)]
]
]
],
[FunDecl,
Idfr(“fun2”), UnitType, [],
[ Skip ]
]
]
This is lexically and syntactically valid, but is of type , and so is invalid as an operand of .
The type checker gives the following output:
[BinOpExpr, Eq, IntLit(1), IntLit(1)]BoolTypePlus
Invalid operands in arithmetic expression:
[BinOpExpr,
Plus,
Idfr(“z”),
[BinOpExpr,
Eq,
IntLit(1),
IntLit(1)]
]
and
[BinOpExpr,
Plus,
Idfr(“z”),
[BinOpExpr,
Eq,
IntLit(1),
IntLit(1)]]
—
Idfr(“z”)
has type INT
,
[BinOpExpr,
Eq,
IntLit(1),
IntLit(1)]
has type BOOL
Example #2
Consider the following input AST (as an S-expression).
[
[FunDecl,
Idfr(“fun”), IntType,
[[Idfr(“x”), IntType], [Idfr(“y”), IntType], [Idfr(“z”), IntType]],
[
[WhileLoop,
[BinOpExpr, Eq, Idfr(“x”), Idfr(“y”)],
[
IntLit(1),
[BinOpExpr, Plus, IntLit(1), IntLit(3)],
Skip,
Idfr(“z”)
]
]
]
],
[FunDecl,
Idfr(“main”), IntType, [],
[
[FunInvoc,
Idfr(“fun”), [IntLit(1), IntLit(2), IntLit(3)]
]
]
],
[FunDecl,
Idfr(“fun2”), UnitType, [],
[Skip]
]
]
This is again lexically and syntactically valid, but the last
expression in the body of the loop
is of type , rather than as
required by our typing rules. The type checker gives the following output for the corresponding AST (S-expression):
Idfr(“z”)whileIntTypeUnitType
Invalid final expression in loop body:
[WhileLoop,
[BinOpExpr,
Eq,
Idfr(“x”),
Idfr(“y”)],
[…,
…,
…,
Idfr(“z”)]
]
—
Idfr(“z”)
has type INT
Do not worry about the formatting of / ANSI colour codes (which are used in the skeleton project to decorate printed code) in the output – these are something which we can take care of in the automated testing.
Resources
The skeleton project (extracted from a sample solution based on ANTLR); the main file that needs to be modified is .
Please note that the skeleton project is currently an incomplete implementation, and as such only one error will be detected and reported, which is inadequate.
To give you some hints, methods that require modification are marked with in comments.
Some additional input/output samples can also be found in the archive; you can run the input samples with the script .
SExpressionsAnalyser.javaTODOrun-tests.sh
The custom RuntimeException class (this is also included in the skeleton project).