Assessed Exercise, Task 2: Semantic analysis
Summary
In this task you are going to implement, in Java, a type checker for ASTs for the same simple programming language. The input for this task is in the form of an S-expression representing an AST, according to the specification described for Task 1 of the coursework, read from System.in. (We assume that the input program has passed the syntax analysis phase, so you may assume that the S-expression is properly formatted.) The type checker will do the checks (as detailed below) and print the appropriate output to System.out.
To help you get up to speed, we provide a skeleton project which you can use as a basis. However, you can write the type checker in any way you like, as long as it adheres to the requirements specified below. Please note that we require type errors to be reported through Java’s exception handling mechanism, and according to a specific format (to facilitate automated testing and feedback.) To help in this, we provide a custom RuntimeException class (feel free to modify interfaces, arguments, etc., but please keep the generated output identical).
Once again, you will need a preprocessor (e.g., a lexer and a parser) to turn the input AST (represented by an S-expression) into a data structure that you can work on directly. To save your time, we also provide this as a part of the skeleton project — however, as the input is now guaranteed to be correctly formatted, and is in the form of an S-expression, it may now be manageable to write one manually if you wish. In any case, please think early and carefully about how you will implement pattern matching in your code and make sure it works.
Please also carefully read the submission guidelines.
Function declarations and identifier scope
Functions may be declared in the top-level scope, in any order, and can invoke functions which are defined earlier or later in the source file. (That is: the order in which functions are declared, does not matter.) The functions must have distinct names. We suggest that you use a symbol table to maintain a record of the functions which have been declared, and their types.
Any parameters taken by a function must have type either int or bool. In type-checking a function declaration, it may be useful to define a separate symbol table for the parameters of that function, and add the names of those parameters together with their declared types. This type information can then be used to type-check the function body. The body of the function itself is a block, whose type must agree with the return type of the function.
We may summarise the rules about scope for identifiers as follows.
· The definition of one function fnA, may involve the invocation of another function fnB, which is declared either before or after the function fnA.
· No two functions may have the same name, and no two parameters of the same function fnA may have the same name.
· The parameters of a function fnA, may have the same name as some function fnB.
· A function parameter is only in-scope in the body of the declaration of that function — there are no global variables or side-effects.
· A function parameter must have type either int or bool.
Type deduction rules
The following are the rules which specify how expressions of various kinds are to be typed.
Numerical expressions
Numerical expressions
Constants
Γ ⊢ ⟨IntLit⟩ : int
Addition
Γ ⊢ P : int Γ ⊢ Q : int
Γ ⊢ (P + Q) : int
Subtraction
Γ ⊢ P : int Γ ⊢ Q : int
Γ ⊢ (P – Q) : int
Multiplication
Γ ⊢ P : int Γ ⊢ Q : int
Γ ⊢ (P * Q) : int
Division
Γ ⊢ P : int Γ ⊢ Q : int
Γ ⊢ (P / Q) : int
These are expressions which take values of type int, in particular arising from arithmetic operations on two values of type int. For any expression involving an arithmetic operator as below, where one of the sub-expressions involved is not of type int, should result in an error.
Comparisons
Comparison expressions
Equality
Γ ⊢ P : int Γ ⊢ Q : int
Γ ⊢ (P == Q) : bool
Less than
Γ ⊢ P : int Γ ⊢ Q : int
Γ ⊢ (P < Q) : bool Greater than Γ ⊢ P : int Γ ⊢ Q : int Γ ⊢ (P > Q) : bool
Less than
or equal
Γ ⊢ P : int Γ ⊢ Q : int
Γ ⊢ (P <= Q) : bool Greater than or equal Γ ⊢ P : int Γ ⊢ Q : int Γ ⊢ (P >= Q) : bool
Comparison expressions are ones which relate two expressions of type int, to produce an expression of type bool. For any expression involving a comparison operator as below, where one of the sub-expressions involved is not of type int, should result in an error.
Boolean operations
Boolean operators
Logical AND
Γ ⊢ P : bool Γ ⊢ Q : bool
Γ ⊢ (P && Q) : bool
Logical OR
Γ ⊢ P : bool Γ ⊢ Q : bool
Γ ⊢ (P || Q) : bool
Logical XOR
Γ ⊢ P : bool Γ ⊢ Q : bool
Γ ⊢ (P ^^ Q) : bool
These are expressions which combine two values of type bool using a logical operation, to produce another value of type bool. For any expression involving a logical operator as below, where one of the sub-expressions involved is not of type bool, should result in an error.
Statements
Statements
Skip statements
Γ ⊢ skip : unit
Assignments
Γ ⊢ x : α Γ ⊢ P : α
Γ ⊢ x := P : unit
While loops
Γ ⊢ C : bool Γ ⊢ B : unit
Γ ⊢ while C do {B} : unit
Repeat loops
Γ ⊢ B : unit Γ ⊢ C : bool
Γ ⊢ repeat {B} until C : unit
These are ‘expressions’ which represent statements — not having an interesting value (represented by assigning them the type unit). Any statement such as those below, which involve a mismatch of types where type-agreement is required, or which involve an expression other than type bool where a boolean expression is expected, should result in an error.
<="" div="" style="clear: both; color: rgb(0, 0, 0); font-family: Arial, Helvetica, sans-serif; font-size: medium; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">
Generic expressions
Generic expressions
Function
invocations
Γ ⊢ subr : (α1 , … , αn) → β Γ ⊢ E1 : α1 … Γ ⊢ En : αn
Γ ⊢ subr (E1, … , En) : β
Block
expressions
Γ ⊢ E1 : α1 … Γ ⊢ En : αn
Γ ⊢ E1 ; … ; En : αn
If statements
Γ ⊢ C : bool Γ ⊢ B1 : α Γ ⊢ B2 : α
Γ ⊢ if C then {B1} else {B2} : α
These are expressions, whose types depend entirely on the types of other identifiers or expressions. In any case where there is a mismatch of types which are required to agree, or where an expression other than type bool is found a boolean expression is expected, an error should be returned.
Input/Output specification
Your type checker will read an AST, in the form of an S-expression, from System.in as input. (The format will be the same as the output of the previous task, but may also contain whitespace.) It will conduct the following checks (not necessarily in this order):
1. check that the program has a main function, which takes no arguments, and which has return type int;
2. check that all functions have distinct names, and all of the parameters of each function have distinct names;
3. check that all identifiers used in a function, are either the names of parameters of that function, or the names of other functions which are defined somewhere;
4. Based on the type deduction rules above, type-check the function bodies contained in the AST.
If any one of these conditions fail, your program will throw an instance of an Exception class that we provide for this purpose, which will present diagnostic information in a standardised format. If none of these errors are present, the input program will be printed on System.out (again as an S-expression), but with the main function becoming the first declared function.
We provide a skeleton project, for you to add code to, to help you to realise this Task. It performs the work of reading the S-expression at the input, storing it in an abstract syntax tree; it also contains the code necessary to verify the existence of a main procedure and to produce output with this being the first declaration.
The specific requirements for the output of your code are as follows:
· In the case where an exception is thrown, it should be caught by the main method in your Java code, which should then invoke the report method of the exception.
· In the case where the program is semantically valid, you do not have to use whitespace in any specific way in the printed S-expression, but you may like to use the Java class which we provide to pretty print these expressions (this is also included in the skeleton project), for your own benefit/convenience.
Example #1
Consider the following input AST (as an S-expression).
[
[FunDecl,
Idfr(“fun”), IntType,
[[Idfr(“x”), IntType], [Idfr(“y”), IntType], [Idfr(“z”), IntType]],
[
[IfStmt,
[BinOpExpr, Eq, Idfr(“x”), Idfr(“y”)],
[
[BinOpExpr, Plus, Idfr(“z”),
[BinOpExpr, Eq, IntLit(1), IntLit(1)]
]
],
[IntLit(0)]
]
]
],
[FunDecl,
Idfr(“main”), IntType, [],
[
[FunInvoc,
Idfr(“fun”), [IntLit(1), IntLit(2), IntLit(3)]
]
]
],
[FunDecl,
Idfr(“fun2”), UnitType, [],
[ Skip ]
]
]
This is lexically and syntactically valid, but [BinOpExpr, Eq, IntLit(1), IntLit(1)] is of type BoolType, and so is invalid as an operand of Plus. The type checker gives the following output:
Invalid operands in arithmetic expression:
[BinOpExpr,
Plus,
Idfr(“z”),
[BinOpExpr,
Eq,
IntLit(1),
IntLit(1)]
]
and
[BinOpExpr,
Plus,
Idfr(“z”),
[BinOpExpr,
Eq,
IntLit(1),
IntLit(1)]]
—
Idfr(“z”)
has type INT
,
[BinOpExpr,
Eq,
IntLit(1),
IntLit(1)]
has type BOOL
Example #2
Consider the following input AST (as an S-expression).
[
[FunDecl,
Idfr(“fun”), IntType,
[[Idfr(“x”), IntType], [Idfr(“y”), IntType], [Idfr(“z”), IntType]],
[
[WhileLoop,
[BinOpExpr, Eq, Idfr(“x”), Idfr(“y”)],
[
IntLit(1),
[BinOpExpr, Plus, IntLit(1), IntLit(3)],
Skip,
Idfr(“z”)
]
]
]
],
[FunDecl,
Idfr(“main”), IntType, [],
[
[FunInvoc,
Idfr(“fun”), [IntLit(1), IntLit(2), IntLit(3)]
]
]
],
[FunDecl,
Idfr(“fun2”), UnitType, [],
[Skip]
]
]
This is again lexically and syntactically valid, but the last expression Idfr(“z”) in the body of the while loop is of type IntType, rather than UnitType as required by our typing rules. The type checker gives the following output for the corresponding AST (S-expression):
Invalid final expression in loop body:
[WhileLoop,
[BinOpExpr,
Eq,
Idfr(“x”),
Idfr(“y”)],
[…,
…,
…,
Idfr(“z”)]
]
—
Idfr(“z”)
has type INT
Do not worry about the formatting of / ANSI colour codes (which are used in the skeleton project to decorate printed code) in the output – these are something which we can take care of in the automated testing.
Resources
The skeleton project (extracted from a sample solution based on ANTLR); the main file that needs to be modified is SExpressionsAnalyser.java. Please note that the skeleton project is currently an incomplete implementation, and as such only one error will be detected and reported, which is inadequate. To give you some hints, methods that require modification are marked with TODO in comments. Some additional input/output samples can also be found in the archive; you can run the input samples with the script run-tests.sh.
The custom RuntimeException class (this is also included in the skeleton project).