Assessed Exercise, Task 3: Code Generation
Summary
In this task, you are going to implement a code generator targeting RISC-V machine code, for ASTs for the same simple programming language.
The input for this task is again in the form of an S-expression representing an AST, according to the specification described for Task 1, read from System.in. For this task, we assume that the input program has passed the both the syntax analysis phase (Task 1) and the semantic analysis phase (Task 2).
The generated code, which is printed to System.out, should be valid and executable RISC-V assembly code, with the result of the computation saved in register a0 (also known as x10). For your convenience we provide the simple assembler interface (based on RARS’s Java API) that we use to verify your generated code.
You can make use of the grammar in the skeleton project provided for Task 2 to turn the input AST (represented by an S-expression) into a data structure that you can work on directly (you will find symbol tables handy as you generate code) — but again, you are free to write the code generator in any way you want, as long as it adheres to the requirements specified below.
Please also carefully read the submission guidelines.
Input/Output specification
Input format
Your code generator will read an AST, in the form of an S-expression, from System.in as input. (The format will be the same as the output of the Task 1, but may also contain whitespace.) Note that the input is assumed to satisfy the following:
1. The first function is main, which takes no arguments, and which has return type int;
2. All functions have distinct names, and all parameters of each function have distinct names;
3. All identifiers used in a function, are either the names of parameters of that function, or the names of other functions which are defined somewhere;
4. All the function bodies satisfy the type deduction rules of Task 2.
It will generate valid and executable RISC-V assembly code that implements the input program — in particular, it is required that the generated code can be simulated by RARS through the provided assembler interface, which we use to verify your generated code.
Output format
After the generated code finishes execution, register a0 (also known as x10), which is the second line of the output of RARSInterface.java, should be the result of the computation (the return value of main). For example, if we have rars_46ab74d.jar and RARSInterface.java in the current folder, we can pipe RISC-V assembly code into RARS as follows (depending on your OS):
· On *nix-based systems
We can pipe RISC-V assembly code into RARS with the shell command
printf “li t0, 100\n\nmv a0, t0” | java -classpath ./rars_46ab74d.jar ./RARSInterface.java
· On Windows systems
We can pipe RISC-V assembly code into RARS on the command line using
cmd /c “(echo li t0, 100 & echo mv a0, t0) | java -classpath .\rars_46ab74d.jar .\RARSInterface.java”
Additional technical remarks
· The generated code should not lead to an error (such as memory misalignment) or loop indefinitely while being simulated: the first line of the output of RARSInterface.java should be either CLIFF_TERMINATION or NORMAL_TERMINATION.
· While the main function in the input program receives no argument, we require that the generated code for all other functions to work correctly for (sensible) arbitrary arguments. This means that a suitable procedure call / return mechanism based on activation records should be implemented in the generated code.
· Integers should be mapped to RISC-V 32 bit signed integers, despite the fact that we only allow non-negative integer literals in the input language / S-expression. (You do not have to treat the possibility of integer overflow / underflow.)
Some suggestions
· It might be easier to first compile down to an intermediate representation, such as stack machine code or register machine code, and then embed the IR code into RISC-V assembly. We provide an example assembly file that makes use of RARS’s macro functionality to realise this idea. Note that it might be handy to define some additional custom instructions!
· Each new assembly command should be on a new line — i.e., don’t put more than one assembly command on a line. Hint: Consider generating a list of RISC-V assembly command strings, and then concatenate them with added newlines after / between each.
· You’ll need a function that generates fresh labels. Don’t hardcode labels, ensure that the function generates a fresh label every time it is called. Don’t use a random generator for this, use a global / static variable for this purpose. This is one of the few legitimate uses of global/static variables.
· Check that your submission is not miscompiling conditionals like if / then / else or loops in the sense that if the condition evaluates to true then the else-branch is executed and vice versa.
· You are advised against using global variables for function parameters in your code generation methods; the reason is that functions can be recursive, thus more than one activations of a single function may be active at the same time.
· We strongly recommend testing your code before submission. Note that pasting code into RARS and simply press ‘Assemble’ is insufficient as test, because that doesn’t test the termination of the code generated by your code generator.
Resources
The simple assembler interface that we use to verify your generated code.
An example assembly file based on macros for stack machine instructions.