A5: SSA
exp to Static Single Assignment RTL
Your job in this assignment: translate the Grumpy expression language exp.mli to a new intermediate representation called RTL (“Register Transfer Language”). The definition of this new RTL IL is given (and heavily commented) in ssa.mli. We call the file ssa.mli (rather than, e.g., rtl.mli) because your translation function will generate RTL programs in Static Single Assignment (SSA) form.
Download the entire set of A5 files here. You won’t rely very heavily on theGrumpy language specification for this assignment but we link to it just in case.
Before doing any actual programming, read through all the instructions below. And as always, ask early on Piazza if something’s unclear!
Changes to the Grumpy spec since A4
To maintain compatibility with LLVM (looking forward to A6), the putcharprimitive now has specification:
putchar(int) : int
===================
The [putchar] function prints an integer in the range [0..2^8-1]
to stdout. It returns the integer printed (or on error, the
integer error code EOF, just as in C).
If the integer argument supplied to [putchar] is not in the 8-bit
character range given above, the result is undefined.
Previously in a4, it returned the unit value.
If you’re re-using your own type-checker for this assignment, you’ll have to update putchar’s return type to match the new specification.
- Download the assignment files
First, download the assignment files and untar the resulting tarfile into a new directory.
$ tar xvf a5.tar
In the resulting directory src you’ll find the following file structure:
src/ — compiler source files
Makefile — the project Makefile
_tags — the tags file for ocamlbuild
AST.mli — language-independent abstract syntax stuff
AST.ml — associated helper functions
exp.mli — the definition of Grumpy’s abstract syntax
exp.ml — associated functions
lexer.mll — ocamllex source file (stub)
parser.mly — Menhir source file (stub)
tycheck.mli — The type-checker interface
tycheck.ml — The type-checker (stub)
ssa.mli — Defines the RTL intermediate language
ssa.ml — (Part II)
grumpy.ml — the toplevel compiler program
tests/ — test cases
To build the project, type
$ make
As in a3 and a4, you’ll see a bunch of warnings at this point. That’s OK. The lexer, parser, and type-checker files are the same stubs that were given to you in the last assignment. Before you get started on this assignment, copy your own lexer.mll, parser.mly, and tycheck.ml in their place.
If your type-checker doesn’t work quite right, you can request a working version from Alex. Just shoot him an email. We only ask that you don’t share this file with others or post it on the internet.
Tests
Run the tests by doing
$ make test
or by typing ./run.sh from within the tests directory. (Here’s sample passing test output and sample failing test output.)
For this assignment, the *.expected files in the tests directory contain the values we expect each Grumpy program to return. Building the test target does the following:
- Build grumpy.native
- Run grumpy.native on each .gpy file in tests/. This will generate, for each X.gpy file:
- X.gpy.tychecked
- X.gpy.rtl
- X.gpy.rtl.interp
The last two files are most relevant to this assignment.
- X.gpy.rtl is a textual representation of X.gpy after translation to RTL.
- X.gpl.rtl.interp contains the result of interpreting the RTL program you produced for X.gpy using an interpreter for the RTL language defined at the end of ssa.ml.
You won’t have to edit the RTL interpreter (ssa_interp) for this assignment; however, it may benefit you to read through it, especially as you’re diagnosing errors in your compilation function (generally, these errors will result in testcase exceptions caught at “runtime” by ssa_interp).
- ssa.ml
TL;DR Complete the definitions of fresh_ids, instrs_of_exp, andinstrs_of_explist, which together map type-annotated Grumpy source programs of type (ty, ty exp) prog to RTL programs of type (ty, instr list) prog.
Your job in this part is to implement the compilation functions sketched out as stubs in src/ssa.ml. The top-level compilation function
val ssa_of_prog : (ty, ty exp) prog -> (ty, instr list) prog
maps type-annotated Grumpy expression ASTs to programs in which function bodies and the program result are lists of RTL instructions; it’s implemented for you. The main functions you need to implement are described below.
a.)
Start by opening ssa.mli. Read the commented descriptions of the types iexp and instr, which collectively define the RTL language. For example:
(** [ty]-annotated identifiers *)
type iid = ty tid
(** As in the [exp] language, we define two versions of RTL
expressions, which together correspond to a subset of the
expressions of type [exp] in [exp.mli]. The first, [raw_iexp],
defines the RTL language’s expression constructors. The second,
[iexp], wraps [raw_iexp]s with a type [ty]. *)
type raw_iexp =
| IInt of int32 (** 32-bit integers *)
| IFloat of float (** Double-precision floats *)
| IId of id (** Identifiers *)
…
You may need to reference other files, such as AST.mli, to recall the definitions of types such as tid (typed identifiers).
b.)
Once you’ve thoroughly read the comments describing RTL, define the provided stub functions:
let fresh_ids (p : gensym_pkg) (n : int) : id list = …
let rec instrs_of_exp (p : gensym_pkg) (out : id) (e : ty exp)
: instr list = …
and instrs_of_explist (p : gensym_pkg) (out : id) (el : (ty exp) list)
: instr list = …
Informal specifications for each of these functions are given in comments in the file. In short:
- fresh_ids generates, given a gensym_pkg p (for generating fresh names; see gensym.mli for details), a list of n fresh identifiers.
Implement this function in terms of the function fresh_id defined right above it in the file. This should be reasonably straightforward.
- instrs_of_exp (p: gensym_pkg) (out: id) (e: ty exp): instr list is the main function in this assignment. It takes as parameters:
- A gensym_pkg p
- An identifier, out, intended to hold the result of expression e
- The expression e : ty exp to be compiled.
The result of instrs_of_exp is a list of RTL instructions corresponding to expression e. It may not be immediately obvious how to compile some expressions to instruction lists. If you get stuck, consider taking a break to think about it away from your computer.
- instrs_of_explist extends instrs_of_exp to lists of expressions (as might appear in ESeq). This function should be implemented mutually recursively with instrs_of_exp (instrs_of_exp should call instrs_of_explist and vice versa).
A few of the cases of instrs_of_exp are given for you. For example:
| EInt n -> [IAssign(out, mk_iexp (IInt n) e.ety_of)]
| EFloat f -> [IAssign(out, mk_iexp (IFloat f) e.ety_of)]
| EId x -> [IAssign(out, mk_iexp (IId x) e.ety_of)]
define the integer, float, and id cases. In each, you’ll note that we convert an expression into an assignment instruction (not every expression is compiled to just an assignment of course), enforcing the following invariant: In each call to instrs_of_exp, the ultimate result of expression e is always stored in variable out.
To maintain this programming discipline in other cases, you’ll need to introduce fresh variable names, using fresh_id or fresh_ids, binding these vars to intermediate results.
Other Requirements
Your implementation of instrs_of_exp must additionally ensure that
- the same identifier is never assigned twice (SSA).
To enforce this property, you may find yourself generating fresh variables at a number of points. Think hard, in particular, about ELet. (The new function subst_var described in exp.mli may come in handy while implementing this case.)
- type information is properly preserved in the instructions you generate.
Pay attention to the compilation of type information, which will be used in a later stage of the compiler. In particular, make sure that you’re correctly compiling types through from decorated expressions to generated instructions (the types themselves are not compiled, but should be properly labeled in each iid and iexp that appears in your generated code). Read ssa.mli for pointers on how the types correspond across the two languages.
Hints
- As in many languages, the Grumpy operators && and || have short-circuit behavior (as explained in the Grumpy spec). Think about how to compile these operators (perhaps by desugaring to other existing expressions) in order to get the right semantics.
- The RTL interpreter expects that phi nodes
IPhi(phi_lbl, res, (x,l1), (y,l2)
are entered via either label l1 or label l2, with no intervening labels. You may need to think a bit about this when implementing case EIf, the only case of instrs_of_exp in which phi nodes need appear…
- Describe Your Implementation
1) In a detailed paragraph, describe any difficulties you encountered while completing this assignment, along with how you resolved them (or not :-)).
2) Give an English explanation of the purpose of each line of your implementations of the EIf and EWhile cases of instrs_of_exp.
Do both 1) and 2) in a comment at the top of your ssa.ml.
For 2), you can use a format like:
EIf:
line 1: … English explanation of line 1 of EIf implementation
line 2: … English explanation of line 2 of EIf implementation
…
EWhile:
line 1: … English explanation of line 1 of EWhile implementation
line 2: … English explanation of line 2 of EWhile implementation
…
- Submit
Submit your ssa.ml on or before the due date, via Blackboard.