# CSCI 3366 Programming Languages
### Problem Set 10 (14 Points) : Part 2 of a two-part Compiler for MiniC
##### Due: Wednesday May 1, Midnight
#### NB: The final project cannot be submitted late.
##### R. Muller
—
This is a pair problem set. Please continue working with your partner from part 1 of the project. One person in each partnership should be fluent in MIPS assembly language.
—
> This problem set was greatly improved in April, 2017 by Tiane (Art) Zhu ’17, a student in the Spring 2017 section of CSCI 3366. Tiane ran down several bugs and added features. His contributions are greatly appreciated.
—
This problem set is the second of a two-part project implementing important components of a simple compiler for a clipped version of C, a language that we’re calling miniC. This problem set involves writing two of the major components of the compiler. Each part is worth 7 points.
**The main advice on completing this project is that you should start soon**.
A miniC program is a sequence of C-like functions. E.g.,
“`C
int mod(int a, int b) {
if (b < 0) return 0;
else while (a >= b) a = a – b;
return a;
}
int abss(int a) {
if (a < 0) return 0 - a;
else return a;
}
int gcd(int a, int b) {
a = abss(a);
b = abss(b);
if (a == 0 or b == 0) return a + b;
else if (b > a) return gcd(a, mod(b, a));
else return gcd(mod(a, b), b);
}
int main() {
int n;
n = gcd(14039, 1529);
print n;
return 0;
}
“`
#### Compiler Components
The overall structure of the miniC compiler is
“`
miniC pgm => Lexer ->
Parser ->
TypeChecker ->
Name ->
Lift ->
CopyProp ->
Control ->
Codegen => MIPS pgm
“`
This problem set involves writing the `Control` and `Codegen` phases of the compiler. The `Lexer`, `Parser`, `TypeChecker` and `CopyProp` modules are provided in working form as are support modules `Ast`, `Symbol`, `Quads`, `Mips` etc.
You’ll find information on the logistics of the project below. The first order of business is to review the `Compile` module to ensure that you understand the overall structure of the compiler. Then review the `Quads` and `Mips` modules, they’re central to these two compilation phases.
You should feel free to use the `Name` and `Lift` modules that you developed in solving part 1 of the project. If you’d like you are welcome to use our solutions, the course versions of `Name` and `Lift` are in the `src/` directory.
The compiler builds as distributed. As you implement phases, you can try out your compiler on test files in the `test/` folder. If you compile `test/a.mc` you can look in `test/a.dbg` to see the compiler’s main data structures produced by each compilation phase.
#### solved
The `src/` folder includes a solved version of the miniC compiler. To run it, type
“`bash
> cd src
> ./solved file.mc
“`
If `file.mc` is well-formed, the compiler will generate two output files: `file.asm` containing the compiled object code and `file.dbg` which contains text displaying the compiler data structures after each compilation phase. It would be a good idea to create a few test miniC files and run them through the compiler to see how the compiler works. It produces inefficient code, but it works and most importantly it is relatively simple.
—
### Control : 7 points
The transformations after naming and lifting translate away from the compiler intermediate language of `Ast` toward more machine-centric languages.
“`
Control.translate : Ast.program -> Quads.instructionstream
“`
This translation makes the control-flow that is implicit in `Ast.statement` explicit in the language of quadruples. The target language for this transformation is defined in the harness code file `quads.ml`. This transformation should recursively walk the `Ast.program` tree, translating each `Ast.procedure` to an equivalent `Quads.procedure`. The body of the former is an `Ast.statement` while the body of the latter is a `Quads.instruction list`, i.e., it is a list of simpler, more machine-like instructions that may (or may not) include labels. These labels can be the targets of unconditional (`Quads.Jmp`) or conditional (`Quads.JmpZero`) branch instructions.
– An `Ast.Assign` statement should translate to a `Quads.Get` instruction;
– An `Ast.While` statement of the form `Ast.While {expr; statement}`, if **[i1; …; ik]** is the translation of `expr` and **[ik+2; …, ik+m]** is the translation of `statement` then
“`
[l1:i1; …; ik; jzero x, l2; ik+2; …, ik+m; jmp l1; l2:nop]
“`
(where **x** holds the result of the test term) is the translation;
– An `Ast.IfS` statement of the form `Ast.IfS {expr; thn; els}`, if **[i1; …; ik]** is the translation of `expr`, **[ik+2; …; ik+m]** is the translation of `thn` and **[ik+m+2; …; in]** is the translation of `els` then
“`
[i1; …; ik; jzero x, l1; ik+2; …, ik+m; jmp l2; l1:ik+m+2; …, ik+m; l2:nop]
“`
(where **x** holds the result of the test term) is the translation.
### Codegen : 7 points
The final translation is
“`
Codegen.translate : Quads.instructionstream -> Mips.Codestream.codestream
“`
This translation deals with storage allocation for procedure and function variables as well as function call and return protocols. For the purposes of this gut-simple compiler, we will allocate storage for all variables (both programmer supplied and the temporaries generated by the compiler) in an activation record allocated on the call stack. The structure of an activation record is standard
“`
+—————————-+
A | Caller-Save Regs | higher addresses
A +—————————-+
A | Value of Arg 1 |
A +– : –+
A | : |
A +– –+
A | Value of Arg n |
A +—————————-+
A | Caller’s Return Address |
+—————————-+
B | Caller’s Frame Pointer | <-- fp
B +----------------------------+
B | Local Variable 1 |
B +-- : --+
B | : |
B +-- --+
B | Local Variable k |
B +----------------------------+
B | Callee Save Registers | lower addresses
+----------------------------+
<-- sp
```
where the parts labeled with A (above left) are managed by the caller and the parts labeled with B are managed by the callee (i.e., the procedure being called). Since the miniC compiler has no register allocator, there is no need to save or restore the contents of (most) machine registers around procedure calls and returns. Note that all procedure variables can be referenced indirectly using the frame pointer register fp. In particular, the storage cells allocated for a procedure's parameters are allocated at positive offsets from the fp and storage cells for local variables are allocated at negative offsets from the fp. Activation records are constructed by a combination of actions that are performed
1. around procedure calls,
2. on procedure entry and
3. before procedure return.
- **Procedure call**: code is required to push the arguments onto the call stack, to push the return address (ra) register and then transfer to the called procedure using a jump and link (jal) instruction. The code following the jal must restore the ra register and deallocate the storage for the actual arguments (thereby resetting the stack pointer (sp) register.
- **Procedure entry**: code is required to store the callers frame pointer and then allocate storage at the bottom of the activation record for all local variabls (including temporaries). Note that the activation record on the stack will require one 4-byte word of memory for each variable introduced by the programmer and likewise for each temporary name introduced by earlier phases of the compiler. You're going to need to find all of the temporaries in the `Quads.procedure` allocating a frame pointer offset for each. These variable offsets are best stored in an environment.
- **Procedure return**: if the procedure is a function then the value to be returned to the caller must be stored in the value register v0 register and all storage for local variables must be removed from the stack, leaving the sp register pointing at the caller's saved frame pointer on the stack. The return is completed with a jr instruction.
---
### Logistics
After cloning the repository, you'll find that the `src/` folder contains roughly 41 OCaml source files along with a number of other files and directories. The harness code is configured to be easy to work with in a Unix environment that includes an implementation of OCaml, the Unix make command and the atom editor.
#### solved
The `src/` folder includes a solved version of the miniC compiler. To run it, type
```bash
> cd src
> ./solved file.mc
“`
If `file.mc` is well-formed, the compiler will generate two output files: `file.asm` containing the compiled object code and `file.dbg` which contains text displaying the compiler data structures after each compilation phase. It would be a good idea to create a few test miniC files and run them through the compiler to see how the compiler works. It produces inefficient code, but it works and most importantly it is relatively simple.
—
#### Development Cycle
The development cycle would look as follows:
“`
> cd src
> atom . # or use a different text editor
> make # compile your compiler
> ./mc test/a.mc # assuming that make produced no errors
# fix the errors otherwise
> atom test/a.dbg # to see the debugging info
> make clean # cleans up intermediate files
“`
##### To build the compiler:
“`
> cd src
> make
“`
##### To run the miniC compiler:
“`
> cd src
> ./mc foo.mc
“`
This will create `foo.asm` and `foo.dbg`. The `dbg` file shows the compiler intermediate representations at each of the major phases of compilation.
##### To run the test system:
“`bash
> cd src
> ./test/test.sh
“`
This will compile all of the miniC source files in `test/` then pipe them through the MIPS simulator and compares the results against known results in `test/ans/`. Your compiler passes the test system if the output looks like:
“`bash
———- a ———-
———- b ———-
———- basic_ops ———-
———- c ———-
———- cond1 ———-
———- cond2 ———-
———- cond2_inv ———-
———- d ———-
———- e ———-
———- f ———-
———- fact ———-
———- g ———-
———- gcd ———-
———- gcdImperative ———-
———- t1 ———-
———- totient ———-
———- twofacts ———-
“`
#### Working on the Control & Codegen Phases in Parallel
The input to the `Codegen` phase is the output of the `Control` phase. The harness code is designed to allow your team to work on `Codegen` before `Control` is finished. To test out `Codegen` you can type:
“`bash
> make -f Makefile.try
> try codegen 0
“`
This will create a file `see.dbg` showing the result of the `Lift` transformation on a sample data structure (defined in `try.ml`).
**What to Submit**
Each team should push one repository by the due date with the commit message “Final: NAMES”, where includes the full names of both participants. The OCaml source files should contain comments with the surnames of the team members.