CSC 348: Introduction to Compiler Design: 2018 Winter
Final Project
Last modified 2018 March 12
Purpose:
To go over everything, esp. the inner-workings of a compiler.
Assignment:
Please implement a subset of Pascal (tinyPascal) in a simplified virtual machine known as spvm
.
I am giving you the:
- the tokenizer
- the grammar
- the symbol table
You have to fill-in two C++ methods (compile()
and writeAssembly()
) of class Node
and its derived classes.
Our tinyPascal only has types:
boolean
integer
real
(what C callsfloat
)string
It does not have:
- procedures or functions
- pointers or programmer-defined datatypes
spvm Assembly Language
- Each spvm machine word can hold an integer, boolean, string or float. (It can hold other things too, let us not worry about all that.)
- Unlike with a CPU, constants are not in instructions. Instead, there is a constant pool that stores all the constants refered to by the source code.
- Unlike with a CPU, are no registers. All values are on the stack as offsets from the stack pointer. (Just call them variables.)
- Variables will be assigned their stack offsets by the assembler and will known by their offsets. For example, the first four are
@thisSubj
,@thisJust
,@retSubj
and@retJust
. All variables will automatically be given offsets on the stack, therefore you do not have to worry about. The constant and variable declaration section at the beginning of every program will tell the stack offsets of all variables. In the assembly language variables are prefaced byVAR_PREFIX_CHAR
(=@
). You do not have to worry about adding it, the output operator does that for you already. - Like most assembly languages, there are labels to define where
goto
s actually go. Labels are postfixed by the%
. When they label code they must be preceded with colons (LABEL_POSTFIX_CHAR
). However, do not include the colon in thegotoPoin
instructions.gotoPoin jumpOver% # No ":" in goto noOpPoin # This is not done jumpOver% : # A ":" in used label noOpPoin # This *is* done
- Programs have layout:
$beginPreAsm %beginVarDecl ; Variables and named constants go here %endVarDecl %beginCode ; Assembly language code goes here %endCode $endPreAsm
where all you have to do is print the Assembly Language.
$beginPreAsm
,$endPreAsm
,%beginCode
,%endCode
, and the whole variable declaration section will be printed automatically.For example, this loop:PROGRAM eight; VAR i,sum : Integer; BEGIN sum := 0; FOR i := 1 TO 10 DO sum := sum + i; WRITELN(sum) END.
Could be translated as:
$beginPreAsm %beginVarDecl %var @compilerVar1, Integer, 0 %var @compilerVar2, Integer, 0 %var @compilerVar3, Integer, 0 %var @compilerVar4, Integer, 0 %var @i, Integer, 0 %var @sum, Integer, 0 %endVarDecl %beginCode loadPoin @compilerVar1,0 copyPoin @sum,@compilerVar1 loadPoin @compilerVar2,1 copyPoin @i,@compilerVar2 label0%: loadPoin @compilerVar3,10 greaterPoin @compilerVar4,@i,@compilerVar3 ifTrueGotoPoin @compilerVar4,label1% addPoin @sum,@i copyPoin @sum,@sum addPoin @i, 1 gotoPoin label0% label1%: stdOutPrintLnPoin @sum %endCode $endPreAsm
- You will have the following machine instructions:
noOp
Does nothing copyPoin @destVar, @srcVar
Does @destVar = @srcVar
loadPoin @destVar, i
(i
is an integer constant)Does @destVar = i
loadPoin @destVar, r
(r
is a real constant)Does @destVar = r
loadPoin @destVar, "s"
("s"
is a string constant)Does @destVar = "s"
addPoin @destVar, @srcVar
Computes @destVar += @srcVar
(Important:@srcVar
and@destVar
must either both hold integers or both hold reals.)subPoin @destVar, @srcVar
Computes @destVar -= @srcVar
(Important:@srcVar
and@destVar
must either both hold integers or both hold reals.)mulPoin @destVar, @srcVar
Computes @destVar *= @srcVar
(Important:@srcVar
and@destVar
must either both hold integers or both hold reals.)divPoin @destVar, @srcVar
Computes @destVar /= @srcVar
(Important:@srcVar
and@destVar
must either both hold integers or both hold reals.)intToRealPoin @destVar, @srcVar
Puts the floating point representation of integer @srcVar
in@destVar
lesserPoin @destVar, @lhs, @rhs
Computes @destVar = (@lhs < @rhs)
(Important:@srcVar
and@destVar
must hold same type.)lesserEqualPoin @destVar, @lhs, @rhs
Computes @destVar = (@lhs <= @rhs)
(Important:@srcVar
and@destVar
must hold same type.)greaterPoin @destVar, @lhs, @rhs
Computes @destVar = (@lhs > @rhs)
(Important:@srcVar
and@destVar
must hold same type.)greaterEqualPoin @destVar, @lhs, @rhs
Computes @destVar = (@lhs >= @rhs)
(Important:@srcVar
and@destVar
must hold same type.)intEqualPoin @destVar, @lhs, @rhs
Computes @destVar = (@lhs == @rhs)
(Important:@srcVar
and@destVar
must both be integers.)intNotEqualPoin @destVar, @lhs, @rhs
Computes @destVar = (@lhs != @rhs)
(Important:@srcVar
and@destVar
must both be integers.)gotoPoin label%
Jumps to label%
ifTrueGotoPoin @var,label%
Jumps to label%
only when@var == true
ifFalseGotoPoin @var,label%
Jumps to label%
only when@var == false
stdOutPrintPoin @var
Prints the value @var
with no ending newlinestdOutPrintLnPoin @var
Prints the value @var
with an ending newline
tinyPascal
tinyPascal only has 4 types: boolean
, integer
, real
and string
. It supports the following Pascal operators and functionality:
Declaration of integers, booleans and reals |
Binary integer math ops (+ - * DIV ) |
Binary real math ops (+ - * / ) |
Comparison (< > <= >= = < > ) |
Assignment (:= ) |
if then and if then else |
Grouping statements with begin and end |
Real Pascal is case-insensitive. tinyPascal recognizes 3 “cases”:
- lowercase
- UPPERCASE
- Capitalized
What you must do:
- Download
20178-2Win_CSC348_finalProj.zip
- Write the methods
compile()
andwriteAssembly()
for all classes exceptNode
(where it is abstract),BooleanConstantNode
(I gave to you so you have a basic example),IntegerConstantNode
(I gave to you so you have a basic example), andStatementListNode
(which uses C++ iterators, which have a funky syntax, so I just gave it to you).All of the work you have to do is inNode.h
, exceptVariableNode::compile()
which is intinyPascal.y
.All classes all the member vars they need, and their constructors already initialize them.
- To make the program, type:
make
A Word About C++ in general (and this program in particular)
- Both
compile()
andwriteAssembly()
are recursive, tree-walking methods. - C++ has both pointers and references, so methods can “return” more than one thing. For example, check out the
compile()
method ofclass BooleanConstantNode
(akaBooleanConstantNode::compile()
in C++ parlance):VariableMention* compile (simpleType_ty& simpleType ) { varPtr_ = new VariableMention(BOOLEAN_ST); simpleType = BOOLEAN_ST; return(varPtr_); }
Obviously it returns the address of a new instance of
VariableMention
(the value in member variablevarPtr_
).Additionally, it sets the argumentsimpleType
to valueBOOLEAN_ST
. So, if you call it like:simpleType_ty whatsMyReturnType; VariableMention* varPtr; varPtr = nodePtr->compile(whatsMyReturnType);
then not only will
varPtr
be set to the address of theVariableMention
instance that will receives the value, but alsowhatsMyReturnType
will be set toBOOLEAN_ST
. Sometimes it will be necessary to check this return type, to ensure it agrees with what you expect (e.g.if
-statements require boolean conditions)(For the curious, just like the typeVariableMention*
signifies “a pointer to aVariableMention
instance“, so the typesimpleType_ty&
signifies “a reference to asimpleType_ty
instance“. The argument that you send to it must be asimpleType_ty
variable, and that variable can have its value changed.)
Examples:
PROGRAM one; VAR i : integer; BEGIN i := 1 END. |
$beginPreAsm %beginVarDecl %var @compilerVar1, Integer, 0 %var @i, Integer, 0 %endVarDecl %beginCode loadPoin @compilerVar1,1 copyPoin @i,@compilerVar1 %endCode $endPreAsm |
PROGRAM two; VAR i : integer; BEGIN i := 2 * 3; WRITE(i); END. |
$beginPreAsm %beginVarDecl %var @compilerVar1, Integer, 0 %var @compilerVar2, Integer, 0 %var @i, Integer, 0 %endVarDecl %beginCode loadPoin @compilerVar1,2 loadPoin @compilerVar2,3 mulPoin @compilerVar1,@compilerVar2 copyPoin @i,@compilerVar1 stdOutPrint @i %endCode $endPreAsm |
PROGRAM three; VAR b : BOOLEAN; BEGIN b := true; END. |
$beginPreAsm %beginVarDecl %var @b, Boolean, false %var @compilerVar1, Boolean, false %endVarDecl %beginCode loadPoin @compilerVar1,true copyPoin @b,@compilerVar1 %endCode $endPreAsm |
PROGRAM five; VAR i : Integer; b : Boolean; BEGIN i := 2; b := (i = i); WRITELN(b) END. |
$beginPreAsm %beginVarDecl %var @b, Boolean, false %var @compilerVar1, Integer, 0 %var @compilerVar2, Boolean, false %var @i, Integer, 0 %endVarDecl %beginCode loadPoin @compilerVar1,2 copyPoin @i,@compilerVar1 intEqualPoin @compilerVar2,@i,@i copyPoin @b,@compilerVar2 stdOutPrintLn @b %endCode $endPreAsm |
PROGRAM six; VAR i,j: Integer; result : Boolean; BEGIN i := 1; j := 2; IF i = j THEN result := true END. |
$beginPreAsm %beginVarDecl %var @compilerVar1, Integer, 0 %var @compilerVar2, Integer, 0 %var @compilerVar3, Boolean, false %var @compilerVar4, Boolean, false %var @i, Integer, 0 %var @j, Integer, 0 %var @result, Boolean, false %endVarDecl %beginCode loadPoin @compilerVar1,1 copyPoin @i,@compilerVar1 loadPoin @compilerVar2,2 copyPoin @j,@compilerVar2 intEqualPoin @compilerVar3,@i,@j ifFalseGotoPoin @compilerVar3,label0% loadPoin @compilerVar4,true copyPoin @result,@compilerVar4 gotoPoin label0% label0%: %endCode $endPreAsm |
PROGRAM seven; VAR i,j: Integer; result : Boolean; BEGIN i := 1; j := 2; IF i = j THEN result := true ELSE result := false END. |
$beginPreAsm %beginVarDecl %var @compilerVar1, Integer, 0 %var @compilerVar2, Integer, 0 %var @compilerVar3, Boolean, false %var @compilerVar4, Boolean, false %var @compilerVar5, Boolean, false %var @i, Integer, 0 %var @j, Integer, 0 %var @result, Boolean, false %endVarDecl %beginCode loadPoin @compilerVar1,1 copyPoin @i,@compilerVar1 loadPoin @compilerVar2,2 copyPoin @j,@compilerVar2 intEqualPoin @compilerVar3,@i,@j ifFalseGotoPoin @compilerVar3,label0% loadPoin @compilerVar4,true copyPoin @result,@compilerVar4 gotoPoin label1% label0%: loadPoin @compilerVar5,false copyPoin @result,@compilerVar5 label1%: %endCode $endPreAsm |
PROGRAM eight; VAR i,sum : Integer; BEGIN sum := 0; FOR i := 1 TO 10 DO sum := sum + i; WRITELN(sum) END. |
$beginPreAsm %beginVarDecl %var @compilerVar1, Integer, 0 %var @compilerVar2, Integer, 0 %var @compilerVar3, Integer, 0 %var @compilerVar4, Integer, 0 %var @i, Integer, 0 %var @sum, Integer, 0 %endVarDecl %beginCode loadPoin @compilerVar1,0 copyPoin @sum,@compilerVar1 loadPoin @compilerVar2,1 copyPoin @i,@compilerVar2 label0%: loadPoin @compilerVar3,10 greaterPoin @compilerVar4,@i,@compilerVar3 ifTrueGotoPoin @compilerVar4,label1% addPoin @sum,@i copyPoin @sum,@sum add @i, 1 gotoPoin label0% label1%: stdOutPrintLn @sum %endCode $endPreAsm |
PROGRAM nine; VAR i,sum : Integer; BEGIN sum := 0; FOR i := 10 DOWNTO 1 DO sum := sum + i; WRITELN(sum) END. |
$beginPreAsm %beginVarDecl %var @compilerVar1, Integer, 0 %var @compilerVar2, Integer, 0 %var @compilerVar3, Integer, 0 %var @compilerVar4, Integer, 0 %var @i, Integer, 0 %var @sum, Integer, 0 %endVarDecl %beginCode loadPoin @compilerVar1,0 copyPoin @sum,@compilerVar1 loadPoin @compilerVar2,10 copyPoin @i,@compilerVar2 label0%: loadPoin @compilerVar3,1 lesserPoin @compilerVar4,@i,@compilerVar3 ifTrueGotoPoin @compilerVar4,label1% addPoin @sum,@i copyPoin @sum,@sum add @i, -1 gotoPoin label0% label1%: stdOutPrintLn @sum %endCode $endPreAsm |
PROGRAM hello; BEGIN writeln('Hello world') END. |
$beginPreAsm %beginVarDecl %var @compilerVar1, String, "" %endVarDecl %beginCode loadPoin @compilerVar1,"Hello world" stdOutPrintLn @compilerVar1 %endCode $endPreAsm |
PROGRAM eleven; VAR i : INTEGER; BEGIN FOR i := 1 TO 10 DO WRITE(i); END. |
$beginPreAsm %beginVarDecl %var @compilerVar1, Integer, 0 %var @compilerVar2, Integer, 0 %var @compilerVar3, Integer, 0 %var @i, Integer, 0 %endVarDecl %beginCode loadPoin @compilerVar1,1 copyPoin @i,@compilerVar1 label0%: loadPoin @compilerVar2,10 greaterPoin @compilerVar3,@i,@compilerVar2 ifTrueGotoPoin @compilerVar3,label1% stdOutPrint @i add @i, 1 gotoPoin label0% label1%: %endCode $endPreAsm |
PROGRAM twelve; VAR i : INTEGER; r : REAL; BEGIN i := 2; r := 2.5; WRITELN(i * r) END. |
$beginPreAsm %beginVarDecl %var @compilerVar1, Integer, 0 %var @compilerVar2, Real, 0.0 %var @compilerVar3, Real, 0.0 %var @i, Integer, 0 %var @r, Real, 0.0 %endVarDecl %beginCode loadPoin @compilerVar1,2 copyPoin @i,@compilerVar1 loadPoin @compilerVar2,2.5 copyPoin @r,@compilerVar2 intToReal @compilerVar3,@i mulPoin @compilerVar3,@r stdOutPrintLn @compilerVar3 %endCode $endPreAsm |
PROGRAM thirteen; VAR i : INTEGER; r : REAL; BEGIN i := 2; r := 2.5; WRITELN(r * i) END. |
$beginPreAsm %beginVarDecl %var @compilerVar1, Integer, 0 %var @compilerVar2, Real, 0.0 %var @compilerVar3, Real, 0.0 %var @i, Integer, 0 %var @r, Real, 0.0 %endVarDecl %beginCode loadPoin @compilerVar1,2 copyPoin @i,@compilerVar1 loadPoin @compilerVar2,2.5 copyPoin @r,@compilerVar2 intToReal @compilerVar3,@i mulPoin @r,@compilerVar3 stdOutPrintLn @r %endCode $endPreAsm |
PROGRAM fourteen; VAR r0 : REAL; r1 : REAL; BEGIN r0 := 2.5; r1 := 2.5; WRITELN(r0 * r1) END. |
$beginPreAsm %beginVarDecl %var @compilerVar1, Real, 0.0 %var @compilerVar2, Real, 0.0 %var @r0, Real, 0.0 %var @r1, Real, 0.0 %endVarDecl %beginCode loadPoin @compilerVar1,2.5 copyPoin @r0,@compilerVar1 loadPoin @compilerVar2,2.5 copyPoin @r1,@compilerVar2 mulPoin @r0,@r1 stdOutPrintLn @r0 %endCode $endPreAsm |
PROGRAM bad1; VAR i : INTEGER; BEGIN i := true END. |
Error: Variable expression type mismatch |
PROGRAM bad2; VAR b : BOOLEAN; BEGIN b := 0 END. |
Error: Variable expression type mismatch |
PROGRAM bad3; VAR i : Integer; BEGIN i := (1 + true); END. |
Error: Attempt to do mathematical operation on non mathematical operand |
PROGRAM bad4; VAR i : Integer; BEGIN i := 'two' * 2 END. |
Error: Attempt to do mathematical operation on non mathematical operand |
PROGRAM bad5; VAR b : Boolean; BEGIN IF 6 THEN b := true ELSE b := false END. |
Error: Expected boolean expression if condition |
PROGRAM bad6; BEGIN sum := 0; END. |
Error: Undeclared variable |