Assignment 2 Description
Assignment 2 – Writing an Assembler Weighting and Due Dates
• Marks for this assignment contribute 10% of the overall course mark.
• Marks for functionality will be awarded automatically by the web submission
system.
• Due dates: Milestone – 11:55pm Tuesday of week 9, Final – 11:55pm Friday of
week 9.
• Late penalties: For each part, the maximum mark awarded will be reduced by
25% per day / part day late. If your mark is greater than the maximum, it will
be reduced to the maximum.
• Core Body of Knowledge (CBOK) Areas: abstraction, design, hardware and
software, data and information, and programming.
Project Description
In this assignment you will complete a variation of project 6 in the Nand2Tetris
course. A detailed description of Nand2Tetris Project 6 tailored to this course is shown below. In this assignment the assembler will be written as two separate programs. The executable program, parser, will read a Hack Assembly Language program from standard input and produce an abstract syntax tree on standard output. The executable program, translator, will read the abstract syntax tree from
standard input and assemble a machine code representation of the original Hack Assembly Language program. The assembled code will be formatted as sixteen zeros or ones per line and it will be written to standard output. In addition to an assembler you will also write a program to dis-assemble machine code. The executable program, disassembler, will read a Hack Machine Code program from
standard input and write an equivalent Hack Assembly Language program to standard output.
SVN Repository
You must create a directory in your svn repository
named:
• Makefile – this file is used by make to compile your programs – do not
modify this file.
• Makefile-extras – this file is included by Makefile – do not modify this file.
• parser.cpp – C++ source file
• parser – executable script that will run your compiled parser program – do not
modify this file.
• translator.cpp – C++ source file
• translator – executable script that will run your compiled translator program – do not modify this file.
• disassembler.cpp – C++ source file
• disassembler – executable script that will run your
compiled disassembler program – do not modify this file.
• bin – this directory contains precompiled programs and scripts – do not modify
this directory.
• lib – this directory contains precompiled components – do not modify this directory.
• includes – this directory contains .h files for precompiled classes – do not modify this directory.
• tests – this directory contains test data, you can add your own tests here
Note: if the lib/macos/lib.a and lib/cats/lib.a files do not get added to your svn repository you will need to explicitly add them using:
% svn add lib/*/lib.a
Submission and Marking Scheme
Submissions for this assignment must be made to the web submission systemLinks to an external site. assignment named: Assignment 2 – Submit Here. The
assessment is based on “Assessment of Programming Assignments”.
Note: the Submit Here assignment will show a breakdown of your marks by
category but it will always show your total mark as capped at 0.
Your programs must be written in C++. Your programs will be compiled using the Makefile included in the zip file attached below. The parser program will be compiled using the file parser.cpp, the translator program will be compiled using the file translator.cpp file and the disassembler program will be compiled using the file disassembler.cpp file. No other .cpp or .h files should be present in your
svn directory. Your programs will then be tested using the set of test files that are attached below. In addition a number of secret tests will also be run. Note: if your program fails any of these secret tests you will not receive any feedback about these secret tests, even if you ask!
Assignment 2 – Milestone Submissions: due 11:55pm Tuesday of week 9
The marks awarded by the web submission systemLinks to an external site. for the
milestone submission contribute up to 20% of your marks for assignment 2. The marks for the Milestone Tests are used as the marks for the milestone
submission. Your milestone submission mark, after the application of late penalties, will be posted to the myuni gradebook when the assignment marking is complete.
Automatic Marking
The Milestone Tests are just those where the Hack Assembly language is
syntactically correct and does not use symbols. The marks for the Milestone Tests for each component program will be weighted as follows:
• parser – 40%
• translator – 40%
• disassembler – 20%
You can view the Milestone Tests marks in the Milestone assignment but
submissions must be made using the Assignment 2 – Submit Here assignment. Assignment 2 – Final Submissions: due 11:55pm Friday
of week 9
The marks awarded for the final submission contribute up to 80% of your marks for assignment 2.
Your final submission mark will be the geometric mean of three components, the marks for the Final Tests, a mark for your logbook and a mark for your code. It
will be limited to 20% more than the marks for the Final Tests. See “Assessment – Mark Calculations” for examples of how the marks are combined. The Final Tests are all tests run, including those used for the Milestone Tests. Your final submission mark, after the application of late penalties, will be posted to the myuni gradebook when the assignment marking is complete.
Automatic Marking
The marks for the Final Tests for each component program will be weighted as
follows:
• parser – 15%
• translator – 50%
• disassembler – 35%
You can view the Final Tests marks in the Final assignment but submissions must
be made using the Assignment 2 – Submit Here assignment.
Logbook Marking
Important: the logbook must have entries for all work in this assignment, including your milestone submissions. See “Assessment – Logbook Review” for details of how your logbook will be assessed.
Code Review Marking
For each of your programming assignments you are expected to submit well
written code. See “Assessment – Code Review” for details of how your code will be assessed.
Assignment 2 – Participation Marks
Any submissions to assignment 2 that are made more than one week before the due date for Final Submissions may be awarded up to 10 participation marks. The
participation marks will be the marks awarded for the Final Tests divided by 10. You can view the participation marks awarded in the One Week
Early assignment but submissions must be made using the Assignment 2 – Submit
Here assignment. The participation marks will be allocated to week 8.
Nand2Tetris Project 6: The Assembler
Background
Low-level machine programs are rarely written by humans. Typically, they are generated by compilers. Yet humans can inspect the translated code and learn
important lessons about how to write their high-level programs better, in a way that avoids low-level pitfalls and exploits the underlying hardware better. One of the key players in this translation process is the assembler — a program designed to translate code written in a symbolic machine language into code written in binary machine language.
This project marks an exciting landmark in our Nand to Tetris odyssey: it deals with building the first rung up the software hierarchy, which will eventually end
up in the construction of a compiler for a Java/C++ like high-level language. The relevant reading for this project is Chapter 6. Some of the useful tools available include, the Hack Assembler, the CPU Emulator and working versions of the three programs, bin/parser, bin/translator and bin/disassembler. The CPU Emulator actually uses its own builtin disassembler to display memory containing
instructions. Objective
The Hack assembler is a relatively simple program however, so that you can gain experience with the tools used in other workshops and assignments, you will build
your assembler from three parts. This will involve using a precompiled tokeniser for the Hack assembly language to implement a parser that recognises labels, A- instructions and C-instructions using tokens returned by the tokeniser.
The parser will construct a tree representation of the program that
the translator will walk over in order to assemble the final machine code.
The disassembler will use a precompiled tokeniser for Hack machine code to construct a Hack assembly language program.
Compiling and Running Your Programs
You programs can be compiled with one of the following four commands. To compile just parser:
% make parser
To compile just translator:
% make translator
To compile just disassembler: % make disassembler
To compile all three programs and test your programs at the same time you can use the following command:
% make
g++ –std=c++11 -I. -Iincludes -o lib/cats/parser parser.cpp lib/cats/lib.a Testing student parser against Pxml files.
| Test Input | parser | Exp
ected Test Output | Test Result
Checking “cat tests/Add.milestone | ./parser | ./bin/diffc – tests/Add. milestone.Pxml ” – test passed
…
g++ –std=c++11 -I. -Iincludes -o lib/cat/translator translator.cpp lib/cats/lib.a
Testing student translator against Pxml-hack files.
Checking “cat tests/Add.milestone | ./bin/parser | ./translator | ./bin/di ffc – tests/Add.milestone.Pxml-hack ” – test passed
…
Testing student parser and translator against Pxml-hack files.
Checking “cat tests/Add.milestone | ./parser | ./translator | ./bin/diffc
| Test Input | bin/parser | translator |
Expected Test Output | Test Result
| Test Input | parser | translator |
Expected Test Output | Test Result
– tests/Add.milestone.Pxml-hack ” – …
g++ –std=c++11 -I. -Iincludes -o lib/cats/disassembler disassembler.cpp lib/cats/ lib.a
Testing student disassembler against asm files.
test passed
| Test Input | disassembler | Exp
ected Test Output | Test Result
Checking “cat tests/Add.hack | ./disassembler N | ./bin/diffc – tes ts/Add.hack.asm ” – test passed
…
The test scripts do not show the program outputs, just passed or failed, but they do show you the commands being used to run each test. You can copy-paste these
commands if you want to run a particular test yourself and see all of the output.
If you wish to add new tests, create appropriately named files in the tests directory and then use the command:
% make test-add
Note: If you want to see what your parser or translator programs should be producing you can use the expected output files in the tests directory as a
reference.
Note: Do not modify the Makefile or the sub-
directories includes, bin and lib. They will be replaced during testing by the web
submission system. The Tokeniser
You must use the precompiled tokeniser functions provided by the library in the zip file attached below. The tokens recognised and the tokeniser functions are described in the file includes/tokeniser.h. All error handling is silent, if the
tokeniser encounters a character that cannot be part of a token or a character that is not whitespace, it simply returns the tk_eoi token. Once the tk_eoi token is
returned all future attempts to read a token will also return the tk_eoi token.
A set of extra token kinds are also provided that can be used with
the mustbe() and have() functions to check for membership of a particular group of tokens. For example, the token kind tk_dest can be used to check if a token is a destination component of a C instruction, ie one
of tk_dest_A, tk_dest_M, tk_dest_D, tk_dest_MD, tk_dest_AM, tk_dest_AD, tk_d
est_AMD or tk_null. All of the available groups are described in the file includes/tokeniser.h.
The Assembly Language Grammar
The following table describes the structure of an assembly language program that must be translated into machine code. All comments and whitespace, except for newline characters, will have been removed by the tokeniser. Therefore these
rules are defined purely in terms of the tokens returned by the tokeniser.
Term
program
line
instruction ::= tk_label | A-instr | C-instr A-instr ::= tk_name | tk_number
C-instr ::= tk_dest? tk_alu_op tk_jump?
Definition
::= line* tk_eoi
::= instruction? tk_eol
Notes:
• in a definition the or operator | separates alternative components
• in a definition the question mark character ? indicates that the preceding component may appear 0 or one times
• in a definition the star character * indicates that the preceding component may appear 0 or more times
Abstract Syntax Tree
An abstract syntax tree is just a tree like data structure that holds a representation
of a program. In the case of Hack Assembly language, the abstract syntax tree consists of a root node which has one sub-tree for every label, a-instruction or c- instruction in the program. In addition to the a-instruction node there is also an a- name node to represent a-instructions where the value is still described by a variable or label name. Descriptions of the functions required to create abstract syntax tree nodes and to access their contents can be found in the
file includes/abstract-syntax-tree.h.
All abstract syntax tree nodes are immutable – they cannot be changed – so the a new node cannot be created until all of its sub-trees have been created. This makes it impossible to construct loops in a tree which greatly simplifies their implementation. All tree nodes are referenced by a value of type ast which is simply an integer index into a private table. All tree nodes remain in existence for
the lifetime of the running program that creates them. Parser
The skeleton parser.cpp file has most of the parsing structure already written, you just need to complete the functions that parse labels, A-instructions and C- instructions. The main() function of the parser program prints out an XML
representation of the abstract syntax tree using the precompiled library function ast_print_as_xml(). You do not need to know how to write out XML. XML is produced by the parser because it is easy to read by humans.
If your parser encounters any errors, it must exit immediately and have not produced any output. Examples of errors include, multiple definitions of a
label, an A-instruction with a number that is too large, or a C-instruction with multiple destination components, multiple jump components or no ALU component. In this two program structure, the parser will not be able to directly detect the first two kinds of errors. Multiple definitions of a label will not be detectable until the translator walks the abstract syntax tree and a number that is too large will cause the tokeniser to report that it is at the end of the input. The other errors can be detected by the parser and should result in an immediate exit
with no output.
The iobuffer.h include file gives details of functions you can use to manage when output is produced by your programs.
Translator
The executable program, translator, will read the abstract syntax tree and assemble a machine code representation of the original Hack Assembly Language program. The assembled code will be formatted as sixteen zeros or ones per line and it must
be written to standard output using the write_to_output() function described in includes/iobuffer.h.
The translator program uses the precompiled function ast_parse_xml(), to read the output of the parser program. You do not need to know how to read in and interpret XML. The later workshops and assignments will use this technique to allow their compilers to be written as separate programs.
One advantage of this two program approach is that walking over the tree more
than once is really simple. The skeleton translator.cpp file includes an example of how to walk over the abstract syntax tree produced by the parser. This is important in the translator when it comes to resolving symbols. When a program includes an A-instruction such as @somename you cannot tell if somename is a label or a variable until the entire program has been inspected because the label definition may come later. This will be true for jumps to a later part of a program. We solve this problem by walking over the abstract syntax tree and record all the
labels that we can find. This is Pass 1. When we walk over the abstract syntax tree a second time, Pass 2, every label name will already have been defined. Therefore, all undefined names found in Pass 2 must be variable names.
If your translator encounters any errors, it must exit immediately and have not produced any output. It may be interesting to reflect on why we did not include
missing label definitions amongst the list of errors to be detected. Lookup Tables
Your translator program will need a way of generating the correct set of bits for each part of a C-instruction. For example, “JMP” needs to be mapped to “111”. To
do this you will need some sort of lookup table to record these mappings.
The translator program also needs a lookup table so that it can implement a symbol table to record label addresses and the memory locations for variables. You should use the precompiled symbol table classes described in
the includes/symbols.h file to create any lookup tables that you require.
The includes/symbols.h file includes examples of how to use these lookup tables.
You will need several lookup tables, one for the symbol table and one for each component of a C-instruction. Although most parts of a C-instruction are unique, the M, A and D registers need to be mapped to different sets of bits depending on whether they are used as an alu operation or a destination. For example, “A” as a destination would be “100” but as an alu operation it would be “0110000”.
Disassembler
The disassembler program uses its own tokeniser, which returns one token per instruction to parse a Hack machine code representation of a program. Details of the tokens, tokeniser functions and the grammar that is recognised are all described in the includes/tokeniser-d.h file.
The new Assembly language code that you generate must be formatted as follows and must be written to standard output using the write_to_output() function described in includes/iobuffer.h:
• Labels are on their own line, the opening bracket, ‘(‘, must be at the start of the line.
• A-instructions are on their own line prefixed by 8 spaces.
• C-instructions are on their own line prefixed by 8 spaces.
• If the jump or destination components of a C-instruction are null, they are not
written out.
• If the alu operation component of a C-instruction is not recognised, use the
string “< ** UNDEFINED ALU OPERATION ** >”
• Apart from the 8 space prefixes and within the alu error message, no other
white space can appear on a line of output.
No symbols
If the disassembler program is passed the command line argument “N”,
the disassemble_no_symbols() function will be called. This function will output all
A-instructions in numeric form and there will be no labels or variable names in the output.
Symbols
If no command line arguments are passed to the disassembler program,
the disassemble_symbols() function will be called and A-instructions must be
output in symbolic form where possible. The rules for generating label and variable names are as follows:
If an A-instruction is immediately followed by a C-instruction with a jump (the jump bits are “000”) and the A-instruction contains the ROM address of one of instructions in the program, the A-instruction must be written out using the label
name for that instruction. The first instruction in the program that is the target of a jump is given the label “L1”, the next instruction that is a target is given the label “L2”, and so on. This will require a list of label names to be kept and two passes over the program’s machine code.
If an A-instruction is immediately followed by a C-instruction that reads or writes
RAM and the address in RAM can be given a name, the A-instruction must be written out using that name.
If the RAM address has a predefined name in Assembly language, then that name must be used. Note: addresses 0 to 4 must be named SP, LCL, ARG, THIS or THAT rather than R0, R1, R2, R3 or R4.
If the RAM address is in the range 16 to 255 it may be the address of a variable.
This will require knowing the address of the next free variable in memory. Remember that in the Hack Assembly Language, the first variable is allocated address 16, the next 17 and so on. So, if the address is at least 16 and it is less than or equal to the address of the next free variable, the address refers to a variable and the A-instruction must be written out using the variable’s name. If the address is the address of the next free variable, the address of the next free
variable is also incremented by 1. Variable names can be calculated from the address by subtracting 16 and prefixing the result with “v_”. The first variable is at
address 16 and is named “v_0”, the next is at address 17 and is named “v_1”, and so on.
Disassembler Lookup Tables
Your disassembler program will need a way of generating the correct strings for each part of a C-instruction. For example, for a jump , “111” needs to be mapped to “JMP”. This can be easily done by copying the table initialisation code from
your translator program and swapping the order of the arguments. For the
predefined symbols do not include the mappings for “R0” to “R4” so that the names “SP”, “LCL”, “ARG”, “THIS” and “THIS” will be returned instead.
Test Programs
In addition to the tests in the tests directory, we will use a number of secret tests that may contain various errors. Note: these tests are secret, if your programs fail any of these secret tests you will not receive any feedback about these secret tests,
even if you ask!
The Pong program supplied in the tests directory was written in the Java-like high-level Jack language and translated into the Hack assembly language by the Jack compiler and a Hack Virtual Machine translator. (The Virtual Machine, Jack and the Jack compiler are described in Chapters 7-8, 9 and 10-11, respectively).
Although the original Jack program is only about 300 lines of Jack code, the executable Pong code is naturally much longer. Running this interactive program in the supplied CPU Emulator is a slow affair, so don’t expect a high-powered
Pong game. This slowness is actually a virtue, since it enables your eye to track the graphical behavior of the program. And don’t worry! as we continue to build the
software platform in the next few projects, Pong and and other games will run much faster.
Startup Files
The startup files in the attached zip file should work on most 64-bit Linux systems
and on a Mac. Please see the Startup Files for Workshops and Assignments page for more information.
• assignment-assembler.zip