9/27/21, 10:10 PM Milestone-2: (HA) 3574_Applied Software Design
https://canvas.vt.edu/courses/136167/pages/milestone-2 1/10
Milestone-2
Github Invitation Link: (https://classroom.github.com/a/vbNs2A4Z)
Introduction
The goal of this milestone is to develop the parser for our simulator. The parser will consume tokens
from the lexer and determine if it is a syntactically correct assembly program (grammatically correct).
In addition to the modules from milestone 0 (token and lexer) this milestone adds the parser
( parser.hpp , parser.cpp ). Although it is not evaluated as part of this milestone, you should also
design and begin implementation of your own modules for the data structures representing:
memory, as a sequence of bytes,
the association between constants and values,
the association between labels and memory locations,
the program, as a sequence of instructions, and
the association between labels and instructions.
Combined with the lexer from milestone 0 this will allow us to go from MIPS assembly source files to
data ready for interpretation/simulation in milestone 2. In this milestone you will get practice with
module design and implementation, as well as unit testing. If you were unable to complete milestone
0, you should do so fully before attempting this milestone.
You can use any of your code from milestone 0 (lexer.cpp and any tests you wrote) as a starting
point. You may need to modify and/or extend it as you work on this milestone.
Recommended Background Reading
A Primer on Parsing (https://canvas.vt.edu/courses/136167/pages/a-primer-on-parsing)
A Primer on Command Line Arguments (https://canvas.vt.edu/courses/136167/pages/a-primer-
on-command-line-arguments)
Sections 5 and 6 of “MIPS Assembly Language Programming using QtSpim”, by Ed Jorgensen,
2016.
A BNF tutorial (http://matt.might.net/articles/grammars-bnf-ebnf/) by Professor Matt Might at
UAB.
Format of a MIPS assembly program
MIPS-I is a 32-bit CPU. The word size is 32-bits or 4 bytes long. A half-word is 16-bits or two bytes,
and a byte is 8-bits. Memory is byte-addressable, meaning that individual bytes can be accessed.
The CPU is word aligned and the CPU expects words to start at a memory address that is a multiple
https://classroom.github.com/a/vbNs2A4Z
https://canvas.vt.edu/courses/136167/pages/a-primer-on-parsing
https://canvas.vt.edu/courses/136167/pages/a-primer-on-command-line-arguments
http://matt.might.net/articles/grammars-bnf-ebnf/
9/27/21, 10:10 PM Milestone-2: (HA) 3574_Applied Software Design
https://canvas.vt.edu/courses/136167/pages/milestone-2 2/10
of 4 (we will ignore this complication). Bytes, half, or words may be treated as unsigned, in which
case they are stored in normal binary, or signed, in which case they are stored using two’s
complement. To implement the various signed and unsigned instructions you will need to be familiar
with arithmetic operations in binary and two’s complement (this is covered in ECE 2544 and 2564).
MIPS is unique in that it can be either little-endian or big-endian
(https://en.wikipedia.org/wiki/Endianness) depending on CPU initialization. This refers to the way
multi-byte sequences (half and full words) are ordered in memory, most-significant byte at lower
addresses (big-endian) or least-significant byte at lower addresses (little-endian). We will assume a
little-endian byte order in our simulator. This primarily affects how memory is interpreted and how
transfers to and from registers move individual bytes.
The CPU has 32 32-bit general-purpose registers (labeled $0 to $31). There are also three additional
registers (in the core processor) that contain CPU specific information: the program counter ($pc)
and two registers used with the ALU ($hi and $lo). The co-processors (which we are not simulating)
have others.
MIPS-I has a conventional fetch-decode-execute cycle, however some branches and loads require
two clock cycles. This complicates the simulation significantly, so we will instead define
a virtual machine (VM) that executes instructions directly in C++ code. In particular we will not be
concerned with pipelines, caching, or the details of the data path. We will also not work with the
binary representation of code (machine code) nor be concerned with timing (clock cycles per
instruction). Instead, our simulation will initialize non-code memory and process each instruction as
determined by the program-counter, accessing and updating the registers and memory as required
as well as trapping instruction exceptions. The amount of memory available is fixed during a
simulation and (in future milestones) can be changed from the command line; by default is 1024
bytes.
The general-purpose registers also have aliases based on their typical usage (see table below). For
example, register $8 is also called temporary register 0, or $t0. Most of these aliases are conventions
useful for programming subroutines (procedures and functions) and operating system kernels. We
will just use them as simple aliases for the numbered registers.
Reference: Figure A.6.1 in Patterson and Hennessy
Number Alias Typical Usage
$0 $zero Hardwired to the value of zero
$1 $at Assembler Temporary (reserved for assembler pseudoinstructions)
$2-$3 $v0-$v1 expression evaluation and function results
$4-$7 $a0-$a3 first 4 arguments of procedures
$8-$15 $t0-$t7 Temporaries (not saved)
$16-$23 $s0-$s7 Saved Values
https://en.wikipedia.org/wiki/Endianness
9/27/21, 10:10 PM Milestone-2: (HA) 3574_Applied Software Design
https://canvas.vt.edu/courses/136167/pages/milestone-2 3/10
$24-$25 $t8-$t9 Temporaries (not saved)
$26-$27 $k0-$k1 for interrupt and exception handlers
$28 $gp the global pointer
$29 $sp stack pointer
$30 $fp frame pointer
$31 $ra return address
The grammar for our version of MIPS assembly can be split in two, depending if one is currently
parsing a data section or a text section. If a string token with value “.data” followed by an EOL is
encountered the parser switches to declaration mode and uses the data section grammar. If a string
token with value “.text” followed by an EOL is encountered the parser switches to text mode and
uses the text section grammar. One or the other must be encountered first in a file, or it is an error.
The respective grammars are defined below.
Data Section grammar
The general grammar for a data declaration is a lone label or an optional label followed by a layout
and value. Constants can also be defined in the data section using the EQUAL token. The EOL token
terminates any declaration.
The storage format is interpreted as:
.word followed by a comma separated list of 32 bit integers I. Allocate 4 bytes of space starting at
the last available address and fill with I, LSB first. It is an error if any integer is larger than 32 bits
or the required memory exceeds that specified.
.half followed by a comma separated list of 16 bit integers I. Allocate 2 bytes of space starting at
the last available address and fill with I, LSB first. It is an error if the integer is larger than 16 bits
or the required memory exceeds that specified.
.byte followed by a comma separated list of 8 bit integers. Allocate 1 byte of space starting at
the last available address and fill with the bytes, in order. It is an error if the integer is larger than
8 bits or the required memory exceeds that specified.
.space followed by an integer I specifying the amount of space in bytes. Allocate I bytes of space
starting at the last available address and fill with zeros. It is an error if the integer is larger than 32
9/27/21, 10:10 PM Milestone-2: (HA) 3574_Applied Software Design
https://canvas.vt.edu/courses/136167/pages/milestone-2 4/10
bits or the required memory exceeds that specified.
.ascii followed by a string of length L (not automatically null-terminated). Allocate L bytes of
space starting at the last available address and fill with the bytes of the string in order left-to-right.
It is an error if the required memory exceeds that specified.
.asciiz followed by a string of length L (automatically null-terminated). Same as .ascii but
terminate the string with a null (decimal value 0) character. This will allocate L+1 bytes. It is an
error if the required memory exceeds that specified.
We are going to ignore the concept of alignment – data of any type can be placed at any memory
word without restriction during parsing of the data section. We will assume the data section starts at
a byte address of 0.
Integer value’s in-memory representation is little endian (https://en.wikipedia.org/wiki/Endianness)
(least-significant byte first) with negative values in two’s complement
(https://en.wikipedia.org/wiki/Two%27s_complement) . So for example the following declarations
.data
x: .word 1, -1
y: .byte 42, -42
would result in a memory layout (each byte in hex, memory address 0 on the left):
0x01 0x00 0x00 0x00 0xFF 0xFF 0xFF 0xFF 0x2A 0xD6
and the labels x and y would map to the addresses 0x00000000 and 0x000000008 .
When parsing integers it is necessary to ensure they fit into the number of bits required, based on the
context. To simplify this, for the purposes of bounds checking, if an integer starts with ‘+’ or ‘-’ then it
should be treated as a signed type, otherwise it should be treated as unsigned. For example
.data
.byte 250 # this is ok 0 <= 250 <= 255
.byte +250 # this is an error because -128 <= +250 not<= +127
Text section grammar
Our simulator will work with only a subset of MIPS instructions including data movement
instructions, integer arithmetic instructions, logical instructions, and control instructions. The general
grammar for an instruction is an optional label followed by an operation followed by an EOL token.
An operation is a mnemonic opcode followed by a comma-separated (SEP token) list of arguments,
except in the case of the special nop (no operation) instruction.
https://en.wikipedia.org/wiki/Endianness
https://en.wikipedia.org/wiki/Two%27s_complement
9/27/21, 10:10 PM Milestone-2: (HA) 3574_Applied Software Design
https://canvas.vt.edu/courses/136167/pages/milestone-2 5/10
The number and interpretation of the arguments is context dependent, i.e. it depends on the opcode.
In general an argument is either a register, memory reference, or an immediate value. Registers,
denoted
numeric or alphanumeric sequence as specified in the table above. Memory references,
denoted
argument refers to the actual value, for example a label from the data section. In indirect addressing
the argument is surrounded by () and is treated like a pointer reference in C. The value inside the
parenthesis is treated as a memory location and the value at that location is read. A signed offset to
the address in bytes may be used as well. See section 6 of Jorgensen for more detail.
Immediate value arguments are just the value itself, are denoted
may be integers or (previously defined) constants. In the following
Data Movement Instructions:
Let look at some examples (see also section 5.3 of Jorgensen).
.data
x: .word 100
arr: .byte 10,11,12
.text
main:
# load word from location x into temporary register 0
lw $t0, x
# load address of arr into $t1
la $t1, arr
# and get first value (10) into $t2
lb $t2, ($t0)
# and get second value (11) into $t3
lb $t3, 1($t0)
# and get third value (12)into $t4
lb $t4, 2($t0)
Integer Arithmetic Instructions:
9/27/21, 10:10 PM Milestone-2: (HA) 3574_Applied Software Design
https://canvas.vt.edu/courses/136167/pages/milestone-2 6/10
Logical Instructions:
Parsing the token sequence
Your parser code can use the state machine technique in the background reading, but it is not a
requirement. You will likely need to re-factor your code so the complexity does not get out of hand
and become difficult to debug. Unit tests can be a big help here.
When a data declaration is successfully parsed, the VM memory should be appropriately initialized
and any labels or constants added. Once an instruction is successfully parsed it should be added to
the list of instructions held by the VM. Any instruction labels encountered should be added as well.
If the lexer or parser encounters an error is should halt and store or return the line (in the original file)
and reason for the error. This will be needed for error reporting.
We will use a driver program, called simmips , to do functional testing. This program will eventually
become our textual interface to the simulator, but for now it will just be used to open, lex, parse a file
and report any errors. The file name to read is specified as a command line argument (whatever the
9/27/21, 10:10 PM Milestone-2: (HA) 3574_Applied Software Design
https://canvas.vt.edu/courses/136167/pages/milestone-2 7/10
user types on the command line after the executable). If you need a primer on how to use command
line arguments, see the background reading.
For example, the following program invocation reads the assembly program in the file progn.asm and
attempts to lex and parse it (assuming Windows platform, $ as the command line prompt):
$ .\simmips.exe progn.asm
If a single file name argument is not specified, or the specified file cannot be opened, the program
should print an appropriate error message to standard error and return EXIT_FAILURE . If the input file
fails to lex or parse then an error message should be printed to standard error and the program
should return EXIT_FAILURE . The format of the error message should be:
Error:line: message
where line is the numeric line number of the input file (starting at 1) where the error occurred, and
message is an informative diagnostic message. The portion of the message up to the second ‘:’
should follow this format exactly, however the message text can vary as long as it does not exceed
one line. The parser should halt on the first error and the program report only one error. These
seemingly pedantic requirements are need for grading purposes.
Otherwise, if the input file correctly parses the program should print nothing to standard output or
standard error and should return EXIT_SUCCESS . See the header std::cstdlib
(http://en.cppreference.com/w/cpp/utility/program/EXIT_status) where these return codes are defined.
Testing
Each module of your code, generally a pair of files module.hpp / module.cpp , should have a
corresponding set of unit tests in test_module.cpp . These should use the Catch testing framework
( catch.hpp and main test driver unit_tests.cpp is provided). The quality of your tests is determined by
code coverage which should exceed 95% of lines. The building of these tests is setup via the
provided CMakeLists.txt .
Several CMake-driven functional tests are provided in the CMakeLists.txt file, using
the simmips program to read files from the tests subdirectory of the repository. Do not modify these
test input files. The functional tests can be run using the command cmake –build . –target test or by
building the RUN_TESTS project in Visual Studio or XCode. These are a large subset of the tests that
we will run when grading. Your code should pass these tests before your final submission.
The file test_config.hpp is written by cmake into the build directory (from the file test_config.hpp.in )
at configure time to set a string that points to the test files root directory. You can include this file in
your tests to determine those file names at run time.
http://en.cppreference.com/w/cpp/utility/program/EXIT_status
9/27/21, 10:10 PM Milestone-2: (HA) 3574_Applied Software Design
https://canvas.vt.edu/courses/136167/pages/milestone-2 8/10
Getting Started
Accept the GitHub invitation above and then clone the git repository to your local machine. The initial
git repository contains several files. Your task in this milestone is to implement the parser
functionality in the module parser.hpp / parser.cpp . You can (and probably should) define additional
units of code (classes, functions) and tests as additional files, but add them to the CMakeLists.txt files
in the indicated place near the top.
All unit tests should use the Catch testing framework (see meeting 7). The included CMakeLists.txt
file sets up building these tests for you. Just configure the build, run the build, and then run the tests.
You should use git to commit versions of your program source (only) as you go along, demonstrating
good incremental programming technique.
Steps to build and run the tests in the reference environment (after vagrant ssh).
To configure the build
cmake /vagrant
To run the build
make
or
cmake –build .
To run all tests
make test
or
cmake –build . –target test
The test output is placed in “Testing/Temporary/LastTest.log. You can also just run the unit tests
directly:
./unit_tests
To run your tests and check code coverage.
make coverage
or
9/27/21, 10:10 PM Milestone-2: (HA) 3574_Applied Software Design
https://canvas.vt.edu/courses/136167/pages/milestone-2 9/10
cmake –build . –target coverage
This will create a directory called Coverage_Report in the build directory. You can copy it back to
your host by doing cp -r Coverage_Report /vagrant . Open the index.html file inside it to examine
the detailed report.
To run the memory checks on your unit tests.
make memtest
or
cmake –build . –target memtest
This should report no “definitely lost” leaks and no access errors. Look for
LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks
and
ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Note: if your unit tests fail, the memory test will also fail.
To configure and run the build in strict mode (increased warnings, warnings become errors)
cmake -DSTRICT=True /vagrant
make clean; make
To Do
Since this module has several pieces, below is a list of suggested “todos” in order of priority:
1. clone the repository, implement a stub for the parser and a functioning version of
the simmips program (can compile and run).
2. implement the parser, writing unit tests for each data declaration and instruction as you go
3. finish implementing the parser so that it passes the provided functional tests
4. design and implement data structures to hold the data encountered while parsing, with unit tests
Todos 1-3 are required for milestone 2. The last todo will help you get started on milestone 3.
Submission
To submit your assignment:
1. Tag the git commit that you wish to be considered for grading as “final”.
9/27/21, 10:10 PM Milestone-2: (HA) 3574_Applied Software Design
https://canvas.vt.edu/courses/136167/pages/milestone-2 10/10
git tag final
2. Push this change to GitHub
git push origin final
If you need to tag a different version of your code as final (before the due date) simply create and
push a new tag, appending a monotonically increasing number to final, e.g. final2, final3, etc.
Be sure you have committed all the changes you intend to. It is a good idea to re-clone your
repository into a separate directory and double check it is what you intend to submit. Failure to
complete these steps by the due date will result in a failed submission.
Grading
There are 52 points allocated to this assignment. All requirements are evaluated with respect to the
course reference environment.
Correct files submitted (cmake configure) 1
Code Compiles 1
Functional Tests 38
Test Quality (coverage) 5
Memory Analysis (valgrind) 5
Code Compiles with No Warnings (STRICT mode) 1
Development Practices 1
Configuration and Compilation are all-or-nothing. Functional Tests, Test Quality, and Memory
Analysis portions of the grade are proportional (you can receive partial points). Good development
practices will be assessed by ensuring you have a regular commit history in your repository.
Correctness can be checked in the auto-grader (https://grader.ece.vt.edu/) and is determined by
the proportion of instructor tests that pass. The auto-grader uses the exact same environment as the
reference so if your code compiles there, it should in the auto-grader as well (but check anyway!).
You are rate-limited to only two submissions every two hours to the auto-grader so as to prevent you
from using it as your development environment and encourage proper debugging skills.
https://grader.ece.vt.edu/