CS 152 Lab 2: Pipelined RV32I Implementation
Due: 11:59 PM, March 13
Version 0.1. 2020-02-17
1. Overview
In this lab, you will take a non-pipelined implementation of a RV32I processor, and improve its performance while not losing correctness, using ideas covered in class including (but not limited to) pipelining, branch prediction, and forwarding.
You will also extend the ISA with an additional instruction to support more efficient software.
The final goal of this lab is modify the architecture such that it can run the provided neural network inference workload as fast as possible, while maintaining correctness.
2. The Environment
2.1. Setting up the environment
The environment is provided in openlab.ics.uci.edu
1. Once logged in, either run, or add the following command to your .bashrc
source /home/swjun/cs152/setup.sh
2. Copy the contents of /home/swjun/cs152/vim-bsv into your ~/.vim directory, to enable
syntax highlighting for bluespec inside vim.
3. Copy the directory /home/swjun/cs152/processor to your working directory Now you are ready to build and execute the processor simulation
2.2. Running the simulation
Go to your own processor directory, run ¡°make¡±, and then run ./obj/bsim
In the default setting, it will run the mnist handwriting recognition benchmark, which will run for a few minutes
2.3. Modifying the environment (mmap.txt)
The file mmap.txt tells the simulation environment what files to load to memory before executing, and what MMIO output to expect.
In the default setting, the mmap.txt file loads sw/obj/microbench.bin to address zero, and sets up multiple MMIO mappings to check for correct operations. Please look at sw/microbench.s, as well as the disassembled version sw/obj/microbench.dump to see what the program does.
If you want to try changing the microbenchmark, edit the .s file and run Make in the sw/ directory.
For testing with the neural network benchmark, back up the original mmap.txt somewhere, and copy mmap.txt.nn to mmap.txt. The new mmap.txt file loads sw/nn/obj/mnist.bin to address zero, the trained neural network model sw/nn/data/modeli.dat to 0x2000, input data sw/nn/data/sample00_answer7.dat to 0x3000. It also sets up MMIO on address 0x5000 and checks if the output to this address is 7, which is the correct answer. This mmap.txt includes commented lines for other data inputs as well, which you can try to further check correctness.
3. The Processor Code Structure
The code for the processor is stored in the directory src/. The main file to look at is Processor.bsv, and the other files implement various supporting modules and functions. For example, Decode.bsv implements the decode function, which is used by the doDecode rule in Processor.bsv. Execute.bsv implements the execute function, which is used by the doExecute rule in Processor.bsv.
4. The Goals
There are two goals for this lab: Adding the multiply instruction, and improving performance in terms of cycles per instruction.
4.1. Adding the multiply instruction
You will need to edit src/Decode.bsv, as well as src/Execute.bsv to add decoding and execution for the one new instruction. Keep in mind the Mul instruction is of type opOp (two register inputs and one register output), funct7 is 0x1, and funct3 is 0x0, as defined by Bit#(3) fnMUL in Decode.bsv.
4.2. Improving CPI
For this part you will apply all you have learned during the microarchitecture portion of the course to improve the performance of the processor. The nominal steps of improving performance will be the following, but you are free to try what you feel is good.
1. Pipeline the processor, which will require solving the control hazard issue via a branch predictor. Initially it is recommended just to predict always-not-taken (pc_predicted = pc + 4). In order to maintain correctness during mispredicts, you will also need to implement epochs as covered during lecture.
2. Implement stalling to maintain correctness under data hazards. Use the scoreboard implementation given in Scoreboard.bsv. A scoreboard with four slots is already initialized with the variable name ¡°sb¡±. Look at Scoreboard.bsv to see its interface, which should be very similar to what is covered in class.
3. Implement forwarding to improve the performance of mispredicts.
5. What to Hand In
1. Submit a tar-gzipped version of your ¡°src¡± directory.
2. Submit a report, describing the modifications you made to the processor, and how much CPI improvements it resulted in.
6. Grading
Grading will be done using the provided neural network code, with the datasets provided as well as a few other data and answer pairs.
Full scores will be given to correct implementations which perform similarly, or better than, the reference implementation by the instructor. Partial credit will be given to correct implementations with lower performance. Implementations with incorrect results will be given lower partial credit than correct implementations with slower performance.