代写 C++ C MIPS operating system software ECE/CSE 401 Spring 2018: Project 2

ECE/CSE 401 Spring 2018: Project 2
Due: March 6th, 2018
You are going to implement a five-stage MIPS pipeline in behavioral Verilog. An unpipelined implementation is provided with a C++ wrapper, which can handle system calls, library function calls and memory accesses. To complete the design and the testing, you will need to modify both the Verilog code and the C++ wrapper.
1. Use Verilator to Compile and Simulate the Unpipelined Processor
In this project, you are going to use verilator to compile and simulate. Verilator is a software tool which synthesizes Verilog to a cycle-accurate behavioral model in C/C++.
* Copy the code file to your own directory

cp -r /proj/ece401-spring2018/ClassShare/Proj2 /proj/ece401-spring2018/USERORGROUP
* Enter the copied directory

cd /proj/ece401-spring2018/USERORGROUP/Proj2
* Run the compilation script: ./compile
* Run the VMIPS file
./VMIPS ApplicationName [Duration]
For example:
./VMIPS class
runs application “class” on the simulated MIPS processor implemented in the Verilog, ./VMIPS file 10
runs application “file” and enters debugging mode on the 10th cycle.
Ten applications are provided. Their C source code can be found at /proj/ece401-spring2018/ ClassShare/app_src. Their ELF objects files can be found at /proj/ece401-spring2018/ClassShare/ app_obj. The C++ wrapper will read the object files to simulate each MIPS instructions. All ten applications and their instruction counts are listed below:
class
fib18
fact12
file
hanoi
96914
305486
110521
95219
201314
hello
ical
matrix
noio
sort
95442
216216
136978
1996
104655
You will need to run all 10 applications on the modified version.

2. Simulation Framework
Figure 1 Relationship between MIPS top file and C++ wrapper.
Figure 1 shows how the system works. The value initialization, clock generation, library function substitution, syscall handling, and memory function simulation are written in the C++ wrapper (sim_main.cpp). All the Verilog modules are all connected to MIPS.v and the MIPS.v top module communicates with sim_main.cpp file. The main memory in this system is a virtual memory which is a static map (MainMemory) in C++ file.

In the Function substitution, when a function call is detected (how to detect a function call will be discussed later in this instruction), function unit in C++ wrapper will read in some register values and change the value of registers in register file. For example, if function call fxsta64 is detected, the value of reg[4] in register file will be the value of reg[5], the value of reg[5] in register file will be the value of reg[6]. The instrInput to the corresponding instrAddr then are some other values instead of from MainMemory.
When the instrInput equals 12, then the system will enter a syscall. The writebackFlag is passed into the C++ wrapper to indicate if the system is ready to enter the syscall. Like the function substitution, syscall unit also read in some register values and change the value of registers in register file. For example, if syscall fstat is detected, the value of reg[4] in register file will be the value of reg[5], the value of reg[5] in register file will be the value of reg[6]. The reg[2] is used to store the result of the function fstat() (gets information about the named file and writes it to the area).
*syscall
A system call requests services from the kernel of the operating system it is executed on. (wikipedia: https://en.wikipedia.org/wiki/System_call)
Diagram for syscall(from Operating system study guide)
Example: You can recognize a syscall by checking the opcode of the instruction(the opcode of a sys call is 001100). For example, the service number of read is 5 and it is stored in reg[4] in register file. The system loads the service number firstly, and then call the specific system call. In this simulation framework, system calls are emulated in the C++ wrapper. At the end of each system call, the return values are directly stored in the register file.

*Library Function Substitution
The simulation framework emulates library function calls in the C++ wrapper. Figure 2 shows an example of a library call—libc_malloc.
Example:
Figure 2: Instruction that calls a function and the corresponding function call.
The first column in elf is the pc for each instruction. The second column is the instruction code for each. The third is the description of each instruction
Figure 3: String match in the C++ wrapper.
In the C++ wrapper, the system detects function calls with string match. After detecting each function call, system executes each of them and shows the function call in framework.
*When fetching certain instructions, the system will enter library function call substitution part, in this part it uses some register values in the register file and will write to the register file directly. The C++ wrapper recognizes function calls through string match and pass a signal to the mips design. Some functions will also change the instruction inputs directly. So make sure all previous instructions are written back before processing a substituted function. Otherwise the values read from the register file will be incorrect. When the instruction is 12 (encoded value of the instruction), the system will enter the syscall part. (syscall is used to print values or strings from input/output and indicate program end. For some syscall instruction, it will change register values by writing to the register file.)
For the time line, notice that:

*For one clock cycle, verilog code will be evaluated twice, but only when clk is 1 there will be a posedge.

*Make sure to print values at the right time (write print at the right place) when you try to debug. For example, if you want to print fetched instruction value for this cycle, writing in C++ display area will not show you the right value. This is because the fetched instruction inputs will be up- dated later than the C++ print. So you can choose to print either in verilog or C++. (Do not print both in Verilog and C++).
3. Requirements
3.1 Pipeline Registers
Divide building blocks to each stage: IF, ID, EXE, MEM and WB as showing in figure 3. Add pipeline registers as needed. Notice that figure 4 shows how to divide the data path at a high level. There will be more changes in the actual implementation.
Figure 4 Divide building block to 5 stage (adapted from [P&H]). 3.2 Handle Hazards
Implement all of the forwarding paths to handle data hazard appropriately. Assume all of the loads and stores hit in the data cache and the memory stage takes 1 cycle. Load should not cause stall due to the load delay slot. For control hazard, allow branches to be resolved in the Execute

stage. This means that a branch will still cause one pipeline stall after the instruction in the branch delay slot is fetched. Do not implement any branch prediction.
3.3 Handle Syscalls and Library Function Calls and run All of the Applications
In the unpipelined version, both Syscall and Function call take a single cycle to finish. This will not work for pipelined version because syscall and function call can directly read and write the register file as soon as they are fetched. When encountering syscall and function calls, flush the pipeline. This means the syscall or function call instruction should be stalled at the fetch stage, while all of the order instructions in the pipeline write back to the register file. All of the ten applications should be able to run toward the end of the program.
3.4 Analyze the Result (Bonus)
Modify the cpp file so that instruction counting and cycle counting are correct for the 5-stage pipelined version. The instruction counts should match with the instruction counts of the unpipelined version. Add a counter for total number of stalls, which include both the stalls due to branches and the stalls due to syscall/function call. Verify that the following equation is correct for all of the applications: Total_Cycles = Total_Instructions + Total_stalls + Pipeline_stages – 1.
4. Tips
4.1 Signals can be printed in both the Verilog and C++ code for debugging
When print things in Verilog using $display, notice that $display only shows the value before a non-blocking assignment is evaluated.
When print in C++, if you want to indicate a signal that is an in/output port of the top module, use (int)top->signalName; if you want to print a signal that is a local wire in the top module, use (int)top->MIPS->signalName; and if you want to print a signal in a submodule, use (int)top- >MIPS-> __PVT__submoduleName__DOT__signalName.
If you want to print a binary value in c++, using bitset. some examples:
4.2 Use both debugging and non-debugging mode to check correctness
Enter debugging mode using ./VMIPS fileName Duration

In this mode, stack pointer, stack contents, and register content will be printed. You can start de- bugging from a predefined cycle. You can also print more signals to help debugging. Press EN- TER to step next and CTRL+C to terminate.

Enter non-debugging mode using ./VMIPS fileName

In this mode, library function calls, system calls, and IPC results will be printed. You will need to add additional counters to count the number of stalls.
4.3 Cross check the intermediate results between the 5-stage pipelined and unpipelined versions
Since the unpipelined version and the 5-stage pipelined version are both executed in order, you can use the unpipelined processor as a reference.
4.4 Check memory values
All the memory writes are recorded in memoryWrites.txt file after running the code, you can use that to check if memory reads and writes are correct.
4.5 Suggested modifications of C/C++ code
Suggested modifications of the C/C++ code are commented using. Search for “5 stage”.
5. Report and Hand In
Please submit your final design and a project report. If your design does not work completely, please indicate which application can run and which can not. For the applications that cannot finish running, specify how many instructions can run.
5.1 Report requirement
The report should include 1) a diagram of the final design, 2) explanations of the C++ modifications, and 3) cycle count analysis. The final design diagram can be broken down into sub diagrams. You can modify the C++ code, but all of the hardware changes should be implemented in the Verilog code. Please provide explanations and comments for the C++ modifications. Please report the cycle count and the number of stalls for each of the ten application.
5.2 Hand-in requirement
Please make a tarball or zip file of all of the source code (all of the .v files and the .cpp file) and the report. Delete the obj_dir directory and memoryWrites.txt before making a tarball/zip file. Submit the tarball/zip file through coursesite.
5.3 Group submission
If you are part of a group, please make only one submission through coursesite. Students in the same group will get the same grade.