UNIVERSITY OF CALIFORNIA, DAVIS
Department of Electrical and Computer Engineering
EEC 170 Introduction to Computer Architecture Winter 2021
Project 3
Due: 11:59 pm Saturday 6th March.
Refer to Getting started with RARS for tips/suggestions.
You can find the recorded lab tutorial here. This project has 4 parts:
Write 2 sort routines (insertion sort, merge sort) in RISC-V assembly, and use them to sort a random 4 kB file of 32-bit unsigned integers
These unsigned integers will never have their high bit set (bit 31 will always be 0).
Write a binary search routine which utilizes the data you sorted with both of the previous two sort routines to find the predecessor of key K (such that predecessor is the largest element before K)
Analyze the instruction trace from your runs
Using the memory trace from your runs, design, implement, and analyze a 1 kB data cache that would accelerate running your three sort routines
Every place in this assignment that we use the term “kB”, we mean “kibibytes” (a power of two) not “kilobytes” (a power of ten). 4 kB, for instance, is 4096 bytes.
WHAT YOU CAN START DOING IMMEDIATELY
Write your TWO sort routines
Write the binary search routine
Write the code that analyzes the instruction/memory trace Write the code that simulates your cache
a) WRITE TWO SORT ROUTINES
You will write TWO sort functions, each of which will take three arguments:
The first argument is the memory address of where you will write your 4 kB output (the sorted list). You are allowed to change the contents of this memory region if you would like.
The second argument is the memory address of the 4 kB unsorted input. You can assume the input is exactly 4 kB (1024 32-bit unsigned integers).
The third argument is the memory address of a 16 kB uninitialized memory region if you find that useful in writing your routines.
You will write a routine to find the predecessor of key K (which is the largest number in the input dataset that is smaller than K, but not equal to K) using binary search, which will utilize the data you sorted using the sort routines that you wrote. You will have to use the search routine with each of the previous sort routines separately, which will later be used for comparison.
The first argument to the sort function is the address of the sorted list and the second argument is the key K (we assume that the key K is always present in the list).
Use these scaffolds: asm files main.asm and file_read.asm, and the dataset data.txt. We will test your code with our own data.txt.
What does main.asm do?
Calls file_read.asm and sort routines. Make sure that your RARS simulator is in the same folder where this file lies, otherwise provide the entire path for the file.
Saves required environment(registers) and clears all registers except for those used in function calls. This avoids unexpected results.
Provides necessary addresses for the sort and search routines.
What does file_read.asm do?
Reads the dataset data.txt into a buffer.
Since the data can only be read as strings, it has a routine which converts them to unsigned integers.
Returns start and end address of unsorted data.
How to use these scaffolds?
Open main.asm, file_read.asm, and your sort routines into RARS.
If you don’t want to call a routine, comment the respective call macro in main.asm (you will utilize this while doing comparison of the two sort routines where you will comment out one of the sort routines). For example, if you don’t want to assemble merge sort, just comment func_call(merge) and in main.asm.
Do not use the label ‘main’ in your sort routines.
Before you assemble, go to setting select ‘Assemble all files currently open’ and ‘Initialize program counter to global ‘main’ if defined’.
The dataset ‘data.txt’ should be in the same directory as ‘rars.jar’.
Follow RISCV calling conventions: save the return address at the beginning of your sort routine, and retrieve it back at the end.
The two sort algorithms you will implement are:
https://en.wikipedia.org/wiki/Insertion_sort (Links to an external site.) https://en.wikipedia.org/wiki/Merge_sort (Links to an external site.)
The search algorithm you will implement: https://en.wikipedia.org/wiki/Binary_search_algorithm (Links to an external site.) Submit your sort routines as:
Insertion sort : filename (insertion.asm), function name (insertion). Merge sort : filename (merge.asm), function name (merge).
Binary search: filename (bsearch.asm), function name (bsearch).
GENERATE AN INSTRUCTION/MEMORY TRACE
Your instructor wrote this trace generator in Java. The jar file you already downloaded from github contains it (it is now part of RARS). You are welcome to compile it yourself. When you
run a
1. 2. 3. 4. 5. 6. 7.
program in (this version of) RARS, you can generate a trace by doing the following:
Open your source file in RARS.
Tools menu, Instruction/Memory Dump.
Change filename to a filename of your choice.
Click button: Connect to Program
Run, Assemble.
Run, Go.
Go back to Instruction/Memory Dump window: click “Dump Log”. This saves the dump to the file you specified in step 3.
These steps are pretty brittle (i.e., do them in this exact order) because your instructor doesn¡¯t know how to use Swing very well, but they seem to work well in practice. If you have suggestions, the instructional staff is happy to hear them.
The file you generate has one line per datum. The four kinds of data you will see in the trace are:
¡®I¡±: The address of an access into instruction memory
¡®i¡¯: A 32-bit RISC-V instruction (the trace first dumps the address then the instruction)
¡®L¡¯: The address of a memory load into data memory
¡®S¡¯: The address of a memory store into data memory (we don¡¯t think you need the contents of the memory load/store for this project, so they aren¡¯t in the trace)
Your instructor put a C++ file that you can use as a scaffold for your analysis, as well as a sample log file, in the Project 3 directory in Canvas.
b) ANALYZE THE INSTRUCTION TRACE
For this part you only need to look at the ¡®I¡¯ and ¡®i¡¯ entries in your trace. In your report, answer the following questions. It seems easiest to the staff that you report separately for each of your two implementations (i.e merge sort and search and insertion sort and search). Throughout this analysis you are supposed to combine each sort with search, that is, you will generate the trace using merge sort and search (comment out the call to insertion sort) and generate trace using insertion sort and search (comment out the call to merge sort).
Compare the performance, in terms of instructions executed, of the two sorts with search.
1) What is the instruction mix that you actually run over your two sort and search methods: what is the fraction of instructions that are loads, stores, branches, arithmetic operations, etc.? Use your judgment on how best to report this mix. Compare for both implementations.
2) List the top 5 most frequent dependent instruction pairs that executed in each trace. By “dependent instruction pairs”, we mean “a pair of instructions that run in sequence where the first instruction produces a register value that is read by the second instruction”. You can ignore register numbers for this problem except to tell if the second instruction depends on the first one (so your answer is going to be 5 pairs like “lw-add”, not “lw x2 … – add x3, x2, x1”).
3) Given our 5-stage RISC-V pipeline, what fraction of instructions executed have at least one register argument that must be forwarded (for both implementations)?
4) How effective is static branch prediction (what percentage of the time will a static branch prediction algorithm¡ªforward branches are not taken; backward branches are taken¡ªmake the right prediction) (for both implementations)?
c) DESIGN A CACHE
You will design a 1 kB data cache, implement it (in C), and analyze its performance using the traces you produce. The most straightforward way for you to do this is to allocate a data structure in your trace analyzer that has the exact same data as the pictures of cache that we showed in class (for example, in the lecture memory-2.pdf, ¡°Example: Intrinsity FastMath¡±; for this example, you will have to keep track of a valid bit and a tag for each entry in the cache; you do NOT have to keep track of data because we don¡¯t need to do so for this assignment, and you should think through why we say this in the assignment so you understand). You are only focusing on data accesses (load/store), not on instruction accesses.
In your simulator, every time you see a data memory access, you will check the cache to see if it is a hit or miss, record whatever data you want to record for your analysis, and update the cache. The size of your cache is fixed (1 kB). You will certainly want to explore varying the block size and associativity of your cache. You may also want to vary replacement policy and write policy (write-through vs. write-back).
Your goal is to maximize system performance. In your report, quantify how much your cache helps system performance vs. having no cache. Assume that your workload is equally split between the two sort routines (you will call each sort routine the same number of times in your workload). Maximizing system performance will primarily mean maximizing hit rate. Justify your
design and your conclusions in your report succinctly but thoroughly. To measure system performance, assume the following:
Cache hits complete in a single cycle.
Cache misses cost an additional 50 cycles (to go to memory).
If you implement an associative cache, it affects the clock cycle of the machine. Any set associative cache has a 10% clock cycle penalty when compared to a direct-mapped cache, and each additional doubling of set associativity adds an additional 2%. So a 2-way s-a cache has a 10% penalty, a 4-way s-a cache has a 12% penalty, 8-way is 14%, 16-way is 16%, etc.
WHAT YOU SUBMIT
A PDF writeup (technical report) of no more than 2 pages that summarizes your findings. MUST BE PDF. DO NOT USE ANOTHER FILE FORMAT.
Your source code for your two sort and search routine.