CS计算机代考程序代写 compiler Java c/c++ cache simulator Hive cache CSc 656 Project 2: Cache simulation

CSc 656 Project 2: Cache simulation
due Monday 4/19/2021, 3pm (standard 48-hour grace period for 75% credit) (9% of your grade)
This is an individual project. Work on your own.
For this project, you will write a cache simulator that uses memory address traces as input. The traces can be found on iLearn in the Spring 2021 CSC 656 Chapter 5 section, and on the tiger.sfsu.edu server at
~whsu/MiBench
The traces are from the MiBench suite of embedded system benchmarks. The source for our traces is not important for our project, but in case you’re interested:
http://vhosts.eecs.umich.edu/mibench/
Sample executables for this project will be found on tiger.sfsu.edu at ~whsu/csc656/Code/S21/P2
A few lines from a sample trace file:
0xb7fc7489: W 0xbff20468 4 0xb7fc748e 0xb7fc748e: R 0xbff20468 4 0xb7fc748e 0xb7fc7495: W 0xbff20478 4 0xbff204b0 0xb7fc749e: R 0xb7fd9ff4 4 0x15f24
etc etc
The first hexadecimal integer is the address of the instruction that caused the data memory access. Then follows W for write, or R for read. The next hexadecimal integer is the memory address for that data access. Then follows the number of bytes read or written. Finally, the data read/written is displayed. In the example shown, for the top line, an instruction at 0xb7fc7489 caused a data write to address 0xbff20468 of 4 bytes; 0xb7fc748e was written to memory.
Benchmark Traces
Trace excerpts (100,000 accesses each) from some benchmarks have been chosen from the Mibench embedded applications suite. Each student will be assigned two traces. These are:
from consumer suite: lame, mad, jpeg from telecom suite: gsm, adpcm, FFT from security suite: sha, blowfish

from network suite: dijkstra, patricia
You should make all measurements for each cache configuration, using the trace for the benchmark.
Cache simulator implementation
You will write a cache simulator to collect statistics on events in the memory system. Your code will allocate some arrays that represent the state of the cache, read each line of a trace file in the specified format, and update the state of the proper cache array elements accordingly.
While you are only required to show measurements for a small number of cache configurations, your code must work for any cache size that is a power of 2 bytes.
Your simulator tracks the behavior of a writeback data cache, as defined below.
Cache System Characteristics
For both benchmarks, study cache configurations with 32-byte blocks. Make measurements for these systems:
System 1: Direct-mapped data cache, 1KB and 2KB System 2: k-way set associative data cache, 1KB and 2KB
Make measurements for each benchmark, for the following parameters:
System
Cache configuration
System 1a
1KB direct-mapped
System 1b
2KB direct-mapped
System 2a
1KB 2-way set associative
System 2b
1KB 4-way set associative
System 2c
2KB 2-way set associative
System 2d
2KB 4-way set associative
Assume that the miss penalty is equal to 100 cycles.
On a write miss, assume that the requested block is always brought into the cache, and the CPU writes into the proper word in the cache.
You should collect these statistics, and display them in order (see sample executables): • number of data reads (i.e., number of loads),
• number of data writes (i.e., number of stores),
• number of data accesses (i.e., number of loads + number of stores),
• number of total data read misses,
• number of total data write misses,
• number of data misses (i.e., number of accesses that miss the cache),

• number of dirty data read misses,
• number of dirty write misses,
• number of bytes read from memory,
• number of bytes written to memory,
• the total access time (in cycles) for reads,
• the total access time (in cycles) for writes,
• the overall data cache miss rate ( (read misses + write misses) / total accesses)
A data read miss is any read access (i.e., load) that results in an access to memory.
A data write miss is any write access (i.e., store) that results in an access to memory. A clean miss is a miss that does not replace a dirty block in the cache (i.e., the dirty bit for the block is off, or the block is not valid).
A dirty miss is a miss that replaces a dirty block in the cache.
(Note that some of these measurements give overlapping information, and it is easy to check whether your results are consistent by comparing some of the measurements.)
Follow the definitions given below for detailed operations, and how hits and misses are counted.
Detailed direct-mapped data cache operation (System 1)
Suppose an access has address A. A maps to index I in the direct-mapped data cache. Each access falls under one of these cases:
Case 1: the block containing A is found in data cache (cache hit) Read: no state changes, access time = 1 cycle
Write: set dirty bit = 1, access time = 1 cycle
[Case 2a/2b cover misses. For a cache miss, index I in the data cache may contain block X (different from the block that contains address A), or may be empty.]
Case 2a: the block containing A is not found in data cache, index I in data cache either contains block X (which is clean), or is empty (clean cache miss)
Read: move block containing A from memory into index I in data cache, dirty bit = 0, access time = (1 + miss penalty) cycles
Write: move block containing A from memory into index I in data cache, dirty bit = 1, access time = (1 + miss penalty) cycles
Case 2b: the block containing A is not found in data cache, index I in data cache contains block X in data cache, which is dirty (dirty cache miss)
Read: write block X to memory
move block containing A from memory into data cache, dirty bit = 0 access time = (1 + 2 * miss penalty) cycles
Write: write block X to memory
move block containing A from memory into data cache, dirty bit = 1 access time = (1 + 2 * miss penalty) cycles

Detailed k-way set-associative data cache operation (System 2)
All cases are basically the same as for System 1, except the cache is k-way set- associative.
Suppose an access has address A. A maps to index I in the direct-mapped data cache. Each index I in the data cache accesses one of k blocks. Assign ids 0 to k-1 to the k blocks within the same index.
For each block in the k-way set-associative cache, in addition to the bit-fields in the direct-mapped cache, there is also a last-used field. This contains the current instruction count of the access that last touched that block, i.e., the order of the access in the trace file, starting from 0.
Index I in the data cache contains block X0 to Xk-1 , corresponding to ids 0 to k-1; each of these k blocks is either different from the block that contains address A, or is empty.
Each access falls under one of these cases:
Case 1: the block containing A is found in index I id D in the data cache (cache hit) Read: last-used for index I id D = current IC, access time = 1 cycle
Write: last-used for index I id D = current IC, set dirty bit = 1, access time = 1 cycle
[Case 2a/2b cover misses. For a cache miss, the block containing A maps to index I. First choose one of k blocks. If there is an empty block, this is the empty block with the smallest id. If there is no empty block, this is the least recently used block, according to the last-used field for that block. Let the id of the chosen block be D.]
Case 2a: the block containing A is not found in data cache, the chosen block with id D is either clean, or is empty (clean cache miss)
Read: move block containing A from memory into block D in data cache, last-used for index I id D = current IC, dirty bit = 0
access time = (1 + miss penalty) cycles
Write: move block containing A from memory into index I data cache last-used for index I id D = current IC, dirty bit = 1
access time = (1 + miss penalty) cycles
Case 2b: the block containing A is not found in data cache, the chosen block with id D is dirty (dirty cache miss)
Read: write block X to memory
move block containing A from memory into block D in data cache, last-used for index I id D = current IC, dirty bit = 0
access time = (1 + 2 * miss penalty) cycles
Write: write block X to memory

move block containing A from memory into block D in data cache, last-used for index I id D = current IC, dirty bit = 1
access time = (1 + 2 * miss penalty) cycles
Simulator input format
System 1 will prompt for tracefile (name of the trace file, file path ok), cachesize (size of the cache in KB, and whether verbose mode is on/off. If verbose mode is on, prompt for the reference numbers for the first and last references to track (ic1 and ic2). A sample run:
whsu@tiger:~/csc656/Code/S21/P2$ ./sys1
Enter trace file: /home/whsu/MiBench/blowfish.xex Enter cache size in KB: 1
Verbose mode (y/n): y
Enter first and last reference to track: 0 10
0 1f 2fe7f7 0 0 0 0 2a
1 1f 2fe7f7 1 2fe7f7 0 1 1
2 1 2fe7f8 0 0 0 0 2a
3 0 2fe7f8 0 0 0 0 2a
4 5 202d1 0 0 0 0 2a
5 5 202d1 1 202d1 0 1 1
6 0 2fe7f8 1 2fe7f8 0 1 1
7 0 2fe7f8 1 2fe7f8 0 1 1
8 0 2fe7f8 1 2fe7f8 0 1 1
9 0 2fe7f8 1 2fe7f8 0 1 1
10 0 2fe7f8 1 2fe7f8 0 1 1
direct-mapped, writeback, size = 1KB
loads 60773 stores 39227 total 100000
rmiss 7273 wmiss 1598 total 8871
dirty rmiss 1697 dirty wmiss 919
bytes read 283872 bytes written 83712
read time 957773 write time 290927
miss rate 0.088710
System 2 has input prompts as sys1, but with an extra argument for set associativity k. A sample run:

whsu@tiger:~/csc656/Code/S21/P2$ ./sys2
Enter trace file: /home/whsu/MiBench/jpeg.xex Enter cache size in KB: 1
Enter set associativity: 2
Verbose mode (y/n): y
Enter first and last reference to track: 0 12 0 a 5fe50c 0 0 0 0 0 0 2a
1 a 5fe50c 1 0 0 5fe50c 1 1 1
2 a 5fe50c 1 0 1 5fe50c 1 1 1
3 a 5fe50c 1 0 2 5fe50c 1 1 1
4 b 5fe50c 0 0 0 0 0 0 2a
5 9 5fe50c 0 0 0 0 0 0 2a
6 d 5fe50c 0 0 0 0 0 0 2a
7 c 5fe50c 0 0 0 0 0 0 2a
8 a 5fe50c 1 0 3 5fe50c 1 1 1
9 9 5fe50c 1 0 5 5fe50c 1 1 1
10 d 5fe50c 1 0 6 5fe50c 0 1 1
11 c 5fe50c 1 0 7 5fe50c 0 1 1
12 a 5fe50c 1 0 8 5fe50c 1 1 1
2-way, writeback, size = 1KB
loads 65864 stores 34136 total 100000
rmiss 3572 wmiss 660 total 4232
dirty rmiss 728 dirty wmiss 231
bytes read 135424 bytes written 30688
read time 495864 write time 123236
miss rate 0.042320
Verbose format
Verbose output is absolutely crucial for debugging simulator applications. I highly recommend getting verbose output working early in your development cycle.
The data memory accesses in the trace file are numbered in sequential order, starting from zero. If verbose mode is on, verbose output is displayed for access ic1 to ic2, inclusive.
For System 1 (direct-mapped cache), the verbose format output comprises:
order of the load/store in address stream (starting from 0, in decimal) cache index calculated from address
cache tag calculated from address
valid bit of cache block accessed
tag stored in cache block accessed (0 if invalid) dirty bit of cache block accessed

hit or miss (0 miss, 1 hit)
Case number according to Detailed Operation (1, 2a, or 2b)
For example, consider three references from adpcm.xex for a 1KB direct-mapped cache that touch index 2. (No other references touch index 2 in this window of references.)
6 2 202b8 0 0 0 0 2a
The first number (6) is the reference number in the trace. The original reference is a read, with address 0x80ae050, which breaks up into index = 2 and tag = 202b8. The cache was empty, so the valid bit is 0, the old tag and dirty bits are also 0. It’s a miss (0), and this is Case 2a (clean miss with empty cache block).
Going down the trace a little further: 28 2 202c9 1 202b8 0 0 2a
This is reference 28, a write. The original reference address is 0x80b2446, which breaks up into index = 2 and tag = 202c9. Index 2 contains the block for reference 6, so the valid bit is 1, the old tag is the same as for reference 6, the dirty bit is 0 (since reference 6 was a read). This is a miss (0), and also Case 2a (clean miss).
A little further down:
41 2 202b8 1 202c9 1 0 2b
This is reference 41, a read. The original reference address is 0x80ae04c, which breaks up into index = 2 and tag = 202b8. Index 2 contains the block for reference 28, so the valid bit is 1, the old tag is the same as for reference 28, the dirty bit is 1 (since reference 28 was a write). This is a miss (0), and Case 2b (dirty miss, since index 2 contains a dirty block).
For System 2 (k-way set-associative cache), the verbose format comprises:
order of the load/store in address stream (starting from 0, in decimal) cache index calculated from address
cache tag calculated from address
valid bit of cache block accessed
id of the block chosen (D, from the specification) last-used field of the block chosen (0 if invalid) tag stored in cache block accessed (0 if invalid) dirty bit of cache block accessed
hit or miss (0 miss, 1 hit)
Case number according to Detailed Operation (1, 2a, or 2b)
For example, consider two references from adpcm.xex for a 1KB 2-way set-associative

cache:
881 2 40570 1 0 846 40570 0 1 1 (original address 0x80ae04c)
903 2 40592 1 1 868 40592 1 1 1 (original address 0x80b245f)
Reference 881 (a read) has index 2, tag 0x40570. It is found in a valid block (1), with id 0. That block was last touched by reference 846, the stored tag is 0x40570 (since it’s a hit), it was clean (0), we have a hit (1), and Case 1.
Reference 903 (a write) has index 2, tag 0x40592. It is found in a valid block (1), with id 1. That block was last touched by reference 868, the stored tag is 0x405792 (since it’s a hit), it was dirty (1), we have a hit (1), and Case 1.
So after references 881 and 903, this is the state of the two blocks with index 2:
Block id=0: valid=1, dirty=0, tag=0x40570, lastUsed=881 Block id=1: valid=1, dirty=1, tag=0x40592, lastUsed=903
Then consider two more references:
911 2 40590 1 0 881 40570 0 0 2a (original address 0x80b2040)
916 2 40570 1 1 903 40592 1 0 2b (original address 0x80ae04c)
Reference 911 (a read) has index 2, tag 0x40590. It is not found in either of the two blocks with index 2. The LRU block is the one with id 0, which was last touched by reference 881; we’ll place reference 911 into id 0. The valid bit is already 1, the id 0. That block was last touched by reference 881, the stored tag is 0x40570 (from reference 881), it was clean (0), we have a miss (0), and Case 2a (clean miss).
After reference 911, the state of the two blocks with index 2:
Block id=0: valid=1, dirty=0, tag=0x40590, lastUsed=911 Block id=1: valid=1, dirty=1, tag=0x40592, lastUsed=903
Reference 916 (a read) has index 2, tag 0x40570. It is not found in either of the two blocks with index 2. The LRU block is the one with id 1, which was last touched by reference 903; we’ll place reference 916 into id 1. The valid bit is already 1, the id 1. That block was last touched by reference 903, the stored tag is 0x40592 (from reference 903), it was dirty (1), we have a miss (0), and Case 2b (dirty miss).
After reference 916, the state of the two blocks with index 2:
Block id=0: valid=1, dirty=0, tag=0x40590, lastUsed=911 Block id=1: valid=1, dirty=0, tag=0x40570, lastUsed=916

Submission (2 parts):
Source code submission: submit a single tar, zip or other archive file using iLearn. Your archive file should expand into a single directory tagged with your login name. The directory should contain your source files and a README file (plaintext) with clear instructions on how to compile and run your code.
I should be able to cd into the directory you submitted, follow the README instructions to compile your code to generate two executables, sys1 (for System 1) and sys2 (for System 2). See input prompt formats in the sample runs.
You may develop your code on any platform, in any language. However, your submission must compile and run on tiger.sfsu.edu (a Linux server), following your instructions, without additional software installations. (Note that the C/C++ and Java compilers are multiple versions behind; please don’t try to use recent features without testing them on tiger. Also, as with most servers, IDEs will not be available for remote access. You have to stick with Unix command line.)
Report submission: Submit a table (in pdf format) showing miss rate and total access time for each system, for each benchmark you are responsible for. Put the pdf in the zip file you submitted for the source code. Your table should look something like
Benchmark 1 miss rate
Benchmark 1 tot. access time
Benchmark 2 miss rate
Benchmark 2 tot. access time
System 1a
System 1b
System 2a
System 2b
System 2c
System 2d
Note: The address traces used in this project are generated using the Pin program instrumentation toolkit, using the atrace or address trace tool. While you don’t need to know anything about Pin, if you’re curious, check out
https://software.intel.com/content/www/us/en/develop/articl
es/pin-a-dynamic-binary-instrumentation-tool.html