EECS 370 Final Exam Winter 2019
Notes:
● Closed book.
● You are allowed one cheat sheet (notes on both sides allowed)
● 13 problems on 19 pages. Count them to be sure you have them all.
● Calculators without wireless are allowed, but no PDAs, Portables, Cell phones, etc.
● This exam is fairly long; don’t spend too much time on any one problem.
● You have 120 minutes for the exam.
● Some questions may be more difficult than others. You may want to skip around.
● Show your work to get partial credit.
● Write legibly and dark enough for the scanners to read your work.
Write your uniqname on the line provided at the top of each page.
You are to abide by the University of Michigan/Engineering honor code. Please sign below to
signify that you have kept the honor code pledge:
I have neither given nor received aid on this exam, nor have I concealed any violations of the Honor Code.
1 of 19
Signature:
Name:
Uniqname:
First and last name of person sitting to your right (write the symbol ⊥ if at the end of a row): _________________________________________
_________________________________________
_________________________________________
_________________________________________
First and last name of person sitting to your left (write the symbol ⊥ if at the end of a row): _________________________________________
2 of 19 uniqname: ______________________
Problem 1. Warm-Up Points: ___/10 Specify whether statements a-j are True or False. Completely fill in the ENTIRE bubble of
your selected answer.
True False
a) CISC has higher CPI than RISC in the common case.
True
b) LRU cache replacement policy can help exploit temporal locality for a direct-mapped cache.
False
c) Tags in a physically addressed cache remain valid across a context switch.
True
d) Data hazards exist in a multi-cycle data path.
False
e) Increasing the number of stages in a pipeline data path will always decrease the execution time of a program.
False
f) Increasing cache block size can reduce compulsory misses in a cache.
True
g) A multi-level page table can consume more space than a single-level page table.
True
h) A multi-level page table requires fewer memory accesses than a single level page table.
False
i) Write-through caches can incur fewer writes to memory than write-back caches.
True/False*
j) In the common case, a multi-level page table takes up more memory than a single-level page table.
False
*The question is ambiguous, as it does not explicitly state whether writes to memory occur byte by byte or not. If writes to memory were byte by byte, then a write through cache will have fewer writes in the case where only one byte is written to a block in the cache and the block size is 4. In a write through cache, this will be one write to memory, but in a write back cache, there will be 4, as the entire block must be written to memory upon eviction. Because of this, we will be accepting both answers. However, the answer to this question will be false if this question were to be asked on the exam, as we would
provide the necessary stipulations.
3 of 19 uniqname: ______________________
Problem 2. Too Much Branching! Points: ___/6
A C program has three branches. Their outcomes in an execution is listed below:
a) Assume static prediction, where a compiler decides whether a branch is predicted always T or NT. Determine the optimal decision and mispredictions for the above execution.
b) Assume each branch is dynamically predicted using a local 2-bit saturating counter for each branch. Assume counters are initialized to weakly taken. Determine mispredictions.
c) Consider a processor with 20 pipeline stages.
● 20% of instructions executed are branches
● Branch prediction accuracy is 96%.
● Branches are resolved in stage 7
● CPI is 1 in the absence of branch mispredictions
Calculate the CPI. Show your work.
CPI : ___1 + (0.2) (0.04) 6_= ____1.048___________
B1
NT
NT
NT
NT
T
B2
NT
T
NT
NT
NT
T
B3
T
T
T
T
B1
B2
B3
T or NT?
NT
NT
T
# Mispredictions
1
2
0
B1
B2
B3
# Mispredictions
2
4
0
4 of 19 uniqname: ______________________
Problem 3. Architecting a Cache
Consider a program that copies one large array (> 1000 words) into another: ⋁ i (destination[i] = source[i])
Points: ___/3
Cache is direct-mapped and uses allocate-on-write policy. Its size is 256 words. Determine configuration for data cache that would perform the best for this program. Assume both arrays are cache-block aligned, and that all cache blocks are evicted before the program terminates.
(a) Select a block size (in words): I. 1 II. 64 III. 128 IV. 256
Larger the block higher the spatial locality for this program. However, for a block of size 256 words which is same as cache size, every accesses after the first access to a cache block would be a conflict miss.
(b) Select the write policy:
I. Write-through
II. Write-back
III. Don’t care. Both would result in same number of writes.
Write-back typically has fewer writes than write-through. However, for this program, due to perfect spatial locality, and no temporal locality, both would result in same number of writes. We will accept both write-back and don’t care. If we are counting the number of bytes written to memory, then write-back and write-through have same number of bytes. It we are counting the number of writes to memory, write back is better.
Problem 4. Hazards Points: ___/8
Assume the 5-stage LC2K pipeline discussed in class. Assume that the register file supports internal forwarding, but the pipeline has NO hazard detection and NO data forwarding.
a) Draw the RAW (read-after-write) data dependencies for the following LC2K program.
b) Reorder instructions and insert no more than one noop to ensure correct execution. When you reorder an instruction, don’t change its number. Use “N” for noop’s
number.
# Instruction
1. add 1 2 3 2. add 2 4 6 3.lw 0 1 loc1 4.sw 6 1 loc2 5. nor 3 2 2 6. nor 4 2 1
# Instruction
1 add123
2 add246
3 lw01loc1
5 nor322
N noop
4 sw 6 1 loc2 6 nor 4 2 1
5 of 19 uniqname: ______________________
RAW: 2->4 (reg 6), 3->4 (reg 1), 5->6 (reg 2)
Problem 5. Pipeline Design Points: ___/5
Assume the 5-stage LC2K pipeline discussed in class. Say we unify the instruction and data memory into one. We provision one read/write port that allows only one read or write to the unified memory in any cycle.
Answer the following questions assuming we use detect-and-stall to resolve this new structural hazard. Completely fill the bubbles.
a) List the instruction(s) that can cause the new hazard. Justify in one sentence. Instructions: _________lw, sw_________________________
Only instructions that can read/write to memory (lw, sw) can cause a hazard as they conflict with instruction fetch that also reads from memory.
b) What is the earliest pipeline stage when we can detect the new hazard for an instruction? Justify in one sentence.
Stage: ___IF or ID*____
Check if opcode is lw or sw in the ID stage.
* Technically, since the question does not restrict the addition of extra hardware and the instruction is already available in the IF stage, decode logic can be added to detect the hazard in the IF stage
c) Which stage should we insert a bubble (noop) to resolve the new hazard? Justify in one sentence.
Stage: ___IF or IF/ID____
Fetch stage cannot read an instruction from memory when there is a lw/sw in MEM stage. Hazard resolved by inserting noop into IF/ID latch.
6 of 19 uniqname: ______________________
Problem 6. JALR Pipeline Design Points: ___/5 Answer the following on resolving control hazards due to the jalr instruction in the standard
5-stage LC2K pipeline.
a) What is the earliest stage we can resolve the target for jalr and update the PC? Explain briefly. Assume that latency of write to PC is negligible.
Stage: __*______
*Because the question does not specify whether additional hardware can be added, it is ambiguous. Technically, the answer could be any stage except for WB. Because of this ambiguity, we will be taking any answer.
b) Direction prediction accuracy for jalr is __higher/better/easier etc.__________ than beq. c) Can branch target buffer (BTB) be used to predict target of jalr? Answer either yes or no
statements.
Yes, we can. But, compared to beq, there is one significant difference
Target (PC + 1 + offset) for a beq is a constant. But jalr’s target ([regA]) can change across its different instances. Therefore, target prediction for jalr can be wrong, and must be checked when jalr resolves the target.
Action
jalr regA regB
[regB] = PC + 1; PC = [regA];
7 of 19 uniqname: ______________________ No, we cannot. Because compared to beq, there is one significant difference
____________________________________________n/a_____________________________
8 of 19 uniqname: ______________________
Problem 7. Performance Points: ___/10
Assume 5 stage pipeline discussed in class. Assume data-forwarding to EX stage and register internal forwarding. Consider a program with the following mix of instructions:
Opcode
%
add/nor
5
lw
50
sw
25
beq
20
Assume 70% of branches are predicted correctly. Consider the following abstract instruction sequence to represent all loads and its following instructions: lw; I1; I2; I3. Assume the following:
● 25% of all loads are followed by an instruction (I1) dependent on that load.
● 30% of all loads are followed by an instruction (I1) that is NOT dependent on
that load, but the next instruction (I2) is dependent on that load.
● 10% of all loads are followed by two instructions (I1 and I2) that are NOT
dependent on that load, but the instruction following those two (I3) is dependent on the load.
a) What is the CPI? Show your work.
CPI = 1 + branch_hazard_penalty + load_data_hazard_penalty = 1 + (0.2) (0.3) (3) + (.5) (.25) (1)
= 1+0.18+.125 = 1.305
b) Now assume that branches are resolved in the EX stage and that the MEM stage takes two cycles (loads and stores don’t finish until the end of the second MEM stage). What is the CPI?
CPI = 1 + branch_hazard_penalty + load_data_hazard_penalty = 1+(0.2)(.3)(2)+0.5(.25*2+0.3*1)
= 1+0.12+.4
= 1.52
9 of 19 uniqname: ______________________
Problem 8. Performance II Points: ___/8
Consider a system with a unified data and instruction cache. The virtual memory system is a single-level page table system. Assume that the system uses a virtually addressed cache.
Consider the latencies and hit rates for different accesses below. Assume TLB hits will not cause a page fault.
● TLB hit rate: 99%
● Cache hit rate: 90%
● Page fault rate: 0.05% of total memory accesses (loads and stores)
● TLB lookup: 1 cycle
● Cache Access: 1 cycle
● Memory Access: 60 cycles (does not include cache access)
● Disk Access: 60,000 cycles (does not include cache or memory access)
Note:
● TLB and cache are not accessed in parallel
● All accesses are sequential (i.e. accessing cache and memory takes 60 + 1 cycles)
● The page table is always pinned in main memory
● Upon retrieval from cache, main memory, or disk, the data is sent immediately to the
CPU. Assume other updates occur in parallel with no impact on performance.
(a) Determine how many cycles it will take to access data that lives in a particular physical location. Note, some of the locations may have two answers. Make sure to include allpossibleanswers.S howyourwork.
Physical Location
Cycles
Cache
1
Main Memory
62 ( if TLB hit ), 122 ( if TLB miss )
Disk
60,062
(b) Calculate the average memory access time (AMAT). Show your work.
1 + .10{1 + .99(60) + .01 [60 + .50(60) + .50(60000)]}
AMAT = 37.13 cycles
Problem 9. Challenging the High C’s Points: ___/12
10 of 19 uniqname: ______________________
Consider a byte-addressable system with 16-bit addresses, and a cache with the following configuration:
● Cache size
● Block size
● Set associative
● LRU replacement policy
● Write-back, allocate-on-write
Assume the cache is initially empty with all tags set to 0.
Simulate the cache for the following memory trace. Addresses are already divided into
Determine accesses that are misses. For misses, determine its type (compulsory, capacity, conflict), and the tags of blocks that are evicted.
A completely empty row/column loses all points for that row/column
64 bytes 8 bytes 2-way
Tag
Set Index
Block Offset
Hit or Miss?
Type of Miss (if applicable)
Tag of evicted block (if applicable)
AB
0
7
Miss
Compulsory
AB
1
1
Miss
Compulsory
BB
1
2
Miss
Compulsory
AB
0
3
Hit
1B
0
0
Miss
Compulsory
EF
0
1
Miss
Compulsory
AB
AB
1
2
Hit
AB
0
3
Miss
Conflict
1B
2F
1
5
Miss
Compulsory
BB
BB
1
1
Miss
Conflict
AB
11 of 19 uniqname: ______________________
Problem 10. What’s the Ca(t)che? Points: ___/15
Consider the following memory trace with addresses and cache hit/miss outcomes. Assume a 8-bit byte-addressable system with a cache of size 64 bytes. Assume LRU, if needed. Determine the cache configuration. Justify. Guesses without justification will not receive credit.
Access# Address Outcome
1. 0x04 – Miss 2. 0x25 – Miss 3. 0x05 – Hit 4. 0x47 – Miss 5. 0x26 – Hit 6. 0x06 – Miss 7. 0x00 – Miss
Block Size?
4 bytes
Associativity?
Direct-mapped
Number of sets?
16
Justify your block size using following constraints. You may NOT need all of them.
(State all block sizes in powers of 2) Circle one relation
Block size is ≥ 4 bytes, because memory access # 5 is a hit Block size is < 8 bytes, because memory access # 7 is a miss
The above access are just an example of possible answers. There may be other answers. Therefore, block size must be = __4_ ___bytes
Correct, but redundant constraints:
Block size is ≥ Block size is < Block size is < Block size is <
2 bytes, because memory access # 3 2^6 bytes, because memory access # 2 2^7 bytes, because memory access # 4 2^6 bytes, because memory access # 6
is a hit
is a miss is a miss is a miss
Justify your associativity using following constraints. You may NOT need all of them.
Circle one relation
Associativity cannot be = 2-way, because memory access # 5 is a hit Associativity cannot be ≥ 4-way, because memory access # 6 is a miss
12 of 19 uniqname: ______________________
There is no counterexample for 1-way
13 of 19
uniqname: ______________________
Problem 11. Multi Level Page Table
Consider the following system:
● Byte-addressable with 48-bit virtual addresses.
● 64MB of RAM installed.
● Page size: 2 KB
● Three level page table
● A second level page table occupies 1 page. A third level page table occupies 1 page. Size of first level page table is not specified.
Points: ___/8
● Assume each page table entry has 4 control bits and a physical page number. All page table entries must occupy a minimal power of 2 number of bytes.
a. Physical address
i. Size of physical address
ii. Size of page offset
iii. Size of physical page number
___________
b. Third level page table
i. Size of a page table entry
ii. Number of entries
iii. Size of index
c. Second level page table
i. Size of a page table entry
ii. Number of entries
iii. Size of index
d. First level page table
i. Size of a page table entry
ii. Size of index
iii. Number of entries
iv. Size of one first level page table (in bytes)
v. How many pages does the first level page table occupy?
26 bits 11 bits
4B 512 9 bits
4B 512 9 bits
4B
19 bits 2^19 2^21 B 2^10
___________ ___________ 15 bits
___________
___________
___________
___________
___________
___________
___________
___________
___________
___________
___________
14 of 19 uniqname: ______________________
15 of 19
uniqname: ______________________
Problem 12. VM Simulation
Consider the following system:
● Virtual address space size
● Page size
● Physical memory size
● Page replacement policy
● Page table entry size
Points: ___/15
● Byte addressable architecture
● Single-level page table
○ Page table size : 128 bytes
: 1 KB
: 128 bytes : 8 pages
: LRU
: 16 bytes
Notes:
On a page fault, the page table is updated before allocating a physical page.
If more than one free page is available, the smallest physical page number is chosen. Physical page #0 is reserved for the operating system (OS). It cannot be replaced.
(a) Assume that a process with Process ID: 11 (PID: 11), has been running. The state of the physical memory is displayed below on the left. Complete the page table for this process on the right. [2 pts]
Physical Page # (PPN)
Memory Contents
0x0
Reserved for OS
0x1
Page Table of PID 11
0x2
0x3
PID 11: VPN 4
0x4
0x5
PID 11: VPN 5
0x6
PID 11: VPN 0
0x7
Virtual Page # (VPN)
Valid
Physical Page # (PPN)
0x0
1
0x6
0x1
0
0x2
0
0x3
0
0x4
1
0x3
0x5
1
0x5
0x6
0
0x7
0
(b) Say a new process (PID: 7) starts. Which physical page would its page table be stored in? Assume the memory state in part (a). [1 pt.]
PPN: 2
16 of 19 uniqname: ______________________
(c) Complete the following table for the given sequence of virtual address requests. Assume that the memory state in part (a), and that space for page table for PID 7 has been allocated. [12 pts]
Time
Process ID
Virtual Address (VA)
Virtual Page # (VPN)
Physical Page # (PPN)
Page Fault? (Y/N)
Physical Address (PA)
0
11
0x28A
0x5
0x5
N
0x28A
1
7
0x28A
0x5
0x4
Y
0x20A
2
7
0x29B
0x5
0x4
N
0x21B
3
11
0x94
0x1
0x7
Y
0x394
4
11
0x200
0x4
0x3
N
0x180
5
7
0x1FF
0x3
0x6
Y
0x37F
6
11
0x10
0x0
0x5
Y
0x290
You may use the tables below as workspace (won’t be graded). Main Memory State Page Table of Process 7
Page Table of Process 11
PPN
Memory Contents
LRU
0x0
Operating System Data
0x1
0x2
0x3
0x4
0x5
0x6
0x7
VPN
Valid
PPN
0x0
0x1
0x2
0x3
0x4
0x5
0x6
0x7
VPN
0x0
Valid
PPN
0x1
0x2
0x3
0x4
0x5
0x6
0x7
17 of 19 uniqname: ______________________
Problem 13. Stack ISA Points: ___/15
Consider the following stack-based ISA, where an instruction can only read/write to memory in the first and/or second position from the top of the stack.
Assume all instructions perform PC = PC + 1, unless specified otherwise.
Instruction
Operational Semantics
pop
stack.pop()
halt
stack.halt()
pushi immediate
stack.push( signExtend(immediate) )
beq immediate
top = stack.pop() if top == 0:
PC = PC + 1 + signExtend(immediate);
load immediate
top = stack.pop()
memAddr = top + signExtend(immediate) stack.push( memory[memAddr] )
store immediate
top = stack.pop()
memAddr = top + signExtend(immediate) data = stack.pop()
memory[memAddr] = data
add
sub
nor
sll
slr
first = stack.pop(); second = stack.pop() result = second OP first
push(result)
where OP is:
add: + sub: - nor: BITWISE NOR sll: << slr: >>
less
first = stack.pop(); second = stack.pop() if first < second:
stack.push(1) else:
stack.push(0)
18 of 19 uniqname: ______________________
Translate the following C program into the new stack-based ISA assembly. Assume that variables i, j, and s, are stored at addresses 33, 34, and 35 (all decimals), respectively.
C Code
Stack-Based ISA Assembly Code
if (i == 3) { j /= 4;
for (; s < 5; s += 2) { i += 2;
} }
else {
j *= 8;
}
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
pushi load pushi sub beq
else pushi load
pushi sll pushi store pushi pushi beq
if load pushi
slr pushi store
for pushi pushi
load
less beq
load
pushi
add
pushi
store
pushi
pushi
beq
end halt i3 j2 s0
0 i 3
if or 9 or 14 0
j
3ori
0 j 0 0
end or 18 or 32 j
2orj
0 j 5 0 s
end i 2
0 i 0 0
for or -13 or 19
19 of 19 uniqname: ______________________
*Problem 13 will not be graded, and every submission will be given full credit. It fails to implement online3oftheCcodeandtheloadinstructiononline14oftheassemblycodewill cause an error if the branch on line 4 is taken.