程序代写代做代考 compiler cache C assembly EECS 370 Final Exam Winter 2019

EECS 370 Final Exam Winter 2019
Notes:
● Closed book.
● You are allowed one cheat sheet (notes on both sides allowed)
● 13 problems on 19 pages​. Count them to be sure you have them all.
● Calculators without wireless are allowed, but no PDAs, Portables, Cell phones, etc.
● This exam is fairly long; ​don’t spend too much time on any one problem​.
● You have ​120 minutes​ for the exam.
● Some questions may be more difficult than others. You may want to skip around.
● Show your work to get partial credit.
● Write legibly and dark enough for the scanners to read your work​.
Write your uniqname on the line provided at the top of each page​.
You are to abide by the University of Michigan/Engineering honor code. Please sign below to
signify that you have kept the honor code pledge:
I have neither given nor received aid on this exam, nor have I concealed any violations of the Honor Code.​
1 of 19
Signature:
Name:
Uniqname:
First and last name of person sitting to your ​right​ (write the symbol ⊥ if at the end of a row): _________________________________________
_________________________________________
_________________________________________
_________________________________________
First and last name of person sitting to your ​left​ (write the symbol ⊥ if at the end of a row): _________________________________________

2 of 19 uniqname: ______________________
Problem 1. Warm-Up Points: ___/10 Specify whether statements ​a-j are True or False. ​Completely fill in the ENTIRE bubble ​of
your selected answer.
True False
a) CISC has higher CPI than RISC in the common case.
True
b) LRU cache replacement policy can help exploit temporal locality for a direct-mapped cache.
False
c) Tags in a physically addressed cache remain valid across a context switch.
True
d) Data hazards exist in a multi-cycle data path.
False
e) Increasing the number of stages in a pipeline data path will always decrease the execution time of a program.
False
f) Increasing cache block size can reduce compulsory misses in a cache.
True
g) A multi-level page table can consume more space than a single-level page table.
True
h) A multi-level page table requires fewer memory accesses than a single level page table.
False
i) Write-through caches can incur fewer writes to memory than write-back caches.
True/​False​*
j) In the common case, a multi-level page table takes up more memory than a single-level page table.
False
*The question is ambiguous, as it does not explicitly state whether writes to memory occur byte by byte or not. If writes to memory were byte by byte, then a write through cache will have fewer writes in the case where only one byte is written to a block in the cache and the block size is 4. In a write through cache, this will be one write to memory, but in a write back cache, there will be 4, as the entire block must be written to memory upon eviction. Because of this, we will be accepting both answers. However, the answer to this question will be false if this question were to be asked on the exam, as we would
provide the necessary stipulations.

3 of 19 uniqname: ______________________
Problem 2. Too Much Branching! Points: ___/6
A C program has three branches. Their outcomes in an execution is listed below:
a) Assume static prediction, where a compiler decides whether a branch is predicted ​always ​T or NT. Determine the optimal decision and mispredictions for the above execution.
b) Assume each branch is dynamically predicted using a local ​2-bit saturating counter ​for each branch. Assume counters are initialized to​ weakly taken​. Determine mispredictions.
c) Consider a processor with 20 pipeline stages.
● 20% of instructions executed are branches
● Branch prediction accuracy is 96%.
● Branches are resolved in stage 7
● CPI is 1 in the absence of branch mispredictions
Calculate the CPI. Show your work.
CPI : ___​1 + (0.2) (0.04) 6_= ____1.048​___________
B1
NT
NT
NT
NT
T
B2
NT
T
NT
NT
NT
T
B3
T
T
T
T
B1
B2
B3
T or NT?
NT
NT
T
# Mispredictions
1
2
0
B1
B2
B3
# Mispredictions
2
4
0

4 of 19 uniqname: ______________________
Problem 3. Architecting a Cache
Consider a program that copies one large array (> 1000 words) into another: ⋁ i (​destination[i] = source[i]​)
Points: ___/3
Cache is direct-mapped and uses allocate-on-write policy. Its size is 256 words. Determine configuration for data cache that would perform the best for this program. Assume both arrays are cache-block aligned, and that all cache blocks are evicted before the program terminates.
(a) Select a block size (in words): I. 1 II. 64 III. ​128 IV. 256
Larger the block higher the spatial locality for this program. However, for a block of size 256 words which is same as cache size, every accesses after the first access to a cache block would be a conflict miss.
(b) Select the write policy:
I. Write-through
II. ​Write-back
III. ​Don’t care. Both would result in same number of writes.
Write-back typically has fewer writes than write-through. However, for this program, due to perfect spatial locality, and no temporal locality, both would result in same number of writes. We will accept both write-back and don’t care. If we are counting the number of bytes written to memory, then write-back and write-through have same number of bytes. It we are counting the number of writes to memory, write back is better.
Problem 4. Hazards Points: ___/8
Assume the 5-stage LC2K pipeline discussed in class. Assume that the register file supports internal forwarding, but the pipeline has ​NO​ hazard detection and ​NO​ data forwarding.
a) Draw the RAW (read-after-write) data dependencies for the following LC2K program.
b) Reorder instructions and insert no more than one ​noop to ensure correct execution. When you reorder an instruction, don’t change its number. Use ​“N” ​for ​noop’s
number.
# Instruction
1. add 1 2 3 2. add 2 4 6 3.lw 0 1 loc1 4.sw 6 1 loc2 5. nor 3 2 2 6. nor 4 2 1
# Instruction
1 add123
2 add246
3 lw01loc1
5 nor322
N noop
4 sw 6 1 loc2 6 nor 4 2 1

5 of 19 uniqname: ______________________
RAW: ​2->4 (reg 6), 3->4 (reg 1), 5->6 (reg 2)
Problem 5. ​Pipeline Design Points: ___/5
Assume the 5-stage LC2K pipeline discussed in class. Say we unify the instruction and data memory into one. We provision one read/write port that allows only one read or write to the unified memory in any cycle.
Answer the following questions assuming we use ​detect-and-stall​ to resolve this new structural hazard. Completely fill the bubbles.
a) List the instruction(s) that can cause the new hazard. Justify in one sentence. Instructions: _________​lw, sw​_________________________
Only instructions that can read/write to memory (lw, sw) can cause a hazard as they conflict with instruction fetch that also reads from memory.
b) What is the earliest pipeline stage when we can detect the new hazard for an instruction? Justify in one sentence.
Stage: ___​IF or ID*​____
Check if opcode is ​lw or sw​ in the ID stage.
* Technically, since the question does not restrict the addition of extra hardware and the instruction is already available in the IF stage, decode logic can be added to detect the hazard in the IF stage
c) Which stage should we insert a bubble (noop) to resolve the new hazard? Justify in one sentence.
Stage: ___​IF or IF/ID​____
Fetch stage cannot read an instruction from memory when there is a lw/sw in MEM stage. Hazard resolved by inserting noop into IF/ID latch.

6 of 19 uniqname: ______________________
Problem 6. JALR ​Pipeline Design Points: ___/5 Answer the following on resolving control hazards due to the ​jalr​ instruction in the standard
5-stage LC2K pipeline.
a) What is the earliest stage we can resolve the target for ​jalr​ and update the PC? Explain briefly. Assume that latency of write to PC is negligible.
Stage: __​*​______
*Because the question does not specify whether additional hardware can be added, it is ambiguous. Technically, the answer could be any stage except for WB. Because of this ambiguity, we will be taking any answer.
b) Direction prediction accuracy for ​jalr ​is ​__​higher/better/easier etc.​__________ ​than ​beq. c) Can branch target buffer (BTB) be used to predict target of ​jalr​? Answer either yes or no
statements.
Yes, we can​. But, compared to ​beq​, there is one significant difference
Target (PC + 1 + offset) for a beq is a constant. But jalr’s target ([regA]) can change across its different instances. Therefore, target prediction for jalr can be wrong, and must be checked when jalr resolves the target.
Action
jalr​ ​regA​ ​regB
[regB] = PC + 1; PC = [regA];

7 of 19 uniqname: ______________________ No, we cannot​. Because compared to ​beq​, there is one significant difference
____________________________________________​n/a​_____________________________

8 of 19 uniqname: ______________________
Problem 7. Performance Points: ___/10
Assume 5 stage pipeline discussed in class. Assume data-forwarding to EX stage and register internal forwarding. Consider a program with the following mix of instructions:
Opcode
%
add/nor
5
lw
50
sw
25
beq
20
Assume 70% of branches are predicted correctly. Consider the following abstract instruction sequence to represent all loads and its following instructions: ​lw; I1; I2; I3. ​Assume the following:
● 25% of all loads are followed by an instruction (​I1​) dependent on that load.
● 30% of all loads are followed by an instruction (​I1​) that is NOT dependent on
that load, but the next instruction (​I2​) is dependent on that load.
● 10% of all loads are followed by two instructions (​I1​ and ​I2​) that are NOT
dependent on that load, but the instruction following those two (​I3​) is dependent on the load.
a) What is the CPI? Show your work.
CPI = 1 + branch_hazard_penalty + load_data_hazard_penalty = 1 + (0.2) (0.3) (3) + (.5) (.25) (1)
= 1+0.18+.125 = 1.305
b) Now assume that branches are resolved in the EX stage and that the MEM stage takes two cycles (loads and stores don’t finish until the end of the second MEM stage). What is the CPI?
CPI = 1 + branch_hazard_penalty + load_data_hazard_penalty = 1+(0.2)(.3)(2)+0.5(.25*2+0.3*1)
= 1+0.12+.4
= 1.52

9 of 19 uniqname: ______________________
Problem 8. Performance II Points: ___/8
Consider a system with a unified data and instruction cache. The virtual memory system is a single-level page table system. Assume that the system uses a ​virtually addressed​ cache.
Consider the latencies and hit rates for different accesses below. Assume TLB hits will​ not cause a page fault.
● TLB hit rate: 99%
● Cache hit rate: 90%
● Page fault rate: 0.05% of ​total memory accesses​ (loads and stores)
● TLB lookup: 1 cycle
● Cache Access: 1 cycle
● Memory Access: 60 cycles (does not include cache access)
● Disk Access: 60,000 cycles (does not include cache or memory access)
Note:
● TLB and cache are ​not​ accessed in parallel
● All accesses are​ sequential​ (i.e. accessing cache and memory takes 60 + 1 cycles)
● The page table is always pinned in main memory
● Upon retrieval from cache, main memory, or disk, the data is sent immediately to the
CPU. Assume other updates occur in parallel with no impact on performance.
(a) Determine how many cycles it will take to access data that lives in a particular physical location. ​Note, some of the locations may have two answers. Make sure to include allpossibleanswers.S​ howyourwork.
Physical Location
Cycles
Cache
1
Main Memory
62 ( if TLB hit ), 122 ( if TLB miss )
Disk
60,062
(b) Calculate the average memory access time (AMAT). Show your work.
1 + .10{1 + .99(60) + .01 [60 + .50(60) + .50(60000)]}
AMAT = ​37.13 cycles
Problem 9. Challenging the High C’s Points: ___/12

10 of 19 uniqname: ______________________
Consider a byte-addressable system with 16-bit addresses, and a cache with the following configuration:
● Cache size
● Block size
● Set associative
● LRU replacement policy
● Write-back, allocate-on-write
Assume the cache is initially empty with all tags set to 0.
Simulate the cache for the following memory trace. Addresses are already divided into tuples.
Determine accesses that are misses. For misses, determine its type (compulsory, capacity, conflict), and the tags of blocks that are evicted.
A completely empty row/column loses all points for that row/column
64 bytes 8 bytes 2-way
Tag
Set Index
Block Offset
Hit or Miss?
Type of Miss (if applicable)
Tag of evicted block (if applicable)
AB
0
7
Miss
Compulsory
AB
1
1
Miss
Compulsory
BB
1
2
Miss
Compulsory
AB
0
3
Hit
1B
0
0
Miss
Compulsory
EF
0
1
Miss
Compulsory
AB
AB
1
2
Hit
AB
0
3
Miss
Conflict
1B
2F
1
5
Miss
Compulsory
BB
BB
1
1
Miss
Conflict
AB

11 of 19 uniqname: ______________________
Problem 10. What’s the Ca(t)che? Points: ___/15
Consider the following memory trace with addresses and cache hit/miss outcomes. Assume a 8-bit byte-addressable system with a cache of size 64 bytes. Assume LRU, if needed. Determine the cache configuration. Justify. ​Guesses without justification will not receive credit.
Access# Address Outcome
1. 0x04 – Miss 2. 0x25 – Miss 3. 0x05 – ​Hit 4. 0x47 – Miss 5. 0x26 – ​Hit 6. 0x06 – Miss 7. 0x00 – Miss
Block Size?
4 bytes
Associativity?
Direct-mapped
Number of sets?
16
Justify your block size using following constraints. You may NOT need all of them.
(State all block sizes in powers of 2) Circle one relation
Block size is ≥ 4 bytes, because memory access # 5 is a ​hit Block size is < 8 bytes, because memory access # 7 is a ​miss The above access are just an example of possible answers. There may be other answers. Therefore, block size must be = __​4_​ ___bytes Correct, but redundant constraints: Block size is ≥ Block size is < Block size is < Block size is < 2 bytes, because memory access # 3 2^6 bytes, because memory access # 2 2^7 bytes, because memory access # 4 2^6 bytes, because memory access # 6 is a ​hit is a ​miss is a ​miss is a ​miss Justify your associativity using following constraints. You may NOT need all of them. Circle one relation Associativity cannot be = 2-way, because memory access # 5 is a ​hit Associativity cannot be ≥ 4-way, because memory access # 6 is a ​miss 12 of 19 uniqname: ______________________ There is no counterexample for 1-way 13 of 19 uniqname: ______________________ Problem 11. Multi Level Page Table Consider the following system: ● Byte-addressable with 48-bit virtual addresses. ● 64MB of RAM installed. ● Page size: 2 KB ● Three level page table ● A second level page table occupies 1 page. A third level page table occupies 1 page. Size of first level page table is not specified. Points: ___/8 ● Assume each page table entry has 4 control bits and a physical page number. All page table entries must occupy a minimal power of 2 number of bytes. a. Physical address i. Size of physical address ii. Size of page offset iii. Size of physical page number ___________ b. Third level page table i. Size of a page table entry ii. Number of entries iii. Size of index c. Second level page table i. Size of a page table entry ii. Number of entries iii. Size of index d. First level page table i. Size of a page table entry ii. Size of index iii. Number of entries iv. Size of one first level page table (in bytes) v. How many pages does the first level page table occupy? 26 bits 11 bits 4B 512 9 bits 4B 512 9 bits 4B 19 bits 2^19 2^21 B 2^10 ___________ ___________ 15 bits ___________ ___________ ___________ ___________ ___________ ___________ ___________ ___________ ___________ ___________ ___________ 14 of 19 uniqname: ______________________ 15 of 19 uniqname: ______________________ Problem 12. VM Simulation Consider the following system: ● Virtual address space size ● Page size ● Physical memory size ● Page replacement policy ● Page table entry size Points: ___/15 ● Byte addressable architecture ● Single-level page table ○ Page table size : 128 bytes : 1 KB : 128 bytes : 8 pages : LRU : 16 bytes Notes​: On a page fault, the page table is updated before allocating a physical page. If more than one free page is available, the smallest physical page number is chosen. Physical page #0 is reserved for the operating system (OS). It cannot be replaced. (a) Assume that a process with Process ID: 11 (PID: 11), has been running. The state of the physical memory is displayed below on the left. Complete the page table for this process on the right. [2 pts] Physical Page # (PPN) Memory Contents 0x0 Reserved for OS 0x1 Page Table of PID 11 0x2 0x3 PID 11: VPN 4 0x4 0x5 PID 11: VPN 5 0x6 PID 11: VPN 0 0x7 Virtual Page # (VPN) Valid Physical Page # (PPN) 0x0 1 0x6 0x1 0 0x2 0 0x3 0 0x4 1 0x3 0x5 1 0x5 0x6 0 0x7 0 (b) Say a new process (PID: 7) starts. Which physical page would its page table be stored in? Assume the memory state in part (a). [1 pt.] PPN: ​2 16 of 19 uniqname: ______________________ (c) Complete the following table for the given sequence of virtual address requests. Assume that the memory state in part (a), and that space for page table for PID 7 has been allocated. [12 pts] Time Process ID Virtual Address (VA) Virtual Page # (VPN) Physical Page # (PPN) Page Fault? (Y/N) Physical Address (PA) 0 11 0x28A 0x5 0x5 N 0x28A 1 7 0x28A 0x5 0x4 Y 0x20A 2 7 0x29B 0x5 0x4 N 0x21B 3 11 0x94 0x1 0x7 Y 0x394 4 11 0x200 0x4 0x3 N 0x180 5 7 0x1FF 0x3 0x6 Y 0x37F 6 11 0x10 0x0 0x5 Y 0x290 You may use the tables below as workspace (​won’t be graded​). Main Memory State Page Table of Process 7 Page Table of Process 11 PPN Memory Contents LRU 0x0 Operating System Data 0x1 0x2 0x3 0x4 0x5 0x6 0x7 VPN Valid PPN 0x0 0x1 0x2 0x3 0x4 0x5 0x6 0x7 VPN 0x0 Valid PPN 0x1 0x2 0x3 0x4 0x5 0x6 0x7 17 of 19 uniqname: ______________________ Problem 13. Stack ISA Points: ___/15 Consider the following stack-based ISA, where an instruction can only read/write to memory in the first and/or second position from the top of the stack. Assume all instructions perform PC = PC + 1, unless specified otherwise. Instruction Operational Semantics pop stack.pop() halt stack.halt() pushi ​immediate stack.push( signExtend(immediate) ) beq ​immediate top = stack.pop() if top == 0: PC = PC + 1 + signExtend(immediate); load ​immediate top = stack.pop() memAddr = top + signExtend(immediate) stack.push( memory[memAddr] ) store ​immediate top = stack.pop() memAddr = top + signExtend(immediate) data = stack.pop() memory[memAddr] = data add sub nor sll slr first = stack.pop(); second = stack.pop() result = second ​OP​ first push(result) where ​OP​ ​is: add: + sub: - nor: BITWISE NOR sll: << slr: >>
less
first = stack.pop(); second = stack.pop() if first < second: stack.push(1) else: stack.push(0) 18 of 19 uniqname: ______________________ Translate the following C program into the new stack-based ISA assembly. Assume that variables ​i, j,​ and ​s​, are stored at addresses 33, 34, and 35 (all decimals), respectively. C Code Stack-Based ISA Assembly Code if (i == 3) { j /= 4; for (; s < 5; s += 2) { i += 2; } } else { j *= 8; } 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 ​pushi load pushi sub beq ​ else pushi load pushi​ ​sll pushi store pushi pushi beq if load pushi ​slr pushi store for pushi pushi load ​ ​less beq load pushi add pushi store pushi pushi beq end halt i3 j2 s0 ​ 0 i 3 ​ if or 9 or 14 0 j ​3ori 0 j 0 0 ​end or 18 or 32 j ​2orj 0 j 5 0 s end i 2 0 i 0 0 ​for or -13 or 19 ​ ​ 19 of 19 uniqname: ______________________ *Problem 13 will not be graded, and every submission will be given full credit. It fails to implement online3oftheCcodeandtheloadinstructiononline14oftheassemblycodewill cause an error if the branch on line 4 is taken.