UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering
EEC 170 Introduction to Computer Architecture Winter 2021
Quiz 4 Solutions
1. Following is the “baseline” class datapath that will be used in Question 1. This is identical to the diagram we have used in lecture, although some elements have slightly changed positions compared to the lecture diagram. Black lines are datapath signals, gray lines are control signals.
Screech hates load-use hazards. He changes the pipelined baseline design to the one below, putting the data memory before the ALU. All control signals are the same except MemToReg has disappeared (its functionality is now subsumed by a more capable ALUSrc).
FETCH
PCSrc
Zero? AND
DECODE EXECUTE
MEM
WB
Instruction Memory Addr
Instr<31:0>
RegWrite
Rd
A =? +0
Reg
Write Enable
Write Data
Reg 1 Reg 2
Write Read Read
ALUSrc
MemWrite ALUOp MemRead
Zero?
MemToReg
Rs1 Rs2
32 64-Bit Registers
Read data 1 Read data 2
B 41
00
PC
WrEn
Data Memory
Data Data In Out
11
+
Imm Gen
Read
Addr
Instr
64
<<1
FETCH
AND
Instr<31:0>
DECODE
Rs1
MEM
MemWrite MemRead
EXECUTE WB
ALUOp ALUSrc
Zero?
A =? B
Instruction Memory Addr
RegWrite
Rd
Rs2
PCSrc Zero?
0 +1
4
0
1 +
<<1
2
PC
Reg
Write Enable
Write Data
Reg 1 Reg 2
Write Read Read
32 64-Bit Registers
Read data 1 Read data 2
WrEn
Data Memory
Data Data In Out
Read
Addr
Imm Gen
Instr
Page 1 of 5
MUX
MUX
MUX
ALU
MUX
MUX
ALU
(a) (4 points) Specify a pair of RISC-V instructions that: • Can be run on either datapath;
• Do not require a stall cycle between them on the baseline; and datapath • Do require a stall cycle between them on Screech’s datapath.
Solution: We want a pair of instructions that has the use-load hazard, where the first instruction produces a value in its ALU stage and the second instruction consumes that value in its memory stage. For instance:
1 add x2, x3, x4
2 lw x5, 0(x2) # offset must be 0 to run on Screech's datapath
(b) (4 points) Specify a single RISC-V instruction that can run on the baseline data- path but cannot run on Screech’s datapath.
(c) (4 points) Write a RISC-V sequence of instructions that can run on Screech’s datapath that implements the instruction you specified in the previous part. Use x10 as a temporary if necessary.
If your answer in the previous part is wrong but your sequence implements your previous answer correctly and can run on Screech’s datapath, we will give up to half credit on this part.
Solution: Screech’s datapath cannot add a register plus an immediate offset and use the result as the address input to a memory operation, e.g., ld x2, 100(x3).
Solution: We need to replace a memory operation that uses base+offset ad- dressing with an add to compute the address then the memory operation, for instance:
1 addi x10, x3, 100
2 ld x2, 0(x10)
(d) (4 points) Specify a pair of RISC-V instructions that require two instructions to run on the baseline datapath but can be run with a single instruction on Screech’s datapath.
Solution: Here we’re looking for an instruction that uses a dataflow that is only available on Screech’s datapath, which will be a load followed by a use of that load. The load needs to be from a register, not register+offset, because
Page 2 of 5
register+offset addressing is not available on Screech’s datapath. For instance, the following two RISC-V instructions could be combined into one instruction on Screech’s datapath:
1 ld x2, 0(x3)
2 add x5, x4, x2
2. Your team of two junior designers, Zack and A.C., of the new RISC-VI processor meets with you with some bad news. “We can’t meet the clock target,” they say, “the memory stage of our 5-stage pipeline just isn’t fast enough. We both have a different suggestion.”
You will evaluate each of their solutions on the following code benchmark. You will be counting total clock cycles, not the total number of instructions/stalls. Note: When you count clock cycles on this problem, count from the beginning of the first instruction (instruction fetch stage) to the completion (writeback stage) of the last instruction. All instructions issue, execute, and complete in order.
For maximum opportunity for partial credit, be clear in your solution where stalls occur.
1 ld
2 add
3 ld
4 ld
5 add
6 add
x2, 0(x10)
x2, x2, x2
x3, 8(x10)
x4, 16(x10)
x3, x3, x4
x5, x2, x3
(a) (4 points) Assuming we ran the benchmark on the baseline 5-stage RISC-V imple- mentation with all forwarding paths enabled, how many cycles would it take? Remember to include any necessary stalls.
Solution: In all solutions on this question, the solution shows ** for stalls and then, if appropriate, delays the beginning of the next instruction to allow it to run stall-free and in order. This could be accounted for in other ways (e.g., propagating the stall to future instructions) but the answer is the same each way.
1 # 1 2 3 4 5 6 7 8 9 10 11 12
2 ld x2,0(x10) IFIDEXMEWB
3 addx2,x2,x2
4 ld x3,8(x10)
5 ld x4,16(x10)
IFID**EXMEWB #load-use,x2 IFIDEXMEWB
IFIDEXMEWB
Page 3 of 5
6 addx3,x3,x4 IFID**EXMEWB #load-use,x4 7 addx5,x2,x3 IFIDEXMEWB
12 cycles. The only two stalls here are due to load-use hazards.
(b) (6 points) “I can make the clock target,” says Zack, “and implement all possible forwarding paths, but now on all instructions we require three memory stages (IF ID EX MEM1 MEM2 MEM3 WB). Results from memory operations would only now be available for forwarding at the end of MEM3.” How many cycles would your benchmark take on Zack’s implementation?
Solution:
1# 1 2 3 4 5 6 7 8 9101112131415161718 2 ld x2,0(x10) IFIDEXM1M2M3WB
3 addx2,x2,x2
4 ld x3,8(x10)
5 ld x4,16(x10)
6 addx3,x3,x4
7 addx5,x2,x3
IFID******EXM1M2M3WB #load-use,x2 IFIDEXM1M2M3WB
IFIDEXM1M2M3WB IFID******EXM1M2M3WB#l-ux4
IFIDEXM1M2M3WB
18 cycles. The effect here is that a load-use hazard now takes 3 stall cycles to resolve.
(c) (6 points) “I can make the clock target,” says A.C., “and keep a 5-stage pipeline, but I have to disable any forwarding paths. Any results that would normally be forwarded instead must be written to the register file. We can write to the register file then read from it on the same stage.” This pipeline has no additional capabilities or changes to the standard 5-stage pipeline; it simply has forwarding disabled. How many cycles would your benchmark take on A.C.’s implementation?
Solution:
1 # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
2 ld x2,0(x10) IFIDEXMEWB
3 addx2,x2,x2
4 ld x3,8(x10)
5 ld x4,16(x10)
6 addx3,x3,x4
7 addx5,x2,x3
IF****IDEXMEWB #x2 IFIDEXMEWB
IFIDEXMEWB IF****IDEXMEWB #x4
IF****IDEXMEWB #x3
Page 4 of 5
16 cycles. The effect here is that any data hazard does not resolve until the writeback stage.
Page 5 of 5