CO101 Principle of Computer Organization
Assignment 3
Due: Nov 30, 2016
Name: Student Number:
1. In this exercise, we examine how pipelining affects the clock cycle time of the processor. Problems in this exercise assume that individual stages of the datapath have the following latencies:
IF
ID
EX
MEM
WB
200ps
100ps
120ps
210ps
150ps
1) What is the clock cycle time in a pipelined and non-pipelined processor?
210ps, 780ps
2) What is the total latency of an LW instruction in a pipelined and non-pipelined processor?
5*210ps = 1050ps, 780ps
3) If we can split one stage of the pipelined datapath into two new stages, each with half the latency of the original stage, which stage would you split and what is the new clock cycle time of the processor?
MEM. 200ps.
2. The following code is run on a 5-stage MIPS pipeline with full forwarding.
lw $t0, 0($a0)
addi $t0, $t0, 1
sw $t0, 0($a0)
addi $a0, $a0, 4
1) Identify all the data hazards that cannot be resolved with forwarding.
The load-use hazard between the first two instructions cannot be resolved with forwarding.
2) Rewrite the code to eliminate stalls on the 5-stage pipeline with full forwarding.
lw $t0, 0($a0)
addi $a0, $a0, 4
addi $t0, $t0, 1
sw $t0, 0($a0)
3. Problems in this exercise refer to the following instruction sequences:
I1: ADD R1, R2, R1
I2: LW R2, 0(R1)
I3: LW R1, 4(R1)
I4: OR R3, R1, R2
1) Find all data dependences in this instruction sequence.
2) Find all hazards in this instruction sequence for a 5-stage pipeline with and then without forwarding.
3) To reduce clock cycle time, we are considering a split of the MEM stage into two stages. Find all hazards in this instruction sequence for this 6-stage pipeline with and then without forwarding.
4. For a multi-cycle processor, assume that the operation times for the major function units are as following:
1) Memory units: 200 ps;
2) ALU and adders: 100 ps;
3) Register file (read or write): 50 ps.
Assume that the multiplexors, control unit, PC accesses, sign extension unit and wires have no delay. Assume that instruction frequencies are as following:
1) 25% loads
2) 10% stores;
3) 15% branches;
4) 5% jumps;
5) 45% ALU instructions.
For pipelined execution, assume that half of the load instructions are immediately followed by and instruction that uses the result, that the branch delay on misprediction is 1 clock cycle, and that one-quarter of the branches are mispredicted. Assume that jumps always pay 1 full clock cycle of delay, so their average time is 2 clock cycles. Ignore any other hazards. Now suppose the memory access became 2 clock cycles long.
1) Draw the modified pipeline.
2) List all the possible new forwarding situations and all possible new hazards and their length.