Computer Architecture LAB 3
By Jahangir Ikram
Experiment: To study the basics of forwarding. For all the programs please select Stall Detection and Forwarding should be ON.
(a) Write a sample program that forwards between EXE stage and MEM stage on upper input of the ALU. Write the program below and show forwarding with arrow. Test it on the simulator and mention what value is being forwarded. DO NOT USE LW INSTRUCTION
Instruction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
(b) Write a sample program that forwards between EXE stage and WB stage on upper input of the ALU. Write the program below and show forwarding with arrow. Test it on the simulator and mention what value is being forwarded. DO NOT USE LW INSTRUCTION
Instruction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
(c) Write a sample program that forwards between EXE stage and MEM stage on lower input of the ALU. Write the program below and show forwarding with arrow. Test it on the simulator and mention what value is being forwarded. DO NOT USE LW INSTRUCTION
Instruction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
(d) Write a sample program that forwards between EXE stage and WB stage on lower input of the ALU. Write the program below and show forwarding with arrow. Test it on the simulator and mention what value is being forwarded. DO NOT USE LW INSTRUCTION
Instruction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
(e) Write a sample program that forwards between MEM stage and WB stage. Write the program below and show forwarding with arrow. Test it on the simulator and mention what value is being forwarded.
Instruction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
(f) Write a program that causes Load Use Delay Stall. See what data is to be moved and notice exactly when the required Data is passed on to the waiting instruction. Show it as an arrow on the following diagram.
Instruction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
BRANCH HAZARDS. (Lab Experiment 2)
(a) Let us now study some branch hazard. First of all make sure that the Aggressive branching option is OFF, Stall Detection is ON and Forwarding is ON. Select Always Flush option from the branch Policy and write the following program. Does this program work properly. If not modify the program so that it works properly. It fills 10 memory locations memory with a value 222. Check what should be the values of all the registers if this program work is to properly (no useful instruction turning into NOP or getting flushed). Carefully note the use of SLTI instruction in the following loop.
ADDI R3, R0, 0 ADDI R1, R0, 0 ADDI R2, R0, 222 Loop: Addi R1, R1, 4 SW R2, 100(R1) ADDI R3, R3, 1 SLTI R5, R3, 10 BNEQ R5, R0, loop ADDI R7, R1, 10 ADDI R8, R2, 5 ADDI R2, R2, 100
(i) Calculate the CPI for this program. _______________________
(ii) What changes can we make to this program so that it works properly (useful instructions after the program do not flush) _______________________________________
(iii) Run the same program with Predict NT option in Branch Policy. What difference do you see when loop completes.
(b) This problem is similar to problem A-1 at the end of the book. (Exercise A-1.)
For this exercise, type the following program:
ADDI R2, R0, 100 #Make R2 = 100 ADDI R3,R2,40 #MakeR3=R2+40 Loop1:
R1, 0(R2) R1,R1, #1
R1, 0(R2) R2,R2, #4 R4,R3,R2 R4,R0, Loop1
Note that you need to calculate the offset in the actual program in terms of number of instructions. Offset = _________
1. LW
2. ADDI
3. SW
4. ADDI
5. SUB
6. BNEQ
7. ADDI R2, R0, 0
8. ADDI R3, R0, 0 #
# Load array element
# increment
# store it back
# Make R2 point to next word
# Compare R3 with R2
# Loop until R2 < R3
#
Just another instruction after the loop
(A) Processor Configuration. Stall Detection ON., Forwarding: OFF, Aggressive Branching: YES, Branch Policy: Always Flush. Run the above program and fill the following table for the instructions in the loop body for first 2 or 3 iterations. Total Clock cycles to run the program ____________
LW R1, 0(R2) ADDI R1,R1, #1 SW R1, 0(R2) ADDI R2,R2, #4 DSUB R4,R3,R2 BNEZ R4, Loop1 ADDI R2, R0, 0 ADDI R3, R0, 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 10 21 22 23 24 25 26 27 28 29 29 29
(b) Redo the part (a) with forwarding ON and other processor configuration is same. Total clock cycles __________________
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 10 21 22 23 24 25 26 27 28 29 29 29
LW R1, 0(R2) ADDI R1,R1, #1 SW R1, 0(R2) ADDI R2,R2, #4 DSUB R4,R3,R2 BNEZ R4, Loop1 ADDI R2, R0, 0 ADDI R3, R0, 0
(C) Now repeat the above exercise with rescheduling the code. Processor configuration is such that it is delayed branching processor policy is being used. Total clock cycles in this case _________________________
LW R1, 0(R2) ADDI R1,R1, #1 SW R1, 0(R2) ADDI R2,R2, #4 DSUB R4,R3,R2 BNEZ R4, Loop1 ADDI R2, R0, 0 ADDI R3, R0, 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 10 21 22 23 24 25 26 27 28 29 29 29
For rough use:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 10 21 22 23 24 25 26 27 28 29 29 29
LW R1, 0(R2) ADDI R1,R1, #1 SW R1, 0(R2) ADDI R2,R2, #4 DSUB R4,R3,R2 BNEZ R4, Loop1 ADDI R2, R0, 0 ADDI R3, R0, 0