CS计算机代考程序代写 CS3350B Computer Organization Chapter 4: Instruction-Level Parallelism Hazard Examples

CS3350B Computer Organization Chapter 4: Instruction-Level Parallelism Hazard Examples
Alex Brandt
Department of Computer Science University of Western Ontario, Canada
Tuesday March 24, 2020
Alex Brandt
Chapter 4: ILP , Hazard Examples Tuesday March 24, 2020 1 / 24

Introduction
In pipelining examples, assume we always start with the “basic” datapath; the one as of the end of Lecture 11.
ë This datapath implicitly already solves the two structural hazards in memory and register file.
ë That is, we do not consider structural hazards.
Each optimization should be explicitly added in the question or in your answer for a possible resolution.
ë Each type of forwarding (ALU-ALU, MEM-ALU, MEM-MEM). ë Filling the load delay slot with something other than nop.
ë Branch comparator in ID stage.
ë Delayed branching and branch delay slot.
Alex Brandt Chapter 4: ILP , Hazard Examples Tuesday March 24, 2020 2 / 24

Example 1
lw $t0 , addu $t0 , subu $t4 , addi $s1 , add $t1 ,
If any dependencies exist where
0($s1) $t0 , $s2 $t0 , $t3 $s1,4 $t1 , $t2
are they and what type are they?
Alex Brandt Chapter 4: ILP , Hazard Examples Tuesday March 24, 2020 3 / 24

Example 1
lw $t0 , addu $t0 , subu $t4 , addi $s1 , add $t1 ,
If any dependencies exist where
ë Load-use (RAW) between lw and addu. ë WAW between lw and addu.
ë RAW between addu and sub.
0($s1) $t0 , $s2 $t0 , $t3 $s1,4 $t1 , $t2
are they and what type are they?
Alex Brandt Chapter 4: ILP , Hazard Examples Tuesday March 24, 2020 4 / 24

Example 1
lw $t0 , addu $t0 , subu $t4 , addi $s1 , add $t1 ,
0($s1) $t0 , $s2 $t0 , $t3 $s1,4 $t1 , $t2
On the basic datapath, how many cycles does it take to execute the code fragment (including stalls)?
Alex Brandt Chapter 4: ILP , Hazard Examples Tuesday March 24, 2020 5 / 24

Example 1
lw $t0 , addu $t0 , subu $t4 , addi $s1 , add $t1 ,
0($s1) $t0 , $s2 $t0 , $t3 $s1,4 $t1 , $t2
On the basic datapath, how many cycles does it take to execute the code fragment (including stalls)?
ë ë ë

2 nop between lw and addu. MEM of lw and IF of addu can overlap. 2 nop between addu and sw. MEM of addu and IF of sw can overlap. On 5th cycle lw completes and then one cycle per instruction after that.
Includingnopweget: 5+2nop+1+2nop+2+1=13.
Alex Brandt
Chapter 4: ILP , Hazard Examples Tuesday March 24, 2020 6 / 24

Example 1
lw $t0 , addu $t0 , subu $t4 , addi $s1 , add $t1 ,
0($s1) $t0 , $s2 $t0 , $t3 $s1,4 $t1 , $t2
Alex Brandt
Chapter 4: ILP , Hazard Examples
Tuesday March 24, 2020
7 / 24

Example 1
lw $t0 , addu $t0 , subu $t4 , addi $s1 , add $t1 ,
0($s1) $t0 , $s2 $t0 , $t3 $s1,4 $t1 , $t2
What optimizations can be added to the datapath to reduce the number of cycles? How many cycles are needed to execute the code fragment after optimizations are added?
Alex Brandt Chapter 4: ILP , Hazard Examples Tuesday March 24, 2020 8 / 24

Example 1
lw $t0 , addu $t0 , subu $t4 , addi $s1 , add $t1 ,
0($s1) $t0 , $s2 $t0 , $t3 $s1,4 $t1 , $t2
What optimizations can be added to the datapath to reduce the number of cycles? How many cycles are needed to execute the code fragment after optimizations are added?
ë MEM-ALU forwarding for load-use. Reduces nop count to 1.
ë ALU-ALU forwarding removes both nop between addu and sub ë Clockcycles: 5+1nop+4=10.
Alex Brandt Chapter 4: ILP , Hazard Examples Tuesday March 24, 2020 9 / 24

Example 1
lw $t0 , addu $t0 , subu $t4 , addi $s1 , add $t1 ,
0($s1) $t0 , $s2 $t0 , $t3 $s1,4 $t1 , $t2
Alex Brandt
Chapter 4: ILP , Hazard Examples
Tuesday March 24, 2020
10 / 24

Example 1
lw $t0 , addu $t0 , subu $t4 , addi $s1 , add $t1 ,
Can code re-organization along
to further improve the number of clock cycles needed to execute the code? If so, re-order the code and declare any additional optimizations; what is the number of cycles needed to execute the re-ordered code?
0($s1) $t0 , $s2 $t0 , $t3 $s1,4 $t1 , $t2
with datapath optimizations be used
Alex Brandt Chapter 4: ILP , Hazard Examples Tuesday March 24, 2020 11 / 24

Example 1
lw $t0 , addu $t0 , subu $t4 , addi $s1 , add $t1 ,
Can code re-organization along
to further improve the number of clock cycles needed to execute the code? If so, re-order the code and declare any additional optimizations; what is the number of cycles needed to execute the re-ordered code?
ë Yes.
ë Move addi or add into load-delay slot. ë 9, since we remove the nop.
0($s1) $t0 , $s2 $t0 , $t3 $s1,4 $t1 , $t2
with datapath optimizations be used
Alex Brandt Chapter 4: ILP , Hazard Examples Tuesday March 24, 2020 12 / 24

Example 1
lw $t0 , addu $t0 , subu $t4 , addi $s1 , add $t1 ,
0($s1) $t0 , $s2 $t0 , $t3 $s1,4 $t1 , $t2
Alex Brandt
Chapter 4: ILP , Hazard Examples
Tuesday March 24, 2020
13 / 24

Example 2
sub $t2 , and $t7 , or $t8 , add $t9 , sw $t5 ,
If any dependencies exist where
$t1 , $t3 $t2 , $t5 $t6 , $t2 $t2 , $t2 12( $t2 )
are they and what type are they?
Alex Brandt Chapter 4: ILP , Hazard Examples Tuesday March 24, 2020 14 / 24

Example 2
sub $t2 , and $t7 , or $t8 , add $t9 , sw $t5 ,
If any dependencies exist where ë RAW between sub and and.
ë RAW between sub and or. ë RAW between sub and and. ë RAW between sub and sw.
$t1 , $t3 $t2 , $t5 $t6 , $t2 $t2 , $t2 12( $t2 )
are they and what type are they?
Alex Brandt Chapter 4: ILP , Hazard Examples Tuesday March 24, 2020 15 / 24

Example 2
sub $t2 , and $t7 , or $t8 , add $t9 , sw $t5 ,
$t1 , $t3 $t2 , $t5 $t6 , $t2 $t2 , $t2 12( $t2 )
Consider the basic datapath with ALU-ALU and MEM-ALU forwarding added. In this code fragment where do forwards occur? How many cycles does it take to execute the code fragment?
Alex Brandt Chapter 4: ILP , Hazard Examples Tuesday March 24, 2020 16 / 24

Example 2
sub $t2 , and $t7 , or $t8 , add $t9 , sw $t5 ,
$t1 , $t3 $t2 , $t5 $t6 , $t2 $t2 , $t2 12( $t2 )
Consider the basic datapath with ALU-ALU and MEM-ALU forwarding added. In this code fragment where do forwards occur? How many cycles does it take to execute the code fragment?
ë ALU-ALU from sub to and.
ë MEM-ALU from sub to or.
ë sub to and RAW solved by register file design. ë5+1+1+1+1=9
Alex Brandt Chapter 4: ILP , Hazard Examples Tuesday March 24, 2020 17 / 24

Example 2
sub $t2 , and $t7 , or $t8 , add $t9 , sw $t5 ,
$t1 , $t3 $t2 , $t5 $t6 , $t2 $t2 , $t2 12( $t2 )
Alex Brandt
Chapter 4: ILP , Hazard Examples
Tuesday March 24, 2020
18 / 24

Example 3
for: beq $t6, $t7, end add $t0 , $t0 , $t1
addi $t6, $t6, 1
j for
end: sub $t1 , $t6 , $0
Assuming the basic data path how many cycles does it take to execute two loops within the code fragment (therefore, excluding the sub)?
Alex Brandt Chapter 4: ILP , Hazard Examples Tuesday March 24, 2020 19 / 24

Example 3
for: beq $t6, $t7, end add $t0 , $t0 , $t1
addi $t6, $t6, 1
j for
end: sub $t1 , $t6 , $0
Assuming the basic data path how many cycles does it take to execute two loops within the code fragment (therefore, excluding the sub)?
ë Careful! Since a loop, RAW dependency between andi and beq. ë Two nop follows beq for control hazard.
ë One nop follows j for control hazard.
ë Firstloop: 5+2nop+3+1nop.
ë In the second loop beq overlaps with previous instructions. ë Secondloop: 1+2nop+3+1nop.
ë Total: 18.
Alex Brandt Chapter 4: ILP , Hazard Examples Tuesday March 24, 2020 20 / 24

Example 3
for: beq $t6, $t7, end add $t0 , $t0 , $t1
addi $t6, $t6, 1
j for
end: sub $t1 , $t6 , $0
Alex Brandt
Chapter 4: ILP , Hazard Examples
Tuesday March 24, 2020
21 / 24

Example 3
for: beq $t6, $t7, end add $t0 , $t0 , $t1
addi $t6, $t6, 1
j for
end: sub $t1 , $t6 , $0
Using any datapath optimizations and code re-ordering, minimize the clock cycles required to execute the loop two times. Name the optimizations used. How many cycles does it take to execute this optimized version?
Alex Brandt Chapter 4: ILP , Hazard Examples Tuesday March 24, 2020 22 / 24

Example 3
for: beq $t6, $t7, end add $t0 , $t0 , $t1
addi $t6, $t6, 1
j for
end: sub $t1 , $t6 , $0
Using any datapath optimizations and code re-ordering, minimize the clock cycles required to execute the loop two times. Name the optimizations used. How many cycles does it take to execute this optimized version?
ë Special branch comparator in ID stage.
ë Careful! Cannot fill branch delay slot.
ë Using add would change code meaning.
ë Value of $t6 used again after loop so cannot use addi. ë Cannot use jump for obvious control-flow reasons.
ë Total savings: 1 nop per branch  16 cycles now.
ë (If using branch prediction, all nops are removed after beq).
Alex Brandt Chapter 4: ILP , Hazard Examples Tuesday March 24, 2020 23 / 24

Example 3
for: beq $t6, $t7, end add $t0 , $t0 , $t1
addi $t6, $t6, 1
j for
end: sub $t1 , $t6 , $0
Alex Brandt
Chapter 4: ILP , Hazard Examples
Tuesday March 24, 2020
24 / 24