CS3350B Computer Organization Chapter 4: Instruction-Level Parallelism Part 2: Pipeline Hazards
Alex Brandt
Department of Computer Science University of Western Ontario, Canada
Monday March 15, 2021
Alex Brandt
Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 1 / 32
Outline
1 Overview
2 Structural Hazards
3 Data Hazards
4 Control Hazards
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 2 / 32
Pros and Cons of Pipelining
Pipelining overlaps the execution of instructions to keep each stage of the datapath busy at all times.
ë Improves throughput but not latency. ë Might actually increase latency.
Can increase clock frequency using multi-cycle datapath. Ideal speedup can be up to the number of stages.
Ideal speed up never reached.
ë ë
ë
Fill time and drain time limits speedup.
Must account for dependencies between results of previous instructions
and operands of future instructions.
Sometimes the same hardware is needed simultaneously by different pipeline stages and different instructions (e.g. ID and WB stages).
Alex Brandt
Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 3 / 32
Categorizing Pipeline Hazards
Structural Hazards
Conflicts in hardware/circuit use.
Different stages or different instructions attempt to use same piece of hardware at the same time.
Data Hazards
Dependencies between the result of an instruction and the input to another instruction.
Data being used before it is finished being computed or written to memory/registers.
Control Hazards
Ambiguity in the control flow of the program being executed. Branch instructions—if/else, loops.
Take the branch? Don’t take the branch? Which instruction follows a branch instruction in the pipeline?
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 4 / 32
“Resolving” Pipeline Hazards
Not an easy task. Simplest solution: just wait or stall. ë Any hazard can always be solved by just waiting.
But:
Ruins potential speedup.
ë Might end up being slower than a single-cycle datapath.
ë Since latency can increase in pipelining, with enough stalls becomes
slower. Increases CPI.
Works against entire principle of pipelining. ë Where’s the performance?
Nonetheless, sometimes it really is the only solution.
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 5 / 32
Outline
1 Overview
2 Structural Hazards
3 Data Hazards
4 Control Hazards
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 6 / 32
Structural Hazards: Causes and Resolutions
Structural hazards are caused by two instructions needing to use the same hardware at the same time.
Easiest to resolve? Just add in redundant hardware.
ë Works for combinational circuits.
ë Redundant memory would cause problems in needing to keep both
consistent.
Real structural hazards thus lie in state circuits: registers and memory.
ë IF stage and MEM stage. ë ID stage and WB stage.
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 7 / 32
Structural Hazards In Memory (1/2)
Consider a unified L1 cache. Reading instructions and reading/writing data could overlap for pipelined instructions.
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 8 / 32
Structural Hazards In Memory (2/2)
Simple fix: separate instruction memory from data memory. Can use a banked cache.
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 9 / 32
Structural Hazards In Register File (1/2)
ID stage must read from registers while WB stage must write to registers.
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 10 / 32
Structural Hazards In Register File (2/2)
In reality, reading from register file is very fast; clock cycle is long enough to allow both ID and WB to occur within a single clock cycle.
Needs independent read and write ports.
Reg
Reg
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards
Monday March 15, 2021 11 / 32
Outline
1 Overview
2 Structural Hazards
3 Data Hazards
4 Control Hazards
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 12 / 32
Data Hazards: Causes and Resolutions
Data hazards are caused by dependencies between instruction operands or results.
ë Read After Write (RAW) only true dependency.
ë Read After Read not a hazard.
ë Write After Read (WAR) and Write After Write (WAW) only a hazard
for out-of-order execution Superscalar machines
ë Prelude to register renaming.
Can always be solved by stalling the pipeline.
Can be solved by special forwarding (also called bypass).
Most common type of hazard.
ë It’s the logical way to write programs; locality.
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 13 / 32
Data Hazard Example 1 (1/3)
add produces a result which is then read by sub, and, or, xor. Read After Write hazard.
xor is far enough in the future to be okay.
sub, and need more work.
or actually already resolved via structural hazard solution
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 14 / 32
Data Hazard Example 1 (2/3)
Possible (but not great) solution: stall the execution. sub structural hazard already solved.
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 15 / 32
Data Hazard Example 1 (3/3)
Another possible solution: forwarding.
Implemented via inter-stage registers.
No more stalls!
ALU forwarding for add to sub and add to and. or structural hazard already solved.
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 16 / 32
More ALU-ALU Forwarding
Two kinds of ALU forwarding:
Instruction currently in MEM stage to ALU (ALU-ALU forwarding). Instruction currently in WB stage to ALU (MEM-ALU forwarding). Which forwarded value to choose? ï More control, more MUX.
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 17 / 32
MEM-MEM Forwarding
For efficient memory copies (a common operation) this optimization results in no stalls.
ë Otherwise, two stalls required.
ë Eight great ideas in computer arch.: make the common case fast.
MEM-MEM also used for forwording an ALU result into a sw ë add $t0, $t1, $t4 and sw $t0 0($t2).
ë $t0 is MEM-MEM forwarded.
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 18 / 32
Load-Use Data Hazard
Load-use data hazard, a special kind of RAW hazard.
Forwarding does not help here, still going backwards in time. A stall is required.
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 19 / 32
Implementing a Stall: Pipeline Interlock
Pipeline Interlock—hardware detects hazard and stalls the pipeline. Quite literally locks the flow of data between stages (locking writes to
inter-stage registers).
Essentially inserts an air bubble into pipeline.
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 20 / 32
Implementing a Stall: NOP
NOP—a “no operation” special instruction inserted into instruction flow by compiler.
Hazards are detected and fixed at compile-time.
Can be combined with forwarding; MEM-ALU in this case.
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 21 / 32
Pipeline Interlock vs NOP
Interlocking requires special circuity to dynamically detect hazards and stall the datapath.
nop requires extra effort at compile time to detect and resolve hazards.
Inserted nop instructions bloat instruction memory.
More work at compile time for nop insertion but simpler (= faster?)
datapath and controller.
MIPS: Microprocessor without Interlocked Pipelined Stages
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 22 / 32
Data Hazards and Code Structure
Some data hazards are “fake”.
Only caused by the order of instructions and not a true dependency. Re-order code (if possible) so an independent instruction performed instead of a nop.
ë Where the nop would be inserted is called the load delay slot.
ë Load delay slot can be filled with a nop or an independent instruction.
Need at least one instruction between lw and using the loaded word.
lw $t1, 0($t0)
lw $t2, 4($t0) stall add $t3, $t1, $t2
sw $t3, 12($t0)
lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0)
add $t3, $t1, $t2 sw $t3, 12($t0)
lw $t4, 8($t0)
stall add $t5, $t1, $t4 add $t5, $t1, $t4
sw $t5, 16($t0)
13 cycles
sw $t5, 16($t0)
11 cycles
Alex Brandt
Chapter 4: ILP , Part 2: Pipeline Hazards
Monday March 15, 2021
23 / 32
Outline
1 Overview
2 Structural Hazards
3 Data Hazards
4 Control Hazards
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 24 / 32
Control Hazards: Causes and Resolutions
Control hazards are caused by instructions which change the flow of control.
ë Branching.
ë If statements, loops.
Sometimes called branch hazards.
Since branch condition (beq, bne) not determined until after EX
stage, cannot be certain about next instruction to fetch.
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 25 / 32
Control Hazard Resolution: Wait
The simplest resolution is to just wait until branch condition is calculated before fetching next instruction.
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 26 / 32
Control Hazard Resolution: Add a Branch Comparator
Add a special circuit used to calculate branch conditions.
Now only one stall needed instead of two.
Similar to load-use hazard we now have a branch delay slot.
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 27 / 32
Delayed Branching
The branch delay slot is the instruction immediately following a branch. Can be a nop or a useful instruction.
In delayed branching the instruction in the branch delay slot is always executed whether or not the branch condition holds.
ë Used in conjunction with a special branch comparator.
ë Filling the branch delay slot (and other code re-organization) is usually
handled by compiler/assembler.
ë Cannot fill slot with an instruction that influences branch condition.
Jump instructions also have a delay slot.
addi $v0, $0, 1
add $t0, $s0, $s1
add $t1, $s2, $s3
beq $t0, $t1, L
add $t0, $s0, $s1
add $t1, $s2, $s3
beq $t0, $t1, L
addi $v0, $0, 1
L: … L: …
# addi executed regardless
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 28 / 32
Control Hazard Resolution: Branch Prediction
Hardware predicts whether branch will occur of not.
If the branch condition ends up being opposite of prediction flush the pipeline.
This flush shows a pipeline without a special branch comparator in ID stage. Otherwise, only one instruction needs to be flushed.
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 29 / 32
Implementing Branch Prediction
Branches have exactly two possibilities: taken or not taken. In MIPS branches are statically predicted to never happen.
Dynamic branch prediction uses run-time information to change prediction between taken or not taken.
ë Use branch history to predict future branches.
ë Simplest method is to use a saturated counter: increment counter if
branch actually taken, decrease counter if branch not taken.
ë Predict based on current count.
ë More advanced predictors evaluate patterns in branch history.
Random branch prediction: statistically 50% correct prediction. A two-bit saturated counter:
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 30 / 32
Datapath With Forwarding and Flushing
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 31 / 32
Hazard Summary
Structural hazards caused by conflicts accessing hardware.
ë Register access fast enough to happen twice in one clock cycle. ë Banked L1 cache for simultaneous instruction and data access.
Data hazards caused by Read After Write (RAW). ë ALU-ALU forwarding.
ë MEM-MEM forwarding (memory copies).
ë Load-use hazard: stall (load-delay slot) and MEM-ALU forward.
Control hazards caused by branch instructions. ë Special branch comparator in ID stage.
ë Branch delay slot; delayed branching.
ë Branch prediction and pipeline flush.
Compiler handles nop insertion to fix hazards.
Hardware handles fixing hazards with pipeline interlock.
Alex Brandt Chapter 4: ILP , Part 2: Pipeline Hazards Monday March 15, 2021 32 / 32