Consider executing this sequence of instructions in the pipeline:
address instruction
—————————-
36: sub $10, $4, $8 40: beq $1, $3, 72
44: and 48: or
52: add
56: slt $15, $6, $7 …
72: lw $4, 50($7)
Issue:
Should we fetch the instruction at address 44 when beq moves into the ID stage?
– it may or may not be executed next, depending on whether $1 == $3
– we don’t know the address of the branch target instruction yet anyway
$12, $2, $5
$13,$12,$13
$14, $4, $2
Branch Hazards Handling Branches 1
CS@VT Computer Organization II ©2005-2015 McQuain
When beq moves into the ID stage, we don’t even know it is a conditional branch. And… we won’t know if the branch should be taken until beq reaches the end of EX.
Hey!
It’s a branch!
address instruction
???
beq sub
—————————-
36: sub $10, $4, $8 40: beq $1, $3, 72 44: and $12, $2, $5
…
72: lw $4, 50($7)
Branch Hazards Handling Branches 2
CS@VT Computer Organization II ©2005-2015 McQuain
So… we will have already fetched the next (sequential) instruction.
Hey!
It’s a branch!
address instruction
—————————-
36: sub $10, $4, $8 40: beq $1, $3, 72 44: and $12, $2, $5
…
72: lw $4, 50($7)
and
beq sub
Branch Hazards Handling Branches 3
CS@VT Computer Organization II ©2005-2015 McQuain
Stalling for Branch Hazards Handling Branches 4 Idea: insert stalls until we know if the branch will be taken.
We can’t act on that information before beq reaches the MEM stage.
CS@VT Computer Organization II ©2005-2015 McQuain
Idea: insert stalls until we know if the branch will be taken.
cycle action beq —————————-
0: fetch sub
1: fetch beq IF
2: fetch and ID
3: stall EX
4: stall MEM
5: fetch and/lw
That’s expensive.
If we don’t take the branch, we needlessly delayed the and instruction for 2 cycles.
Hamlet
Branch! or
Hamlet Don’t branch!
fetch stall stall beq sub
Stalling for Branch Hazards Handling Branches 5
CS@VT Computer Organization II ©2005-2015 McQuain
Idea:
proceed as if the branch will not be taken;
turn mis-fetched instructions into nops if wrong.
If branch is taken, flush* these instructions
(set control values to 0)
*QTP: will this be too late?
Rollback for Branch Hazards Handling Branches 6
CS@VT Computer Organization II ©2005-2015 McQuain
Could we rearrange the datapath so that the branch decision could be made earlier? Questions to ponder:
– What about calculating the branch target address? Can that be done in the ID stage?
– What about the register comparison?
Can that be done in the ID stage?
What about other kinds of conditional branches (e.g., bgez,)?
Questions Handling Branches 7
CS@VT Computer Organization II ©2005-2015 McQuain
Ideas: simple hardware suffices to compare two registers moving the branch adder is relatively simple
We can determine if the branch will be taken before beq reaches the EX stage.
Making Branch Decisions Earlier Handling Branches 8
CS@VT Computer Organization II ©2005-2015 McQuain
Stall if branch is taken, proceed normally otherwise:
or and beq
and beq
Cost is now one stall if branch is taken, nothing if branch is not taken.
lw nop beq
Making Branch Decisions Earlier Handling Branches 9
CS@VT Computer Organization II ©2005-2015 McQuain
New Control Features Handling Branches 10
CS@VT Computer Organization II ©2005-2015 McQuain
The Big (but not quite final) Picture Handling Branches 11
CS@VT Computer Organization II ©2005-2015 McQuain
If a comparison register in beq is a destination of 2nd or 3rd preceding ALU instruction Can resolve using forwarding
add $1, $2, $3
add $4, $5, $6
…
beq $1, $4, target
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
beq
QTP: why is the forwarding-to issue different (now) for beq than for the other instructions?
MEM
WB
Data Hazards for Branches Handling Branches 12
CS@VT Computer Organization II ©2005-2015 McQuain
If a comparison register is a destination of preceding ALU instruction or 2nd preceding load instruction
– Need to stall for 1 cycle
lw $1, addr
add $4, $5, $6
beq stalled
beq $1, $4, target
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
ID
beq
EX
MEM
WB
Data Hazards for Branches Handling Branches 13
CS@VT Computer Organization II ©2005-2015 McQuain
If a comparison register is a destination of immediately preceding load instruction
– Need to stall for 2 cycles
lw $1, addr
beq stalled
beq stalled
beq $1, $0, target
IF
ID
EX
MEM
WB
IF
ID
ID
ID
beq
EX
MEM
WB
Data Hazards for Branches Handling Branches 14
CS@VT Computer Organization II ©2005-2015 McQuain
In deeper and superscalar pipelines, the branch penalty is more significant
Use dynamic prediction
– need a branch history table (aka branch prediction buffer)
– indexed by addresses of recent branch instructions
– stores recent outcome(s) for branch (taken/not taken)
To execute a branch:
– check the table, expect consistent behavior with recent past
– start fetching (fall-through instruction or branch target)
– if wrong, flush pipeline (stall) and flip prediction
– updatetableaccordingly
Dynamic Branch Prediction Handling Branches 15
CS@VT Computer Organization II ©2005-2015 McQuain
Idea: use one bit to remember if the branch was taken or not the last time.
Shortcoming: inner loop branches are mispredicted twice!
outer: … … inner: … …
beq …, …, inner
…
beq …, …, outer
Mispredict as taken on last iteration of inner loop
Then mispredict as not taken on first iteration of inner loop next time around
1-Bit Predictor Handling Branches 16
CS@VT Computer Organization II ©2005-2015 McQuain
Idea: use two bits to remember if the branch was taken or not the last time. Only change prediction on two successive mispredictions
2-Bit Predictor Handling Branches 17
CS@VT Computer Organization II ©2005-2015 McQuain
Even with predictor, still need to calculate the target address – 1-cycle penalty if branch is taken
Branch target buffer
– Add a cache of target addresses
– IndexedbyPCwheninstructionisfetched – If hit (i.e., target address is in cache) and
instruction is branch predicted taken, can fetch target immediately
Calculating the Branch Target Handling Branches 18
CS@VT Computer Organization II ©2005-2015 McQuain