Computer Organization and Design
Unit 7: Branch Prediction
Based on slides by Profs. , & C.J. IS 501: Comp. Arch. | Dr. | Branch Prediction 1
Copyright By PowCoder代写 加微信 powcoder
This Unit: Branch Prediction
• Control hazards
• Branch prediction
System software
CIS 501: Comp. Arch. | Dr. | Branch Prediction 2
• Chapter 4
CIS 501: Comp. Arch. | Dr. | Branch Prediction 3
Control Dependences and Branch Prediction
CIS 501: Comp. Arch. | Dr. | Branch Prediction 4
What About Branches?
Register File
• Branchspeculation
• Could just stall to wait for branch outcome (two-cycle penalty)
• Fetchpastbranchinsnsbeforebranchoutcomeisknown
• Default: assume “not-taken” (at fetch, can’t tell it’s a branch) CIS 501: Comp. Arch. | Dr. | Branch Prediction 5
Big Idea: Speculative Execution
• Speculation: “risky transactions on chance of profit”
• Speculativeexecution
• Execute before all parameters known with certainty • Correctspeculation
+ Avoid stall, improve performance
• Incorrectspeculation(mis-speculation)
– Must abort/flush/squash incorrect insns
– Must undo incorrect changes (recover pre-speculation state)
• Controlspeculation:speculationaimedatcontrolhazards • Unknown parameter: are these the correct insns to execute next?
CIS 501: Comp. Arch. | Dr. | Branch Prediction 6
Control Speculation Mechanics
• Guess branch target, start fetching at guessed position • Doing nothing is implicitly guessing target is PC+4
• We were already speculating before!
• Can actively guess other targets: dynamic branch prediction
• Execute branch to verify (check) guess • Correct speculation? keep going
• Mis-speculation? Flush mis-speculated insns
• Hopefully haven’t modified permanent state (Regfile, DMem) + Happens naturally in in-order 5-stage pipeline
CIS 501: Comp. Arch. | Dr. | Branch Prediction 7
When to Perform Branch Prediction?
• Option #1: During Decode
• Look at instruction opcode to determine branch instructions
• Can calculate next PC from instruction (for PC-relative branches)
– One cycle “mis-fetch” penalty even if branch predictor is correct
bnez r3,targ targ:add r4⟵r5,r4
123456789 FDXMW
• Option #2: During Fetch? • Howdowedothat?
CIS 501: Comp. Arch. | Dr. | Branch Prediction 8
Branch Recovery
Register File
• Branchrecovery:whattodowhenbranchisactuallytaken • Insns that are in F and D are wrong
• Flushthem,i.e.,replacethemwithnops
+ They haven’t written permanent state yet (regfile, DMem)
– Two cycle penalty for taken branches
CIS 501: Comp. Arch. | Dr. | Branch Prediction 9
Branch Speculation and Recovery
123456789 FDXMW
speculative
• Mis-speculationrecovery:whattodoonwrongguess • Not too painful in a short, in-order pipeline
• Branch resolves in X
+ Younger insns (in F, D) haven’t changed permanent state
• FlushinsnscurrentlyinDandX(i.e.,replacewithnops) 123456789
addi r3⟵r1,1 bnez r3,targ st r6⟶[r7+4]
mul r10⟵r8,r9
addi r3⟵r1,1 bnez r3,targ st r6⟶[r7+4]
FDXMW FDXMW
F D — — —
F — — — —
mul r10⟵r8,r9 targ:add r4⟵r4,r5
CIS 501: Comp. Arch. | Dr. | Branch Prediction
Branch Performance
• Back of the envelope calculation
• Branch:20%,load:20%,store:10%,other:50% • Say, 75% of branches are taken
• CPI=1+20%*75%*2=
1 + 0.20 * 0.75 * 2 = 1.3
– Branches cause 30% slowdown
• Worse with deeper pipelines (higher mis-prediction penalty)
• Can we do better than assuming branch is not taken?
CIS 501: Comp. Arch. | Dr. | Branch Prediction 11
Dynamic Branch Prediction
Register File
• Dynamicbranchprediction:hardwareguessesoutcome • Start fetching from guessed address
• Flush on mis-prediction
CIS 501: Comp. Arch. | Dr. | Branch Prediction 12
Branch Prediction Performance
• Parameters
• Branch:20%,load:20%,store:10%,other:50% • 75% of branches are taken
• Dynamic branch prediction
• Branches predicted with 95% accuracy • CPI=1+20%*5%*2=1.02
CIS 501: Comp. Arch. | Dr. | Branch Prediction 13
Dynamic Branch Prediction Components
• Step #1: is it a branch? • Easy after decode…
• Step #2: is the branch taken or not taken?
• Direction predictor (applies to conditional branches only) • Predicts taken/not-taken
• Step #3: if the branch is taken, where does it go? • Easy after decode…
CIS 501: Comp. Arch. | Dr. | Branch Prediction 14
Branch Prediction Steps
• Which insn’s behavior are we trying to predict?
• Where does PC come from?
no branch?
is insn a yes T or NT?
Not Taken Taken
prediction source:
branch target buffer
direction predictor
predicted target
CIS 501: Comp. Arch. | Prof. | Branch Prediction 15
BRANCH TARGET PREDICTION
CIS 501: Comp. Arch. | Dr. | Branch Prediction 16
Revisiting Branch Prediction Components
• Step #1: is it a branch?
• Easy after decode… during fetch: predictor
• Step #2: is the branch taken or not taken? • Directionpredictor(later)
• Step #3: if the branch is taken, where does it go? • Branchtargetpredictor(BTB)
• Supplies target PC if branch is taken
CIS 501: Comp. Arch. | Dr. | Branch Prediction 17
Branch Target Buffer
• Learnfrompast,predictthefuture • Record the past in a hardware structure
• Branchtargetbuffer(BTB):
• “guess” the future PC based on past behavior
• “Last time the branch X was taken, it went to address Y”
• “So, in the future, if address X is fetched, fetch address Y next” • PC indexes table of bits target addresses
• Essentially: branch will go to same place it went last time
• What about aliasing?
• Two PCs with the same lower bits?
• No problem, just a prediction!
target target
predicted target
CIS 501: Comp. Arch. | Dr. | Branch Prediction
Branch Target Buffer (continued)
• At Fetch, how does insn know it’s a branch & should read BTB? It doesn’t have to…
• …allinsnsaccessBTBinparallelwithImemFetch
• Key idea: use BTB to predict which insn are branches • Implement by “tagging” each entry with its corresponding PC
• Update BTB on every taken branch insn, record target PC:
• BTB[PC].tag = PC, BTB[PC].target = target of branch • All insns access at Fetch in parallel with Imem
• Check for tag match, signifies insn at that PC is a branch
• Predicted PC = (BTB[PC].tag == PC) ? BTB[PC].target : PC+4
tag BTB target
predicted target
CIS 501: Comp. Arch. | Dr. | Branch Prediction
Why Does a BTB Work?
• Because most control insns use direct targets
• Target encoded in insn itself ® same “taken” target every time
• What about indirect targets?
• Target held in a register ® can be different each time • Two indirect call idioms
+ Dynamically linked functions (DLLs): target always the same
• Dynamically dispatched (virtual) functions: hard but uncommon • Also two indirect unconditional jump idioms
• Switches: hard but uncommon
– Function returns: hard and common but…
CIS 501: Comp. Arch. | Dr. | Branch Prediction 20
Return Address Stack (RAS)
tag target
predicted target
• Returnaddressstack(RAS)
• Call instruction? RAS[TopOfStack++] = PC+4
• Return instruction? Predicted-target = RAS[–TopOfStack]
• Q: how can you tell if an insn is a call/return before decoding it?
• Accessing RAS on every insn BTB-style doesn’t work
• Answer: another predictor (or put them in BTB marked as “return”)
• Or, pre-decode bits in insn mem, written when first executed CIS 501: Comp. Arch. | Dr. | Branch Prediction 21
BRANCH DIRECTION PREDICTION
CIS 501: Comp. Arch. | Dr. | Branch Prediction 22
Branch Direction Prediction
• Learn from past, predict the future • Recordthepastinahardwarestructure
• Direction predictor (DIRP)
• Mapconditional-branchPCtotaken/not-taken(T/N)decision • Individualconditionalbranchesoftenbiasedorweaklybiased
• 90%+ one way or the other considered “biased”
• Why? Loop back edges, checking for uncommon conditions
• Bimodal predictor: simplest predictor
• PCindexesBranchHistoryTableofbits(0=N,1=T),notags • Essentially:branchwillgosamewayitwentlasttime
PC 1:0 BHT T or NT
• Whataboutaliasing?
• Two PC with the same lower bits? • No problem, just a prediction!
CIS 501: Comp. Arch. | Dr. | Branch Prediction
Prediction (taken or not taken) 23
Bimodal Branch Predictor
• simplest direction predictor
• PCindexestableofbits(0=N,1=T), no tags
• Essentially: branch will go same way it went last time
• Problem: inner loop branch below for (i=0;i<100;i++)
for (j=0;j<3;j++)
// whatever
– Two “built-in” mis-predictions per inner loop iteration
– Branch predictor “changes its mind too quickly”
CIS 501: Comp. Arch. | Dr. | Branch Prediction
Outcome Prediction State
Two-Bit Saturating Counters (2bc)
• Two-bitsaturatingcounters(2bc) [Smith 1981]
• Replace each single-bit prediction • (0,1,2,3) = (N,n,t,T)
• Adds “hysteresis”
• Force predictor to mis-predict twice before “changing its mind”
• One mispredict each loop execution (rather than two)
+ Fixes this pathology (which is not contrived, by the way)
• Can we do even better?
CIS 501: Comp. Arch. | Dr. | Branch Prediction
Outcome Prediction State
Branches may be correlated
• Consider:
for (i=0; i<1000000; i++) {
if (i % 3 == 0) {
// Highly biased
// Locally correlated
// Unpredictable
if (random() % 2 == 0) {
if (i % 3 == 0) {
... // Globally correlated
CIS 501: Comp. Arch. | Dr. | Branch Prediction 28
Gshare History-Based Predictor
• Exploits observation that branch outcomes are correlated
• Maintains recent branch outcomes in Branch History Register (BHR)
• In addition to BHT of counters (typically 2-bit sat. counters) • How do we incorporate history into our predictions?
• Use PC xor BHR to index into BHT. Why?
CIS 501: Comp. Arch. | Dr. | Branch Prediction 29
direction prediction (T/NT)
Gshare History-based Predictor
• Gshare working example
• assume program has one branch
• BHT: one 1-bit DIRP entry
• 3BHR:last3branchoutcomes
• train counter, and update BHR after each branch
CIS 501: Comp. Arch. | Dr. | Branch Prediction 30
Outcome Prediction
State Time
Hybrid Predictor
• Hybrid(tournament)predictor[McFarling1993] • Attacks correlated predictor BHT capacity problem
• Idea: combine two predictors
• Simple bimodal predictor for history-independent branches • Correlated predictor for branches that need history
• Chooser assigns branches to one predictor or the other
• Branches start in simple BHT, move mis-prediction threshold
+ Correlated predictor can be made smaller, handles fewer branches + 90–95% accuracy
CIS 501: Comp. Arch. | Dr. | Branch Prediction 33
BHT chooser
REDUCING BRANCH PENALTY
CIS 501: Comp. Arch. | Dr. | Branch Prediction 34
Reducing Penalty: Fast Branches
Register File
• Fastbranch:candecideatD,notX
• Test must be comparison to zero or equality, no time for ALU
+ New taken branch penalty is 1
– Additional insns (slt) for more complex tests, must bypass to D too
CIS 371 (Martin): Pipelining 35
Reducing Penalty: Fast Branches
• Fastbranch:targetscontrol-hazardpenalty • Basically, branch insns that can resolve at D, not X
• Test must be comparison to zero or equality, no time for ALU + New taken branch penalty is 1
– Additional comparison insns (e.g., cmplt, slt) for complex tests
– Must bypass into decode stage now, too
bnez r3,targ
st r6⟶[r7+4] targ:add r4⟵r5,r4
123456789 FDXMW
F D -- -- -- FDXMW
CIS 501: Comp. Arch. | Dr. | Branch Prediction 37
Fast Branch Performance
• Assume: Branch: 20%, 75% of branches are taken • CPI=1+20%*75%*1=1+0.20*0.75*1=1.15 • 15% slowdown (better than the 30% from before)
• But wait, fast branches assume only simple comparisons • Fine for MIPS
• But not fine for ISAs with “branch if $1 > $2” operations
• In such cases, say 25% of branches require an extra insn
• CPI = 1 + (20% * 75% * 1) + 20%*25%*1(extra insn) = 1.2
• Example of ISA and micro-architecture interaction • Type of branch instructions
• Another option: “Delayed branch” or “branch delay slot”
• What about condition codes?
CIS 501: Comp. Arch. | Dr. | Branch Prediction 38
Putting It All Together
• BTB & branch direction predictor during fetch
tag target
predicted target
taken/not-taken
• If branch prediction correct, no taken branch penalty
CIS 501: Comp. Arch. | Dr. | Branch Prediction 39
Branch Prediction Performance
• Dynamic branch prediction • 20% of instruction branches
• Simple predictor: branches predicted with 75% accuracy • CPI=1+(20%*25%*2)=1.1
• More advanced predictor: 95% accuracy • CPI=1+(20%* 5%*2)=1.02
• Branch mis-predictions still a big problem though
• Pipelines are long: typical mis-prediction penalty is 10+ cycles
• For cores that do more per cycle, predictions more costly (later)
CIS 501: Comp. Arch. | Dr. | Branch Prediction 40
PREDICATION
CIS 501: Comp. Arch. | Dr. | Branch Prediction 41
Predication
• Instead of predicting which way we’re going, why not go both ways?
• compute a predicate bit indicating a condition • ISA includes predicated instructions
• predicated insns either execute as normal or as NOPs, depending on the predicate bit
• Examples
• x86 cmov performs conditional load/store
• 32b ARM allows almost all insns to be predicated
• 64b ARM has predicated reg-reg move, inc, dec, not • Nvidia’s CUDA ISA supports predication on most insns
• predicate bits are like LC4 NZP bits
• x86 FLAGS, ARM condition codes
CIS 501: Comp. Arch. | Dr. | Branch Prediction 42
Predication Performance
• Predication overhead is additional insns • Sometimes overhead is zero
• for if-then statement where condition is true – Most of the times it isn’t
• if-then-else statement, only one of the paths is useful
• Calculation for a given branch, predicate (vs speculate) if… • Average number of additional insns > overall mis-prediction penalty • For an individual branch
• Mis-prediction penalty in a 5-stage pipeline = 2
• Mis-prediction rate is <50%, and often <20%
• Overall mis-prediction penalty <1 and often <0.4
• So when is predication ever worth it?
CIS 501: Comp. Arch. | Dr. | Branch Prediction 43
Predication Performance
• What does predication actually accomplish?
• In a scalar 5-stage pipeline (penalty = 2): nothing
• In a 4-way superscalar 15-stage pipeline (penalty = 60): something
• Use when mis-predictions >10% and insn overhead <6 • In a 4-way out-of-order superscalar (penalty ~ 150)
• potentially useful in more situations
• Still: only useful for branches that mis-predict frequently
• Other predication advantages
• Low-power: eliminates the need for a large branch predictor • Real-time: predicated code performs consistently
• Predication disadvantages
• wasted time/energy compared to correct prediction • doesn’t nest well
CIS 501: Comp. Arch. | Dr. | Branch Prediction 44
• Control hazards
• Branch target prediction
• Branch direction prediction
System software
CIS 501: Comp. Arch. | Dr. | Branch Prediction 45
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com