代写代考 IS 501: Comp. Arch. | Dr. | Branch Prediction 1

Computer Organization and Design
Unit 7: Branch Prediction
Based on slides by Profs. , & C.J. IS 501: Comp. Arch. | Dr. | Branch Prediction 1

Copyright By PowCoder代写 加微信 powcoder

This Unit: Branch Prediction
• Control hazards
• Branch prediction
System software
CIS 501: Comp. Arch. | Dr. | Branch Prediction 2

• Chapter 4
CIS 501: Comp. Arch. | Dr. | Branch Prediction 3

Control Dependences and Branch Prediction
CIS 501: Comp. Arch. | Dr. | Branch Prediction 4

What About Branches?
Register File
• Branchspeculation
• Could just stall to wait for branch outcome (two-cycle penalty)
• Fetchpastbranchinsnsbeforebranchoutcomeisknown
• Default: assume “not-taken” (at fetch, can’t tell it’s a branch) CIS 501: Comp. Arch. | Dr. | Branch Prediction 5

Big Idea: Speculative Execution
• Speculation: “risky transactions on chance of profit”
• Speculativeexecution
• Execute before all parameters known with certainty • Correctspeculation
+ Avoid stall, improve performance
• Incorrectspeculation(mis-speculation)
– Must abort/flush/squash incorrect insns
– Must undo incorrect changes (recover pre-speculation state)
• Controlspeculation:speculationaimedatcontrolhazards • Unknown parameter: are these the correct insns to execute next?
CIS 501: Comp. Arch. | Dr. | Branch Prediction 6

Control Speculation Mechanics
• Guess branch target, start fetching at guessed position • Doing nothing is implicitly guessing target is PC+4
• We were already speculating before!
• Can actively guess other targets: dynamic branch prediction
• Execute branch to verify (check) guess • Correct speculation? keep going
• Mis-speculation? Flush mis-speculated insns
• Hopefully haven’t modified permanent state (Regfile, DMem) + Happens naturally in in-order 5-stage pipeline
CIS 501: Comp. Arch. | Dr. | Branch Prediction 7

When to Perform Branch Prediction?
• Option #1: During Decode
• Look at instruction opcode to determine branch instructions
• Can calculate next PC from instruction (for PC-relative branches)
– One cycle “mis-fetch” penalty even if branch predictor is correct
bnez r3,targ targ:add r4⟵r5,r4
123456789 FDXMW
• Option #2: During Fetch? • Howdowedothat?
CIS 501: Comp. Arch. | Dr. | Branch Prediction 8

Branch Recovery
Register File
• Branchrecovery:whattodowhenbranchisactuallytaken • Insns that are in F and D are wrong
• Flushthem,i.e.,replacethemwithnops
+ They haven’t written permanent state yet (regfile, DMem)
– Two cycle penalty for taken branches
CIS 501: Comp. Arch. | Dr. | Branch Prediction 9

Branch Speculation and Recovery
123456789 FDXMW
speculative
• Mis-speculationrecovery:whattodoonwrongguess • Not too painful in a short, in-order pipeline
• Branch resolves in X
+ Younger insns (in F, D) haven’t changed permanent state
• FlushinsnscurrentlyinDandX(i.e.,replacewithnops) 123456789
addi r3⟵r1,1 bnez r3,targ st r6⟶[r7+4]
mul r10⟵r8,r9
addi r3⟵r1,1 bnez r3,targ st r6⟶[r7+4]
FDXMW FDXMW
F D — — —
F — — — —
mul r10⟵r8,r9 targ:add r4⟵r4,r5
CIS 501: Comp. Arch. | Dr. | Branch Prediction

Branch Performance
• Back of the envelope calculation
• Branch:20%,load:20%,store:10%,other:50% • Say, 75% of branches are taken
• CPI=1+20%*75%*2=
1 + 0.20 * 0.75 * 2 = 1.3
– Branches cause 30% slowdown
• Worse with deeper pipelines (higher mis-prediction penalty)
• Can we do better than assuming branch is not taken?
CIS 501: Comp. Arch. | Dr. | Branch Prediction 11

Dynamic Branch Prediction
Register File
• Dynamicbranchprediction:hardwareguessesoutcome • Start fetching from guessed address
• Flush on mis-prediction
CIS 501: Comp. Arch. | Dr. | Branch Prediction 12

Branch Prediction Performance
• Parameters
• Branch:20%,load:20%,store:10%,other:50% • 75% of branches are taken
• Dynamic branch prediction
• Branches predicted with 95% accuracy • CPI=1+20%*5%*2=1.02
CIS 501: Comp. Arch. | Dr. | Branch Prediction 13

Dynamic Branch Prediction Components
• Step #1: is it a branch? • Easy after decode…
• Step #2: is the branch taken or not taken?
• Direction predictor (applies to conditional branches only) • Predicts taken/not-taken
• Step #3: if the branch is taken, where does it go? • Easy after decode…
CIS 501: Comp. Arch. | Dr. | Branch Prediction 14

Branch Prediction Steps
• Which insn’s behavior are we trying to predict?
• Where does PC come from?
no branch?
is insn a yes T or NT?
Not Taken Taken
prediction source:
branch target buffer
direction predictor
predicted target
CIS 501: Comp. Arch. | Prof. | Branch Prediction 15

BRANCH TARGET PREDICTION
CIS 501: Comp. Arch. | Dr. | Branch Prediction 16

Revisiting Branch Prediction Components
• Step #1: is it a branch?
• Easy after decode… during fetch: predictor
• Step #2: is the branch taken or not taken? • Directionpredictor(later)
• Step #3: if the branch is taken, where does it go? • Branchtargetpredictor(BTB)
• Supplies target PC if branch is taken
CIS 501: Comp. Arch. | Dr. | Branch Prediction 17

Branch Target Buffer
• Learnfrompast,predictthefuture • Record the past in a hardware structure
• Branchtargetbuffer(BTB):
• “guess” the future PC based on past behavior
• “Last time the branch X was taken, it went to address Y”
• “So, in the future, if address X is fetched, fetch address Y next” • PC indexes table of bits target addresses
• Essentially: branch will go to same place it went last time
• What about aliasing?
• Two PCs with the same lower bits?
• No problem, just a prediction!
target target
predicted target
CIS 501: Comp. Arch. | Dr. | Branch Prediction

Branch Target Buffer (continued)
• At Fetch, how does insn know it’s a branch & should read BTB? It doesn’t have to…
• …allinsnsaccessBTBinparallelwithImemFetch
• Key idea: use BTB to predict which insn are branches • Implement by “tagging” each entry with its corresponding PC
• Update BTB on every taken branch insn, record target PC:
• BTB[PC].tag = PC, BTB[PC].target = target of branch • All insns access at Fetch in parallel with Imem
• Check for tag match, signifies insn at that PC is a branch
• Predicted PC = (BTB[PC].tag == PC) ? BTB[PC].target : PC+4
tag BTB target
predicted target
CIS 501: Comp. Arch. | Dr. | Branch Prediction

Why Does a BTB Work?
• Because most control insns use direct targets
• Target encoded in insn itself ® same “taken” target every time
• What about indirect targets?
• Target held in a register ® can be different each time • Two indirect call idioms
+ Dynamically linked functions (DLLs): target always the same
• Dynamically dispatched (virtual) functions: hard but uncommon • Also two indirect unconditional jump idioms
• Switches: hard but uncommon
– Function returns: hard and common but…
CIS 501: Comp. Arch. | Dr. | Branch Prediction 20

Return Address Stack (RAS)
tag target
predicted target
• Returnaddressstack(RAS)
• Call instruction? RAS[TopOfStack++] = PC+4
• Return instruction? Predicted-target = RAS[–TopOfStack]
• Q: how can you tell if an insn is a call/return before decoding it?
• Accessing RAS on every insn BTB-style doesn’t work
• Answer: another predictor (or put them in BTB marked as “return”)
• Or, pre-decode bits in insn mem, written when first executed CIS 501: Comp. Arch. | Dr. | Branch Prediction 21

BRANCH DIRECTION PREDICTION
CIS 501: Comp. Arch. | Dr. | Branch Prediction 22

Branch Direction Prediction
• Learn from past, predict the future • Recordthepastinahardwarestructure
• Direction predictor (DIRP)
• Mapconditional-branchPCtotaken/not-taken(T/N)decision • Individualconditionalbranchesoftenbiasedorweaklybiased
• 90%+ one way or the other considered “biased”
• Why? Loop back edges, checking for uncommon conditions
• Bimodal predictor: simplest predictor
• PCindexesBranchHistoryTableofbits(0=N,1=T),notags • Essentially:branchwillgosamewayitwentlasttime
PC 1:0 BHT T or NT
• Whataboutaliasing?
• Two PC with the same lower bits? • No problem, just a prediction!
CIS 501: Comp. Arch. | Dr. | Branch Prediction
Prediction (taken or not taken) 23

Bimodal Branch Predictor
• simplest direction predictor
• PCindexestableofbits(0=N,1=T), no tags
• Essentially: branch will go same way it went last time
• Problem: inner loop branch below for (i=0;i<100;i++) for (j=0;j<3;j++) // whatever – Two “built-in” mis-predictions per inner loop iteration – Branch predictor “changes its mind too quickly” CIS 501: Comp. Arch. | Dr. | Branch Prediction Outcome Prediction State Two-Bit Saturating Counters (2bc) • Two-bitsaturatingcounters(2bc) [Smith 1981] • Replace each single-bit prediction • (0,1,2,3) = (N,n,t,T) • Adds “hysteresis” • Force predictor to mis-predict twice before “changing its mind” • One mispredict each loop execution (rather than two) + Fixes this pathology (which is not contrived, by the way) • Can we do even better? CIS 501: Comp. Arch. | Dr. | Branch Prediction Outcome Prediction State Branches may be correlated • Consider: for (i=0; i<1000000; i++) { if (i % 3 == 0) { // Highly biased // Locally correlated // Unpredictable if (random() % 2 == 0) { if (i % 3 == 0) { ... // Globally correlated CIS 501: Comp. Arch. | Dr. | Branch Prediction 28 Gshare History-Based Predictor • Exploits observation that branch outcomes are correlated • Maintains recent branch outcomes in Branch History Register (BHR) • In addition to BHT of counters (typically 2-bit sat. counters) • How do we incorporate history into our predictions? • Use PC xor BHR to index into BHT. Why? CIS 501: Comp. Arch. | Dr. | Branch Prediction 29 direction prediction (T/NT) Gshare History-based Predictor • Gshare working example • assume program has one branch • BHT: one 1-bit DIRP entry • 3BHR:last3branchoutcomes • train counter, and update BHR after each branch CIS 501: Comp. Arch. | Dr. | Branch Prediction 30 Outcome Prediction State Time Hybrid Predictor • Hybrid(tournament)predictor[McFarling1993] • Attacks correlated predictor BHT capacity problem • Idea: combine two predictors • Simple bimodal predictor for history-independent branches • Correlated predictor for branches that need history • Chooser assigns branches to one predictor or the other • Branches start in simple BHT, move mis-prediction threshold + Correlated predictor can be made smaller, handles fewer branches + 90–95% accuracy CIS 501: Comp. Arch. | Dr. | Branch Prediction 33 BHT chooser REDUCING BRANCH PENALTY CIS 501: Comp. Arch. | Dr. | Branch Prediction 34 Reducing Penalty: Fast Branches Register File • Fastbranch:candecideatD,notX • Test must be comparison to zero or equality, no time for ALU + New taken branch penalty is 1 – Additional insns (slt) for more complex tests, must bypass to D too CIS 371 (Martin): Pipelining 35 Reducing Penalty: Fast Branches • Fastbranch:targetscontrol-hazardpenalty • Basically, branch insns that can resolve at D, not X • Test must be comparison to zero or equality, no time for ALU + New taken branch penalty is 1 – Additional comparison insns (e.g., cmplt, slt) for complex tests – Must bypass into decode stage now, too bnez r3,targ st r6⟶[r7+4] targ:add r4⟵r5,r4 123456789 FDXMW F D -- -- -- FDXMW CIS 501: Comp. Arch. | Dr. | Branch Prediction 37 Fast Branch Performance • Assume: Branch: 20%, 75% of branches are taken • CPI=1+20%*75%*1=1+0.20*0.75*1=1.15 • 15% slowdown (better than the 30% from before) • But wait, fast branches assume only simple comparisons • Fine for MIPS • But not fine for ISAs with “branch if $1 > $2” operations
• In such cases, say 25% of branches require an extra insn
• CPI = 1 + (20% * 75% * 1) + 20%*25%*1(extra insn) = 1.2
• Example of ISA and micro-architecture interaction • Type of branch instructions
• Another option: “Delayed branch” or “branch delay slot”
• What about condition codes?
CIS 501: Comp. Arch. | Dr. | Branch Prediction 38

Putting It All Together
• BTB & branch direction predictor during fetch
tag target
predicted target
taken/not-taken
• If branch prediction correct, no taken branch penalty
CIS 501: Comp. Arch. | Dr. | Branch Prediction 39

Branch Prediction Performance
• Dynamic branch prediction • 20% of instruction branches
• Simple predictor: branches predicted with 75% accuracy • CPI=1+(20%*25%*2)=1.1
• More advanced predictor: 95% accuracy • CPI=1+(20%* 5%*2)=1.02
• Branch mis-predictions still a big problem though
• Pipelines are long: typical mis-prediction penalty is 10+ cycles
• For cores that do more per cycle, predictions more costly (later)
CIS 501: Comp. Arch. | Dr. | Branch Prediction 40

PREDICATION
CIS 501: Comp. Arch. | Dr. | Branch Prediction 41

Predication
• Instead of predicting which way we’re going, why not go both ways?
• compute a predicate bit indicating a condition • ISA includes predicated instructions
• predicated insns either execute as normal or as NOPs, depending on the predicate bit
• Examples
• x86 cmov performs conditional load/store
• 32b ARM allows almost all insns to be predicated
• 64b ARM has predicated reg-reg move, inc, dec, not • Nvidia’s CUDA ISA supports predication on most insns
• predicate bits are like LC4 NZP bits
• x86 FLAGS, ARM condition codes
CIS 501: Comp. Arch. | Dr. | Branch Prediction 42

Predication Performance
• Predication overhead is additional insns • Sometimes overhead is zero
• for if-then statement where condition is true – Most of the times it isn’t
• if-then-else statement, only one of the paths is useful
• Calculation for a given branch, predicate (vs speculate) if… • Average number of additional insns > overall mis-prediction penalty • For an individual branch
• Mis-prediction penalty in a 5-stage pipeline = 2
• Mis-prediction rate is <50%, and often <20% • Overall mis-prediction penalty <1 and often <0.4 • So when is predication ever worth it? CIS 501: Comp. Arch. | Dr. | Branch Prediction 43 Predication Performance • What does predication actually accomplish? • In a scalar 5-stage pipeline (penalty = 2): nothing • In a 4-way superscalar 15-stage pipeline (penalty = 60): something • Use when mis-predictions >10% and insn overhead <6 • In a 4-way out-of-order superscalar (penalty ~ 150) • potentially useful in more situations • Still: only useful for branches that mis-predict frequently • Other predication advantages • Low-power: eliminates the need for a large branch predictor • Real-time: predicated code performs consistently • Predication disadvantages • wasted time/energy compared to correct prediction • doesn’t nest well CIS 501: Comp. Arch. | Dr. | Branch Prediction 44 • Control hazards • Branch target prediction • Branch direction prediction System software CIS 501: Comp. Arch. | Dr. | Branch Prediction 45 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com