Chapter 3: ILP and Its Exploitation
p 3.1 Concepts and Challenges
p 3.2 Basic Compiler Techniques
p 3.3 Reducing Branch Costs with Advanced Branch Prediction
Copyright By PowCoder代写 加微信 powcoder
p 3.4 Overcoming Data Hazards with Dynamic Scheduling
p 3.5 Hardware-based Speculation
Chapter 3: ILP and Its Exploitation
Control Speculation
p Execute instruction beyond a branch before the
branch is resolved à Performance p Speculative execution
p What if mis-speculated?
m Recovery mechanism
m Squash instructions on the incorrect path
p Branch prediction: Dynamic vs. Static Prof. ’s Slide
Chapter 3: ILP and Its Exploitation
Predict What?
p Direction (1-bit)
m Single direction for unconditional jumps and calls/returns. No need to
m Binary directions for conditional branches
p Target (32-bit or 64-bit addresses) m Some are easy
n One: Uni-directional jumps
n Two: Fall through (Not Taken) vs. Taken m Function Pointer or Indirect Jump (e.g. jr r31)
Prof. ’s Slide
Chapter 3: ILP and Its Exploitation
Categorizing Branches
Conditional Branch
Call/Return
0% 20% 40% 60%
Frequency of branch instructions
SPEC2000INT SPEC2000FP
Prof. ’s Slide
Source: H&P using Alpha
Chapter 3: ILP and Its Exploitation
Why Branch is Predictable?
for (i=0; i<100; i++) {
addi r10, r0, 100 addi r1, r0, r0
L1: ... ... ......
addi r1, r1, 1 bne r1, r10, L1 ......
Prof. ’s Slide
Chapter 3: ILP and Its Exploitation
Static Branch Prediction
p Uni-directional, always predict taken or not taken
p Static prediction is used as a fall-back technique in some processors with dynamic branch when there is not any information for dynamic predictors to use
Modified from Prof. ’s Slide
Chapter 3: ILP and Its Exploitation
Simplest Dynamic Branch Predictor p Prediction based on latest outcome
p Index by some bits in the branch PC m Aliasing
1-bit Branch History Table
for (i=0; i<100; i++) { ....
addi r10, r0, 100 addi r1, r1, r0
addi r1, r1, 1 bne r1, r10, L1 ......
0x40010100 0x40010104
0x40010108
... 0x40010A04 0x40010A08
Prof. ’s Slide
How accurate?
Chapter 3: ILP and Its Exploitation
Simplest Dynamic Branch Predictor for (i=0; i<100; i++) {
if (a[i] == 0) { ...
0x40010100 0x40010104
0x40010108 0x4001010c 0x40010110
0x40010210
0x40010B0c 0x40010B10
1-bit Branch History Table
addi r10, r0, 100
addi r1, r1, r0 L1:
add r21, r20, r1 lw r2, (r21) beq r2, r0, L2 ......
......... L3:
addi r1, r1, 1 bne r1, r10, L1
Prof. ’s Slide
Chapter 3: ILP and Its Exploitation
Typical Table Organization
PC (32 bits) Hash
2N entries
table update
FSM Update Logic
Actual outcome
Prof. ’s Slide
Prediction
Chapter 3: ILP and Its Exploitation
FSM of the Simplest Predictor
p A 2-state machine p Change mind fast
Actual outcome
If branch taken
If branch not taken
0 Predict not taken
1 Predict taken
Prof. ’s Slide
Chapter 3: ILP and Its Exploitation
Example using 1-bit branch history table
for (i=0; i<5; i++) { ....
Actual T T T T NT T T T T NT T
addi r10, r0, 5
addi r1, r0, 0 L1:
addi r1, r1, 1 bne r1, r10, L1
ÖÖÖNNÖÖÖNN Pred 0 1 1 1 1 0 1 1 1 1 0 1
Prof. ’s Slide
Chapter 3: ILP and Its Exploitation
2-bit Saturating Up/Down Counter Predictor
10/ 11/ WT ST
01/ 00/ WN SN
Taken Not Taken
ST: Strongly Taken WT: Weakly Taken WN: Weakly Not Taken SN: Strongly Not Taken
Predict Not taken Predict taken
Prof. ’s Slide
Chapter 3: ILP and Its Exploitation
Example using 2-bit up/down counter
for (i=0; i< ....
Actual T T T T NT T T T T NT T
addi r10, r0, 5
addi r1, r0, 0 L1:
addi r1, r1, 1 bne r1, r10, L1
ÖÖÖNÖÖÖÖNÖ
Pred 01 10 11 11 11 10 11 11 11 11 10 11
Prof. ’s Slide
Chapter 3: ILP and Its Exploitation
p 3.1 Concepts and Challenges
p 3.2 Basic Compiler Techniques
p 3.3 Reducing Branch Costs with Advanced Branch Prediction
p 3.4 Overcoming Data Hazards with Dynamic Scheduling
p 3.5 Hardware-based Speculation
Chapter 3: ILP and Its Exploitation
Motivation
DIV.D F4, F0, F2 ADD.D F10, F4, F8 SUB.D F12, F6, F14
There is little parallelism among the instructions (if executed in order)
Can we execute instructions out of order? To get more parallelism
Yes! Out of Order Execution
p There is a RAW hazard between DIV.D and ADD.D
p ADD.D should be stalled
p SUB.D is not dependent on previous instructions. Thus, it can be executed. But it is blocked by ADD.D
Chapter 3: ILP and Its Exploitation
Motivation
Out of Order Execution
Both Parallelism and Correctness!
Traditionally: In- order issue In-order execution
How do we ensure correctness of execution?
But, Be Careful!
Chapter 3: ILP and Its Exploitation
Improvement Through
Out-of-Order-Execution hardware)
p The ID stage is divided into two parts
m Part 1) checking for any structural hazard
m Part 2) waiting for the absence of a data hazard
p Thus, in-order issue, out-of-order execution m We can use in-order instruction issue (instructions
issued in program order),
m But we want an instruction to begin execution as soon as its data operands are available.
Chapter 3: ILP and Its Exploitation
Issues with Out-of-Order Execution
List all WAR and WAW hazards
DIV.D ADD.D SUB.D MUL.D
F0, F2, F4 F6, F0, F8
F8, F10, F14 F6, F10, F8
Chapter 3: ILP and Its Exploitation
Dynamic Scheduling
Dynamic Scheduling:
Rearrange order of instructions to reduce stalls while maintaining data flow
p Advantages:
m Compiler doesn’t need to have knowledge of
microarchitecture
m Handles cases where dependencies are unknown at compile time
p Disadvantage:
m Substantial increase in hardware complexity m Complicates exceptions
Chapter 3: ILP and Its Exploitation
Two Algorithms for Dynamic Scheduling
Scoreboard Approach
• Adopted by CDC6600 in 1963
Tomasulo’s Approach
• Tracks when operands are available
• Introduces register renaming in hardware
• Minimizes WAW and WAR hazards
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com