CS代考 SPEC2000INT SPEC2000FP

Chapter 3: ILP and Its Exploitation
p 3.1 Concepts and Challenges
p 3.2 Basic Compiler Techniques
p 3.3 Reducing Branch Costs with Advanced Branch Prediction

p 3.4 Overcoming Data Hazards with Dynamic Scheduling
p 3.5 Hardware-based Speculation

Chapter 3: ILP and Its Exploitation
Control Speculation
p Execute instruction beyond a branch before the
branch is resolved à Performance p Speculative execution
p What if mis-speculated?
m Recovery mechanism
m Squash instructions on the incorrect path
p Branch prediction: Dynamic vs. Static Prof. ’s Slide

Chapter 3: ILP and Its Exploitation
Predict What?
p Direction (1-bit)
m Single direction for unconditional jumps and calls/returns. No need to
m Binary directions for conditional branches
p Target (32-bit or 64-bit addresses) m Some are easy
n One: Uni-directional jumps
n Two: Fall through (Not Taken) vs. Taken m Function Pointer or Indirect Jump (e.g. jr r31)
Prof. ’s Slide

Chapter 3: ILP and Its Exploitation
Categorizing Branches
Conditional Branch
Call/Return
0% 20% 40% 60%
Frequency of branch instructions
SPEC2000INT SPEC2000FP
Prof. ’s Slide
Source: H&P using Alpha

Chapter 3: ILP and Its Exploitation
Why Branch is Predictable?
for (i=0; i<100; i++) { addi r10, r0, 100 addi r1, r0, r0 L1: ... ... ...... addi r1, r1, 1 bne r1, r10, L1 ...... Prof. ’s Slide Chapter 3: ILP and Its Exploitation Static Branch Prediction p Uni-directional, always predict taken or not taken p Static prediction is used as a fall-back technique in some processors with dynamic branch when there is not any information for dynamic predictors to use Modified from Prof. ’s Slide Chapter 3: ILP and Its Exploitation Simplest Dynamic Branch Predictor p Prediction based on latest outcome p Index by some bits in the branch PC m Aliasing 1-bit Branch History Table for (i=0; i<100; i++) { .... addi r10, r0, 100 addi r1, r1, r0 addi r1, r1, 1 bne r1, r10, L1 ...... 0x40010100 0x40010104 0x40010108 ... 0x40010A04 0x40010A08 Prof. ’s Slide How accurate? Chapter 3: ILP and Its Exploitation Simplest Dynamic Branch Predictor for (i=0; i<100; i++) { if (a[i] == 0) { ... 0x40010100 0x40010104 0x40010108 0x4001010c 0x40010110 0x40010210 0x40010B0c 0x40010B10 1-bit Branch History Table addi r10, r0, 100 addi r1, r1, r0 L1: add r21, r20, r1 lw r2, (r21) beq r2, r0, L2 ...... ......... L3: addi r1, r1, 1 bne r1, r10, L1 Prof. ’s Slide Chapter 3: ILP and Its Exploitation Typical Table Organization PC (32 bits) Hash 2N entries table update FSM Update Logic Actual outcome Prof. ’s Slide Prediction Chapter 3: ILP and Its Exploitation FSM of the Simplest Predictor p A 2-state machine p Change mind fast Actual outcome If branch taken If branch not taken 0 Predict not taken 1 Predict taken Prof. ’s Slide Chapter 3: ILP and Its Exploitation Example using 1-bit branch history table for (i=0; i<5; i++) { .... Actual T T T T NT T T T T NT T addi r10, r0, 5 addi r1, r0, 0 L1: addi r1, r1, 1 bne r1, r10, L1 ÖÖÖNNÖÖÖNN Pred 0 1 1 1 1 0 1 1 1 1 0 1 Prof. ’s Slide Chapter 3: ILP and Its Exploitation 2-bit Saturating Up/Down Counter Predictor 10/ 11/ WT ST 01/ 00/ WN SN Taken Not Taken ST: Strongly Taken WT: Weakly Taken WN: Weakly Not Taken SN: Strongly Not Taken Predict Not taken Predict taken Prof. ’s Slide Chapter 3: ILP and Its Exploitation Example using 2-bit up/down counter for (i=0; i< .... Actual T T T T NT T T T T NT T addi r10, r0, 5 addi r1, r0, 0 L1: addi r1, r1, 1 bne r1, r10, L1 ÖÖÖNÖÖÖÖNÖ Pred 01 10 11 11 11 10 11 11 11 11 10 11 Prof. ’s Slide Chapter 3: ILP and Its Exploitation p 3.1 Concepts and Challenges p 3.2 Basic Compiler Techniques p 3.3 Reducing Branch Costs with Advanced Branch Prediction p 3.4 Overcoming Data Hazards with Dynamic Scheduling p 3.5 Hardware-based Speculation Chapter 3: ILP and Its Exploitation Motivation DIV.D F4, F0, F2 ADD.D F10, F4, F8 SUB.D F12, F6, F14 There is little parallelism among the instructions (if executed in order) Can we execute instructions out of order? To get more parallelism Yes! Out of Order Execution p There is a RAW hazard between DIV.D and ADD.D p ADD.D should be stalled p SUB.D is not dependent on previous instructions. Thus, it can be executed. But it is blocked by ADD.D Chapter 3: ILP and Its Exploitation Motivation Out of Order Execution Both Parallelism and Correctness! Traditionally: In- order issue In-order execution How do we ensure correctness of execution? But, Be Careful! Chapter 3: ILP and Its Exploitation Improvement Through Out-of-Order-Execution 􏰀hardware) p The ID stage is divided into two parts m Part 1) checking for any structural hazard m Part 2) waiting for the absence of a data hazard p Thus, in-order issue, out-of-order execution m We can use in-order instruction issue (instructions issued in program order), m But we want an instruction to begin execution as soon as its data operands are available. Chapter 3: ILP and Its Exploitation Issues with Out-of-Order Execution List all WAR and WAW hazards DIV.D ADD.D SUB.D MUL.D F0, F2, F4 F6, F0, F8 F8, F10, F14 F6, F10, F8 Chapter 3: ILP and Its Exploitation Dynamic Scheduling Dynamic Scheduling: Rearrange order of instructions to reduce stalls while maintaining data flow p Advantages: m Compiler doesn’t need to have knowledge of microarchitecture m Handles cases where dependencies are unknown at compile time p Disadvantage: m Substantial increase in hardware complexity m Complicates exceptions Chapter 3: ILP and Its Exploitation Two Algorithms for Dynamic Scheduling Scoreboard Approach • Adopted by CDC6600 in 1963 Tomasulo’s Approach • Tracks when operands are available • Introduces register renaming in hardware • Minimizes WAW and WAR hazards 程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts