Chapter 3: ILP and Its Exploitation
p 3.1 Concepts and Challenges
p 3.2 Basic Compiler Techniques
p 3.3 Reducing Branch Costs with Advanced Branch Prediction
Copyright By PowCoder代写 加微信 powcoder
p 3.4 Overcoming Data Hazards with Dynamic Scheduling
m Scoreboarding Algorithm m Tomasulo’s Algorithm
p 3.5 Hardware-based Speculation
Chapter 3: ILP and Its Exploitation
Scoreboard: a bookkeeping technique p Out-of-order execution divides ID stage:
1. Issue—decodeinstructions,checkforstructuralhazards
2. Read operands—wait until no data hazards, then read operands
p Scoreboards date back to CDC6600 in 1963
p Instructions execute whenever not dependent on
previous instructions and no hazards.
p CDC 6600: In order issue, out-of-order execution,
out-of-order commit (or completion) m No forwarding!
m Imprecise interrupt/exception model for now
Chapter 3: ILP and Its Exploitation
Scoreboard Architecture (CDC 6600)
SCOREBOARD
FP Divide FP Add
Functional Units
Chapter 3: ILP and Its Exploitation
Scoreboard Implications
p The key question: Out-of-order completion => WAR, WAW hazards?
p Solutions for WAR:
m Detect hazard and stall writeback until registers have been read
m Read registers only during Read Operands stage p Solution for WAW:
m Detect hazard and stall issue of new instruction until other instruction completes
p Need to have multiple instructions in execution phase => multiple execution units or pipelined execution units
p Scoreboard keeps track of dependencies between instructions that have already issued.
Chapter 3: ILP and Its Exploitation
Four Stages of Scoreboard Control
p 1) Issue—decode instructions & check for structural hazards (ID1)
m Instructions issued in program order (for hazard checking)
m Don’t issue if structural hazard
m Don’t issue if instruction is output dependent on any previously issued but uncompleted instruction (no WAW hazards)
p 2) Read operands—wait until no data hazards, then read operands (ID2)
m All real dependencies (RAW hazards) resolved in this stage, since we wait for instructions to write back data.
m No forwarding of data in this model!
Chapter 3: ILP and Its Exploitation
Four Stages of Scoreboard Control
p 3) Execution—operate on operands (EX)
m The functional unit begins execution upon receiving operands. When the
result is ready, it notifies the scoreboard that it has completed execution. p 4) Write result—finish execution (WB)
m Stall until no WAR hazards with previous instructions:
Example: DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F8,F8,F14
CDC 6600 scoreboard would stall SUBD until ADDD reads operands
Chapter 3: ILP and Its Exploitation
Three Parts of the Scoreboard
p Instruction status:
Which of 4 steps the instruction is in
p Functional unit status:—Indicates the state of the functional unit (FU). 9 fields for each functional unit
p Busy: Indicates whether the unit is busy or not
Op: Operation to perform in the unit (e.g., + or –)
Fi: Destination register
Fj,Fk: Source-register numbers
Qj,Qk: Functional units producing source registers Fj, ,Rk: Flags indicating when Fj, Fk are ready
p Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register
Chapter 3: ILP and Its Exploitation
Scoreboard Example
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer Mult1 Mult2 Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
F0 F2 F4 F6 F8 F10 F12 … F30
dest S1 S2
No No No No No
Chapter 3: ILP and Its Exploitation
Detailed Scoreboard Pipeline Control
Instruction status
Wait until
Bookkeeping
Not busy (FU) and not result(D)
Busy(FU)¬ yes; Op(FU)¬ op; Fi(FU)¬ `D’; Fj(FU)¬ `S1’; Fk(FU)¬ `S2’; Qj¬ Result(‘S1’); Qk¬ Result(`S2’); Rj¬ not Qj; Rk¬ not Qk; (‘D’)¬ FU;
Read operands
Rj and ¬ No; Rk¬ No
Execution complete
Functional unit done
Write result
“f((Fj(f)≠Fi(FU) or Rj(f)=No) & (Fk(f) ≠Fi(FU) or Rk( f )=No))
“f(if Qj(f)=FU then Rj(f)¬ Yes); “f(if Qk(f)=FU then Rj(f)¬ Yes); Result(Fi(FU))¬ Ø; Busy(FU)¬ No
Result() denotes the content in the register status
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 1
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer Mult1 Mult2 Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
dest S1 S2 FU
FU Fj? Fk?
Yes Load F6 R2 Yes No
F0 F2 F4 F6 F8 F10 F12 … F30
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 2
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer Mult1 Mult2 Add Divide
Register result status:
2 FU • Issue 2nd LD?
Read Exec Write Issue Oper Comp Result
dest S1 S2
Yes Load F6 R2 Yes No
F0 F2 F4 F6 F8 F10 F12 … F30
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 3
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer Mult1 Mult2 Add Divide
Register result status:
3 FU • Issue MULT?
Read Exec Write Issue Oper Comp Result
dest S1 S2
Yes Load F6 R2 No No
F0 F2 F4 F6 F8 F10 F12 … F30
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 4
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer Mult1 Mult2 Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
dest S1 S2
FU FU Qj ? Fk?
No No No No No
F0 F2 F4 F6 F8 F10 F12 … F30
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 5
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Mult1 Mult2 Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
dest S1 S2
Yes Load F2 R3 Yes No
F0 F2 F4 F6 F8 F10 F12 … F30
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 6
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Mult1 Mult2 Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
dest S1 S2
Yes Load F2 R3 Yes Yes Mult F0 F2 F4 Integer No Yes
F0 F2 F4 F6 F8 F10 F12 … F30
Mult1 Integer
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 7
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Mult1 Mult2 Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
dest S1 S2
Yes Load F2 Yes Mult F0
Yes Sub F8
• Read multiply operands?
F0 F2 F4 F6 F8 F10 F12 … F30
Mult1 Integer Add
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 8a (First half of clock cycl
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Mult1 Mult2 Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
dest S1 S2
Yes Load F2 R3 Yes Mult F0 F2 F4
Yes Sub F8 F6 F2 Yes Div F10 F0 F6
Integer Yes No No Yes
F0 F2 F4 F6 F8 F10 F12 … F30
Mult1 Integer Add Divide
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 8b (Second half of clock c
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer Mult1 Mult2 Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
dest S1 S2
No Yes No Yes Yes
Mult F0 F2 F4
Sub F8 F6 F2
Div F10 F0 F6
F0 F2 F4 F6 F8 F10 F12 … F30
Mult1 Add Divide
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 9
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer 10 Mult1 Mult2
2 Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
dest S1 S2
No Yes No Yes Yes
Mult F0 F2 F4
Sub F8 F6 F2
Div F10 F0 F6
F0 F2 F4 F6 F8 F10 F12 … F30
Mult1 Add Divide
• Read operands for MULT & SUB? Issue ADDD?
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 10
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer 9 Mult1 Mult2
1 Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
dest S1 S2
Yes Mult F0 F2 F4
No Yes Yes
Sub F8 F6 F2
Div F10 F0 F6
F0 F2 F4 F6 F8 F10 F12 … F30
Mult1 Add Divide
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 11
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer 8 Mult1 Mult2
0 Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
dest S1 S2
Yes Mult F0 F2 F4
No Yes Yes
Sub F8 F6 F2
Div F10 F0 F6
F0 F2 F4 F6 F8 F10 F12 … F30
Mult1 Add Divide
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 12
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer 7 Mult1 Mult2
Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
7 9 11 12 8
dest S1 S2
Yes Mult F0 F2 F4 No No
Yes Div F10 F0 F6 Mult1 No Yes
• Read operands for DIVD?
F0 F2 F4 F6 F8 F10 F12 … F30
Mult1 Divide
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 13
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer 6 Mult1 Mult2
Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
7 9 11 12 8
dest S1 S2
No Yes No Yes Yes
Mult F0 F2 F4
Add F6 F8 F2
Div F10 F0 F6
F0 F2 F4 F6 F8 F10 F12 … F30
Mult1 Add Divide
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 14
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer 5 Mult1 Mult2
2 Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
7 9 11 12 8
dest S1 S2
No Yes No Yes Yes
Mult F0 F2 F4
Add F6 F8 F2
Div F10 F0 F6
F0 F2 F4 F6 F8 F10 F12 … F30
Mult1 Add Divide
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 15
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer 4 Mult1 Mult2
1 Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
7 9 11 12 8
dest S1 S2
Yes Mult F0 F2 F4
No Yes Yes
Add F6 F8 F2
Div F10 F0 F6
F0 F2 F4 F6 F8 F10 F12 … F30
Mult1 Add Divide
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 16
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer 3 Mult1 Mult2
0 Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
7 9 11 12 8
dest S1 S2
Yes Mult F0 F2 F4
No Yes Yes
Add F6 F8 F2
Div F10 F0 F6
F0 F2 F4 F6 F8 F10 F12 … F30
Mult1 Add Divide
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 17
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer 2 Mult1 Mult2
Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
7 9 11 12 8
dest S1 S2
WAR Hazard!
Yes Mult F0 F2 F4
No Yes Yes
Add F6 F8 F2
Div F10 F0 F6
Mult1 Add Divide
• Why not write result of ADD???
F0 F2 F4 F6 F8 F10 F12 … F30
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 18
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer 1 Mult1 Mult2
Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
7 9 11 12 8
dest S1 S2
Yes Mult F0 F2 F4
No Yes Yes
Add F6 F8 F2
Div F10 F0 F6
F0 F2 F4 F6 F8 F10 F12 … F30
Mult1 Add Divide
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 19
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer 0 Mult1 Mult2
Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
5678 6 9 19
7 9 11 12 8
dest S1 S2
Yes Mult F0 F2 F4
No Yes Yes
Add F6 F8 F2
Div F10 F0 F6
F0 F2 F4 F6 F8 F10 F12 … F30
Mult1 Add Divide
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 20
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer Mult1 Mult2 Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
5678 6 9 19 20 7 9 11 12 8
dest S1 S2
Yes Add F6 F8 F2 No No Yes Div F10 F0 F6 Yes Yes
F0 F2 F4 F6 F8 F10 F12 … F30
Add Divide
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 21
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer Mult1 Mult2 Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
5678 6 9 19 20 7 9 11 12 8 21
dest S1 S2
Yes Add F6 F8 F2 No No Yes Div F10 F0 F6 Yes Yes
• WAR Hazard is now gone…
F0 F2 F4 F6 F8 F10 F12 … F30
Add Divide
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 22
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Time Name Busy
Read Exec Write Issue Oper CompResult
5678 6 9 19 20 7 9 11 12 8 21
13 14 16 22
dest S1 S2
No No No No
Yes Div F10 F0 F6 No No
Integer Mult1 Mult2 Add
Register result status:
F0 F2 F4 F6 F8 F10 F12 … F30
Chapter 3: ILP and Its Exploitation
Faster than light computation
(skip a couple of cycles)
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 61
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer Mult1 Mult2 Add
Register result status:
Read Exec Write Issue Oper Comp Result
5678 6 9 19 20 7 9 11 12 8 21 61
13 14 16 22
dest S1 S2
No No No No
Yes Div F10 F0 F6 No No
F0 F2 F4 F6 F8 F10 F12 … F30
Chapter 3: ILP and Its Exploitation
Scoreboard Example: Cycle 62
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer Mult1 Mult2 Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
5678 6 9 19 20 7 9 11 12 8 21 61 62
13 14 16 22
dest S1 S2
No No No No No
F0 F2 F4 F6 F8 F10 F12 … F30
Chapter 3: ILP and Its Exploitation
Review: Scoreboard Example: Cycle 62
Instruction status:
Instruction j k LD F634+R2 LD F245+R3 MULTD F0 F2 F4 SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2
Functional unit status:
Integer Mult1 Mult2 Add Divide
Register result status:
Read Exec Write Issue Oper Comp Result
5678 6 9 19 20 7 9 11 12 8 21 61 62
13 14 16 22
dest S1 S2
No No No No No
F0 F2 F4 F6 F8 F10 F12 … F30
• In-order issue; out-of-order execute & commit
Chapter 3: ILP and Its Exploitation
Limitations of 6600 scoreboard: p No forwarding hardware
p Limited to instructions in basic block (small window)
p Small number of functional units (structural hazards), especially integer/load store units
p Do not issue on structural hazards p Wait for WAR hazards
p Prevent WAW hazards
Chapter 3: ILP and Its Exploitation
p 3.1 Concepts and Challenges
p 3.2 Basic Compiler Techniques
p 3.3 Reducing Branch Costs with Advanced Branch Prediction
p 3.4 Overcoming Data Hazards with Dynamic Scheduling
m Scoreboarding Algorithm m Tomasulo’s Algorithm
p 3.5 Hardware-based Speculation
Chapter 3: ILP and Its Exploitation
Tomasulo’s Algorithm – Overview
p A technique for allowing instructions to execute out of order when there are sufficient resources and no data dependences
p Can be extended to handle antidependences and output dependences by effectively renaming the registers dynamically
p Can be extended to handle speculation
m A technique to reduce the effect of control dependences by
predicting the outcome of a branch
m IBM 360/91
m Invented by
Chapter 3: ILP and Its Exploitation
Dynamic Algorithm: Tomasulo’s Algorithm
p For IBM 360/91– 3 years after CDC
p Goal: High Performance without special compilers
p Small number of floating point registers (4 in 360) prevented interesting compiler scheduling of operations
mThis led Tomasulo to try to figure o
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com