CS2305: Computer Architecture
Pipelining
(Computer Architecture: Chapter 3 & Appendix C)
Copyright By PowCoder代写 加微信 powcoder
Department of Computer Science and Engineering
Shanghai University
Chapter 3: Pipelining
Parallel Processing
Chapter 3: Pipelining
pC.1 Introduction to Pipelining pHow Pipeline is Implemented pPipeline Hazards
p Exceptions
pHandling Multicycle Operations
Chapter 3: Pipelining
Processor Performance
p Performance of single-cycle processor is limited by the long critical path delay
m The critical path limits the operating clock frequency
p Can we do better?
m New semiconductor technology will reduce the critical path delay by manufacturing small- sized transistors
n Core 2 Duo is manufactured with 65nm technology n Core i7 is manufactured with 45nm technology
n Next semiconductor technology is 32nm technology
m Can we increase the processor performance with a different microarchitecture?
n Yes! Pipelining
Chapter 3: Pipelining
Revisiting Performance
• Laundry Example
§ Ann, Brian, Cathy, Dave each has one load of clothes to wash, dry, and fold
§ Washer takes 30 minutes § Dryer takes 40 minutes
§ Folder takes 20 minutes
Chapter 3: Pipelining
Sequential Laundry
6PM 7 8 9 10 11 Midnight
30 40 20 30 40 20 30 40 20 30 40 20 A
• Response time: 90 mins
• Throughput:
0.67 tasks / hr (= 90mins/task, 6 hours for 4 loads)
Chapter 3: Laundry
6PM 7 8 9 10 11 Midnight Time
30 40 40 40 40 20 A
• Pipelining doesn’t help latency (response time) of a single task
• Pipelining helps throughput of entire workload
• Multiple tasks operating simultaneously
Potential speedup = # of pipeline stages
Unbalanced lengths of pipeline stages reduce speedup
• Throughput:
Response time: 90 mins
1.14 tasks / hr (= 52.5 mins/task, 3.5 hours for 4 loads)
Chapter 3:
Instruction Fetch
Register File Access (Read)
ALU Operation
Data Access
Register Access (Write)
execution Time 2 4 6 8 10 12 14 16 18
Data access
Instruction fetch
Sequential
(in instructions)
lw $1, 100($0) lw $2, 200($0) lw $3, 300($0)
Program execution order
(in instructions)
lw $1, 100($0) lw $2, 200($0) lw $3, 300($0)
Data access
2 4 6 8 10 12 14
Instruction fetch
Instruction fetch
Instruction fetch
Data access
Instruction fetch
Data access
Instruction fetch
Data access
Improve performance by increasing instruction throughput
Chapter 3: (Cont.)
Program execution order
(in instructions)
lw $1, 100($0) lw $2, 200($0) lw $3, 300($0)
Multiple instructions are being executed simultaneously
2 4 6 8 10 12 14
Instruction fetch
Data access
Instruction fetch
Data access
Instruction fetch
Data access
Instruction fetch
Data access
Instruction fetch
Data access
Instruction fetch
Data access
Instruction fetch
Instruction fetch
Data access
Data access
Instruction fetch
Data access
Chapter 3: (Cont.)
Pipeline Speedup
Time to execute an instructionpipeline =
Time to execute an instructionsequential Number of stages
• If not balanced, speedup is less
• Speedup comes from increased throughput
• the latency of instruction does not decrease
Chapter 3: Pipelining
pIntroduction to Pipelining
pC.2 How Pipeline is Implemented pPipeline Hazards
p Exceptions
pHandling Multicycle Operations
Chapter 3: Pipelining
Basic Idea
IF: Instruction fetch
ID: Instruction decode/ register file read
EX: Execute/ address calculation
MEM: Memory access
WB: Write back
Shift left 2
Add Add result
ALU ALU result
Instruction memory
Instruction
Read register 1
Read register 2
Read data1
Write Registers Read
register data 2
Write data
Data memory
Write data
Sign extend
What do we have to add to actually split the datapath into stages?
Chapter 3: Pipelining
Basic Idea
IF: Instruction fetch
ID: Instruction decode/ register file read
EX: Execute/ address calculation
MEM: Memory access
WB: Write back
Shift left 2
Add Add result
ALU ALU result
Instruction memory
Instruction
Read register 1
register 2
Write Registers Read
register data 2
Write data
Read data 1
Data memory
Write data
Sign extend
Chapter 3: Pipelining
Graphically Representing Pipelines
2 4 6 8 10
p Shading indicates the unit is being used by the instruction
p Shading on the right half of the register file (ID or WB) or memory
means the element is being read in that stage
p Shading on the left half means the element is being written in that stage
Chapter 3: Datapath
ID/EX EX/MEM
Add Add result
Shift left 2
ALU ALU result
Instruction memory
Read register 1
Read data1
register 2
Write Registers Read
register data 2
Write data
16 Sign 32 extend
Data memory
Write data
Instruction
Chapter 3: Pipelining
lw: Instruction Fetch (IF) Instruction fetch
ID/EX EX/MEM
Add Add result
ALU ALU result
Shift left 2
Instruction memory
Read register 1
Read data1
register 2
Write Registers Read
register data 2
Write data
Data memory
Write data
16 Sign 32 extend
Instruction
Chapter 3: Pipelining
lw: Instruction Decode (ID) Instruction decode
ID/EX EX/MEM
Add Add result
ALU ALU result
Shift left 2
Instruction memory
Read register 1
Read data1
register 2
Write Registers Read
register data 2
Write data
Data memory
Write data
16 Sign 32 extend
Instruction
Chapter 3: Pipelining
lw: Execution (EX) Execution
ID/EX EX/MEM
Add Add result
ALU ALU result
Shift left 2
Instruction memory
Read register 1
Read data1
register 2
Write Registers Read
register data 2
Write data
Data memory
Write data
16 Sign 32 extend
Instruction
Chapter 3: Pipelining
lw: Memory (MEM)
Add Add result
ALU ALU result
Shift left 2
Instruction memory
Read register 1
Read data1
register 2
Write Registers Read
register data 2
Write data
Data memory
Write data
16 Sign 32 extend
Instruction
Chapter 3: Pipelining
lw: Writeback (WB)
ID/EX EX/MEM
Add Add result
ALU ALU result
Shift left 2
Instruction memory
Read register 1
Read data1
register 2
Write Registers Read
register data 2
Write data
Data memory
Write data
16 Sign 32 extend
Instruction
Chapter 3: Pipelining
sw: Writeback (WB): do nothing Writeback
ID/EX EX/MEM
Add Add result
ALU ALU result
Shift left 2
Instruction memory
Read register 1
Read data1
register 2
Write Registers Read
register data 2
Write data
Data memory
Write data
16 Sign 32 extend
Instruction
Chapter 3: Pipelining
Corrected Datapath (for lw)
ID/EX EX/MEM
Add Add result
ALU ALU result
Shift left 2
Instruction memory
Read register 1
Read register 2
Write Registers Read
register data 2
Write data
Read data 1
Data memory
Write data
16 Sign 32 extend
Instruction
Chapter 3: Example
add $14, $5, $6
lw $13, 24($1)
add $12, $3, $4
sub $11, $2, $3
lw $10, 20($1)
Add Add result
ALU ALU result
Shift left 2
Instruction memory
Read register 1
Read data1
register 2
Write Registers Read
register data 2
Write data
Data memory
Write data
Sign 32 extend
Instruction
Chapter 3: Pipelining
Pipeline Control
Note that in this implementation, the branch is resolved in the MEM stage
IF/ID ID/EX
Add Add result
Shift left 2
ALU ALU result
Read register 1
register 2
Write Registers Read
register data 2
Write data
Read data 1
ALU control
Data memory
Write data
Instruction [15– 11]
Instruction memory
Instruction
[15–0] 16 Sign 32
Instruction [20– 16]
Instruction
Chapter 3: Pipelining
Pipeline Control
• What needs to be controlled in each stage (IF, ID, EX, MEM, WB)?
m IF: Instruction fetch and PC increment
m ID: Instruction decode and operand fetch from register file and/or
m EX: Execution stage
n ALUop[1:0] n ALUSrc
m MA: Memory stage n Branch
n MemRead n MemWrite
m WB: Writeback n MemtoReg
n RegWrite (note that this signal is in ID stage)
Chapter 3: Pipelining
Pipeline Control
p Extend pipeline registers to include control information created in ID stage p Pass control signals along just like the data
Instruction
Execution/Address Calculation stage control lines
Instruction
Memory access stage control lines
Write-back stage control lines
Mem to Reg
ID/EX EX/MEM MEM/WB
Chapter 3: with Control
Control M WB
Read register 1
register 2
Write Registers Read
register data 2
Write data
Shift left 2
Add Add result
Read data 1
Instruction 16 32 [15– 0] Sign
Instruction [20– 16]
ALU ALU result
Write data
ALU control
Instruction [15– 11]
Instruction memory
Instruction
Chapter 3: with Control
IF: lw $10, 9($1)
Control M WB
Shift left 2
Add Add result
Read register 1
register 2
Write Registers Read
register data 2
Write data
Read data 1
Instruction 16 32 [15– 0] Sign
Instruction [20– 16]
ALU ALU result
Write data
ALU control
Instruction [15– 11]
Instruction memory
Instruction
Chapter 3: with Control
IF: sub $11, $2, $3
ID: lw $10, 9($1)
Control M WB
Shift left 2
Add Add result
Read register 1
register 2
Write Registers Read
register data 2
Write data
Instruction 16 32 [15– 0] Sign
Instruction [20– 16]
Read data 1
Write data
ALU ALU result
ALU control
Instruction [15– 11]
Instruction memory
Instruction
Chapter 3: with Control
IF: and $12, $4, $5
ID: sub $11, $2, $3
EX: lw $10, 9($1)
Shift left 2
Add Add result
Read register 1
register 2
Write Registers Read
register data 2
Write data
Instruction 16 32 [15– 0] Sign
Instruction [20– 16]
Read data 1
Write data
ALU ALU result
ALU control
Instruction [15– 11]
Instruction memory
Instruction
Chapter 3: with Control
IF: or $13, $6, $7
ID: and $12, $4, $5
EX: sub $11, $2, $3
MEM: lw $10, 9($1)
Control M 1 WB 0
Shift left 2
Add Add result
Read register 1
register 2
Write Registers Read
register data 2
Write data
Instruction 16 32 [15– 0] Sign
Instruction [20– 16]
Read data 1
Write data
ALU ALU result
ALU control
Instruction [15– 11]
Instruction memory
Instruction
Chapter 3: with Control
IF: add $14, $8, $9
ID: or $13, $6, $7
EX: and $12, $4, $5
MEM: sub $11, ..
WB: lw $10, 9($1)
MEM/WB 1 WB
10WB 10 000 000
Control M 1 1100EX 10
Shift left 2
Add Add result
Read register 1
register 2
Write Registers Read
register data 2
Write data
Instruction 16 32 [15– 0] Sign
Instruction [20– 16]
Read data 1
Write data
ALU ALU result
ALU control
Instruction [15– 11]
Instruction memory
Instruction
Chapter 3: with Control
ID:add$14,$8,$9
EX:or$13,$6,$7
MEM:and$12… WB:sub$11,..
MEM/WB 1 WB
Control M 1 WB 0
Shift left 2
Add Add result
Read register 1
register 2
Write Registers Read
register data 2
Write data
Read data 1
Instruction 16 32 [15– 0] Sign
Instruction [20– 16]
ALU ALU result
Write data
ALU control
Instruction [15– 11]
Instruction memory
Instruction
Chapter 3: with Control
EX: add $14, $8, $9
MEM: or $13, ..
WB: and $12…
MEM/WB 1 WB
M 000 WB 10 10
EX 10 M 00
Shift left 2
Add Add result
Read register 1
register 2
Write Registers Read
register data 2
Write data
Read data 1
Instruction 16 32 [15– 0] Sign
Instruction [20– 16]
ALU ALU result
Write data
ALU control
Instruction [15– 11]
Instruction memory
Instruction
Chapter 3: with Control
MEM: add $14, ..
WB: or $13…
MEM/WB 1 WB
Shift left 2
Add Add result
Read register 1
register 2
Write Registers Read
register data 2
Write data
Read data 1
Instruction 16 32 [15– 0] Sign
Instruction [20– 16]
ALU ALU result
Write data
ALU control
Instruction [15– 11]
Instruction memory
Instruction
Chapter 3: with Control
WB: add $14..
MEM/WB 1 WB
Control M WB
Shift left 2
Add Add result
Read register 1
register 2
Write Registers Read
register data 2
Write data
Read data 1
Instruction 16 32 [15– 0] Sign
Instruction [20– 16]
ALU ALU result
Write data
ALU control
Instruction [15– 11]
Instruction memory
Instruction
Chapter 3: Pipelining
pIntroduction to Pipelining pHow Pipeline is Implemented pC.3 Pipeline Hazards
p Exceptions
pHandling Multicycle Operations
Chapter 3: Pipelining
p It would be happy if we split the datapath into stages and the CPU works just fine
m But, things are not that simple as you may expect m There are hazards!
p Hazard is a situation that prevents starting the next instruction in the next cycle
m Structure hazard
n Conflict over the use of a resource at the same time
m Data hazard
n Data is not ready for the subsequent dependent instruction
m Control hazard
n Fetching the next instruction depends on the previous branch outcome
Chapter 3: Pipelining
Structure Hazards
p Structural hazard is a conflict over the use of a resource at the same time
p Suppose the MIPS CPU with a single memory
m Load/store requires data access in MEM stage
m Instruction fetch requires instruction access from the same memory n Instructionfetchwouldhavetostallforthatcycle
n Wouldcauseapipeline“bubble”
p Hence, pipelined datapaths require either separate ports to memory or separate memories for instruction and data
Address Bus Data Bus
Address Bus
Data Bus Address Bus
Chapter 3: Pipelining
2 4 6 8 10
Structure Hazards (Cont.)
Either provide separate ports to access memory or to provide instruction memory and data memory separately
Chapter 3: Pipelining
Data Hazards
Data is not ready for the subsequent dependent instruction
add $s0,$t0,$t1 EX sub $t2,$s0,$t3
• Tosolvethedatahazardproblem,thepipelineneedstobe stalled (typically referred to as “bubble”)
• Then, the performance is penalized
• Abettersolution?
• Forwarding (or Bypassing)
Bubble Bubble
Chapter 3: Pipelining
Forwarding
add $s0,$t0,$t1 IF
sub $t2,$s0,$t3
EX MEM Bubble
Chapter 3: Pipelining
Data Hazard – Load-Use Case
p Can’t always avoid stalls by forwarding m Can’t forward backward in time!
p Hardware interlock is needed for the pipeline stall
lw $s0, 8($t1)
sub $t2,$s0,$t3
• This bubble can be hidden by proper instruction scheduling
Chapter 3: Pipelining
Code Scheduling to Avoid Stalls
p Reorder code to avoid use of load result in the next instruction
A = B + E; // Bis loaded to$t1, E is loaded to$t2 C=B+F; // Fisloadedto$t4
lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com