CS2305: Computer Architecture
Pipelining
(Computer Architecture: Chapter 3 & Appendix C)
Copyright By PowCoder代写 加微信 powcoder
Department of Computer Science and Engineering
Shanghai University
Chapter 3: Pipelining
Control Hazard
p Branch determines the flow of instructions
p Fetching the next instruction depends on the branch outcome
m Pipeline can’t always fetch correct instruction
m Branch instruction is still working on ID stage when fetching the next instruction
Taken target address is known here
Branch is resolved here
beq $1,$2,L1 add $1,$2,$3
sw $1, 4($2)
L1: sub $1,$2, $3
Fetch the next instruction based on the comparison result
Chapter 3: Pipelining
Reducing Control Hazard
p To reduce 2 bubbles to 1 bubble, add hardware in ID stage to compare registers (and generate branch condition)
Branch is resolved here
Taken target address is known here
beq $1,$2,L1 add $1,$2,$3
L1: sub $1,$2, $3
Fetch instruction based on the comparison result
Chapter 3: Pipelining
Delayed Branch
p Many CPUs adopt a technique called the delayed branch to further reduce the stall
m Delayed branch always executes the next sequential instruction n The branch takes place after that one instruction delay
m Delay slot is the slot right after a delayed branch instruction
Taken target address is known here
beq $1,$2,L1
add $1,$2,$3 (delay slot)
L1: sub $1,$2, $3
Branch is resolved here
Fetch instruction based on the comparison result
Chapter 3: Pipelining
Delay Slot (Cont.)
p Compiler needs to schedule a useful instruction in the delay slot, or fills it up with nop (no operation)
// $s1 = a, $s2 = b, $3 = c //$t0=d, $t1=f
a = b + c;
if (d == 0) {f = f + 1;}
f = f + 2;
add $s1,$s2, $s3 bne $t0,$zero, L1 nop //delay slot addi $t1, $t1, 1
L1: addi $t1, $t1, 2
Can we do better?
bne $t0, $zero, L1
add $s1,$s2,$s3 // delay slot addi $t1, $t1, 1
L1: addi $t1, $t1, 2
Fill the delay slot with a useful and valid instruction
Chapter 3: Pipelining
Branch Prediction
p Longer pipelines (implemented in Core 2 Duo, for example) can’t readily determine branch outcome early
m Stall penalty becomes unacceptable since branch instructions are used so frequently in the program
p Solution: Branch Prediction
m Predict the branch outcome in hardware
m Flush the instructions (that shouldn’t have been executed) in the pipeline if the prediction turns out to be wrong
m Modern processors use sophisticated branch predictors
Chapter 3: Pipelining
MIPS with Predict-Not-Taken
Prediction
Flush the instruction that shouldn’t be executed
Prediction
Chapter 3: Pipelining
Alleviate Branch Hazards
p Reduce penalty to 1 cycle
m Move the branch compare to the ID stage of pipeline
m Add an adder to calculate the branch target in ID stage
m Add the IF.flush signal that zeros the instruction (or squash) in IF/ID pipeline register
Branch is resolved here
Taken target address is known here
beq $1,$2,L1 add $1,$2,$3
L1: sub $1,$2, $3
Bubble e IF
Chapter 3: Pipelining
Flushing Instructions
Data memory
Forwarding unit
Hazard detection unit
Shift left 2
Instruction memory
Chapter 3: Pipelining
Flushing Instructions (cycle N)
and $12, $2, $5
beq $1, $3, L2
and $12, $2, $5 or $13, $12, $1 …
lw $4, 40($7)
Hazard detection unit
Shift left 2
Data memory
Forwarding unit
Instruction memory
Chapter 3: Pipelining
Flushing Instructions (cycle N)
and $12, $2, $5
beq $1, $3, L2
and $12, $2, $5 or $13, $12, $1 …
lw $4, 40($7)
Hazard detection unit
Shift left 2
Forwarding unit
Data memory
Instruction memory
Chapter 3: Pipelining
Flushing Instructions (cycle N+1be)q $1, $3, L2
lw $4, 40($7)
beq $1, $3, L2
and $12, $2, $5 or $13, $12, $1 …
lw $4, 40($7)
Hazard detection unit
Shift left 2
Data memory
Forwarding unit
Instruction memory
Chapter 3: Pipelining
pIntroduction to Pipelining pHow Pipeline is Implemented pPipeline Hazards
pC.4 Exceptions
pHandling Multicycle Operations
Chapter 3: Pipelining
Exceptions
pExceptions describe those situations where the normal execution order of instruction is changed!
mmay force the CPU to abort the instructions in the pipeline before they complete!
pSome other used terminologies for “exception”
mInterrupt mFault
pWe use exception!
Chapter 3: Pipelining
Types of Exceptions
p I/O device request
p Invoking an OS service
for a user program
p Tracing instruction execution
p Breakpoint (programmer requested interrupt)
p Integer arithmetic overflow
p FP arithmetic anomaly
p Page fault (not in main memory)
p Misaligned memory accesses
p Memory protection violation
p Using an undefined instruction
p Hardware malfunctions
p Power failure
Chapter 3: Pipelining
Requirements on Exceptions
Synchronous vs asynchronous
Within vs between instructions
User maskable vs user nonmaskable
Resume vs terminate
User requested vs coerced
Chapter 3: Pipelining
Classifications
Chapter 3: Pipelining
Exceptions
pExceptions describe those situations where the normal execution order of instruction is changed!
mmay force the CPU to abort the instructions in the pipeline before they complete!
pSome other used terminologies for “exception”
mInterrupt mFault
pWe use exception!
Chapter 3: Pipelining
Stopping and Restarting Execution
pThe most difficult exceptions have two properties
p(1) they occur within instructions (i.e., in the middle of the instruction execution corresponding to EX or MEM pipe stages
p(2) they must be restartable
Chapter 3: Pipelining
Steps to Save Pipeline State
p (1) Force a trap instruction into the pipeline on the next IF
p (2) Until the trap is taken, turn off all writes for the faulting instruction and for all instructions that follow in the pipeline
m This can be done by placing zeros into the pipeline latches of all instructions, starting with the instruction that generates the exception, but not those that precede that instruction
p (3) After the exception-handling routine in the OS receives control, it immediately saves the PC of the faulting instruction
m This value will be used to return from the exception later
Chapter 3: Pipelining
Precise vs Imprecise Exceptions
If the pipeline can be stopped so that the instructions just before the faulting instruction are completed and those after it can be restarted from scratch, the pipeline is said to have precise exceptions
p Supporting precise exceptions is a requirement in many systems
p Any processor with demand paging or IEEE arithmetic trap handlers must make its exceptions precise
Chapter 3: Pipelining
Exceptions in MIPS Pipeline
Exceptions may occur in different stages of a pipeline
Chapter 3: Pipelining
pIntroduction to Pipelining pHow Pipeline is Implemented pPipeline Hazards
p Exceptions
pC.5 Handling Multicycle Operations
Chapter 3: Pipelining
Supporting Multiple FP
Operations
Integer Unit
FP multiplier: 7 cycles
FP add: 4 cycles
FP divider (non-pipelined) 24 cycles
• Complicate bypass or forwarding
• Potential structural hazard
• Multiple (FP) instructions can complete at the same time
§ RFmightneedtobemulti-ported
§ Orderingissue,whogetstoupdatetheregister?
• Out-of-order completion/retirement: Precise exception issue
Modified from Prof Sean Lee’s Slide
Chapter 3: Pipelining
Bypassing & Forwarding
ClockCycles 1 2 3 4 5 6 7 8 9 101112131415161718 L.D F4,0(R2)
MUL.D F0,F4,F6 ADD.D F2,F0,F8 S.D F2,0(R2)
Chapter 3: Pipelining
Structural Hazards
ClockCycles 1234567891011 MUL.D F0,F4,F6
ADD.D F2,F4,F6
L.D F2,0(R2)
• Write to register file at the same cycle (cc11)
• Write to the same register (WAW)
• MEM in cc10
Chapter 3: Pipelining
Precise Exception Issue
DIV.D F0,F2,F4 (exception!)
ADD.D F3,F10,F8 (completed)
SUB.D F12,F12,F14 (completed)
p Precise exception: If the pipeline can (or must) be stopped m All the instructions before the faulty (or intended)
instruction must be completed
m All the instructions after it must not be completed
m Restart the execution from the faulty (or intended) instruction
p State must be consistent with the original program order p Not straightforward with out-of-order completion
Chapter 3: Pipelining
Scalar Pipeline (Baseline)
IF DE EX MEM WB 1
Modified from Prof Sean Lee’s Slide
Execution Cycle
Instruction Sequence
Chapter 3:
p Deeper pipelining is called superpipelining
p Deeper pipeline allows for achieving higher clock rates
IF DE EX MEM WB 1
Modified from Prof Sean Lee’s Slide
Execution Cycle
Instruction Sequence
CS2305: Computer Architecture
End of Pipelining
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com