CO101
Principle of Computer
Organization
Lecture 13: Pipelined MIPS
Processor 3
Liang Yanyan
澳門科技大學
Macau of University of Science and Technology
Pipelining: ideal example
• Ideal code segment.
• Start a new instruction every clock cycle.
• Complete one instruction every clock cycle.
• No dependency between instructions.
• Each instruction read/write distinct registers.
• Data required by current instruction is not produced by previous
instruction.
• Pipelined design can be generated easily.
2
Clock cycle
1 2 3 4 5 6 7 8 9
lw $8,4($29) IF ID EX MEM WB
sub $2,$4,$5 IF ID EX MEM WB
and $9,$10,$11 IF ID EX MEM WB
or $16,$17,$18 IF ID EX MEM WB
add $13,$14,$0 IF ID EX MEM WB
However
• There are many dependencies between instructions.
• Read after write dependency.
• E.g. instruction “and, or, add, sw” require (read) data $2, which
will be produced (written) by instruction “sub”.
• It is fine for single cycle processor.
• As an instruction will not start until the previous finish completely.
• What happen for a pipelined processor?
3
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Instruction execution
• SW is fine
• When does SW read the data of $2?
• $2 has been written back to register when the ID stage of SW
starts.
4
IF ID EX MEM WB
1 2 3 4 5 6 7 8 9
Clock cycle
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
vagrant
However
• “and, or” instructions.
• $2 is not ready when they start reading $2.
• An old value of $2 will be read in this circumstance.
• Data hazard
• An instruction attempts to use data before it is ready.
• Back (red) arrows in time designate data hazard.
5
IF ID EX MEM WB
1 2 3 4 5 6 7 8 9
Clock cycle
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
How about
“add”?
Solutions for solving data hazard
• Pipeline stall
• Delay the execution of instructions such that they can read the
correct data.
• May reduce performance.
• We will address this later.
• Forwarding
• Forward the data from one stage to another stage that request
the data. In other word, the data bypass the pipeline register.
• E.g. forward the $2 from ALU output of “sub” to “and, or”.
• Question: How?
• Explained in the following slides
6
Observing the example
• When is the result ($2) produced?
• When is the result ($2) consumed?
7
IF ID EX MEM WB
1 2 3 4 5 6 7 8 9Clock cycle
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
Observing the example
• When is the result ($2) produced? EX of “sub”
• When is the result ($2) consumed? EX of “and, or”
8
IF ID EX MEM WB
1 2 3 4 5 6 7 8 9Clock cycle
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
Forwarding
• Forward the result from the ALU output of “sub” to “and”
instruction.
• Bypass the pipeline register and write back stage.
9
IF ID EX MEM WB
1 2 3 4 5 6 7 8 9Clock cycle
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Forwarding
• Since at the beginning of clock cycle 4, pipeline register has already
contained the result of ALU ($2).
• At clock 4: forward ALU result from EX/MEM register to ALU input.
• At clock 5: forward ALU result from MEM/WB register to ALU input.
• As ALU result ($2) is now propagated to MEM/WB register.
10
1 2 3 4 5 6 7Clock cycle
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
Reg DM Reg
A
L
UIM
Reg DM Reg
A
L
UIM
Reg DM Reg
A
L
UIM
Forwarding hardware
• Add a control unit (forwarding unit) to select the correct ALU value
for the EX stage.
• If there is no hazard, the unit selects ALU’s operands from register file,
just like before.
• If there is a hazard, the unit selects operands from either the EX/MEM
or MEM/WB pipeline registers.
• As ALU has two inputs, add two new multiplexers to select the ALU
operands.
11
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
Reg DM Reg
A
L
UIM
Reg DM Reg
A
L
UIM
Reg DM Reg
A
L
UIM
Normal flow of ALU result
• The ALU result generated at the EX stage is normally
passed through the pipeline registers to the MEM and
WB stages, before it is finally written to the register file.
12
P
C Addr
Inst. Mem.
Read Reg #1
Reg #1
Read Reg #2
Write Reg
Reg #2
Write Data
Register File
Addr
Read Data
Write Data
Data Mem
0
1
ALU
0
1Inst[15-11]
Inst[20-16]
IF/ID ID/EX EX/MEM MEM/WB
Simplified datapath with two new multiplexers
13
P
C Addr
Inst. Mem.
Read Reg #1
Reg #1
Read Reg #2
Write Reg
Reg #2
Write Data
Register File
Addr
Read Data
Write Data
Data Mem
0
1ALU
0
1Inst[15-11]
Inst[20-16]
IF/ID ID/EX EX/MEM MEM/WB
0
2
1
0
2
1
ForwardA
ForwardB
Next step: add a forwarding unit (control unit) to
control the selection of the two multiplexers.
Question: how can the hardware know that there is
data hazard? → how to detect data hazard?
Forward Unit Output Signals
14
How to detect data hazard?
• 1. EX Forward Unit:
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd != 0)
and (EX/MEM.RegisterRd == ID/EX.RegisterRs))
ForwardA = 10
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd != 0)
and (EX/MEM.RegisterRd == ID/EX.RegisterRt))
ForwardB = 10
• 2. MEM Forward Unit:
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd != 0)
and (MEM/WB.RegisterRd == ID/EX.RegisterRs))
ForwardA = 01
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd != 0)
and (MEM/WB.RegisterRd == ID/EX.RegisterRt))
ForwardB = 01
15
EX/MEM hazard
• An EX/MEM hazard occurs between two adjacent
instructions and:
• The first instruction will write data to register file.
• The first instruction’s destination is the same as one of the
second instruction’s read operands, e.g. register $2 here.
16
Reg DM Reg
A
L
UIM
Reg DM Reg
A
L
UIM
sub $2, $1, $3
and $12, $2, $5
MEM/WB hazard
• Occur between current instruction and the instruction two
stages ahead.
17
1 2 3 4 5 6 7Clock cycle
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
Reg DM Reg
A
L
UIM
Reg DM Reg
A
L
UIM
Reg DM Reg
A
L
UIM
How to detect data hazard?
18
P
C Addr
Inst. Mem.
Read Reg #1
Reg #1
Read Reg #2
Write Reg
Reg #2
Write Data
Register File
Addr
Read Data
Write Data
Data Mem
0
1ALU
0
1Inst[15-11]
Inst[20-16]
IF/ID ID/EX EX/MEM MEM/WB
0
2
1
0
2
1
ForwardA
ForwardB
and $12, $2, $5 sub $2, $1, $3
EX/MEM.RegWr
EX/MEM.RegRd
ID/EX.RegRt
ID/EX.RegRd
ID/EX.RegRs
Datapath with forwarding unit (control unit)
19
P
C Addr
Inst. Mem.
Read Reg #1
Reg #1
Read Reg #2
Write Reg
Reg #2
Write Data
Register File
Addr
Read Data
Write Data
Data Mem
0
1ALU
0
1RD
RT
IF/ID ID/EX EX/MEM MEM/WB
0
2
1
0
2
1
ForwardA
ForwardB
Forwarding
Unit
RS
ID/EX.RegRT
ID/EX.RegRS
EX/MEM.RegRD
MEM/WB.RegRD
EX/MEM.RegWr
MEM/WB.RegWr
Exception: not MEM/WB hazard, it is EX/MEM hazard
• One new problem is if the same register is updated twice in previous
two instructions.
add $1, $2, $3
add $1, $1, $4
sub $5, $5, $1
• Register $1 is written by both of the previous two instructions: which
instruction will actually produce the result for SUB?
• Only the most recent result (from the second ADD) should be forwarded.
20
DMReg RegIM
DMReg RegIM
DMReg RegIM
add $1, $2, $3
add $1, $1, $4
sub $5, $5, $1
Forwarding example:
• Assume that each register contains a value which is its
number plus 100. For instance, register $2 contains 102,
register $6 contains 106, and so forth.
sub $2, $1, $3
add $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
21
Cycle 3
22
Cycle 4: forwarding $2 from EX/MEM
23
Cycle 5: forwarding $2 from MEM/WB
24
Forwarding for SW
25
CASE 1:
CASE 2:
DMReg RegIM
DMReg RegIM
add $1, $2, $3
sw $1, 0($4)
DMReg RegIM
DMReg RegIM
add $1, $2, $3
sw $4, 0($1)
1 2 3 4 5 6
1 2 3 4 5 6
CASE 1: forward to R[rs]
26
P
C Addr
Inst. Mem.
Read Reg #1
Reg #1
Read Reg #2
Write Reg
Reg #2
Write Data
Register File
Addr
Read Data
Write Data
Data Mem
0
1ALU
0
1RD
RT
0
2
1
0
2
1
ForwardA
ForwardB
Forwarding
Unit
RS
Sign
Extend
0
1
IF/ID ID/EX EX/MEM MEM/WB
EX: sw $4, 0($1) MEM: add $1, $2, $3
1
1
2
CASE 2: forward to R[rt]
27
P
C Addr
Inst. Mem.
Read Reg #1
Reg #1
Read Reg #2
Write Reg
Reg #2
Write Data
Register File
Addr
Read Data
Write Data
Data Mem
0
1ALU
0
1RD
RT
0
2
1
0
2
1
ForwardA
ForwardB
Forwarding
Unit
RS
Sign
Extend
0
1
IF/ID ID/EX EX/MEM MEM/WB
EX: sw $1, 0($4) MEM: add $1, $2, $3
11
2
CASE 3
28
DMReg RegIM
DMReg RegIM
lw $1, 0($2)
sw $1, 0($4)
1 2 3 4 5 6
Can the datapath handle case 3?
29
P
C Addr
Inst. Mem.
Read Reg #1
Reg #1
Read Reg #2
Write Reg
Reg #2
Write Data
Register File
Addr.
R. Data
W. Data
Data Mem
0
1ALU
0
1RD
RT
0
2
1
0
2
1
ForwardA
ForwardB
Forwarding
Unit
RS
Sign
Extend
0
1
IF/ID ID/EX EX/MEM MEM/WB
lw $1, 0($2)
sw $1, 0($4)
Modify the datapath
30
P
C Addr
Inst. Mem.
Read Reg #1
Reg #1
Read Reg #2
Write Reg
Reg #2
Write Data
Register File
Addr.
R. Data
W. Data
Data Mem
0
1ALU
0
1RD
RT
0
2
1
0
2
1
ForwardA
ForwardB
Forwardig
Unit
RS
Sign
Extend
0
1
IF/ID ID/EX EX/MEM MEM/WB
0
1
EX/MEM.MemWr
lw $1, 0($2)
sw $1, 0($4)
lw $1, 0($2)sw $1, 0($4)
$1
Handle case 3
31
lw $1,0($2)
A
LUIM Reg DM Reg
sw $1,0($4)
A
LUIM Reg DM Reg
Can forwarding resolve this example?
• $2 is only available at the beginning of cycle 5.
• $2 is needed by “and” at the beginning of cycle 4.
• Forwarding doesn’t work.
• Data is not available when another instruction needs it.
• Forwarding: data is already available in one of the pipeline
stages when another instruction needs it.
• “True” data hazard.
32
DMReg RegIM
DMReg RegIM
lw $2, 20($3)
and $12, $2, $5
Clock cycle
1 2 3 4 5 6
Resolve true data hazard
• By inserting NOP instructions at compile time (software approach).
• NOP is a dummy instruction doing nothing.
• A NOP can be: or $0, $0, $0
• Doesn’t change the value of any register and memory.
33
DMReg RegIM
RegIM
IM
lw $2, 20($3)
or $0, $0, $0
and $12, $2, $5
or $13, $12, $2
DMReg RegIM
DMReg Reg
Clock cycle
1 2 3 4 5 6 7 8
DM Reg
Resolve true data hazard
• By stall the pipeline + forwarding at run time (hardware approach).
• Delay the execution of instruction (adding delay slot).
• E.g. delay the EX stage of “and” by one cycle, and use forwarding to pass data from
MEM/WB register to ALU input of “and”.
• If there is no forwarding, we need to stall two cycles (adding two delay slots).
34
DMReg RegIM
DMReg RegIM
lw $2, 20($3)
and $12, $2, $5
Clock cycle
1 2 3 4 5 6 7
DMReg RegIM
DMReg RegIM
lw $2, 20($3)
and $12, $2, $5
Clock cycle
1 2 3 4 5 6 7 8
Stall
• If we stall one instruction, we need to stall the
subsequent instructions too.
• Prevent structural hazard.
• Can use stall to resolve any hazard
• But reduce performance.
35
DMReg RegIM
DMReg RegIM
DMReg RegIM
lw $2, 20($3)
and $12, $2, $5
or $13, $12, $2
Clock cycle
1 2 3 4 5 6 7 8
Adding hazard detection (stall detection)
36
PC Write: update PC or not. Used to
freeze PC.
IF/ID Write: update IF/ID register or
not. Used to freeze IF/ID register.
Hazard
Unit
P
C Addr
Inst. Mem.
Read Reg #1
Reg #1
Read Reg #2
Write Reg
Reg #2
Write Data
Register File
Addr
Read Data
Write Data
Data Mem
0
1ALU
0
1RD
RT
IF/ID
ID/EX
EX/MEM
MEM/WB
0
1
2
0
1
2
ForwardA
ForwardB
Forwarding
Unit
RS
Sign
Extend
0
1
Control
W
M
E
W
M W
0
1
ID/EX.RegRT
ID/EX.MemRead
RTRS
PC Write
IF/ID Write
0
Hazard detection unit (stall detection)
• Inputs of the hazard detection unit.
• IF/ID.RegRs and IF/ID.RegRt, the source registers for the
second instruction.
• ID/EX.MemRead and ID/EX.RegRt, to determine if the first
instruction is LW and, if so, to which register it will write.
• By inspecting these values, the detection unit generates
three outputs.
• Two new control signals PCWrite and IF/ID Write, which
determine whether the pipeline stalls (Freeze PC and IF/ID) or
continues.
• A mux select, which forces control signals of the EX and future
stages to 0 in case of a stall (NOP insertion).
37
Example: cycle 3
38
or $4, $4, $2 and $4, $2, $5 lw $2, 20($1)
lw $2, 20($1)
and $4, $2, $5
or $4, $4, $2
0
Control signal 0 means: don’t
write any register and memory.
Example: cycle 4
39
or $4, $4, $2 and $4, $2, $5 NOP
lw $2, 20($1)
and $4, $2, $5
or $4, $4, $2
0
Control signal 0 means: don’t
write any register and memory.
lw $2, 20($1)
0
Summary
• There is dependency between instruction.
• Attempt to read data before it is ready → data hazard.
• Three kinds of hazards.
• Data hazard: attempts to use data before it is ready.
• E.g. instruction depends on result of previous instructions.
• Solution:
• insert NOP at compile time; forwarding;
• stall for true hazard.
40