CO101
Principle of Computer
Organization
Lecture 10: Multi-Cycle
Processor 1
Liang Yanyan
澳門科技大學
Macau of University of Science and Technology
What’s wrong with our CPI=1 processor?
2
• Long Cycle Time.
• Real memory is not so nice as our idealized memory.
• cannot always get the job done in one (short) cycle.
PC Inst Memory mux ALU Data Mem mux
PC Reg FileInst Memory mux ALU mux
PC Inst Memory mux ALU Data Mem
PC Inst Memory cmp mux
Reg File
Reg File
Reg File
Arithmetic & Logical
Load
Store
Branch
Critical Path
setup
setup
Reducing Cycle Time
• Cut combinational dependency graph and insert register.
• Do same work in two short cycles, rather than one long cycle.
3
storage element
Datapath
storage element
storage element
Datapath (A)
storage element
storage element
Datapath (B)
=>
One long cycle Two short cycles
1st cycle
2nd cycle
Review: A Single Cycle Datapath
4
Read
Address
Instr[31-0]
Instruction
Memory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
zero
RegWrite
Data
Memory
Address
Write Data
Read Data
MemWrite
MemRead
Sign
Extend16 32
MemtoReg
ALUsrc
Shift
left 2
Add
PCsrc
RegDst
1
1
1
0
0
0
0
1
Instr[15-0]
Instr[25-21]
Instr[20-16]
Instr[15-11]
Shift
left 2
0
1
32
Instr[25-0]
26
PC+4[31-28]
28
Jump
ALUctr
Partition the CPI=1 Datapath
• Add registers between partitions.
5
P
C
N
e
x
t
P
C
O
p
e
ra
n
d
F
e
tc
h
Exec R
e
g
.
F
ile
M
e
m
A
c
c
e
s
s
D
a
ta
M
e
mIn
s
tr
u
c
ti
o
n
F
e
tc
h
R
e
s
u
lt
S
to
re
A
L
U
ct
r
R
eg
D
st
A
L
U
S
rc
E
x
tO
p
M
em
W
r
n
P
C
_
se
l
R
eg
W
r
M
em
W
r
M
em
R
d
Partition the CPI=1 Datapath
• Instruction memory and data memory are combined into
one memory.
6
PC Address
Memory
Data
Instruction
or Data
Instruction
register
Memory
data
register
Data
Register #
Register #
Register #
Registers
A
B
ALU
ALUout
load PC Load
instruction
or data
Load data
from register
or store data
to register
Execute
register
Write
result
A more detailed view
• Instruction register is combined with instruction decode.
7
Shift
left 2
P
Memory
MemData
Write
data
M
u
x
0
1
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
M
u
x
0
1
M
u
x
0
1
4
Instruction
[15– 0]
Sign
extend
3216
Instruction
[25– 21]
Instruction
[20– 16]
Instruction
[15– 0]
Instruction
decode
1 M
u
x
0
3
2
M
u
x
Instruction
[15– 11]
Zero
0
1
Address
ALU
result
ALU
ALUout
C
Memory
data
register
A
B
How does it work?
• R type instruction: R[rd] <= R[rs] op R[rt] • Five cycles: • Instruction fetch • Instruction decode, read R[rs] and R[rt] • Execute instruction: R[rs] op R[rt] • Write result back to register R[rd] • Update PC: PC = PC + 4 8 R type: 1st cycle • Instruction fetch 9 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B R type: 2nd cycle • Instruction decode: read R[rs] and R[rt] 10 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B R type: 3rd cycle • Execute instruction 11 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B R type: 4th cycle • Write result back to register R[rd] 12 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B R type: 5th cycle • Update program counter: PC = PC + 4 • Instruction at (PC + 4) will be fetched next time 13 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B How does it work? • I type instruction: R[rt] <= R[rs] op Imm • Five cycles: • Instruction fetch • Instruction decode, read R[rs] • Execute instruction: R[rs] op Imm • Write result back to register R[rt] • Update PC: PC = PC + 4 14 I type: 1st cycle • Instruction fetch 15 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address L ALU result ALU ALUout C Memory data register A B I type: 2nd cycle • Instruction decode: read R[rs], decode Imm 16 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B I type: 3rd cycle • Execute instruction: R[rs] op Imm 17 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B I type: 4th cycle • Write result back to register R[rt] 18 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B I type: 5th cycle • Update program counter: PC = PC + 4 • Instruction at (PC + 4) will be fetch next time 19 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B How does it work? • Load word: lw R[rt], Imm[R[rs]] • Major cycles: • Instruction fetch • Instruction decode: read R[rs] • Calculate memory address: R[rs] + Imm • Load data from memory address: R[rs] + Imm • Write memory data to register R[rt] • Update PC: PC = PC + 4 20 Calculate memory address • address = R[rs] + Imm 21 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B Load data from memory 22 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B Write memory data to R[rt] 23 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register How does it work? • Store word: sw R[rt], Imm[R[rs]] • Major cycles: • Instruction fetch • Instruction decode: read R[rs] • Calculate memory address: R[rs] + Imm • Store data from R[rt] to memory • Update PC: PC = PC + 4 24 Calculate memory address • address = R[rs] + Imm • ALUout is used to store the calculated address which will be used in the store stage later. • Read R[rt] 25 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B Store data from register to memory 26 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B How does it work? • Branch instruction: beq R[rt], R[rs], Imm • Three major cycles • Instruction fetch • Instruction decode: read R[rs] and R[rt] • Calculate (PC + 4) • Calculate (PC + 4 + Imm × 4) • Calculate branch condition: e.g. R[rt] == R[rs] ?? • Update program counter as (PC + 4 + Imm × 4) if branch condition is true 27 Calculate PC+4 • Add a multiplexer • Calculate PC + 4, and write (PC+4) to register PC 28 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register M U X A B PC+4 Calculate branch address • Branch address = PC + 4 + Imm × 4 • Don’t write branch address to register PC, write branch address to register ALUout 29 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register M U X A B PC+4 PC+4+Imm×4 Calculate branch condition • Calculate R[rs] – R[rt] • Update register PC based on zero flag 30 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register M U X A B PC+4+Imm×4 PC+4 PC+4+Imm×4 Select (PC+4+Imm×4) if zero==1.Update PC if zero==1. Otherwise, PC remains unchanged. How does it work? • Jump instruction: J Label • One major cycle • Instruction fetch • Instruction decode • Calculate target address (2 steps) 31 Calculate (PC + 4) • Extend the multiplexer to 3 inputs • Add a shifter 32 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register M U X A B Inst[25-0] Shift left 2 PC[31-28] PC+4 PC+4 PC Jump address: (upper 4 bits of (PC + 4) : Inst[25-0]) << 2 Calculate target address • Calculate Jump address: (upper 4 bits of (PC + 4) : Inst[25-0]) << 2 33 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register M U X A B Inst[25-0] Shift left 2 PC+4[31-28] PC+4 Jump addr Jump addr Select jump address if it is jump instruction. Multi-Cycle datapath with control signals • Minimum Hardware: 1 memory, 1 ALU (instruction and data memories are combined, no adder to calculate branch address) 34 Designing control logic 35 Steps to execute instructions: R-type 36 • R-type, e.g. add $t0, $t1, $t2 • Instruction fetch • Instruction decode, read R[rs] and R[rt] • Execute instruction: R[rs] op R[rt] • Write result back to register R[rd] • Update PC: PC = PC + 4 • Can we design a finite state machine to represent these states? And generate different control signals at different states to control the operation of the datapath? • E.g. design a FSM with 5 states: IF, ID, EX, WB, PC. Each state outputs appropriate control signals to control the datapath to finish adequate operation in a certain state. Instruction fetch Instruction decode Execute Write back result MemWr = .. ALUop = .. ALUSelA = .. RegWr = … FSM Control signals Updata PC Example: control signals for ($t1 + $t2) at execute state • ALUOp = add, ALUSrcA = 1, ALUSrcB = 0, MemtoReg = x, RegWrite = 0 result Example: control signals for ($t1 + $t2) at write back state • ALUOp = x, ALUSrcA = x, ALUSrcB = x, MemtoReg = 0, RegWrite = 1 result Steps involved in instructions: LW • Similarly, we can design a FSM for LW instruction. • Load word • Instruction fetch • Instruction decode: read R[rs] • Calculate memory address: R[rs] + Imm • Load data from memory address: R[rs] + Imm • Write memory data to register R[rt] • Update PC: PC = PC + 4 • We can design FSMs for other instructions. Instruction fetch Instruction decode Calculate address Load data from mem IorD = .. ALUop = .. ALUSelA = .. RegWr = … FSM Control signals Write data to register Updata PC Control FSM for the multi-cycle processor Instruction fetch Instruction decodeFSM for R-type FSM for I-type FSM for LW FSM for SW FSM for branch • Combine all FSMs together. The first two states of all FSMs are the same: IF, ID. Control FSM for the multi-cycle processor: a detailed view 0 Start Instruction fetch MemRead=1 ALUSrcA=0 IorD=0 IRWrite=1 ALUSrcB=01 ALUOp=00 PCWrite=1 PCSource=00 1 Instruction decode/ register fetch ALUSrcA=0 ALUSrcB=11 ALUOp=00 7 R-type completion RegDst=1 RegWrite = 1 MemtoReg=0 4 Write-back step RegDst=0 RegWrite=1 MemtoReg=1 5 Memory access MemWrite=1 IorD=1 2 Memory address computation ALUSrcA=1 ALUSrcB=10 ALUOp=00 6 Execution ALUSrcA=1 ALUSrcB=00 ALUOp=10 8 Branch completion ALUSrcA=1 ALUSrcB=00 ALUOp=01ite PCSource=01 3 MemRead=1 IorD=1 Memory access (O p = ‘ L W ’) 9 Jump completion PCWrite=1 PCSource=10 Implement the FSM in hardware • State changes depends on opcode and current state. PCWrite ALUSrcB MemtoReg MemRead Outputs Inputs Control Logic PCWriteCon ALUOp IorD RegWrite RegDst NS3 NS2 NS1 NS0 S 3 S 2 S 1 S 0 State register O p 5 O p 4 O p 3 O p 2 O p 1 O p 0 Instruction Register Opcode field MemWrite IRWrite PCSource ALUSrcA Single cycle vs Multi-cycle processor • Example • A single cycle processor has a cycle time of 800ps • A multi-cycle processor: CPIR-type=4, CPIlw=5, CPIbeq=3. Cycle time = 200ps • A program has 40% R-type, 20% lw, 40% beq. Totally 10000 inst. • Single processor time = 10000 × 800 × 1 = 8M ps • Average CPI for multi-cycle processor = 4×40%+5×20%+3×40%=3.8 • Multi-cycle processor time = 10000 × 200 × 3.8 = 7.6M ps • Multi-cycle processor is better in this example, how about 40% of lw, and 20% of beq? • 4×40%+5×40%+3×20%=4.2 • 10000 × 200 × 4.2 = 8.4M ps • Which processor is better? • Depends on the instruction mix of programs → program dependent. • Execution time = Instruction count × CPI × cycle time 43 Summary • Two components of processors. • Datapath and control. • Design multi-cycle processor. • Partition datapath of single cycle processor (break an instruction execution into multiple steps). • Add registers into the partitioned datapath. • Design FSM to control the datapath. • Execute each step in one cycle, require multiple cycles to execute one instruction. • Advantages and disadvantages • Single cycle: CPI = 1, long cycle time. • Multi-cycle: shorter cycle time as each cycle just execute one step, different instructions have different CPI. As a result, simple instructions require less cycles (less time) to execute. • Performance comparison • Program dependent. • Execution time = Instruction count × CPI × cycle time 44 Control Datapath Memory Processor Input Output