程序代写代做代考 CO101

CO101
Principle of Computer

Organization

Lecture 10: Multi-Cycle
Processor 1

Liang Yanyan

澳門科技大學
Macau of University of Science and Technology

What’s wrong with our CPI=1 processor?

2

• Long Cycle Time.
• Real memory is not so nice as our idealized memory.

• cannot always get the job done in one (short) cycle.

PC Inst Memory mux ALU Data Mem mux

PC Reg FileInst Memory mux ALU mux

PC Inst Memory mux ALU Data Mem

PC Inst Memory cmp mux

Reg File

Reg File

Reg File

Arithmetic & Logical

Load

Store

Branch

Critical Path

setup

setup

Reducing Cycle Time
• Cut combinational dependency graph and insert register.

• Do same work in two short cycles, rather than one long cycle.

3

storage element

Datapath

storage element

storage element

Datapath (A)

storage element

storage element

Datapath (B)

=>

One long cycle Two short cycles

1st cycle

2nd cycle

Review: A Single Cycle Datapath

4

Read
Address

Instr[31-0]

Instruction
Memory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read
Data 1

Read
Data 2

ALU

zero

RegWrite

Data
Memory

Address

Write Data

Read Data

MemWrite

MemRead

Sign
Extend16 32

MemtoReg

ALUsrc

Shift

left 2

Add

PCsrc

RegDst

1

1

1

0

0
0

0

1

Instr[15-0]

Instr[25-21]

Instr[20-16]

Instr[15-11]

Shift

left 2

0

1
32

Instr[25-0]

26
PC+4[31-28]

28

Jump

ALUctr

Partition the CPI=1 Datapath

• Add registers between partitions.

5

P
C

N
e
x
t

P
C

O
p
e
ra

n
d

F
e
tc

h
Exec R

e
g
.

F
ile

M
e
m

A
c
c
e
s
s

D
a
ta

M
e
mIn

s
tr

u
c
ti
o
n

F
e
tc

h

R
e
s
u
lt
S

to
re

A
L
U
ct
r

R
eg
D
st

A
L
U
S
rc

E
x
tO

p

M
em

W
r

n
P
C
_
se
l

R
eg
W
r

M
em

W
r

M
em

R
d

Partition the CPI=1 Datapath

• Instruction memory and data memory are combined into
one memory.

6

PC Address

Memory
Data

Instruction
or Data

Instruction
register

Memory
data

register

Data

Register #

Register #

Register #

Registers

A

B

ALU
ALUout

load PC Load
instruction
or data

Load data
from register
or store data
to register

Execute

register

Write
result

A more detailed view

• Instruction register is combined with instruction decode.

7

Shift

left 2

P

Memory

MemData

Write
data

M
u
x

0

1

Registers
Write
register

Write
data

Read
data 1

Read
data 2

Read
register 1

Read
register 2

M
u
x

0

1

M
u
x

0

1

4

Instruction
[15– 0]

Sign

extend

3216

Instruction
[25– 21]

Instruction
[20– 16]

Instruction
[15– 0]

Instruction

decode
1 M

u
x

0

3

2

M
u
x

Instruction
[15– 11]

Zero

0

1

Address

ALU
result

ALU

ALUout

C

Memory

data

register

A

B

How does it work?

• R type instruction: R[rd] <= R[rs] op R[rt] • Five cycles: • Instruction fetch • Instruction decode, read R[rs] and R[rt] • Execute instruction: R[rs] op R[rt] • Write result back to register R[rd] • Update PC: PC = PC + 4 8 R type: 1st cycle • Instruction fetch 9 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B R type: 2nd cycle • Instruction decode: read R[rs] and R[rt] 10 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B R type: 3rd cycle • Execute instruction 11 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B R type: 4th cycle • Write result back to register R[rd] 12 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B R type: 5th cycle • Update program counter: PC = PC + 4 • Instruction at (PC + 4) will be fetched next time 13 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B How does it work? • I type instruction: R[rt] <= R[rs] op Imm • Five cycles: • Instruction fetch • Instruction decode, read R[rs] • Execute instruction: R[rs] op Imm • Write result back to register R[rt] • Update PC: PC = PC + 4 14 I type: 1st cycle • Instruction fetch 15 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address L ALU result ALU ALUout C Memory data register A B I type: 2nd cycle • Instruction decode: read R[rs], decode Imm 16 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B I type: 3rd cycle • Execute instruction: R[rs] op Imm 17 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B I type: 4th cycle • Write result back to register R[rt] 18 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B I type: 5th cycle • Update program counter: PC = PC + 4 • Instruction at (PC + 4) will be fetch next time 19 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B How does it work? • Load word: lw R[rt], Imm[R[rs]] • Major cycles: • Instruction fetch • Instruction decode: read R[rs] • Calculate memory address: R[rs] + Imm • Load data from memory address: R[rs] + Imm • Write memory data to register R[rt] • Update PC: PC = PC + 4 20 Calculate memory address • address = R[rs] + Imm 21 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B Load data from memory 22 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B Write memory data to R[rt] 23 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register How does it work? • Store word: sw R[rt], Imm[R[rs]] • Major cycles: • Instruction fetch • Instruction decode: read R[rs] • Calculate memory address: R[rs] + Imm • Store data from R[rt] to memory • Update PC: PC = PC + 4 24 Calculate memory address • address = R[rs] + Imm • ALUout is used to store the calculated address which will be used in the store stage later. • Read R[rt] 25 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B Store data from register to memory 26 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register A B How does it work? • Branch instruction: beq R[rt], R[rs], Imm • Three major cycles • Instruction fetch • Instruction decode: read R[rs] and R[rt] • Calculate (PC + 4) • Calculate (PC + 4 + Imm × 4) • Calculate branch condition: e.g. R[rt] == R[rs] ?? • Update program counter as (PC + 4 + Imm × 4) if branch condition is true 27 Calculate PC+4 • Add a multiplexer • Calculate PC + 4, and write (PC+4) to register PC 28 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register M U X A B PC+4 Calculate branch address • Branch address = PC + 4 + Imm × 4 • Don’t write branch address to register PC, write branch address to register ALUout 29 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register M U X A B PC+4 PC+4+Imm×4 Calculate branch condition • Calculate R[rs] – R[rt] • Update register PC based on zero flag 30 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register M U X A B PC+4+Imm×4 PC+4 PC+4+Imm×4 Select (PC+4+Imm×4) if zero==1.Update PC if zero==1. Otherwise, PC remains unchanged. How does it work? • Jump instruction: J Label • One major cycle • Instruction fetch • Instruction decode • Calculate target address (2 steps) 31 Calculate (PC + 4) • Extend the multiplexer to 3 inputs • Add a shifter 32 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register M U X A B Inst[25-0] Shift left 2 PC[31-28] PC+4 PC+4 PC Jump address: (upper 4 bits of (PC + 4) : Inst[25-0]) << 2 Calculate target address • Calculate Jump address: (upper 4 bits of (PC + 4) : Inst[25-0]) << 2 33 Shift left 2 P Memory MemData Write data M u x 0 1 Registers Write register Write data Read data 1 Read data 2 Read register 1 Read register 2 M u x 0 1 M u x 0 1 4 Instruction [15– 0] Sign extend 3216 Instruction [25– 21] Instruction [20– 16] Instruction [15– 0] Instruction register 1 M u x 0 3 2 M u x Instruction [15– 11] Zero 0 1 Address ALU result ALU ALUout C Memory data register M U X A B Inst[25-0] Shift left 2 PC+4[31-28] PC+4 Jump addr Jump addr Select jump address if it is jump instruction. Multi-Cycle datapath with control signals • Minimum Hardware: 1 memory, 1 ALU (instruction and data memories are combined, no adder to calculate branch address) 34 Designing control logic 35 Steps to execute instructions: R-type 36 • R-type, e.g. add $t0, $t1, $t2 • Instruction fetch • Instruction decode, read R[rs] and R[rt] • Execute instruction: R[rs] op R[rt] • Write result back to register R[rd] • Update PC: PC = PC + 4 • Can we design a finite state machine to represent these states? And generate different control signals at different states to control the operation of the datapath? • E.g. design a FSM with 5 states: IF, ID, EX, WB, PC. Each state outputs appropriate control signals to control the datapath to finish adequate operation in a certain state. Instruction fetch Instruction decode Execute Write back result MemWr = .. ALUop = .. ALUSelA = .. RegWr = … FSM Control signals Updata PC Example: control signals for ($t1 + $t2) at execute state • ALUOp = add, ALUSrcA = 1, ALUSrcB = 0, MemtoReg = x, RegWrite = 0 result Example: control signals for ($t1 + $t2) at write back state • ALUOp = x, ALUSrcA = x, ALUSrcB = x, MemtoReg = 0, RegWrite = 1 result Steps involved in instructions: LW • Similarly, we can design a FSM for LW instruction. • Load word • Instruction fetch • Instruction decode: read R[rs] • Calculate memory address: R[rs] + Imm • Load data from memory address: R[rs] + Imm • Write memory data to register R[rt] • Update PC: PC = PC + 4 • We can design FSMs for other instructions. Instruction fetch Instruction decode Calculate address Load data from mem IorD = .. ALUop = .. ALUSelA = .. RegWr = … FSM Control signals Write data to register Updata PC Control FSM for the multi-cycle processor Instruction fetch Instruction decodeFSM for R-type FSM for I-type FSM for LW FSM for SW FSM for branch • Combine all FSMs together. The first two states of all FSMs are the same: IF, ID. Control FSM for the multi-cycle processor: a detailed view 0 Start Instruction fetch MemRead=1 ALUSrcA=0 IorD=0 IRWrite=1 ALUSrcB=01 ALUOp=00 PCWrite=1 PCSource=00 1 Instruction decode/ register fetch ALUSrcA=0 ALUSrcB=11 ALUOp=00 7 R-type completion RegDst=1 RegWrite = 1 MemtoReg=0 4 Write-back step RegDst=0 RegWrite=1 MemtoReg=1 5 Memory access MemWrite=1 IorD=1 2 Memory address computation ALUSrcA=1 ALUSrcB=10 ALUOp=00 6 Execution ALUSrcA=1 ALUSrcB=00 ALUOp=10 8 Branch completion ALUSrcA=1 ALUSrcB=00 ALUOp=01ite PCSource=01 3 MemRead=1 IorD=1 Memory access (O p = ‘ L W ’) 9 Jump completion PCWrite=1 PCSource=10 Implement the FSM in hardware • State changes depends on opcode and current state. PCWrite ALUSrcB MemtoReg MemRead Outputs Inputs Control Logic PCWriteCon ALUOp IorD RegWrite RegDst NS3 NS2 NS1 NS0 S 3 S 2 S 1 S 0 State register O p 5 O p 4 O p 3 O p 2 O p 1 O p 0 Instruction Register Opcode field MemWrite IRWrite PCSource ALUSrcA Single cycle vs Multi-cycle processor • Example • A single cycle processor has a cycle time of 800ps • A multi-cycle processor: CPIR-type=4, CPIlw=5, CPIbeq=3. Cycle time = 200ps • A program has 40% R-type, 20% lw, 40% beq. Totally 10000 inst. • Single processor time = 10000 × 800 × 1 = 8M ps • Average CPI for multi-cycle processor = 4×40%+5×20%+3×40%=3.8 • Multi-cycle processor time = 10000 × 200 × 3.8 = 7.6M ps • Multi-cycle processor is better in this example, how about 40% of lw, and 20% of beq? • 4×40%+5×40%+3×20%=4.2 • 10000 × 200 × 4.2 = 8.4M ps • Which processor is better? • Depends on the instruction mix of programs → program dependent. • Execution time = Instruction count × CPI × cycle time 43 Summary • Two components of processors. • Datapath and control. • Design multi-cycle processor. • Partition datapath of single cycle processor (break an instruction execution into multiple steps). • Add registers into the partitioned datapath. • Design FSM to control the datapath. • Execute each step in one cycle, require multiple cycles to execute one instruction. • Advantages and disadvantages • Single cycle: CPI = 1, long cycle time. • Multi-cycle: shorter cycle time as each cycle just execute one step, different instructions have different CPI. As a result, simple instructions require less cycles (less time) to execute. • Performance comparison • Program dependent. • Execution time = Instruction count × CPI × cycle time 44 Control Datapath Memory Processor Input Output