ECE437: Introduction to Digital Computer Design
Chapter 4a (single-cycle, 4.1-4.4) Spring 2021
Processor Implementation
• CPU aka processor
ECE437, S’21 © Vijaykumar and Thottethodi (2)
Output
Input
CPU
Control
Memory
Datapath
2/1/2021
362 vs. 437
• 362 CPU is an embedded processor – Low-cost, low-power emphasis
– On-chip, custom peripherals
– Transistors used for peripherals
• 437 CPU is a general-purpose processor – High-performance emphasis
– Off-chip, generic peripherals
– Transistors used for performance
• Seen differences in ISA
ECE437, S’21 © Vijaykumar and Thottethodi (3)
2/1/2021
Outline
• Datapath – single cycle
– single instruction, 2’s complement, unsigned
• Control
ECE437, S’21 © Vijaykumar and Thottethodi (4)
2/1/2021
1
Datapath for Instructions
• Single-cycle datapath
• Compose using well-understood pieces
– Mux, flip-flops and gates – ALU
– Register file
ECE437, S’21 © Vijaykumar and Thottethodi (5)
2/1/2021
Comb. Logic Elements
• Adder • ALU
B 32
•Mux
ECE437, S’21 © Vijaykumar and Thottethodi
CarryIn A 32
32
B 32
Result
Select
A 32
Sum Carry
Y
O P
A 32
32
B 32 32
(6)
2/1/2021
Storage Elements Write Enable
Data In Data Out
• Register – forPC
• Registerfile – 32registers
– 2readports/buses – 1writeport/bus
• Memory
– 1inputbus
– 1outputbus
– Notbidirectional
NN
Clk RWRARB
Write Enable5
busW 32
5
5
busA 32
Clk
Write Enable
DataIn 32
Clk
busB Address
DataOut 32
2/1/2021
ECE437, S’21 © Vijaykumar and Thottethodi (7)
32 32-bit Registers
Computer as State Machine
• Storage elements
– Memory, Register file, PC
• Combinational elements – ALUs, Adders, Muxes
ECE437, S’21 © Vijaykumar and Thottethodi (8)
2/1/2021
Storage
Comb. Logic
2
Adder
ALU
MUX
Processor Implementation
• This Lecture: CPU Datapath
CPU
Output
Input
Control
Datapath
ECE437, S’21 © Vijaykumar and Thottethodi (9)
2/1/2021
Memory
Processor Implementation
CPI
Instr. Count Cycle Time
Forward Pointer Alert
• Implementation determines
– How many Clocks Per Instruction (CPI) – How long is the clock cycle (Cycle time)
• ISA, compiler determine
• Howmanyinstructionsinaprogram(Instr.count)
– We will cover these in Ch 1b (after Ch 4a)
• Fornow:implementationdeterminesexecutiontime which measures performance
ECE437, S’21 © Vijaykumar and Thottethodi (10)
2/1/2021
Datapath – Single cycle
• Assumption: Get one whole instruction done in one long clock cycle
– fetch, decode/read operands, execute, memory, writeback
• 5 steps you should NEVER forget!
– useful way to represent steps and identify
required datapath elements: RTL • For single instruction
• Put it together
ECE437, S’21 © Vijaykumar and Thottethodi (11)
2/1/2021
Inst Memory
Adr
nPC_sel
C Clk
10
Rs Rt
Rs Rt Rd RegDst Rd Rt
Imm16
Equal
Instruction<31:0>
ALUctr MemWr MemtoReg
=
Clk
4
RegWr 5 R
55
busA
w Ra Rb
busW
32
32 32-bit
32 lk
You will d
Registers
esign this !!
0
2/1/2021
busB 0 32
32
imm16 16
1
32
Data In
WrEn
Data Memory
A
dr
1
32
ExtOp ALUSrc
ECE437, S’21 © Vijaykumar and Thottethodi (12)
3
<0:15> <11:15> <16:20> <21:25>
Extender
Adder
Adder
PC Ext
ALU
PC Mux
Mux
Mux
imm16
00
Caveat
• Processor you will design in the lab will NOT be exactly same as the processor described in lectures
• So blindly copying design on lecture slides WON’T work in the lab
• Helps me (and you) know whether you understand the material
ECE437, S’21 © Vijaykumar and Thottethodi (13)
2/1/2021
Outline
• Datapath – single cycle
– single instruction, 2’s complement, unsigned
• Control
ECE437, S’21 © Vijaykumar and Thottethodi (14)
2/1/2021
Register Transfer Language
• RTL gives the meaning of the instructions • All start by fetching the instruction
op | rs | rt | rd | shamt | funct = MEM[ PC ] op|rs|rt| Imm16 =MEM[PC]
inst Register Transfers
ADDU R[rd] <– R[rs] + R[rt]; PC<–PC+ 4 SUBU R[rd] <– R[rs] – R[rt]; PC<–PC+ 4 ORi R[rt] <– R[rs] OR zero_ext(Imm16); PC<–PC+ 4 LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm16)]; PC<–PC+ 4 STORE MEM[ R[rs] + sign_ext(Imm16) ] <– R[rt]; PC<–PC+ 4
BEQ if ( R[rs] == R[rt] ) then PC <– PC + sign_ext(Imm16)] || 00
ECE437, S'21 © Vijaykumar and Thottethodi (15)
2/1/2021
else PC <– PC + 4
A Simple Implementation
• ADDandSUB – addUrd,rs,rt – subUrd,rs,rt
31 26 21 16 11 6
0
funct
6 bits
• ORImmediate:
– ori rt,rs,imm16 31
op
6 bits 26
op
6bits 26
op
6 bits 26
op
6 bits
rs
5 bits 21
rs
5bits 21
rs
5 bits 21
rs
5 bits
rt
5 bits 16
rt
5bits 16
rt
5 bits 16
rt
5 bits
rd
5 bits
shamt
5 bits
immediate
16bits
immediate
16 bits
immediate
16 bits
• LDW/STW
– lwrt,rs,imm16 – swrt,rs,imm16
• BRANCH:
– beqrs,rt,imm16
31
31
0
0
0
• How does the hardware know the values for op, rs, rt, rd for a given instruction?
ECE437, S'21 © Vijaykumar and Thottethodi (16)
2/1/2021
4
Fetch Instructions
• Fetch instruction, then update PC
• PC updated (at the end of) every cycle
– What if no branches or jumps?
Clk P
Instruction Word 32
ECE437, S'21 © Vijaykumar and Thottethodi (17)
2/1/2021
C
Address
Instruction Memory
Next Address Logic
4
Next Address Logic
ALU Instructions
• R[rd] <- R[rs] op R[rt] Example: addU rd, rs, rt
– Ra,Rb,andRwcomefrominstruction’srs,rt,andrdfields – ALUOperation and RegWr: control logic after decoding the
instruction
31 26 21 16 11 6 0
op rs
6 bits 5 bits
Rd Rs Rt RegWr 5 5 5
rt
5 bits
rd
5 bits
shamt
5 bits
ALU Operation
funct
6 bits
busW 32
busA 32
busB 32
Result 32
Rw Ra Rb
32 32-bit Registers
Clk
ECE437, S'21 © Vijaykumar and Thottethodi
(18)
2/1/2021
Logical operation with Immediate
• R[rt]<-R[rs]opZeroExt[imm16]]
• rd?
• zeroext
Rd Rt
31 26 21 16 11
0
0
op
6bits
rs
5bits
rt immediate
5bits rd? 16bits 16 15
31
0000000000000000 immediate
RegDst
RegWr 5 5 5
16 bits
16 bits
Mux
Rs Rt
busA
32
busW
ALU Operation
Result 32
(19)
Rw Ra Rb
32 32-bit Registers
32 Clk
imm16
16
busB
32
32
ALUSrc
ECE437, S'21 © Vijaykumar and Thottethodi
2/1/2021
Load Instruction
• R[rt] <- Mem[R[rs] + SignExt[imm16]] Example: lw rt, imm16(rs)
31 26 21 16 11 0
Rd Rt Mux Rs
rd?
16 bits
op
6 bits
busA 32
busB
ExtOp
rs
5 bits
Src
rt
5 bits
ALU Operation
immediate
RegDst
RegWr5 5 5
Rt
Rw Ra Rb
32 32-bit Registers
busW
MemtoReg
2/1/2021
32 Clk
imm16
32 16
ECE437, S'21 © Vijaykumar and Thottethodi
32 MemWr
Data In 32
Clk
(20)
WrEn Adr
Data Memory
32
32
ALU
5
Adder
ALU
ALU
ALU
Mux
Mux
Extender
Mux
ZeroExt
Store Instruction
• Mem[ R[rs] + SignExt[imm16]] <- R[rt] Example: sw rt, imm16(rs)
RegDst
busW
Mux
RegWr 5
5 Rs5 Rt
31 26 21 16
0
Rd
ALUctr
16 bits MemWr
32
MemtoReg
2/1/2021
op
6 bits Rt
rs
5 bits
Rw Ra Rb
32 32-bit Registers
bu
rt
5 bits
busA 32
sB
immediate
32 Clk
imm16 16
32
Data In 32 Clk
WrEn dr
Data Memory
A
32
32
ECE437, S'21 © Vijaykumar and Thottethodi
(21)
Single cycle timing - ideal
CPU clock
fetch fetch fetch
ith instruction i+1th instruction i+2th instruction
Fetch & execute ith Fetch & execute
edge
Like any old state machine
i+1th instr & update PC/reg/mem at next PC/reg/mem at next
instr. & update edge
ECE437, S'21 © Vijaykumar and Thottethodi (22)
2/1/2021
Memory
• Used for instruction fetch and data access ld/st
• Memory is big and slow
– takes 300 CPU clocks today! (ch5)
• Memory can allow only one of instruction fetch or ld/st data access at any given time need to arbitrate between I- fetch and D-access
– We will alleviate this in ch5 but will still need arbitration
ECE437, S'21 © Vijaykumar and Thottethodi (23)
2/1/2021
Memory
• While one could do both I fetch and D- access one after the other in one (slow) CPU clock, our specific FPGA memory needs two clocks –one each for I-fetch and D-access
– So our lab design is “mostly single-cycle CPU”: one cycle for all instructions except ld and st, and two cycles for ld/st
– Still you need arbitration so memory does either I-fetch or D-access at a time
ECE437, S'21 © Vijaykumar and Thottethodi (24)
2/1/2021
6
ALU
Mux
Mux
Extender
Single-cycle timing – lab FPGA
• No ld/st among instructions
CPU clock
I-request & PC ith instruction
I-request & PC i+1th instruction
D-request & data address
ith instruction
I-request & PC i+2th instruction
I-request & PC i+1th instruction
• Withaldorst
CPU clock
I-request & PC ith instruction
ECE437, S'21 © Vijaykumar and Thottethodi (25)
2/1/2021
Memory arbitration
• How is this timing realized? Via request-ready handshake + arbitration
• CPU has a “request block” where CPU generates I-request (for I-fetch) and D-request (for ld/st of data) -- non ld/st instructions generate only I- request and ld/st generate both
– Why both? Simpler else will need to delay I-fetch only for ld/st
• The requests go to the “Arbiter block”
ECE437, S'21 © Vijaykumar and Thottethodi (26)
2/1/2021
Memory arbitration
• The arbiter block chooses one of I-request or D-request if both are active (ld/st) OR I- request if D-request is not active (non-ld/st)
• If both active,
– the D-request is for current instruction and I- request is for the next instruction
– D-request gets priority to complete the current instruction before going to next
• The chosen request accesses memory
ECE437, S'21 © Vijaykumar and Thottethodi (27)
2/1/2021
CPU-Memory Interface
I-request
D-ready
r
Memory
access
Memory system
Request block
D-request
I-ready
CPU
• Request is in CPU and arbiter is in memory system -- SEPARATE
• Do not break this interface by merging request&arbiter – else you will suffer later
ECE437, S'21 © Vijaykumar and Thottethodi (28)
2/1/2021
c
Arbiter/ ontrolle
Memory
7
One-minute quiz
• During the execution of I-format instructions, what is done to the immediate field before being fed to the ALU?
• Previous quiz
– In stored program computers, what is the difference between how code and data are handled (i.e., read from/written to memory and manipulated)?
• None
ECE437, S'21 © Vijaykumar and Thottethodi (29)
2/1/2021
Memory arbitration
• When access complete (I or D), the arbiter/controller asserts the corresponding ready (I-ready or D-ready)
– This is how CPU knows access is complete – this is important in 3 slides
• IMPORTANT: once a D-access is complete, de-assert the D-request
– Else arbiter will keep choosing D-request (which has priority) and will not let next I-fetch happen
• Arbiter/controller combinational but CPU- memory interface sequential
– I-/D- request, ready sensed only at clock edge
ECE437, S'21 © Vijaykumar and Thottethodi (30)
2/1/2021
Memory arbitration – Detail 1
• For ld/st, the instruction is latched within memory system WHILE data access occurs (else instruction would vanish while data access occurs!)
– During the middle clock of previous timing diagram
– This latching is done for you in the code given to you
ECE437, S'21 © Vijaykumar and Thottethodi (31)
2/1/2021
Memory arbitration- Detail 2
• Lab has two clocks – CPU clock and RAM clock
– CPU clock is the main clock
– RAM clock an internal detail to be ignored – RAM clock happens to be 2x CPU clock
• Does not match reality
• Done to make our specific FPGA work for mostly
single-cycle CPU
• Fixed in later slides
ECE437, S'21 © Vijaykumar and Thottethodi (32)
2/1/2021
8
Memory arbitration- Detail 3
• IGNORE any “coherence”-related signals (starting with “cc”) and “datomic” operation which will come later for multicore
• “I-hit” and “D-hit” are SAME as I- ready and D-ready – the term hit will come later for caches
ECE437, S'21 © Vijaykumar and Thottethodi (33)
2/1/2021
Memory arbitration
• In lab, memory is variable latency – so your design should work at different latencies (important in real world)
– Mostly single-cycle is at the lowest latency
– Longer latencies multiple clocks per I- fetch or D-access
– You MAY NOT assume memory will complete within one cycle you MUST wait for I-ready or D-ready
– I-ready, D-ready in one cycle at lowest latency & in more cycles at longer latencies
ECE437, S'21 © Vijaykumar and Thottethodi (34)
2/1/2021
Why?
• Variable latency checks if design is timing-independence – important concept
• Single memory, so we don’t change from two memories (in single cycle) to one memory (in pipelining) – easier for you
• GENERAL- you should test and debug every block BOTH separately AND after merging with rest of design BEFORE going to the next block else you will have a mess that can’t be debugged
ECE437, S'21 © Vijaykumar and Thottethodi (35)
2/1/2021
ORI
• ORI is not in the book
• ORI shows that some instructions need zero extension instead of sign extension
– Logical vs. arithmetic ops
• What will change in the datapath if ORI is absent?
ECE437, S'21 © Vijaykumar and Thottethodi (36)
2/1/2021
9
Conditional Branch Instruction
31 26 21 16
op rs rt
6 bits 5 bits 5 bits
0
• beq rs, rt, imm16
– IR=Mem[PC] //Fetchtheinstructionfrommemory
– Equal <- R[rs] == R[rt] // Calculate the branch condition
– if(Equal==1) //Calculatethenextinstruction’saddress
• PC <- PC+4+(SignExt(imm16)<<2) – else
• PC <- PC + 4
What is this?
• Branches compute TWO things: branch condition and branch target
ECE437, S'21 © Vijaykumar and Thottethodi (37)
2/1/2021
immediate
16 bits
Datapath for ‘beq’
• beq rs, rt, imm16
– Datapath generates branch condition (Equal)
31 26 21 16
0
op
6 bits
rs
5 bits
Inst Address
rt
5 bits
immediate
16 bits
Equal
2/1/2021
4
PCSrc
32
Clk
RegWr 5
5 Rs5 Rt
Rw Ra Rb
32 32-bit Registers
busA 32
busB 32
imm16
busW Clk
ECE437, S'21 © Vijaykumar and Thottethodi
(38)
PCSrc
Rs Rt Rd RegDst Rd Rt
Imm16
Equal
Instruction<31:0>
ALUctr MemWr MemtoReg
Inst Memory
Adr
10
Rs Rt
4
RegWr 5 R
55
busA
=
busW
32 32
imm16 16
w Ra Rb
32 32-bit Registers
32 lk
busB
0
32
0
C Clk
32
1
32
WrEn
Data Memory
A
dr
1
Data In Clk
ExtOp ALUSrc
ECE437, S’21 © Vijaykumar and Thottethodi (39)
2/1/2021
reqd. functionality • To add instructions
Summary
• For a given instruction
– Describe operation in RTL
– Use ALUs, Registers, Memory, adders to achieve
– Rinse and repeat; Reuse components via muxes
– Not all blocks are used by all instrs so you need to
ensure unused blocks don’t interfere • Control: next
– Selection controls for muxes
– ALU controls for ALU ops
– Register address controls
– Write enables for registers/memory
ECE437, S’21 © Vijaykumar and Thottethodi (40)
2/1/2021
10
00
00
Equal?
PC
Mux
Adder Adder
Shift Left 2
Extender
<0:15> <11:15> <16:20> <21:25>
Extender
Adder
Adder
ALU
PC Mux
Mux
Mux
Shift Left 2
Exercise
• Add jump instruction to single cycle datapath
– j Addr – RTL
• PC <- (PC+4)[31:28] // Addr // 00
J-type op target address jump
• See worksheet #1
ECE437, S'21 © Vijaykumar and Thottethodi (41)
2/1/2021
Exercise
J-type op
4
imm16
ECE437, S'21 © Vijaykumar and Thottethodi
target address jump Inst Address
31-28
PCSrc
32
Clk
Inst 25-0
(42)
2/1/2021
Exercise
• See worksheet (Fig 4.15)
• Highlight active datapath for
– Add – Beq – Sw – Lw
ECE437, S'21 © Vijaykumar and Thottethodi (43)
2/1/2021
PCSrc
Rs Rt Rd RegDst Rd Rt
Imm16
Equal
Instruction<31:0>
ALUOp MemWr MemtoReg
Inst Memory
Adr
10
Rs Rt
4
RegWr 5 R
55
busA
=
busW
32 32
imm16 16
w Ra Rb
32 32-bit Registers
32 lk
busB
0
32
0
C Clk
32
1
32
WrEn
Data Memory
A
dr
1
Data In Clk
ExtOp ALUSrc
ECE437, S’21 © Vijaykumar and Thottethodi (44)
2/1/2021
11
00
00
PC Mux
Mux
Adder Adder
Shift Left 2
Shift Left 2
PC Mux
Extender
<0:15> <11:15> <16:20> <21:25>
Extender
Adder
Adder
ALU
Mux
Shift Left 2
Inst Memory
Adr
PCSrc
Instruction<31:0>
Op Fun Rs Rt Rd Imm16
Control
RegWr RegDst ExtOp ALUSrc ALUOp MemWr
MemtoReg
Equal
2/1/2021
Control for Datapath
ECE437, S’21 © Vijaykumar and Thottethodi (45)
DATA PATH
Controls for Add Operation
• R[rd] = R[rs] + R[rt]
PCSrc = +4
Clk
ALUOp=Add
busA 32
Instruction<31:0>
Rt Rs Rd Imm16
Instruction Fetch Unit
Rd Rt 1 Mux 0
RegDst = 1
RegWr=1 5 5 Rs5 Rt
MemtoReg = 0
2/1/2021
Equal
MemWr = 0
Rw Ra Rb
32 32-bit Registers bu
32
busW
32 Clk
imm16 16
ExtOp = x
ECE437, S’21 © Vijaykumar and Thottethodi (46)
0 dr 1
sB0 32
32
1
Data In 32
Clk
WrEn
Data Memory
A
32
ALU
Src = 0
Meaning of Control Signals
• rs, rt, rd and imm16 hardwired in datapath
• PCSrc: 0=>PC<–PC+4; 1=>PC<–PC+4+SignExt(Im16)||00
imm16
PCSrc
Clk
ECE437, S'21 © Vijaykumar and Thottethodi (47)
Instruction Fetch Unit
2/1/2021
4
Inst Memory
Addr
• ExtOp: • ALUsrc: • ALUOp:
“zeroext”, “signext” 0 => regB; 1 => immed “add”, “sub”, “or”
° MemWr:
° MemtoReg:
° RegDst:
write memory
1 => Mem
0 => “rt”; 1 => “rd” write dest register
2/1/2021
Meaning of Control Signals
RegDst Rd Rt
10
RegWr 5 5 Rs5 Rt
° RegWr:
ALUOp MemWr MemtoReg
Equal
sA
32
sB0 32
ExtOp ALUSrc
Rw Ra Rb bu
32 32-bit Registers bu
busW
=
32 Clk
32
32
WrEn
Data Memory
A
0 dr 1
imm16 16
1
Data In Clk
32
ECE437, S’21 © Vijaykumar and Thottethodi (48)
12
00
<0:15> <11:15> <16:20> <21:25>
PC Mux
<0:5> <26:31>
Extender
<0:15> Mux <11:15>
<16:20>
<21:25>
Mux
ALU
ALU
Mux
Mux
Extender
Extender
Adder
Adder
Shift Left 2
ORI Controls: Worksheet
• R[rt] <- R[rs] or ZeroExt[Imm16]
32 32-bit Registers
PCSrc =
Instruction<31:0>
Rd Rt 1 Mux 0
Clk
busA 32
RegDst =
RegWr = 5 5 Rs5 Rt
ALUOp= Rt Equal
Rs Rd
MemWr =
Imm16
MemtoReg =
2/1/2021
busW
Rw Ra Rb
32
ExtOp =
busB
0
32
Instruction Fetch Unit
32 Clk
imm16 16
ECE437, S’21 © Vijaykumar and Thottethodi
(49)
0 dr 1
WrEn
Data Memory
A
32
1
Data In 32
Clk
32
ALU
Src =
ORI Controls: Solution
• R[rt] <- R[rs] or ZeroExt[Imm16]
bu
32 32-bit
Registers busB
32
PCSrc = +4
Instruction<31:0>
Rt Rs Rd Equal MemWr = 0
32
Instruction Fetch Unit
RegDst = 0
Rd Rt 1 Mux 0
Clk
RegWr = 1
5 5
5
Rt
ALUOp = Or
Rs
ExtOp = 0
Imm16
MemtoReg = 0
2/1/2021
Rw Ra Rb
busW
32 Clk
sA 32
0
0 dr 1
32
WrEn
Data Memory
A
imm16 16
1 32
ALUSr
Data In 32
Clk
c=1
ECE437, S’21 © Vijaykumar and Thottethodi
(50)
LW Controls
• R[rt] <- Data Memory {R[rs] + SignExt[imm16]}
bu
32 32-bit
Registers busB
PCSrc = +4
Instruction<31:0>
Rt Rs Rd
Equal MemWr = 0
32
(51)
Instruction Fetch Unit
Rd Rt 1 Mux 0 RegWr = 1 5 5
Clk
sA 32
0
RegDst = 0
Rs
ExtOp = 1
ALUOp = Add
Imm16
MemtoReg = 1
2/1/2021
5
Rt
Rw Ra Rb
busW
32 Clk
imm16 16
0
1 32
32
1 32
ALUSr
Data In 32
Clk
WrEn Adr
Data Memory
c=1
ECE437, S’21 © Vijaykumar and Thottethodi
SW Controls: Worksheet
• R[rt] -> Data Memory {R[rs] + SignExt[imm16]}
Instruction Fetch Unit
Rd Rt 1 Mux 0
PCSrc =
Clk
busA 32
Instruction<31:0>
Rt Rs Rd Imm16
RegDst =
RegWr= 5
5 Rs5 Rt
ALUOp=
Equal MemWr =
Rw Ra Rb
32 32-bit Registers bu
busW
32 Clk
imm16 16
sB0 32
32
1
Data In 32
Clk
WrEn
Data Memory
32
ALU
Src =
ExtOp =
ECE437, S’21 © Vijaykumar and Thottethodi (52)
2/1/2021
0 dr 1
A
32
13
<0:15> <11:15> <16:20> <21:25>
Mux
<0:15> <11:15> <16:20> <21:25>
Mux
<0:15> <11:15> <16:20> <21:25>
Mux
<0:15> <11:15> <16:20> <21:25>
Mux
ALU
ALU
ALU
ALU
Mux
Mux
Mux
Mux
Extender
Extender
Extender
Extender
SW Controls: Solution
• R[rt] <- Data Memory {R[rs] + SignExt[imm16]}
Rd Rt 1 Mux 0 RegWr = 0 5 5
busW
PCSrc = +4
Clk
sA 32
0
Instruction<31:0>
Rt Rs Rd Equal MemWr = 1
32
RegDst = x
Rw Ra Rb
Rs
5
Rt
ALUOp = Add
(53)
Imm16
MemtoReg = x
2/1/2021
Instruction Fetch Unit
bu
32 32-bit
Registers busB
32
32 Clk
imm16 16
WrEn
Data Memory
Adr
32
0 1
1 32
ALU
Data In 32
Clk
Src = 1
ExtOp = 1
ECE437, S’21 © Vijaykumar and Thottethodi
BEQ Controls
• if (R[rs]-R[rt] == 0) then Equal<- 1; else Equal<- 0
PCSrc = “Beq AND Equal”
Instruction<31:0>
Rt Rs Rd Equal MemWr = 0
Rd Rt 1 Mux 0
5 5
Clk
busA 32
RegDst = x RegWr = 0
Rs
5
Rt
ALUOp = Subtract
Imm16 MemtoReg = x
2/1/2021
busW
sB0 32
WrEn
Data Memory
A
1
Data In 32
Clk
32
ALUSr
c=0
ExtOp = x
ECE437, S’21 © Vijaykumar and Thottethodi (54)
Instruction Fetch Unit
Rw Ra Rb
32 32-bit Registers bu
32
0 dr 1
32 Clk
imm16 16
32
Summary of Control Signals
inst Register Transfer
ADD R[rd] <– R[rs] + R[rt]; PC <– PC + 4
ALUsrc = RegB, ALUOp = “add”, RegDst = rd, RegWr, PCSrc = “+4”
SUB R[rd] <– R[rs] – R[rt]; PC <– PC + 4
ALUsrc = RegB, ALUOp = “sub”, RegDst = rd, RegWr, PCSrc = “+4”
ORi R[rt] <– R[rs] + zero_ext(Imm16); PC <– PC + 4
ALUsrc = Im, Extop = “Z”, ALUOp = “or”, RegDst = rt, RegWr, PCSrc = “+4”
LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm16)]; PC <– PC + 4
ALUsrc = Im, Extop = “Sign”, ALUOp = “add”, MemtoReg, RegDst = rt, RegWr, PCSrc = “+4”
STORE MEM[R[rs]+sign_ext(Imm16)]<–R[rs];PC<–PC+4
ALUsrc = Im, Extop = “Signn”, ALUOp = “add”, MemWr, PCSrc = “+4”
BEQ if ( R[rs] == R[rt] ) then PC <– PC + sign_ext(Imm16)] || 00 else PC <– PC + 4;
ALUsrc = RegB, PCSrc = “Beq AND Equal”, ALUOp = “sub”
ECE437, S'21 © Vijaykumar and Thottethodi (55)
2/1/2021
Control Logic
• Logic must generate appropriate signals for all instructions
• Summary slide (previous)
– A way of representing the truth table
• Till now: Instr signal, next: transpose – First: Equations in terms of opcodes
– Next: Equations in terms of instruction bits
ECE437, S'21 © Vijaykumar and Thottethodi (56)
2/1/2021
14
<0:15> Mux <11:15>
<16:20>
<21:25>
<0:15> Mux <11:15>
<16:20>
<21:25>
ALU
ALU
Mux
Mux
Extender
Extender
Controls: Logic equations
<= if (OP == BEQ) then Equal else 0
<=if (OP == “R-type”) then “regB”
elseif (OP == BEQ) then regB, else “imm”
• PCSrc
• ALUsrc
• ALUOp
• ExtOp
• MemWr
• MemtoReg
• RegWr:
• RegDst:
ECE437, S'21 © Vijaykumar and Thottethodi (57)
<=if(OP==“R-type”)then funct elseif (OP == ORi) then “OR” elseif (OP == BEQ) then “sub”
else “add” <= _____________ <= _____________ <= _____________ <= _____________ <= _____________
2/1/2021
• PCSrc
• ALUsrc
• ALUOp
• ExtOp
• MemWr
• MemtoReg
• RegWr:
• RegDst:
Controls: Logic equations
<= if (OP == BEQ) then EQUAL else 0
<= if (OP == “R-type”) then “regB”
elseif (OP == BEQ) then regB, else “imm”
<=if(OP==“R-type”)then funct elseif (OP == ORi) then “OR” elseif (OP == BEQ) then “sub” else “add”
<= if (OP == ORi) then “zeroext” else “signext”
<= (OP == Store)
<= (OP == Load)
<= if ((OP == Store) || (OP == BEQ)) then 0 else 1 <= if ((OP == Load) || (OP == ORi)) then 0 else 1
ECE437, S'21 © Vijaykumar and Thottethodi (58)
2/1/2021
e func
xB
op
10 0000 00 0000 add
10 0010 00 0000 sub
00 1101
ori
We D
10 0011
lw
on’t Car
10 1011
sw
e 🙂
00 0100
beq
00 0010
jump
RegDst
1
1
0
0
x
x
x
ALUSrc
0
0
1
1
1
0
x
MemtoReg
0
0
0
1
x
x
x
RegWrite
1
1
1
1
0
0
0
MemWrite
0
0
0
0
1
0
0
Beq
0
0
0
0
0
1
0
Jump
0
0
0
0
0
0
1
ExtOp
x
x
0
1
1
x
x
ALUOp<2:0>
Add
Subtract
Or
Add
Add
Subtract
xxx
Se Appendi
Truth Table summary
31 26 21 16 11 6 0
R-type op I-type op J-type op
rs rt rs rt
rd
target address
shamt immediate
funct
add, sub
ori, lw, sw, beq jump
2/1/2021
ECE437, S’21 © Vijaykumar and Thottethodi
(59)
Local vs Global Control
• One more layer of abstraction
• ALUOp <= if (OP == “R-type”) then funct elseif (OP == ORi) then “OR”
elseif (OP == BEQ) then “sub” else “add”
31 26 21 16 11 6 0
R-type op rs rt rd shamt funct
ECE437, S'21 © Vijaykumar and Thottethodi (60)
2/1/2021
15
Global Control: Truth Table
RegDst
ALUSrc
MemtoReg
RegWrite
MemWrite
Beq
Jump
ExtOp
ALUg
op
func op 6
ALUOp 3
00 0000
R-type
1
0
0
1
0
0
0
x
“R-type”
ori
0
1
0
1
0
0
0
0
Or
lw
0
1
1
1
0
0
0
1
Add
10 1011
sw
x
1
x
0
1
0
0
1
Add
beq
x
0
x
0
0
1
0
x
Subtract
jump
x
x
x
0
0
0
1
x
xxx
Main Control
ALU Control (Local)
6
ALUg N
ECE437, S’21 © Vijaykumar and Thottethodi
(61)
2/1/2021
00 1101
10 0011
00 0100
00 0010
Encoding
func
op 6 ALUOp 6 ALUg 3
N
• In this exercise, ALUg has to be 2 bits wide to represent: – (1) “R-type” instructions
– “I-type” instructions that require the ALU to perform:
• (2) Or, (3) Add, and (4) Subtract
• To implement the full MIPS ISA, ALUg has to be 3 bits to
represent:
– (1) “R-type” instructions
– “I-type” instructions that require the ALU to perform:
• (2) Or, (3) Add, (4) Subtract, and (5) And (Example: andi)
ECE437, S’21 © Vijaykumar and Thottethodi (62)
2/1/2021
Main Control
ALU Control (Local)
R-type
ori
lw
sw
beq
jump
ALUg (Symbolic)
“R-type”
Or
Add
Add
Subtract
xxx
ALUg<2:0>
1 00
0 10
0 00
0 00
0 01
xxx
Global Control: Truth Table
RegDst
ALUSrc
MemtoReg
RegWrite
MemWrite
Beq
Jump
ExtOp
ALUg<2:0>
6
ALUg N
op
00 0000
R-type
1
0
0
1
0
0
0
x
“R-type”
00 1101
ori
Or
0
1
0
1
0
0
0
0
10 0011
lw
0
1
1
1
0
0
0
1
Add
10 1011
sw
x
1
x
0
1
0
0
1
Add
00 0100
beq
x
0
x
0
0
1
0
x
Subtract
00 0010
(100) (010)
(000) (000)
(63)
(001)
ALUOpr 3
func op 6
jump
x
x
x
0
0
0
1
x
xxx
Main Control
ALU Control (Local)
ECE437, S’21 © Vijaykumar and Thottethodi
2/1/2021
Truth Table for RegWrite
op
• RegWrite = R-type + ori + lw
= !op<5> & !op<4> & !op<3> & !op<2> & !op<1> & !op<0> (R-type)
+ !op<5> & !op<4> & op<3> & op<2> & !op<1> & op<0> (ori)
+ op<5> & !op<4> & !op<3> & !op<2> & op<1> & op<0> (lw)
RegWrite
00 0000
R-type
1
00 1101
ori
1
10 0011
lw
1
10 1011
sw
0
00 0100
beq
0
00 0010
jump
0
op<5>. . op<5>. . op<5>. . op<5>. . op<5>. . op<5>. .
<0>
<0>
<0>
<0>
<0>
op<0>
R-type
ori
lw
sw
beq
jump
ECE437, S’21 © Vijaykumar and Thottethodi
(64)
RegWrite
2/1/2021
16
ALU ALU
PLA implementation
op<5>. . op<5>. . op<5>. . op<5>. . op<5>. . op<5>. .
op<0>
op<0> op<0>
lw sw
op<0> op<0>
beq jump
(65)
op<0>
R-type
ori
RegWrite
ALUSrc RegDst MemtoReg
MemWrite Branch Jump
ExtOp ALUop<2> ALUop<1> ALUop<0>
2/1/2021
ECE437, S’21 © Vijaykumar and Thottethodi
PLA Representation
ECE437, S’21 © Vijaykumar and Thottethodi (66)
2/1/2021
Putting it all together
ALUg RegDst ALUSrc
3
busA 32
func Instr<5:0> 6
ALUOp
(67)
ALU ALUOp
op
6 Instr<31:26>
RegDst
busW
Control
Instruction<31:0>
3
Main Control
:
Rt
PCSrc Clk
Instruction Fetch Unit
Rd
1 Mux 0
Rt Rs Rd Imm16
RegWr 5 5 Rs5 Rt Rw Ra Rb
Zero MemWr 32
MemtoReg
2/1/2021
32 32-bit Registers
32 Clk
imm16 16 Instr<15:0>
busB 32
ExtOp
0
ECE437, S’21 © Vijaykumar and Thottethodi
0 dr 1
WrEn
Data Memory
A
32
1
Data In 32
Clk Src
32
ALU
Setup time
Cycletime
Comb. Logic
• What should the clock period be?
– Enough to compute the next state values
• Propagation clk-to-Q (new state) • Comb. Logic delay
• Setup requirements
ECE437, S’21 © Vijaykumar and Thottethodi (68)
2/1/2021
Storage
17
<0:15> Mux <11:15>
<16:20>
<21:25>
ALU
Mux
Extender
Clk
PC Old Value
Rs, Rt, Rd, Op, Func
ALUOp RegWr busA, B busW
Timing: R-type inst
Clk-to-Q
Old Value
Rd Rs Rt RegWr 5 5 5
ALUOp
ALU Delay New Value
Register Write Occurs Here
Result 32
2/1/2021
New Value
Old Value
Old Value
Old Value
Old Value
Instruction Memory Access Time New Value
busW 32
busA 32
busB 32
Rw Ra Rb
32 32-bit Registers
Delay through Control Logic New Value
New Value
Register File Access Time
New Value
Clk
ECE437, S’21 © Vijaykumar and Thottethodi
(69)
One-minute quiz
• What is the ALU operation for loads and stores?
• Previous quiz
– During execution of I-format instructions, what is done to the immediate field before being fed to the ALU?
• Zero- or sign-extension
ECE437, S’21 © Vijaykumar and Thottethodi (70)
2/1/2021
“lw” Instruction
• Longer critical path
– lower bound on cycletime
Ideal Instruction Memory
Instruction Address
Data 32 Address
Rd
32 R
Instruction
Critical Path (Load Operation) =
PC’s Clk-to-Q +
Instruction Memory’s Access Time + Register File’s Access Time +
ALU to Perform a 32-bit Add + Data Memory Access Time +
Setup Time for Register File Write + Clock Skew
Rs
55
Rt
5
Imm
16 A
w Ra Rb
32 32-bit Registers
32 B
Ideal
Data Memory
Data In
Clk
32
Clk
ECE437, S’21 © Vijaykumar and Thottethodi (71)
2/1/2021
Clk
PC Old Value
Rs, Rt, Rd, Op, Func
ALUOp ExtOp ALUSrc MemtoReg RegWr
busA busB Address
busW
Worst case timing
Clk-to-Q New Value
Old Value
Old Value
Delay through Extender & Mux
Instruction Memory Access Time New Value
Old Value
Old Value
Old Value
Old Value
Delay through Control Logic
New Value
New Value
New Value
New Value New Value
Register Write Occurs
Old Value
Register File Access Time
New Value
Old Value
Old Value
Old Value
Data Memory Access Time
New Value
ALU Delay
New Value
New
ECE437, S’21 © Vijaykumar and Thottethodi (72)
2/1/2021
18
Clk
Next Address PC
ALU
ALU
What’s wrong with our processor?
Arithmetic & Logical PC Inst Memory
Load
PC Inst Memory
Store
PC Inst Memory
Branch
PC Inst Memory
Reg File mux ALU Reg File mux ALU
Critical Path
Reg File mux ALU Reg File cmp mux
mux setup Data Mem Data Mem
muxsetup
• LOOOOONGCycleTime
• ALLinstructionstakeasmuchtimeastheslowest
• RealmemoryMUCHslowerthanouridealizedmemory
– Today some 150ns (memory+bus+control) vs. 0.5ns CPU clock – cannot finish in one (short) cycle
ECE437, S’21 © Vijaykumar and Thottethodi (73)
2/1/2021
Notion of Performance
• We need to understand “performance” better because in the rest of course we will improve the performance of our processor
• To understand performance we will go to chapter 1b (1.4 onwards) before returning to chapter 4b (4.5 onwards)
ECE437, S’21 © Vijaykumar and Thottethodi (74)
2/1/2021
19