CS计算机代考程序代写 mips cache compiler ECE437: Introduction to Digital Computer Design

ECE437: Introduction to Digital Computer Design
Chapter 4a (single-cycle, 4.1-4.4) Spring 2021
Processor Implementation
• CPU aka processor
ECE437, S’21 © Vijaykumar and Thottethodi (2)
Output
Input
CPU
Control
Memory
Datapath
2/1/2021
362 vs. 437
• 362 CPU is an embedded processor – Low-cost, low-power emphasis
– On-chip, custom peripherals
– Transistors used for peripherals
• 437 CPU is a general-purpose processor – High-performance emphasis
– Off-chip, generic peripherals
– Transistors used for performance
• Seen differences in ISA
ECE437, S’21 © Vijaykumar and Thottethodi (3)
2/1/2021
Outline
• Datapath – single cycle
– single instruction, 2’s complement, unsigned
• Control
ECE437, S’21 © Vijaykumar and Thottethodi (4)
2/1/2021
1

Datapath for Instructions
• Single-cycle datapath
• Compose using well-understood pieces
– Mux, flip-flops and gates – ALU
– Register file
ECE437, S’21 © Vijaykumar and Thottethodi (5)
2/1/2021
Comb. Logic Elements
• Adder • ALU
B 32
•Mux
ECE437, S’21 © Vijaykumar and Thottethodi
CarryIn A 32
32
B 32
Result
Select
A 32
Sum Carry
Y
O P
A 32
32
B 32 32
(6)
2/1/2021
Storage Elements Write Enable
Data In Data Out
• Register – forPC
• Registerfile – 32registers
– 2readports/buses – 1writeport/bus
• Memory
– 1inputbus
– 1outputbus
– Notbidirectional
NN
Clk RWRARB
Write Enable5
busW 32
5
5
busA 32
Clk
Write Enable
DataIn 32
Clk
busB Address
DataOut 32
2/1/2021
ECE437, S’21 © Vijaykumar and Thottethodi (7)
32 32-bit Registers
Computer as State Machine
• Storage elements
– Memory, Register file, PC
• Combinational elements – ALUs, Adders, Muxes
ECE437, S’21 © Vijaykumar and Thottethodi (8)
2/1/2021
Storage
Comb. Logic
2
Adder
ALU
MUX

Processor Implementation
• This Lecture: CPU Datapath
CPU
Output
Input
Control
Datapath
ECE437, S’21 © Vijaykumar and Thottethodi (9)
2/1/2021
Memory
Processor Implementation
CPI
Instr. Count Cycle Time
Forward Pointer Alert
• Implementation determines
– How many Clocks Per Instruction (CPI) – How long is the clock cycle (Cycle time)
• ISA, compiler determine
• Howmanyinstructionsinaprogram(Instr.count)
– We will cover these in Ch 1b (after Ch 4a)
• Fornow:implementationdeterminesexecutiontime which measures performance
ECE437, S’21 © Vijaykumar and Thottethodi (10)
2/1/2021
Datapath – Single cycle
• Assumption: Get one whole instruction done in one long clock cycle
– fetch, decode/read operands, execute, memory, writeback
• 5 steps you should NEVER forget!
– useful way to represent steps and identify
required datapath elements: RTL • For single instruction
• Put it together
ECE437, S’21 © Vijaykumar and Thottethodi (11)
2/1/2021
Inst Memory
Adr
nPC_sel
C Clk
10
Rs Rt
Rs Rt Rd RegDst Rd Rt
Imm16
Equal
Instruction<31:0>
ALUctr MemWr MemtoReg
=
Clk
4
RegWr 5 R
55
busA
w Ra Rb
busW
32
32 32-bit
32 lk
You will d
Registers
esign this !!
0
2/1/2021
busB 0 32
32
imm16 16
1
32
Data In
WrEn
Data Memory
A
dr
1
32
ExtOp ALUSrc
ECE437, S’21 © Vijaykumar and Thottethodi (12)
3
<0:15> <11:15> <16:20> <21:25>
Extender
Adder
Adder
PC Ext
ALU
PC Mux
Mux
Mux
imm16
00

Caveat
• Processor you will design in the lab will NOT be exactly same as the processor described in lectures
• So blindly copying design on lecture slides WON’T work in the lab
• Helps me (and you) know whether you understand the material
ECE437, S’21 © Vijaykumar and Thottethodi (13)
2/1/2021
Outline
• Datapath – single cycle
– single instruction, 2’s complement, unsigned
• Control
ECE437, S’21 © Vijaykumar and Thottethodi (14)
2/1/2021
Register Transfer Language
• RTL gives the meaning of the instructions • All start by fetching the instruction
op | rs | rt | rd | shamt | funct = MEM[ PC ] op|rs|rt| Imm16 =MEM[PC]
inst Register Transfers
ADDU R[rd] <– R[rs] + R[rt]; PC<–PC+ 4 SUBU R[rd] <– R[rs] – R[rt]; PC<–PC+ 4 ORi R[rt] <– R[rs] OR zero_ext(Imm16); PC<–PC+ 4 LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm16)]; PC<–PC+ 4 STORE MEM[ R[rs] + sign_ext(Imm16) ] <– R[rt]; PC<–PC+ 4 BEQ if ( R[rs] == R[rt] ) then PC <– PC + sign_ext(Imm16)] || 00 ECE437, S'21 © Vijaykumar and Thottethodi (15) 2/1/2021 else PC <– PC + 4 A Simple Implementation • ADDandSUB – addUrd,rs,rt – subUrd,rs,rt 31 26 21 16 11 6 0 funct 6 bits • ORImmediate: – ori rt,rs,imm16 31 op 6 bits 26 op 6bits 26 op 6 bits 26 op 6 bits rs 5 bits 21 rs 5bits 21 rs 5 bits 21 rs 5 bits rt 5 bits 16 rt 5bits 16 rt 5 bits 16 rt 5 bits rd 5 bits shamt 5 bits immediate 16bits immediate 16 bits immediate 16 bits • LDW/STW – lwrt,rs,imm16 – swrt,rs,imm16 • BRANCH: – beqrs,rt,imm16 31 31 0 0 0 • How does the hardware know the values for op, rs, rt, rd for a given instruction? ECE437, S'21 © Vijaykumar and Thottethodi (16) 2/1/2021 4 Fetch Instructions • Fetch instruction, then update PC • PC updated (at the end of) every cycle – What if no branches or jumps? Clk P Instruction Word 32 ECE437, S'21 © Vijaykumar and Thottethodi (17) 2/1/2021 C Address Instruction Memory Next Address Logic 4 Next Address Logic ALU Instructions • R[rd] <- R[rs] op R[rt] Example: addU rd, rs, rt – Ra,Rb,andRwcomefrominstruction’srs,rt,andrdfields – ALUOperation and RegWr: control logic after decoding the instruction 31 26 21 16 11 6 0 op rs 6 bits 5 bits Rd Rs Rt RegWr 5 5 5 rt 5 bits rd 5 bits shamt 5 bits ALU Operation funct 6 bits busW 32 busA 32 busB 32 Result 32 Rw Ra Rb 32 32-bit Registers Clk ECE437, S'21 © Vijaykumar and Thottethodi (18) 2/1/2021 Logical operation with Immediate • R[rt]<-R[rs]opZeroExt[imm16]] • rd? • zeroext Rd Rt 31 26 21 16 11 0 0 op 6bits rs 5bits rt immediate 5bits rd? 16bits 16 15 31 0000000000000000 immediate RegDst RegWr 5 5 5 16 bits 16 bits Mux Rs Rt busA 32 busW ALU Operation Result 32 (19) Rw Ra Rb 32 32-bit Registers 32 Clk imm16 16 busB 32 32 ALUSrc ECE437, S'21 © Vijaykumar and Thottethodi 2/1/2021 Load Instruction • R[rt] <- Mem[R[rs] + SignExt[imm16]] Example: lw rt, imm16(rs) 31 26 21 16 11 0 Rd Rt Mux Rs rd? 16 bits op 6 bits busA 32 busB ExtOp rs 5 bits Src rt 5 bits ALU Operation immediate RegDst RegWr5 5 5 Rt Rw Ra Rb 32 32-bit Registers busW MemtoReg 2/1/2021 32 Clk imm16 32 16 ECE437, S'21 © Vijaykumar and Thottethodi 32 MemWr Data In 32 Clk (20) WrEn Adr Data Memory 32 32 ALU 5 Adder ALU ALU ALU Mux Mux Extender Mux ZeroExt Store Instruction • Mem[ R[rs] + SignExt[imm16]] <- R[rt] Example: sw rt, imm16(rs) RegDst busW Mux RegWr 5 5 Rs5 Rt 31 26 21 16 0 Rd ALUctr 16 bits MemWr 32 MemtoReg 2/1/2021 op 6 bits Rt rs 5 bits Rw Ra Rb 32 32-bit Registers bu rt 5 bits busA 32 sB immediate 32 Clk imm16 16 32 Data In 32 Clk WrEn dr Data Memory A 32 32 ECE437, S'21 © Vijaykumar and Thottethodi (21) Single cycle timing - ideal CPU clock fetch fetch fetch ith instruction i+1th instruction i+2th instruction Fetch & execute ith Fetch & execute edge Like any old state machine i+1th instr & update PC/reg/mem at next PC/reg/mem at next instr. & update edge ECE437, S'21 © Vijaykumar and Thottethodi (22) 2/1/2021 Memory • Used for instruction fetch and data access ld/st • Memory is big and slow – takes 300 CPU clocks today! (ch5) • Memory can allow only one of instruction fetch or ld/st data access at any given time  need to arbitrate between I- fetch and D-access – We will alleviate this in ch5 but will still need arbitration ECE437, S'21 © Vijaykumar and Thottethodi (23) 2/1/2021 Memory • While one could do both I fetch and D- access one after the other in one (slow) CPU clock, our specific FPGA memory needs two clocks –one each for I-fetch and D-access – So our lab design is “mostly single-cycle CPU”: one cycle for all instructions except ld and st, and two cycles for ld/st – Still you need arbitration so memory does either I-fetch or D-access at a time ECE437, S'21 © Vijaykumar and Thottethodi (24) 2/1/2021 6 ALU Mux Mux Extender Single-cycle timing – lab FPGA • No ld/st among instructions CPU clock I-request & PC ith instruction I-request & PC i+1th instruction D-request & data address ith instruction I-request & PC i+2th instruction I-request & PC i+1th instruction • Withaldorst CPU clock I-request & PC ith instruction ECE437, S'21 © Vijaykumar and Thottethodi (25) 2/1/2021 Memory arbitration • How is this timing realized? Via request-ready handshake + arbitration • CPU has a “request block” where CPU generates I-request (for I-fetch) and D-request (for ld/st of data) -- non ld/st instructions generate only I- request and ld/st generate both – Why both? Simpler else will need to delay I-fetch only for ld/st • The requests go to the “Arbiter block” ECE437, S'21 © Vijaykumar and Thottethodi (26) 2/1/2021 Memory arbitration • The arbiter block chooses one of I-request or D-request if both are active (ld/st) OR I- request if D-request is not active (non-ld/st) • If both active, – the D-request is for current instruction and I- request is for the next instruction – D-request gets priority to complete the current instruction before going to next • The chosen request accesses memory ECE437, S'21 © Vijaykumar and Thottethodi (27) 2/1/2021 CPU-Memory Interface I-request D-ready r Memory access Memory system Request block D-request I-ready CPU • Request is in CPU and arbiter is in memory system -- SEPARATE • Do not break this interface by merging request&arbiter – else you will suffer later ECE437, S'21 © Vijaykumar and Thottethodi (28) 2/1/2021 c Arbiter/ ontrolle Memory 7 One-minute quiz • During the execution of I-format instructions, what is done to the immediate field before being fed to the ALU? • Previous quiz – In stored program computers, what is the difference between how code and data are handled (i.e., read from/written to memory and manipulated)? • None ECE437, S'21 © Vijaykumar and Thottethodi (29) 2/1/2021 Memory arbitration • When access complete (I or D), the arbiter/controller asserts the corresponding ready (I-ready or D-ready) – This is how CPU knows access is complete – this is important in 3 slides • IMPORTANT: once a D-access is complete, de-assert the D-request – Else arbiter will keep choosing D-request (which has priority) and will not let next I-fetch happen • Arbiter/controller combinational but CPU- memory interface sequential – I-/D- request, ready sensed only at clock edge ECE437, S'21 © Vijaykumar and Thottethodi (30) 2/1/2021 Memory arbitration – Detail 1 • For ld/st, the instruction is latched within memory system WHILE data access occurs (else instruction would vanish while data access occurs!) – During the middle clock of previous timing diagram – This latching is done for you in the code given to you ECE437, S'21 © Vijaykumar and Thottethodi (31) 2/1/2021 Memory arbitration- Detail 2 • Lab has two clocks – CPU clock and RAM clock – CPU clock is the main clock – RAM clock an internal detail to be ignored – RAM clock happens to be 2x CPU clock • Does not match reality • Done to make our specific FPGA work for mostly single-cycle CPU • Fixed in later slides ECE437, S'21 © Vijaykumar and Thottethodi (32) 2/1/2021 8 Memory arbitration- Detail 3 • IGNORE any “coherence”-related signals (starting with “cc”) and “datomic” operation which will come later for multicore • “I-hit” and “D-hit” are SAME as I- ready and D-ready – the term hit will come later for caches ECE437, S'21 © Vijaykumar and Thottethodi (33) 2/1/2021 Memory arbitration • In lab, memory is variable latency – so your design should work at different latencies (important in real world) – Mostly single-cycle is at the lowest latency – Longer latencies  multiple clocks per I- fetch or D-access – You MAY NOT assume memory will complete within one cycle  you MUST wait for I-ready or D-ready – I-ready, D-ready in one cycle at lowest latency & in more cycles at longer latencies ECE437, S'21 © Vijaykumar and Thottethodi (34) 2/1/2021 Why? • Variable latency checks if design is timing-independence – important concept • Single memory, so we don’t change from two memories (in single cycle) to one memory (in pipelining) – easier for you • GENERAL- you should test and debug every block BOTH separately AND after merging with rest of design BEFORE going to the next block else you will have a mess that can’t be debugged ECE437, S'21 © Vijaykumar and Thottethodi (35) 2/1/2021 ORI • ORI is not in the book • ORI shows that some instructions need zero extension instead of sign extension – Logical vs. arithmetic ops • What will change in the datapath if ORI is absent? ECE437, S'21 © Vijaykumar and Thottethodi (36) 2/1/2021 9 Conditional Branch Instruction 31 26 21 16 op rs rt 6 bits 5 bits 5 bits 0 • beq rs, rt, imm16 – IR=Mem[PC] //Fetchtheinstructionfrommemory – Equal <- R[rs] == R[rt] // Calculate the branch condition – if(Equal==1) //Calculatethenextinstruction’saddress • PC <- PC+4+(SignExt(imm16)<<2) – else • PC <- PC + 4 What is this? • Branches compute TWO things: branch condition and branch target ECE437, S'21 © Vijaykumar and Thottethodi (37) 2/1/2021 immediate 16 bits Datapath for ‘beq’ • beq rs, rt, imm16 – Datapath generates branch condition (Equal) 31 26 21 16 0 op 6 bits rs 5 bits Inst Address rt 5 bits immediate 16 bits Equal 2/1/2021 4 PCSrc 32 Clk RegWr 5 5 Rs5 Rt Rw Ra Rb 32 32-bit Registers busA 32 busB 32 imm16 busW Clk ECE437, S'21 © Vijaykumar and Thottethodi (38) PCSrc Rs Rt Rd RegDst Rd Rt Imm16 Equal Instruction<31:0>
ALUctr MemWr MemtoReg
Inst Memory
Adr
10
Rs Rt
4
RegWr 5 R
55
busA
=
busW
32 32
imm16 16
w Ra Rb
32 32-bit Registers
32 lk
busB
0
32
0
C Clk
32
1
32
WrEn
Data Memory
A
dr
1
Data In Clk
ExtOp ALUSrc
ECE437, S’21 © Vijaykumar and Thottethodi (39)
2/1/2021
reqd. functionality • To add instructions
Summary
• For a given instruction
– Describe operation in RTL
– Use ALUs, Registers, Memory, adders to achieve
– Rinse and repeat; Reuse components via muxes
– Not all blocks are used by all instrs so you need to
ensure unused blocks don’t interfere • Control: next
– Selection controls for muxes
– ALU controls for ALU ops
– Register address controls
– Write enables for registers/memory
ECE437, S’21 © Vijaykumar and Thottethodi (40)
2/1/2021
10
00
00
Equal?
PC
Mux
Adder Adder
Shift Left 2
Extender
<0:15> <11:15> <16:20> <21:25>
Extender
Adder
Adder
ALU
PC Mux
Mux
Mux
Shift Left 2

Exercise
• Add jump instruction to single cycle datapath
– j Addr – RTL
• PC <- (PC+4)[31:28] // Addr // 00 J-type op target address jump • See worksheet #1 ECE437, S'21 © Vijaykumar and Thottethodi (41) 2/1/2021 Exercise J-type op 4 imm16 ECE437, S'21 © Vijaykumar and Thottethodi target address jump Inst Address 31-28 PCSrc 32 Clk Inst 25-0 (42) 2/1/2021 Exercise • See worksheet (Fig 4.15) • Highlight active datapath for – Add – Beq – Sw – Lw ECE437, S'21 © Vijaykumar and Thottethodi (43) 2/1/2021 PCSrc Rs Rt Rd RegDst Rd Rt Imm16 Equal Instruction<31:0>
ALUOp MemWr MemtoReg
Inst Memory
Adr
10
Rs Rt
4
RegWr 5 R
55
busA
=
busW
32 32
imm16 16
w Ra Rb
32 32-bit Registers
32 lk
busB
0
32
0
C Clk
32
1
32
WrEn
Data Memory
A
dr
1
Data In Clk
ExtOp ALUSrc
ECE437, S’21 © Vijaykumar and Thottethodi (44)
2/1/2021
11
00
00
PC Mux
Mux
Adder Adder
Shift Left 2
Shift Left 2
PC Mux
Extender
<0:15> <11:15> <16:20> <21:25>
Extender
Adder
Adder
ALU
Mux
Shift Left 2

Inst Memory
Adr
PCSrc
Instruction<31:0>
Op Fun Rs Rt Rd Imm16
Control
RegWr RegDst ExtOp ALUSrc ALUOp MemWr
MemtoReg
Equal
2/1/2021
Control for Datapath
ECE437, S’21 © Vijaykumar and Thottethodi (45)
DATA PATH
Controls for Add Operation
• R[rd] = R[rs] + R[rt]
PCSrc = +4
Clk
ALUOp=Add
busA 32
Instruction<31:0>
Rt Rs Rd Imm16
Instruction Fetch Unit
Rd Rt 1 Mux 0
RegDst = 1
RegWr=1 5 5 Rs5 Rt
MemtoReg = 0
2/1/2021
Equal
MemWr = 0
Rw Ra Rb
32 32-bit Registers bu
32
busW
32 Clk
imm16 16
ExtOp = x
ECE437, S’21 © Vijaykumar and Thottethodi (46)
0 dr 1
sB0 32
32
1
Data In 32
Clk
WrEn
Data Memory
A
32
ALU
Src = 0
Meaning of Control Signals
• rs, rt, rd and imm16 hardwired in datapath
• PCSrc: 0=>PC<–PC+4; 1=>PC<–PC+4+SignExt(Im16)||00 imm16 PCSrc Clk ECE437, S'21 © Vijaykumar and Thottethodi (47) Instruction Fetch Unit 2/1/2021 4 Inst Memory Addr • ExtOp: • ALUsrc: • ALUOp: “zeroext”, “signext” 0 => regB; 1 => immed “add”, “sub”, “or”
° MemWr:
° MemtoReg:
° RegDst:
write memory
1 => Mem
0 => “rt”; 1 => “rd” write dest register
2/1/2021
Meaning of Control Signals
RegDst Rd Rt
10
RegWr 5 5 Rs5 Rt
° RegWr:
ALUOp MemWr MemtoReg
Equal
sA
32
sB0 32
ExtOp ALUSrc
Rw Ra Rb bu
32 32-bit Registers bu
busW
=
32 Clk
32
32
WrEn
Data Memory
A
0 dr 1
imm16 16
1
Data In Clk
32
ECE437, S’21 © Vijaykumar and Thottethodi (48)
12
00
<0:15> <11:15> <16:20> <21:25>
PC Mux
<0:5> <26:31>
Extender
<0:15> Mux <11:15>
<16:20>
<21:25>
Mux
ALU
ALU
Mux
Mux
Extender
Extender
Adder
Adder
Shift Left 2

ORI Controls: Worksheet
• R[rt] <- R[rs] or ZeroExt[Imm16] 32 32-bit Registers PCSrc = Instruction<31:0>
Rd Rt 1 Mux 0
Clk
busA 32
RegDst =
RegWr = 5 5 Rs5 Rt
ALUOp= Rt Equal
Rs Rd
MemWr =
Imm16
MemtoReg =
2/1/2021
busW
Rw Ra Rb
32
ExtOp =
busB
0
32
Instruction Fetch Unit
32 Clk
imm16 16
ECE437, S’21 © Vijaykumar and Thottethodi
(49)
0 dr 1
WrEn
Data Memory
A
32
1
Data In 32
Clk
32
ALU
Src =
ORI Controls: Solution
• R[rt] <- R[rs] or ZeroExt[Imm16] bu 32 32-bit Registers busB 32 PCSrc = +4 Instruction<31:0>
Rt Rs Rd Equal MemWr = 0
32
Instruction Fetch Unit
RegDst = 0
Rd Rt 1 Mux 0
Clk
RegWr = 1
5 5
5
Rt
ALUOp = Or
Rs
ExtOp = 0
Imm16
MemtoReg = 0
2/1/2021
Rw Ra Rb
busW
32 Clk
sA 32
0
0 dr 1
32
WrEn
Data Memory
A
imm16 16
1 32
ALUSr
Data In 32
Clk
c=1
ECE437, S’21 © Vijaykumar and Thottethodi
(50)
LW Controls
• R[rt] <- Data Memory {R[rs] + SignExt[imm16]} bu 32 32-bit Registers busB PCSrc = +4 Instruction<31:0>
Rt Rs Rd
Equal MemWr = 0
32
(51)
Instruction Fetch Unit
Rd Rt 1 Mux 0 RegWr = 1 5 5
Clk
sA 32
0
RegDst = 0
Rs
ExtOp = 1
ALUOp = Add
Imm16
MemtoReg = 1
2/1/2021
5
Rt
Rw Ra Rb
busW
32 Clk
imm16 16
0
1 32
32
1 32
ALUSr
Data In 32
Clk
WrEn Adr
Data Memory
c=1
ECE437, S’21 © Vijaykumar and Thottethodi
SW Controls: Worksheet
• R[rt] -> Data Memory {R[rs] + SignExt[imm16]}
Instruction Fetch Unit
Rd Rt 1 Mux 0
PCSrc =
Clk
busA 32
Instruction<31:0>
Rt Rs Rd Imm16
RegDst =
RegWr= 5
5 Rs5 Rt
ALUOp=
Equal MemWr =
Rw Ra Rb
32 32-bit Registers bu
busW
32 Clk
imm16 16
sB0 32
32
1
Data In 32
Clk
WrEn
Data Memory
32
ALU
Src =
ExtOp =
ECE437, S’21 © Vijaykumar and Thottethodi (52)
2/1/2021
0 dr 1
A
32
13
<0:15> <11:15> <16:20> <21:25>
Mux
<0:15> <11:15> <16:20> <21:25>
Mux
<0:15> <11:15> <16:20> <21:25>
Mux
<0:15> <11:15> <16:20> <21:25>
Mux
ALU
ALU
ALU
ALU
Mux
Mux
Mux
Mux
Extender
Extender
Extender
Extender

SW Controls: Solution
• R[rt] <- Data Memory {R[rs] + SignExt[imm16]} Rd Rt 1 Mux 0 RegWr = 0 5 5 busW PCSrc = +4 Clk sA 32 0 Instruction<31:0>
Rt Rs Rd Equal MemWr = 1
32
RegDst = x
Rw Ra Rb
Rs
5
Rt
ALUOp = Add
(53)
Imm16
MemtoReg = x
2/1/2021
Instruction Fetch Unit
bu
32 32-bit
Registers busB
32
32 Clk
imm16 16
WrEn
Data Memory
Adr
32
0 1
1 32
ALU
Data In 32
Clk
Src = 1
ExtOp = 1
ECE437, S’21 © Vijaykumar and Thottethodi
BEQ Controls
• if (R[rs]-R[rt] == 0) then Equal<- 1; else Equal<- 0 PCSrc = “Beq AND Equal” Instruction<31:0>
Rt Rs Rd Equal MemWr = 0
Rd Rt 1 Mux 0
5 5
Clk
busA 32
RegDst = x RegWr = 0
Rs
5
Rt
ALUOp = Subtract
Imm16 MemtoReg = x
2/1/2021
busW
sB0 32
WrEn
Data Memory
A
1
Data In 32
Clk
32
ALUSr
c=0
ExtOp = x
ECE437, S’21 © Vijaykumar and Thottethodi (54)
Instruction Fetch Unit
Rw Ra Rb
32 32-bit Registers bu
32
0 dr 1
32 Clk
imm16 16
32
Summary of Control Signals
inst Register Transfer
ADD R[rd] <– R[rs] + R[rt]; PC <– PC + 4 ALUsrc = RegB, ALUOp = “add”, RegDst = rd, RegWr, PCSrc = “+4” SUB R[rd] <– R[rs] – R[rt]; PC <– PC + 4 ALUsrc = RegB, ALUOp = “sub”, RegDst = rd, RegWr, PCSrc = “+4” ORi R[rt] <– R[rs] + zero_ext(Imm16); PC <– PC + 4 ALUsrc = Im, Extop = “Z”, ALUOp = “or”, RegDst = rt, RegWr, PCSrc = “+4” LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm16)]; PC <– PC + 4 ALUsrc = Im, Extop = “Sign”, ALUOp = “add”, MemtoReg, RegDst = rt, RegWr, PCSrc = “+4” STORE MEM[R[rs]+sign_ext(Imm16)]<–R[rs];PC<–PC+4 ALUsrc = Im, Extop = “Signn”, ALUOp = “add”, MemWr, PCSrc = “+4” BEQ if ( R[rs] == R[rt] ) then PC <– PC + sign_ext(Imm16)] || 00 else PC <– PC + 4; ALUsrc = RegB, PCSrc = “Beq AND Equal”, ALUOp = “sub” ECE437, S'21 © Vijaykumar and Thottethodi (55) 2/1/2021 Control Logic • Logic must generate appropriate signals for all instructions • Summary slide (previous) – A way of representing the truth table • Till now: Instr  signal, next: transpose – First: Equations in terms of opcodes – Next: Equations in terms of instruction bits ECE437, S'21 © Vijaykumar and Thottethodi (56) 2/1/2021 14 <0:15> Mux <11:15>
<16:20>
<21:25>
<0:15> Mux <11:15>
<16:20>
<21:25>
ALU
ALU
Mux
Mux
Extender
Extender

Controls: Logic equations
<= if (OP == BEQ) then Equal else 0 <=if (OP == “R-type”) then “regB” elseif (OP == BEQ) then regB, else “imm” • PCSrc • ALUsrc • ALUOp • ExtOp • MemWr • MemtoReg • RegWr: • RegDst: ECE437, S'21 © Vijaykumar and Thottethodi (57) <=if(OP==“R-type”)then funct elseif (OP == ORi) then “OR” elseif (OP == BEQ) then “sub” else “add” <= _____________ <= _____________ <= _____________ <= _____________ <= _____________ 2/1/2021 • PCSrc • ALUsrc • ALUOp • ExtOp • MemWr • MemtoReg • RegWr: • RegDst: Controls: Logic equations <= if (OP == BEQ) then EQUAL else 0 <= if (OP == “R-type”) then “regB” elseif (OP == BEQ) then regB, else “imm” <=if(OP==“R-type”)then funct elseif (OP == ORi) then “OR” elseif (OP == BEQ) then “sub” else “add” <= if (OP == ORi) then “zeroext” else “signext” <= (OP == Store) <= (OP == Load) <= if ((OP == Store) || (OP == BEQ)) then 0 else 1 <= if ((OP == Load) || (OP == ORi)) then 0 else 1 ECE437, S'21 © Vijaykumar and Thottethodi (58) 2/1/2021 e func xB op 10 0000 00 0000 add 10 0010 00 0000 sub 00 1101 ori We D 10 0011 lw on’t Car 10 1011 sw e 🙂 00 0100 beq 00 0010 jump RegDst 1 1 0 0 x x x ALUSrc 0 0 1 1 1 0 x MemtoReg 0 0 0 1 x x x RegWrite 1 1 1 1 0 0 0 MemWrite 0 0 0 0 1 0 0 Beq 0 0 0 0 0 1 0 Jump 0 0 0 0 0 0 1 ExtOp x x 0 1 1 x x ALUOp<2:0>
Add
Subtract
Or
Add
Add
Subtract
xxx
Se Appendi
Truth Table summary
31 26 21 16 11 6 0
R-type op I-type op J-type op
rs rt rs rt
rd
target address
shamt immediate
funct
add, sub
ori, lw, sw, beq jump
2/1/2021
ECE437, S’21 © Vijaykumar and Thottethodi
(59)
Local vs Global Control
• One more layer of abstraction
• ALUOp <= if (OP == “R-type”) then funct elseif (OP == ORi) then “OR” elseif (OP == BEQ) then “sub” else “add” 31 26 21 16 11 6 0 R-type op rs rt rd shamt funct ECE437, S'21 © Vijaykumar and Thottethodi (60) 2/1/2021 15 Global Control: Truth Table RegDst ALUSrc MemtoReg RegWrite MemWrite Beq Jump ExtOp ALUg
op
func op 6
ALUOp 3
00 0000
R-type
1
0
0
1
0
0
0
x
“R-type”
ori
0
1
0
1
0
0
0
0
Or
lw
0
1
1
1
0
0
0
1
Add
10 1011
sw
x
1
x
0
1
0
0
1
Add
beq
x
0
x
0
0
1
0
x
Subtract
jump
x
x
x
0
0
0
1
x
xxx
Main Control
ALU Control (Local)
6
ALUg N
ECE437, S’21 © Vijaykumar and Thottethodi
(61)
2/1/2021
00 1101
10 0011
00 0100
00 0010
Encoding
func
op 6 ALUOp 6 ALUg 3
N
• In this exercise, ALUg has to be 2 bits wide to represent: – (1) “R-type” instructions
– “I-type” instructions that require the ALU to perform:
• (2) Or, (3) Add, and (4) Subtract
• To implement the full MIPS ISA, ALUg has to be 3 bits to
represent:
– (1) “R-type” instructions
– “I-type” instructions that require the ALU to perform:
• (2) Or, (3) Add, (4) Subtract, and (5) And (Example: andi)
ECE437, S’21 © Vijaykumar and Thottethodi (62)
2/1/2021
Main Control
ALU Control (Local)
R-type
ori
lw
sw
beq
jump
ALUg (Symbolic)
“R-type”
Or
Add
Add
Subtract
xxx
ALUg<2:0>
1 00
0 10
0 00
0 00
0 01
xxx
Global Control: Truth Table
RegDst
ALUSrc
MemtoReg
RegWrite
MemWrite
Beq
Jump
ExtOp
ALUg<2:0>
6
ALUg N
op
00 0000
R-type
1
0
0
1
0
0
0
x
“R-type”
00 1101
ori
Or
0
1
0
1
0
0
0
0
10 0011
lw
0
1
1
1
0
0
0
1
Add
10 1011
sw
x
1
x
0
1
0
0
1
Add
00 0100
beq
x
0
x
0
0
1
0
x
Subtract
00 0010
(100) (010)
(000) (000)
(63)
(001)
ALUOpr 3
func op 6
jump
x
x
x
0
0
0
1
x
xxx
Main Control
ALU Control (Local)
ECE437, S’21 © Vijaykumar and Thottethodi
2/1/2021
Truth Table for RegWrite
op
• RegWrite = R-type + ori + lw
= !op<5> & !op<4> & !op<3> & !op<2> & !op<1> & !op<0> (R-type)
+ !op<5> & !op<4> & op<3> & op<2> & !op<1> & op<0> (ori)
+ op<5> & !op<4> & !op<3> & !op<2> & op<1> & op<0> (lw)
RegWrite
00 0000
R-type
1
00 1101
ori
1
10 0011
lw
1
10 1011
sw
0
00 0100
beq
0
00 0010
jump
0
op<5>. . op<5>. . op<5>. . op<5>. . op<5>. . op<5>. .
<0>
<0>
<0>
<0>
<0>
op<0>
R-type
ori
lw
sw
beq
jump
ECE437, S’21 © Vijaykumar and Thottethodi
(64)
RegWrite
2/1/2021
16
ALU ALU

PLA implementation
op<5>. . op<5>. . op<5>. . op<5>. . op<5>. . op<5>. .
op<0>
op<0> op<0>
lw sw
op<0> op<0>
beq jump
(65)
op<0>
R-type
ori
RegWrite
ALUSrc RegDst MemtoReg
MemWrite Branch Jump
ExtOp ALUop<2> ALUop<1> ALUop<0>
2/1/2021
ECE437, S’21 © Vijaykumar and Thottethodi
PLA Representation
ECE437, S’21 © Vijaykumar and Thottethodi (66)
2/1/2021
Putting it all together
ALUg RegDst ALUSrc
3
busA 32
func Instr<5:0> 6
ALUOp
(67)
ALU ALUOp
op
6 Instr<31:26>
RegDst
busW
Control
Instruction<31:0>
3
Main Control
:
Rt
PCSrc Clk
Instruction Fetch Unit
Rd
1 Mux 0
Rt Rs Rd Imm16
RegWr 5 5 Rs5 Rt Rw Ra Rb
Zero MemWr 32
MemtoReg
2/1/2021
32 32-bit Registers
32 Clk
imm16 16 Instr<15:0>
busB 32
ExtOp
0
ECE437, S’21 © Vijaykumar and Thottethodi
0 dr 1
WrEn
Data Memory
A
32
1
Data In 32
Clk Src
32
ALU
Setup time
Cycletime
Comb. Logic
• What should the clock period be?
– Enough to compute the next state values
• Propagation clk-to-Q (new state) • Comb. Logic delay
• Setup requirements
ECE437, S’21 © Vijaykumar and Thottethodi (68)
2/1/2021
Storage
17
<0:15> Mux <11:15>
<16:20>
<21:25>
ALU
Mux
Extender

Clk
PC Old Value
Rs, Rt, Rd, Op, Func
ALUOp RegWr busA, B busW
Timing: R-type inst
Clk-to-Q
Old Value
Rd Rs Rt RegWr 5 5 5
ALUOp
ALU Delay New Value
Register Write Occurs Here
Result 32
2/1/2021
New Value
Old Value
Old Value
Old Value
Old Value
Instruction Memory Access Time New Value
busW 32
busA 32
busB 32
Rw Ra Rb
32 32-bit Registers
Delay through Control Logic New Value
New Value
Register File Access Time
New Value
Clk
ECE437, S’21 © Vijaykumar and Thottethodi
(69)
One-minute quiz
• What is the ALU operation for loads and stores?
• Previous quiz
– During execution of I-format instructions, what is done to the immediate field before being fed to the ALU?
• Zero- or sign-extension
ECE437, S’21 © Vijaykumar and Thottethodi (70)
2/1/2021
“lw” Instruction
• Longer critical path
– lower bound on cycletime
Ideal Instruction Memory
Instruction Address
Data 32 Address
Rd
32 R
Instruction
Critical Path (Load Operation) =
PC’s Clk-to-Q +
Instruction Memory’s Access Time + Register File’s Access Time +
ALU to Perform a 32-bit Add + Data Memory Access Time +
Setup Time for Register File Write + Clock Skew
Rs
55
Rt
5
Imm
16 A
w Ra Rb
32 32-bit Registers
32 B
Ideal
Data Memory
Data In
Clk
32
Clk
ECE437, S’21 © Vijaykumar and Thottethodi (71)
2/1/2021
Clk
PC Old Value
Rs, Rt, Rd, Op, Func
ALUOp ExtOp ALUSrc MemtoReg RegWr
busA busB Address
busW
Worst case timing
Clk-to-Q New Value
Old Value
Old Value
Delay through Extender & Mux
Instruction Memory Access Time New Value
Old Value
Old Value
Old Value
Old Value
Delay through Control Logic
New Value
New Value
New Value
New Value New Value
Register Write Occurs
Old Value
Register File Access Time
New Value
Old Value
Old Value
Old Value
Data Memory Access Time
New Value
ALU Delay
New Value
New
ECE437, S’21 © Vijaykumar and Thottethodi (72)
2/1/2021
18
Clk
Next Address PC
ALU
ALU

What’s wrong with our processor?
Arithmetic & Logical PC Inst Memory
Load
PC Inst Memory
Store
PC Inst Memory
Branch
PC Inst Memory
Reg File mux ALU Reg File mux ALU
Critical Path
Reg File mux ALU Reg File cmp mux
mux setup Data Mem Data Mem
muxsetup
• LOOOOONGCycleTime
• ALLinstructionstakeasmuchtimeastheslowest
• RealmemoryMUCHslowerthanouridealizedmemory
– Today some 150ns (memory+bus+control) vs. 0.5ns CPU clock – cannot finish in one (short) cycle
ECE437, S’21 © Vijaykumar and Thottethodi (73)
2/1/2021
Notion of Performance
• We need to understand “performance” better because in the rest of course we will improve the performance of our processor
• To understand performance we will go to chapter 1b (1.4 onwards) before returning to chapter 4b (4.5 onwards)
ECE437, S’21 © Vijaykumar and Thottethodi (74)
2/1/2021
19