CMPEN
Lecture
13
331
Chapter 3 — Arithmetic for Computers — 2
Review questions solved
Introduction
• CPUperformancefactors
• Instruction count
• Determined by ISA and compiler
• CPI and Cycle time
• Determined by CPU hardware
• WewillexaminetwoMIPSimplementations • A simplified version
• A more realistic pipelined version
• Simplesubset,showsmostaspects
• Memoryreference:lw,sw
• Arithmetic/logical:add,sub,and,or,slt • Controltransfer:beq,j
Chapter 4 — The Processor — 3
§4.1 Introduction
•
Instruction Execution
Generic implementation
• use the program counter (PC) to supply
the instruction address and fetch the instruction from memory (and update the PC)
• decode the instruction (and read registers)
Exec
Fetch PC = PC+4
Decode
• •
• execute the instruction
All instructions (except j) use the ALU after reading the registers
Depending on instruction class
• Use ALU to calculate
• Arithmetic result
• Memory address for load/store • Branch target address
• Access data memory for load/store
• PC¬targetaddressorPC+4
Chapter 4 — The Processor — 4
CPU Overview
n Can’t just join wires together
n Use multiplexers
Chapter 4 — The Processor — 5
Chapter 4 — The Processor — 6
Control
Adding the Control
• Selecting the operations to perform (ALU, Register File and Memory read/write)
• Controlling the flow of data (multiplexor inputs)
rs
rt
rd
shamt
funct
rs
rt
address offset
q Observations ● opfieldalways
in bits 31-26
31 25 20 15 10 5 0
R-type: op
31 25 20 15 0
I-Type:
31 25 0
J-type: op
op
● addressofregisters
to be read are
always specified by the
rs field (bits 25-21) and rt field (bits 20-16);; for lw and sw rs is the base register
● addressofregistertobewrittenisinoneoftwoplaces–inrt(bits20- 16) for lw;; in rd (bits 15-11) for R-type instructions
● offsetforbeq,lw,andswalwaysinbits15-0
target address
Logic Design Basics
• Information encoded in binary
• Lowvoltage=0,Highvoltage=1
• Onewireperbit
• Multi-bitdataencodedonmulti-wirebuses
• Combinational element
• Operateondata
• Outputisafunctionofinput
• State (sequential) elements • Storeinformation
Chapter 4 — The Processor — 8
§4.2 Logic Design Conventions
Combinational Elements
n AND-gate n Adder nY=A&B nY=A+B
A
Y
+ B
A B
Y
n Multiplexer
n Y = S ? I1 : I0
I0 Mu Y I1 x
S
n Arithmetic/Logic Unit n Y = F(A, B)
A
ALU Y
B
F
Chapter 4 — The Processor — 9
•
Sequential Elements
Register: stores data in a circuit
• Usesaclocksignaltodeterminewhentoupdatethestored value
• Edge-triggered:updatewhenClkchangesfrom0to1
DQ Clk
Clk D Q
Chapter 4 — The Processor — 10
•
Sequential Elements
Register with write control
• Onlyupdatesonclockedgewhenwritecontrolinputis1 • Usedwhenstoredvalueisrequiredlater
DQ
Clk
Write D Q
Write Clk
Chapter 4 — The Processor — 11
Clocking Methodology
• The clocking methodology defines when data in a state element is valid and stable relative to the clock
• State elements – a memory element such as a register
• Edge-triggered–allstatechangesoccuronaclockedge
• Combinational logic transforms data during clock cycles
• Betweenclockedges
• Inputfromstateelements,outputtostateelement • Longestdelaydeterminesclockperiod
Chapter 4 — The Processor — 12
CMPEN
Lecture
14
331
Chapter 4
Building a Datapath
• Datapath
• Elementsthatprocessdataandaddresses in the CPU
• Registers, ALUs, mux’s, memories, …
• We will build a MIPS datapath incrementally
• Refiningtheoverviewdesign
Chapter 4 — The Processor — 15
§4.3 Building a Datapath
Instruction Fetch
• Fetching instructions involves
• ReadingtheinstructionfromtheInstructionMemory
• UpdatingthePCvaluetobetheaddressofthenext (sequential) instruction
● PC is updated every clock cycle, so it does not need an explicit write control signal just a clock signal
● Reading from the Instruction Memory is a combinational activity, so it doesn’t need an explicit read control signal
32-bit register
Chapter 4 — The Processor — 16
Increment by 4 for next instruction
Basic Architecture
• To carry out each instruction, the control unit must:
• Fetch – Read instruction from inst. mem.
• Decode – Determine the operation and operands of the instruction
• Execute – Carry out the instruction’s operation using the datapath
–
Control Unit
Init IR=I[PC] PC=0 Fetch PC=PC+1
Decode
Execute
Controller
Instruction memory I
0: RF[0]=D[0]
1: RF[1]=D[1]
2: RF[2]=RF[0]+RF[1] 3: D[9]=RF[2]
PC
0->1
Instruction memory I
0: RF[0]=D[0]
1: RF[1]=D[1]
2: RF[2]=RF[0]+RF[1] 3: D[9]=RF[2]
Instruction memory I
0: RF[0]=D[0]
1: RF[1]=D[1]
2: RF[2]=RF[0]+RF[1] 3: D[9]=RF[2]
Data memory D
D[0]: 99
IR RF[0]=D[0]
PC
1
IR RF[0]=D[0]
n-bit 2×1
Control unit
Control unit
PC
1
IR RF[0]=D[0]
Controller
Controller
“load”
Control unit
Controller
Register file RF
R[0]: ??à99
Datapath
ALU
(a) Fetch
(b) Decode
Execute (c)
17
•
To carry out each instruction, the control unit must:
• Fetch – Read instruction from inst. mem.
• Decode – Determine the operation and operands of the instruction
• Execute – Carry out the instruction’s operation using the datapath
Basic Architecture
–
Control Unit
Init IR=I[PC] PC=0 Fetch PC=PC+1
Decode
Execute
Controller
Instruction memory I
0: RF[0]=D[0]
1: RF[1]=D[1]
2: RF[2]=RF[0]+RF[1] 3: D[9]=RF[2]
PC
1->2
Instruction memory I
0: RF[0]=D[0]
1: RF[1]=D[1]
2: RF[2]=RF[0]+RF[1] 3: D[9]=RF[2]
Instruction memory I
0: RF[0]=D[0]
1: RF[1]=D[1]
2: RF[2]=RF[0]+RF[1] 3: D[9]=RF[2]
Data memory D
D[1]: 102
IR RF[1]=D[1]
PC
2
IR RF[1]=D[1]
n-bit 2×1
Control unit
Control unit
PC
2
IR RF[1]=D[1]
Controller
Controller
“load”
Control unit
Controller
Register file RF
R[1]: ??à102
Datapath
ALU
18
Execute (c)
(a) Fetch
(b) Decode
Basic Architecture
–
Control Unit
Init IR=I[PC] PC=0 Fetch PC=PC+1
Decode
Execute
Controller
•
To carry out each instruction, the control unit must:
• Fetch – Read instruction from inst. mem.
• Decode – Determine the operation and operands of the instruction
• Execute – Carry out the instruction’s operation using the datapath
Instruction memory I
0: RF[0]=D[0]
1: RF[1]=D[1]
2: RF[2]=RF[0]+RF[1] 3: D[9]=RF[2]
Instruction memory I
0: RF[0]=D[0]
1: RF[1]=D[1]
2: RF[2]=RF[0]+RF[1] 3: D[9]=RF[2]
Instruction memory I
0: RF[0]=D[0]
1: RF[1]=D[1]
2: RF[2]=RF[0]+RF[1] 3: D[9]=RF[2]
Data memory D
PC
2->3
IR RF[2]=RF[0]+RF[1]
PC
3
Controller
PC
3
Control unit
Controller
Control unit
IR RF[2]=RF[0]+RF[1]
IR RF[2]=RF[0]+RF[1]
Controller
“ALU (add)”
Control unit
(b) Decode
99 102
19
Datapath
Execute (c)
(a) Fetch
201
n-bit 2×1
Register file RF
R[2]: ??à201
ALU
•
To carry out each instruction, the control unit must:
• Fetch – Read instruction from inst. mem.
• Decode – Determine the operation and operands of the instruction
• Execute – Carry out the instruction’s operation using the datapath
Basic Architecture
–
Control Unit
Init IR=I[PC] PC=0 Fetch PC=PC+1
Decode
Execute
Controller
Instruction memory I
0: RF[0]=D[0]
1: RF[1]=D[1]
2: RF[2]=RF[0]+RF[1] 3: D[9]=RF[2]
PC
3->4
Instruction memory I
0: RF[0]=D[0]
1: RF[1]=D[1]
2: RF[2]=RF[0]+RF[1] 3: D[9]=RF[2]
Instruction memory I
0: RF[0]=D[0]
1: RF[1]=D[1]
2: RF[2]=RF[0]+RF[1] 3: D[9]=RF[2]
Data memory D
D[9]=?? à 201
IR D[9]=RF[2]
PC
4
IR D[9]=RF[2]
n-bit 2×1
Controller
PC
4
IR D[9]=RF[2]
Controller
“store”
Control unit
Control unit
Controller
Register file RF
R[2]: 201
Control unit
Datapath
ALU
20
Execute (c)
(a) Fetch
(b) Decode
Decoding Instructions
• Decoding instructions involves
• sendingthefetchedinstruction’sopcodeandfunction field bits to the control unit
Fetch PC = PC+4
Exec Decode
Control Unit
and
Instruction
• reading two values from the Register File
– Register File addresses are contained in the instruction
Read Addr 1 Register Read
Read Addr 2
File
Write Addr Write Data
Data 1
Read Data 2
Fixed Program
module program_counter (
Counter
input input input output
update, clk, rst,
reg [31:0] pc
4
New PC
PC
);;
parameter INCREMENT_AMOUNT = 32’d4;;
always @(posedge clk or posedge rst) begin
if (rst)
pc <= 0;;
else if (update)
pc <= pc + INCREMENT_AMOUNT;;
end
Update
+
Variable Program
module program_counter (
Counter
input input input input output
);;
update,
[31:0] instruction_size,
clk,
rst, reg [31:0] pc
New PC
Update PC
Instr_Size
always @(posedge clk or posedge rst) begin
if (rst)
pc <= 0;;
else if (update)
pc <= pc + instruction_size;;
end
+
R
• R format operations (add, sub, slt, and, or) 31 25 20 15 10 5 0
R-type: op
• performoperation(opandfunct)onvaluesinrsandrt
• storetheresultbackintotheRegisterFile(intolocationrd)
• NotethatRegisterFileisnotwritteneverycycle(e.g.sw), so we need an explicit write control signal for the Register File
-
Format Instructions
rs
rt
rd
shamt funct
Chapter 4 — The Processor — 24
Executing Load and Store Operations
• Load and store operations involves
• compute memory address by adding the base register (read from the Register File during decode) to the 16-bit signed-extended offset field in the instruction
• store value (read from the Register File during decode) written to the Data Memory
• load value, read from the Data Memory, written to the Register
File
Instruction
RegWrite
ALU control
MemWrite
overflow zero
ALU
Read Addr 1 Register Read
Read Addr 2
File
Write Addr Write Data
Sign 16 Extend
MemRead
Data 1
Read Data 2
Address
Data
Memory ReadData
Write Data
32
Chapter 4 — The Processor — 26
Data path
Branch Instructions
• Read register operands
• Compare operands
• UseALU,subtractandcheckZerooutput
• Calculate target address
• Sign-extenddisplacement
• Shiftleft2places(worddisplacement)
• AddtoPC+4
• Already calculated by instruction fetch
Chapter 4 — The Processor — 27
Branch Instructions
Add 4
PC
Chapter 4 — The Processor — 28
Fixed Program Counter with Offset Branching
module program_counter (
input input input input input output
update,
branch, [15:0]branch_offset,
clk,
rst, reg [31:0] pc
Branch Offset
Branch?
4
);;
parameter INCREMENT_AMOUNT = 32’d4;;
always @(posedge clk or posedge rst) begin
if (rst)
pc <= 0;;
else if (update) if (branch)
pc <= pc + {16’d0,branch_offset};; else
pc <= pc + INCREMENT_AMOUNT;;
New PC Update PC
end
+
Composing the Elements
• First-cut data path does an instruction in one clock cycle • Eachdatapathelementcanonlydoonefunctionatatime • Hence,weneedseparateinstructionanddatamemories
• Use multiplexers where alternate data sources are used for different instructions
Chapter 4 — The Processor — 30
Chapter 4 — The Processor — 31
R
-
Type/Load/Store Datapath
• •
Jump uses word address
Update PC with concatenation of • Top4bitsofoldPC
• 26-bitjumpaddress
• 00
Need an extra control signal decoded from opcode
•
Jump
Chapter 4 — The Processor — 32
Implementing Jumps
op
address
31:26 25:0
Chapter 4 — The Processor — 33
Datapath With Jumps Added
ALU Control
• ALU used for
• Load/Store:F=add
• Branch:F=subtract
• R-type:Fdependsonfunctfield
ALU control
Function
0000
AND
0001
OR
0010
add
0110
subtract
0111
set-on-less-than
1100
NOR
Chapter 4 — The Processor — 34
§4.4 A Simple Implementation Scheme
ALU Control
• Assume 2-bit ALUOp derived from opcode • CombinationallogicderivesALUcontrol
opcode
ALUOp
Operation
funct
ALU function
ALU control
lw
00
load word
XXXXXX
add
0010
sw
00
store word
XXXXXX
add
0010
beq
01
branch equal
XXXXXX
subtract
0110
R-type
10
add
100000
add
0010
subtract
100010
subtract
0110
AND
100100
AND
0000
OR
100101
OR
0001
set-on-less-than
101010
set-on-less-than
0111
Chapter 4 — The Processor — 35
MIPS ALU in Verilog
•The ALU has 7 ports.
A Verilog behavioral definition of a MIPS ALU
Pseudo code
module MIPSALU (ALUctl, A, B, ALUOut, Zero);
input [3:0] ALUctl;
input [31:0] A,B;
output reg [31:0] ALUOut;
output Zero;
assign Zero = (ALUOut==0); //Zero is true if ALUOut is 0 always @(ALUctl, A, B)
begin //reevaluate if these change case (ALUctl)
0: ALUOut <= A & B;
1: ALUOut <= A | B;
2: ALUOut <= A + B;
6: ALUOut <= A - B;
7: ALUOut <= A < B ? 1 : 0;
12: ALUOut <= ~(A | B); //result is nor default: ALUOut <= 0;
endcase end
endmodule
The MIPS ALU control
This is a combinational control logic. (Pseudo code) module ALUControl (ALUOp, FuncCode, ALUCtl); input [1:0] ALUOp;
input [5:0] FuncCode;
output [3:0] reg ALUCtl; always @(*)
case (FuncCode)
32: ALUCtl <=2; // add 34: ALUCtl <=6; //subtract 36: ALUCtl <=0; // and 37: ALUCtl <=1; // or
39: ALUCtl <=12; // nor 42: ALUCtl <=7; // slt
default: ALUCtl <=15; // should not happen endcase
endmodule
The Main Control Unit
• Control signals derived from instruction
0
rs
rt
rd
shamt
funct
R-type
Load/ Store
Branch
31:26
31:26
31:26
opcode
25:21 20:16 25:21 20:16 25:21 20:16
15:11
10:6 5:0 15:0
15:0
35 or 43
rs
rt
address
4
rs
rt
address
Chapter 4 — The Processor — 39
always read
read, except for load
write for R-type and load
sign-extend and add