Computer Architecture ELEC3441
Lecture 4 – Single Cycle Processor
Dr. Hayden Kwok-Hay So
Department of Electrical and Electronic Engineering
Overview
n First implementation the RISC-V ISA in this course
• Morevariationstocome… n Single cycle processor:
• Eachinstructiontakes1cycletocomplete
n Idealized memory • Instantaneousread • Singlecyclewrite
n Implements base RV32
HKU EEE ENGG3441 – HS 2
Full RISCV1Stage Datapath (HW1)
PC
PC+4
Inst
JumpReg TargGen
BrJmp
TargGen Gen
br_eq? br_lt? br_ltu?
PC
nch
pc+4 br/jmp
jalr
+4
ir[31], ir[7],
IType Sign Extend
Bra Cond
Instruction Mem
BType Sign Extend
ir[30:25], ir[11:8]]
ir[31:20]
0 Op2S
el
Op1Sel
ALU
AluFun
ir[24:20]
ir[19:15]
rs2 rs1
wa en
Reg File
Reg File
Decoder
Control Signals
Execute Stage
cpr_en
tohost
testrig_tohost
rdata
addr Data Mem wdata
HKU EEE
ENGG3441 – HS
3
RV32I RISC-V v2.1 specification
HKU EEE ENGG3441 – HS 4
addr
13
wd
addr
mem_rw mem_val
data
val
control status registers
wb_sel
ir[11:7]
rf_wen
data
pc_sel
Hardware Elements
n Combinational circuits
OpSelect
– Add, Sub, …
– And, Or, Xor, Not, … – GT, LT, EQ, Zero, …
•
A0
Mux, Decoder, ALU, …
Sel
lg(n)
O0 . O1
On-1
A
1 . Mux
O
A
.
A
ALU B
Result Comp?
An-1
.
lg(n)
n Synchronous state elements
•
Flipflop, Register, Register file, SRAM, DRAM
D
En En
Clk
ff
Clk
D QQ
Edge-triggered: Data is sampled at the rising edge
HKU EEE ENGG3441 – HS
5
Register
register
D0 D1 D2 … Dn-1
D
Q
En ff ff Clk
Q0 Q1
ff … Q2 …
ff Qn-1
D Q en
n An n-bit register can be constructed by combining n DFFs in parallel
n Each DFF responsible for read/write of 1 bit of the input
n Shared control signal: • Clock, reset, enable, …
HKU EEE ENGG3441 – HS 6
Register File
n Reads are combinational
• rd=regfile(ra) in the same cycle
n Writes take place at rising the clock edge • Write only take place if WE=1 at clock edge
n Read address (ra) and write address (wa) choose which register to read/write
n RISCV needs a register file with 2 read ports and 1 write port
n What happen in these cases? • ra1=ra2
• ra1=wa1
Clock WE
ReadAddr1 ReadData1
ra1 ra2
wa wd
we
rd1 Register rd2
file 2R+1W
ReadAddr2
WriteAddr WriteData
ReadData2
HKU EEE
ENGG3441 – HS
7
Register File Implementation
wa clk wd rd1 rd2
n 2R + 1W, 32 registers, each 32 bits wide
n Decoder select 1 out of 32 register depending on wa, ra1, ra2
n On writes:
• Only 1 of the 32 register has WE=1
n On reads:
5 ra2 5 32 3232 5
ra1
reg 0 reg 1
reg 31
we
HKU EEE
ENGG3441 – HS 8
• •
Only 1 of the 32 register may output to rd{1,2} bus
The same register may output to both rd1 and rd2 bus (e.g. rs1 = rs2)
…
…
…
Decoder
A Simple Memory Model
WriteEnable
Clock
32
Reads and writes are always completed in one cycle
• a Read can be done any time (i.e. combinational) • a Write is performed at the rising clock edge
if it is enabled
⟹ the write address and data must be stable at the clock edge
Later in the course we will present a more realistic model of memory
HKU EEE ENGG3441 – HS 9
Address
WriteData
32
32
ReadData
MAGIC RAM
Instruction Execution
Execution of a RISC-V instruction involves:
HKU EEE
1. instruction fetch
2. decode and register fetch
3. ALU operation
4. memory operation (optional)
5. write back to register file (optional)
+ the computation of the next instruction address
ENGG3441 – HS 10
Datapath: Reg-Reg ALU Instructions
0x4
Add
RegWriteEn clk
RegWrite Timing?
AL
rdß(rs1) func (rs2)
Inst<19:15> Inst<24:20>
addr
inst
Inst. Memory
PC
clk
Inst<11:7>
<31:25,14:12>
we ra1
ra2
wa rd1
wd rd2 GPRs
31
25 24
OpCode
20 19
15 14
12 11
7 6
0
ALU Control
U
funct7
rs2
rs1
funct3
rd
opcode
HKU EEE
0000000 src2 src1 0000000 src2 src1 0000000 src2 src1 0100000 src2 src1
ADD/SLT/SLTU dest OP AND/OR/XOR dest OP SLL/SRL dest OP SUB/SRA dest OP
11
755357
Datapath: Reg-Imm ALU Instructions
0x4
Add
RegWriteEn clk
31
15 14
12 11
7 6
0
HKU EEE
ENGG3441 – HS
12
PC
clk
inst<19:15>
inst<14:12>
OpCode
20 19
ALU
we ra1
ra2
rd1 wa
wd rd2 GPRs
addr
inst
Inst. Memory
inst<11:7>
inst<31:20>
Sign Ext
12 5 3 5 7
I-immediate[11:0] I-immediate[11:0]
src ADDI/SLTI[U] dest src ANDI/ORI/XORI dest
OP-IMM OP-IMM
ALU Control
rdß(rs1) op imm
imm[11:0]
rs1
funct3
rd
opcode
Conflicts in Merging Datapath
0x4
PC
clk
Add
RegWriteEn clk
Introduce muxes
<19:15>
<24:20>
<11:7>
<31:25,14:12>
we ra1
ra2
rd1 wa
wd rd2 GPRs
addr
inst
Inst. Memory
ALU
<31:20>
Sign Ext
ALU Control
<14:12>
OpCode
31
27 26 25 24
imm[11:5] rs2
20 19
15 14 12 11
7 6
0
funct7
rs2
rs1
funct3
rd
opcode
imm[11:0]
rs1
funct3
rd
opcode
HKU EEE
ENGG3441 – HS
13
rs1
funct3 imm[4:0]
opcode
R-type I-type S-type
ALU Op2 Select
If OPCODE==“OP” then op2sel = ‘1’ else ‘0’
0x4
Add
RegWriteEn clk
<19:15>
<24:20>
<11:7>
we ra1
ra2
rd1 wa
wd rd2 GPRs
addr
inst
Inst. Memory
PC
clk
ALU
<31:20>
Sign Ext
ALU Control
31
27 26 25 24
imm[11:5]
20 19
15 14 12 11
0
OpCode
op2Sel Reg / Imm 7 6
funct7
rs2
rs1
funct3
rd
opcode
imm[11:0]
rs1
funct3
rd
opcode
HKU EEE
ENGG3441 – HS
14
rs2
rs1
funct3 imm[4:0]
opcode
R-type I-type S-type
Quick Quiz
If OPCODE==“OP” then op2sel = ‘1’ else ‘0’
n How do you implement op2sel in hardware?
1 0
1 0
op2sel
1
opcode OP
opcode OP
op2sel
2
=?
How do you implement this?
=?
=?
HKU EEE
ENGG3441 – HS 15
Determining ALU functions
imm[11:5]
rs2 rs1 010
imm[4:0]
0100011 SW rs1,rs2,imm ADDI rd,rs1,imm
imm[11:0]
imm[11:0]
rs1
rs1
000
110
rd
rd
0010011
0010011
imm[11:0]
imm[11:0]
rs1
rs1
010
111
rd
rd
0010011
0010011
000000
shamt
rs1
001
rd
0010011
000000
shamt
rs1
101
rd
0010011
010000
shamt
rs1
101
rd
0010011
0000000
rs2
rs1
000
rd
0110011
0100000
rs2
rs1
000
rd
0110011
0000000
rs2
rs1
001
rd
0110011
ORI rd,rs1,imm
SLTI rd,rs1,imm
ANDI rd,rs1,imm SLLI rd,rs1,shamt SRLI rd,rs1,shamt SRAI rd,rs1,shamt ADD rd,rs1,rs2 SUB rd,rs1,rs2
SLL rd,rs1,rs2
n All basic integer R-R instructions have opcode = OP (“0110011”)
• only funct3 and funct7 are needed to determine needed ALU function:
• E.g. 000èAdd, 001èShiftLeft, 100èXOR,0100000èSub, …
n Immediate instructions requires the same ALU function, but has slightly different encoding
• ADDI is same as ADD, except no need to check for Sub in funct7
• opcode = OP-IMM (“0010011”)
n Need opcode to help determine ALU function
• More cases like these come up later…
HKU EEE ENGG3441 – HS 16
ALU Instructions Datapath
0x4
Add
RegWriteEn clk
<24:20>
<11:7>
we rs1
rs2
rd1 wa
wd rd2 GPRs
addr
inst
Inst. Memory
PC
clk
<19:15>
<30,14:12,6:0>
ALU
<31:20>
Sign Ext
ALU Control
OpCode
Op2Sel Reg / Imm
31
27 26 25 24
20 19
15 14 12 11
7 6
0
funct7
rs2
rs1
funct3
rd
opcode
imm[11:0]
rs1
funct3
rd
opcode
HKU EEE
ENGG3441 – HS
17
imm[11:5] rs2
rs1
funct3 imm[4:0]
opcode
R-type I-type S-type
I
Load Instructions
offset[11:0] base width dest LOAD
n Use ALU for address calculation
n Mux to select data for regfile: mem or ALU
imm[11:0]
rs1
f3
rd
opcode
Load: (dest)ßM[(base) + offset]
0x4
PC
clk
RegWriteEn clk
MemWrite
clk
WBSel ALU / Mem
Add
base
<11:7>
offset
OpCode
ALU
we rs1
rs2
rd1 wa
wd rd2 GPRs
addr
inst
Inst. Memory
Sign Ext
we addr
Data Memory
rdata
wdata
HKU EEE
ENGG3441 – HS
18
ALU Control
Op2Sel
Store Instructions
S
imm[11:5]
rs2
rs1
f3
imm[4:0]
opcode
Store: M[(base) + offset]ß(src)
offset[11:5] src base width offset[4:0] STORE
n Also use ALU for address calculation
n No need to write back to regfile
n Need to tell memory it is a write è Set MemWrite to ‘1’
0x4
PC
clk
RegWriteEn clk
MemWrite
clk
WBSel ALU / Mem
Add
base
offset
ALU
we rs1
rs2
rd1 wa
wd rd2 GPRs
addr
inst
Inst. Memory
we addr
Data Memory
rdata
Sign Ext
wdata
HKU EEE
ENGG3441 – HS
19
OpCode
Op2Sel
ALU Control
RISC-V Conditional Branches
imm[10:5]
rs2
rs1
funct3
imm[4:1]
opcode
SB
offset[12,10:5] src2 src1 BEQ/BNE offset[11,4:0] BRANCH BLT[U]
BGE[U}
n Requires:
if (rs1 BR_OP rs2) then jump to PC + branch_imm
• 1. Logic to compare register values (rs1 and rs2)
• 2. Datapath to calculate branch target address relative to PC
n Current implementation: dedicated logic for both 1 and 2
• Dedicated comparison logic (=, <, [≠, ≥])
• Dedicated adder for jump target calculation
n May use ALU for (2) above • Performance tradeoff...
HKU EEE
ENGG3441 - HS 20
imm[12]
imm[11]
Conditional Branches (BEQ/BNE/BLT/BGE/BLTU/BGEU)
PCSel
br pc+4
Add
RegWrEn
clk
MemWrite
WBSel
0x4
Add
Bcomp?
ALU
clk
Br Logic
we rs1
rs2
rd1 wa
wd rd2 GPRs
addr
inst
Inst. Memory
we addr
rdata Data
Memory wdata
PC
clk
Branch Imm
HKU EEE
21
OpCode
Op2Sel
ENGG3441 - HS
ALU Control
RISC-V Unconditional JAL
PCSel
brjmp pc+4
Add
UB
RegWrEn
clk
offset[20:1]
dest JAL
imm[10:1]
imm[19:12]
rd
opcode
PC
clk
jump to PC + j_imm; rdßPC+4
MemWrite
WBSel
0x4
Add
Bcomp?
ALU
clk
Br Logic
we rs1
rs2
rd1 wa
wd rd2 GPRs
addr
inst
Inst. Memory
Branch Imm
Jump Imm
HKU EEE
OpCode
Op2Sel
ENGG3441 - HS
22
ALU Control
we addr
rdata Data
Memory wdata
JALR
I
imm[11:0]
rs1
f3
offset[11:0] base
000
dest
RegWrEn
clk
JALR
PCSel
brjmp
jmpreg
pc+4
Add
MemWrite
WBSel
0x4
Add
Bcomp?
ALU
clk
we rs1
rs2
rd1 wa
wd rd2 GPRs
rd
opcode
Br Logic
jump to imm + (rs1); rdßPC+4
addr
inst
Inst. Memory
PC
clk
offset
Sign Ext
HKU EEE
OpCode
Op2Sel
ALU Control
ENGG3441 - HS
23
we addr
rdata Data
Memory wdata
Hardwired Control is pure Combinational Logic
op code Bcomp?
Op2Sel AluFunc MemWrite
WBSel RegWriteEn PCSel
combinational logic
n Decoding instruction determines the setting of various muxes and ALU function
n Simple decoding helps to make faster hardware
HKU EEE ENGG3441 - HS 24
imm[20]
Imm[11]
Hardwired Control Table (Excerpt)
Instruction
Op2Sel
AluFunc
WBSel
RegWriteEn
MemWrite
PCSel
ADD
RS2
ADD
ALU
T
F
PC+4
SUB
RS2
SUB
ALU
T
F
PC+4
ADDI
IMI
ADD
ALU
T
F
PC+4
SLL
RS2
SLL
ALU
T
F
PC+4
LW
IMI
ADD
MEM
T
F
PC+4
SW
IMS
ADD
X
F
T
PC+4
BEQ
IMB
X
X
F
F
PC+4/BA
JAL
IMJ
X
PC+4
T
F
JA
JALR
IMI
X
PC+4
T
F
JRA
• Op2Sel: rs2, {I,B,J}-type immediate IM{I, B, J}
• AluFunc: Add, Sub, Shift, XOR, etc
• WBSel: what values to write to rd
HKU EEE ENGG3441 - HS 25
Single-Cycle Hardwired Control
We will assume clock period is sufficiently long for all of the following steps to be “completed”:
1. Instruction fetch
2. Decode and register fetch
3. ALU operation
4. Data fetch if required
5. Register write-back setup time
⟹ tC > tIFetch + tRFetch + tALU+ tDMem+ tRWB At the rising edge of the following clock, the PC,
register file and memory are updated
HKU EEE ENGG3441 – HS 26
Full RISCV1Stage Datapath (HW1)
PC
PC+4
Inst
JumpReg TargGen
BrJmp
TargGen Gen
br_eq? br_lt? br_ltu?
PC
nch
pc+4 br/jmp
jalr
+4
ir[31], ir[7],
IType Sign Extend
Bra Cond
Instruction Mem
BType Sign Extend
ir[30:25], ir[11:8]]
ir[31:20]
0 Op2S
el
Op1Sel
ALU
AluFun
ir[24:20]
ir[19:15]
rs2 rs1
wa en
Reg File
Reg File
Decoder
Control Signals
Execute Stage
cpr_en
tohost
testrig_tohost
rdata
addr Data Mem wdata
HKU EEE
ENGG3441 – HS
27
Acknowledgements
n These slides contain material developed and copyright by:
• Arvind(MIT)
• KrsteAsanovic(MIT/UCB) • JoelEmer(Intel/MIT)
• JamesHoe(CMU)
• JohnKubiatowicz(UCB)
• DavidPatterson(UCB)
n MIT material derived from course 6.823
n UCB material derived from course CS152,
CS252
HKU EEE ENGG3441 – HS 28
addr
13
wd
addr
mem_rw mem_val
data
val
control status registers
wb_sel
ir[11:7]
rf_wen
data
pc_sel