Memory Allocation III CSE 351 Autumn 2016
Roadmap
1
car *c = malloc(sizeof(car));
c->miles = 100;
c->gals = 17;
float mpg = get_mpg(c);
free(c);
Car c = new Car();
c.setMiles(100);
c.setGals(17);
float mpg =
c.getMPG();
Java:
C:
Assembly language:
Machine code:
0111010000011000
100011010000010000000010
1000100111000010
110000011111101000011111
Computer system:
OS:
Memory & data
Arrays & structs
Integers & floats
RISC V assembly
Procedures & stacks
Executables
Memory & caches
Processor Pipeline
Performance
Parallelism
CMPT 295
L27: Datapath
1
RISC-V CPU Datapath, Control Intro
CMPT 295
L27: Datapath
Podcast link: https://gimletmedia.com/tags/6dug/super-tech-support
2
Putting it together!
3
High level languages (ex. C) become machine language through compilation, assembly, and linking.
Registers, clock circuits, gates, and other logic devices are the fundamental building blocks of digital decision- making
CPU
CMPT 295
L27: Datapath
3
Today’s goal:
4
Create a “circuit” of logic elements that, when given an assembly instruction, perform the action the instruction describes
A bunch of logic stuff
010111010100010101..1
Some RISC-V Instruction in binary (let’s say the instruction is:
addi t0 x0 6)
Register t0 now has the value 6! The addi was performed!
CMPT 295
L27: Datapath
4
Today’s goal:
5
Create a “circuit” of logic elements that, when given an assembly instruction, perform the action the instruction describes
A bunch of logic stuff
010111010100010101..1
Some RISC-V Instruction in binary (let’s say the instruction is:
addi t0 x0 6)
Register t0 now has the value 6! The addi was performed!
We’ll call the logic stuff your CPU: Central Processing Unit
CMPT 295
L27: Datapath
5
Agenda
What’s a CPU?
Building from what we know
Our CPU
Processor Design Principles
6
CMPT 295
L27: Datapath
Your CPU in two parts
Central Processing Unit (CPU):
Datapath: contains the hardware necessary to perform operations required by the processor
Reacts to what the controller tells it! (ie. “I was told to do an add, so I”ll feed these arguments through an adder)
Control: decides what each piece of the datapath should do
What operation am I performing? Do I need to get info from memory? Should I write to a register? Which register?
Has to make decisions based on the input instruction only!
7
CMPT 295
L27: Datapath
Your CPU in two parts
8
010111010100010101..1
Some RISC-V Instruction in binary (let’s say the instruction is:
addi t0 x0 6)
Control
What’s our operation? What’s rs1/rs2/rd/imm? Is this a branch? …
Datapath
I was told our operands are the value in the 0 register, and the assembled immediate “6”. I was told to perform an add, so I’ll feed these two arguments into an adder!
I was told rd is t0, so I’ll store the adder’s output in register t0.
Done~!
CMPT 295
L27: Datapath
8
Designing our Datapath: Where to start?
Let’s start with a broad question:
What operations does our datapath need to be capable of performing?
And also maybe a more specific one:
How can we ensure, when we build this, that all RISC-V instructions will be supported?
9
CMPT 295
L27: Datapath
9
Designing our Datapath: Where to start?
6 different formats: R, I , S, SB, U, UJ
Arithmetic, Immediate, Store, Branch, Upper-immediate, JAL
Instructions are classified into these formats based on their behaviours, meaning each type does something a little different!
If we’re building a CPU to run /all/ instructions, we’ll need to figure out what functionalities each type needs → support them all!
10
CMPT 295
L27: Datapath
10
Agenda
What’s a CPU?
Building from what we know
Our CPU
Processor Design Principles
11
CMPT 295
L27: Datapath
Working with an ‘R-type’…
Work with the people around you. What needs to happen for an ‘R-type’ inst to execute?
Wanna work with an example? Use:
add t0 t2 t3
Come up with a list of actions…
12
CMPT 295
L27: Datapath
12
Working with an ‘R-type’
What needs to be done before our R type is executed or evaluated?
(1) Get the instruction
(2) Parse instruction fields (rd, rs1, rs2, operation…)
(3) Read data based on parsed operands
(4) Perform operation
(5) Write result to our destination register
13
CMPT 295
L27: Datapath
Working with an ‘R-type’
(1) Get the instruction
add t0 t2 t3
(2) Parse instruction fields (rd, rs1, rs2, operation…)
rd = t0 rs1 = t2 rs2 = t3
(3) Read data based on parsed operands
R[t2] R[t3]
(4) Perform operation
R[t2] + R[t3]
(5) Write result to our destination register
R[t0] = R[t2] + R[t3]
14
CMPT 295
L27: Datapath
14
Implementing R-Types
15
IMEM
+4
pc
Control
(1) Get the instruction
PC holds the address of our current instruction
Where are the bits making up our inst stored?
How does PC change after an R-Type is executed?
CMPT 295
L27: Datapath
15
Implementing R-Types
16
IMEM
+4
pc
inst[11:7]
inst[19:15]
inst[24:20]
Control
(2) Parse instruction fields (rd, rs1, rs2, operation…)
What registers are we operating on?
Where do they lie in our instruction format?
How big is each field?
CMPT 295
L27: Datapath
16
Implementing R-Types
17
IMEM
+4
pc
inst[11:7]
inst[19:15]
inst[24:20]
Control
(3) Read data based on parsed operands
New hardware: Register File
Abstraction for all our registers (x0..x31 minus PC) and some mux’ing
Reading, Writing happens here
Reg[]
AddrA
AddrB
DataA
AddrD
DataB
R[rs1]
R[rs2]
CMPT 295
L27: Datapath
17
Storage Element: Register File
Register File consists of 31 registers:
Output ports portA and portB
Input port portW
Register selection
Place data of register RA (number) onto portA
Place data of register RB (number) onto portB
Store data on portW into register RW (number) when Write Enable is 1
Clock input (CLK)
CLK is passed to all internal registers so they can be written to if they match RW and Write Enable is 1
18
Clk
portW
Write Enable
32
32
portA
32
portB
5
5
5
RW
RA
RB
32 x 32-bit
Registers
CMPT 295
L27: Datapath
Implementing R-Types
19
IMEM
+4
ALU
pc
inst[11:7]
inst[19:15]
inst[24:20]
Control
(4) Perform operation
New hardware: ALU (Arithmetic Logic Unit)
Abstraction for adders, multipliers, dividers, etc.
How do we know what operation to execute?
Our first control bit! ALUSel(ect)
Reg[]
AddrA
AddrB
DataA
AddrD
DataB
R[rs1]
R[rs2]
ALUSel
inst[31:0]
CMPT 295
L27: Datapath
19
But wait! There are many R-Type operations!
ALU == “Arithmetic Logic Unit”
ALUSel is a control bit which encodes the operation we should perform on the given operands
The value of ALUSel is a mapping from func3 and func7 values to operations
“if func3 == 000 and func7 == 0000000, perform an add”
Multiple func3, func7 combinations might lead to the same operation! (ie. add and addi)
20
CMPT 295
L27: Datapath
Implementing R-Types
21
IMEM
+4
ALU
pc
inst[11:7]
inst[19:15]
inst[24:20]
Control
(5) Write result to our destination register
The data we want to write is the result of computing operation on operands, ie. the output from our ALU
Send it back to the regfile for writing
Reg[]
AddrA
AddrB
DataA
AddrD
DataB
DataD
wb
R[rs1]
R[rs2]
ALUSel
inst[31:0]
CMPT 295
L27: Datapath
21
What changes with an arithmetic ‘I-type’?
A: Get the instruction
B: Parse instruction fields (rd, rs1, rs2)
C: Read data based on parsed operands
D: Perform operation
E: Write result to our destination register
Think: Do we need to add more hardware, more control, or more of both?
22
CMPT 295
L27: Datapath
22
A: Get the instruction
B: Parse instruction fields (rd, rs1, rs2, operation…)
We also need to parse (and reassemble) our immediate!
C: Read data based on parsed operands
Also an okay answer!
D: Perform operation
E: Write result to our destination register
23
What changes with an arithmetic ‘I-type’?
CMPT 295
L27: Datapath
23
Implementing the addi instruction
RISC-V Assembly Instruction:
addi x15,x1,-50
24
111111001110
00001
000
01111
0010011
OP-Imm
rd=15
ADD
imm=-50
rs1=1
CMPT 295
L27: Datapath
Control Logic
Datapath for add/sub
25
+4
pc
pc+4
inst[11:7]
inst[19:15]
inst[24:20]
IMEM
inst[31:0]
RegWEn
(1=write, 0=no write)
Reg[]
AddrA
AddrB
DataA
AddrD
DataB
DataD
Reg[rs1]
Reg[rs2]
alu
ALU
ALUSel
(Add=0/Sub=1)
CMPT 295
L27: Datapath
25
Control Logic
Adding addi to datapath
26
+4
pc
pc+4
inst[11:7]
inst[19:15]
inst[24:20]
IMEM
inst[31:0]
Reg[]
AddrA
AddrB
DataA
AddrD
DataB
DataD
Reg[rs1]
Reg[rs2]
alu
ALU
ALUSel=Add
Imm.
Gen
0
1
RegWEn=1
inst[31:20]
imm[31:0]
ImmSel=I
BSel=1
CMPT 295
L27: Datapath
26
I-Format immediates
27
inst[31:0]
——inst[31]-(sign-extension)——-
inst[30:20]
imm[31:0]
Imm.
Gen
inst[31:20]
imm[31:0]
ImmSel=I
High 12 bits of instruction (inst[31:20]) copied to low 12 bits of immediate (imm[11:0])
Immediate is sign-extended by copying value of inst[31] to fill the upper 20 bits of the immediate value (imm[31:12])
CMPT 295
L27: Datapath
27
Control Logic
Adding addi to datapath
28
+4
pc
pc+4
inst[11:7]
inst[19:15]
inst[24:20]
IMEM
inst[31:0]
Reg[]
AddrA
AddrB
DataA
AddrD
DataB
DataD
Reg[rs1]
Reg[rs2]
alu
ALU
ALUSel=Add
Imm.
Gen
0
1
RegWEn=1
inst[31:20]
imm[31:0]
ImmSel=I
BSel=1
Also works for all other I-format arithmetic instruction (slti,sltiu,andi,ori,xori,slli,srli,srai) just by changing ALUSel
CMPT 295
L27: Datapath
28
But wait… Loads are ‘I-type’s also?
We know we can parse the immediate in the load-word format, but…
What do we do with the immediate?
What operation should we perform?
Maybe a better question: How do we know what the instruction does?
Any ideas?
29
CMPT 295
L27: Datapath
29
Adding lw to datapath
30
IMEM
ALU
Imm.
Gen
+4
DMEM
Reg[]
AddrA
AddrB
DataA
AddrD
DataB
DataD
Addr
DataR
0
1
pc
0
1
inst[11:7]
inst[19:15]
inst[24:20]
inst[31:20]
ALU
mem
wb
imm[31:0]
inst[31:0]
ImmSel
BSel
ALUSel
MemRW
WBSel
wb
pc+4
Reg[rs1]
Reg[rs2]
CMPT 295
L27: Datapath
30
Adding lw to datapath
31
IMEM
ALU
Imm.
Gen
+4
DMEM
Reg[]
AddrA
AddrB
DataA
AddrD
DataB
DataD
Addr
DataR
0
1
pc
0
1
inst[11:7]
inst[19:15]
inst[24:20]
inst[31:20]
alu
mem
wb
pc+4
Reg[rs1]
imm[31:0]
Reg[rs2]
inst[31:0]
ImmSel=I
RegWEn=1
Bsel=1
ALUSel=Add
MemRW=Read
WBSel=0
wb
CMPT 295
L27: Datapath
31
Storage Element: Idealized Memory
Memory (idealized)
One input port: Data In
One output port: Data Out
Memory access:
Read: Write Enable = 0, data at Address is placed on Data Out
Write: Write Enable = 1, Data In written to Address
Clock input (CLK)
CLK input is a factor ONLY during write operation
During read, behaves as a combinational logic block: Address valid → Data Out valid after “access time”
32
CLK
Data In
Write Enable
32
32
DataOut
Address
CMPT 295
L27: Datapath
A few notes on our new datapath…
33
We have a lot of different components!
IMEM, Register file, ALU, DMEM
Does every instruction need every component?
No! We got through all of the R-Types (and some of the I-Types) without DMEM
Does any instruction need every component?
Yep! Loads!
This is the instruction which exercises our “critical path”
CMPT 295
L27: Datapath
33
All RV32 Load Instructions
Supporting the narrower loads requires additional circuits to extract the correct byte/halfword from the value loaded from memory, and sign- or zero-extend the result to 32 bits before writing back to register file.
We’ll assume these are implemented in the DMEM module (not our datapath) and won’t add them to our schematic
34
funct3 field encodes size and signedness of load data
CMPT 295
L27: Datapath
Agenda
Building from what we know
Our CPU
Processor Design Principles
35
CMPT 295
L27: Datapath
This lecture is long… what have we done again?
Added ‘R-type’ and ‘I-type’ instructions to our data path!
We still need to figure out how to do S, SB, U, and UJ types
Don’t worry, we actually have all the big pieces we need!
We talked about control bits and how they instruct our hardware to deal with many different kinds of instructions
Let’s keep going!
36
CMPT 295
L27: Datapath
36
Current Datapath
37
IMEM
ALU
Imm.
Gen
+4
DMEM
Reg[]
AddrA
AddrB
DataA
AddrD
DataB
DataD
Addr
DataR
0
1
pc
0
1
inst[11:7]
inst[19:15]
inst[24:20]
inst[31:20]
ALU
mem
wb
Reg[rs1]
imm[31:0]
Reg[rs2]
inst[31:0]
ImmSel
BSel
ALUSel
MemRW
WBSel
wb
pc+4
CMPT 295
L27: Datapath
37
Adding sw to datapath
38
IMEM
ALU
Imm.
Gen
+4
DMEM
Reg[]
AddrA
AddrB
DataA
AddrD
DataB
DataD
Addr
DataW
DataR
0
1
pc
0
1
inst[11:7]
inst[19:15]
inst[24:20]
inst[31:7]
alu
mem
wb
pc+4
Reg[rs1]
imm[31:0]
Reg[rs2]
inst[31:0]
ImmSel
RegWEn
Bsel
ALUSel
MemRW
WBSel=
wb
CMPT 295
L27: Datapath
38
Adding sw to datapath
39
IMEM
ALU
Imm.
Gen
+4
DMEM
Reg[]
AddrA
AddrB
DataA
AddrD
DataB
DataD
Addr
DataW
DataR
0
1
pc
0
1
inst[11:7]
inst[19:15]
inst[24:20]
inst[31:7]
alu
mem
wb
pc+4
Reg[rs1]
imm[31:0]
Reg[rs2]
inst[31:0]
ImmSel=S
RegWEn=0
Bsel=1
ALUSel=Add
MemRW=Write
WBSel=*
wb
*= “Don’t Care”
CMPT 295
L27: Datapath
39
I-Format immediates
40
inst[31:0]
——inst[31]-(sign-extension)——-
inst[30:20]
imm[31:0]
Imm.
Gen
inst[31:20]
imm[31:0]
ImmSel=I
High 12 bits of instruction (inst[31:20]) copied to low 12 bits of immediate (imm[11:0])
Immediate is sign-extended by copying value of inst[31] to fill the upper 20 bits of the immediate value (imm[31:12])
CMPT 295
L27: Datapath
40
I & S Immediate Generator
41
imm[11:5]
rs2
rs1
funct3
imm[4:0]
S-opcode
imm[11:0]
rs1
funct3
rd
I-opcode
inst[31](sign-extension)
inst[30:25]
imm[31:0]
inst[31:0]
inst[24:20]
S
I
inst[31](sign-extension)
inst[30:25]
inst[11:7]
0
6
7
11
12
14
15
19
20
24
25
31
0
4
5
10
11
31
1
6
5
5
S
I
Just need a 5-bit mux to select between two positions where low five bits of immediate can reside in instruction
Other bits in immediate are wired to fixed positions in instruction
CMPT 295
L27: Datapath
Implementing Branches
B-format is mostly same as S-Format, with two register sources (rs1/rs2) and a 12-bit immediate
But now immediate represents values -4096 to +4094 in 2-byte increments
The 12 immediate bits encode even 13-bit signed byte offsets (lowest bit of offset is always zero, so no need to store it)
42
CMPT 295
L27: Datapath
Adding sw to datapath
43
IMEM
ALU
Imm.
Gen
+4
DMEM
Reg[]
AddrA
AddrB
DataA
AddrD
DataB
DataD
Addr
DataW
DataR
0
1
pc
0
1
inst[11:7]
inst[19:15]
inst[24:20]
inst[31:7]
alu
mem
wb
pc+4
Reg[rs1]
imm[31:0]
Reg[rs2]
inst[31:0]
ImmSel
RegWEn
Bsel
ALUSel
MemRW
WBSel=
wb
CMPT 295
L27: Datapath
43
Adding branches to datapath
44
IMEM
ALU
Imm.
Gen
+4
DMEM
Branch Comp.
Reg[]
AddrA
AddrB
DataA
AddrD
DataB
DataD
Addr
DataW
DataR
1
0
0
1
1
0
pc
0
1
inst[11:7]
inst[19:15]
inst[24:20]
inst[31:7]
alu
mem
wb
alu
pc+4
Reg[rs1]
pc
imm[31:0]
Reg[rs2]
inst[31:0]
ImmSel
RegWEn
BrUn
BrEq
BrLT
ASel
BSel
ALUSel
MemRW
WBSel
PCSel
wb
CMPT 295
L27: Datapath
44
Adding branches to datapath
45
IMEM
ALU
Imm.
Gen
+4
DMEM
Branch Comp.
Reg[]
AddrA
AddrB
DataA
AddrD
DataB
DataD
Addr
DataW
DataR
1
0
0
1
1
0
pc
0
1
inst[11:7]
inst[19:15]
inst[24:20]
inst[31:7]
alu
mem
wb
alu
pc+4
Reg[rs1]
pc
imm[31:0]
Reg[rs2]
inst[31:0]
ImmSel=B
RegWEn=0
BrUn
BrEq
BrLT
ASel=1
Bsel=1
ALUSel=Add
MemRW=Read
WBSel=*
PCSel=taken/not-taken
wb
CMPT 295
L27: Datapath
45
Branch Comparator
BrEq = 1, if A=B
BrLT = 1, if A < B
BrUn =1 selects unsigned comparison for BrLT, 0=signed
BGE branch: A >= B, if !(A