x86 Programming III CSE 351 Autumn 2016
ACKNOWLEDGEMENT: These slides have been modified by your your CMPT 295 instructor and RISC-V ISA creators. However, please report all mistakes to your instructor.
1
car *c = malloc(sizeof(car));
c->miles = 100;
c->gals = 17;
float mpg = get_mpg(c);
free(c);
Car c = new Car();
c.setMiles(100);
c.setGals(17);
float mpg =
c.getMPG();
Java:
C:
Assembly language:
Machine code:
0111010000011000
100011010000010000000010
1000100111000010
110000011111101000011111
Computer system:
OS:
Memory & data
Arrays & structs
Integers & floats
RISC V assembly
Procedures & stacks
Executables
Memory & caches
Processor Pipeline
Performance
Parallelism
CS295
L07 – RISC V Final
1
Stored-Program Concept
R-Format
I-Format
S-Format
SB-Format
U-Format
UJ-Format
Agenda
2
CS295
L07 – RISC V Final
ENIAC (U.Penn., 1946)
First Electronic General-Purpose Computer
3
Blazingly fast (multiply in 2.8ms!)
10 decimal digits x 10 decimal digits
But needed 2-3 days to setup new program,
as programmed with patch cords and switches
CS295
L07 – RISC V Final
-Electronic Numerical Integrator and Computer
-Digital (electronic), capable of being reprogrammed to solve a variety of mathematical problems
-Dev began at the end of WWII, was designed and primarily used to calculate artillery firing tables for the US Army’s
-Very slow to program
-Story about 2 architects who designed it and Von Neumann, how it led to EDSAC
-Slow program entry, I/O
3
EDSAC (Cambridge, 1949)
First General Stored-Program Electronic Computer
Programs held as numbers in memory
35-bit binary 2’s complement words
CS295
L07 – RISC V Final
Programming EDSAC
4
So how do we represent instructions?
Remember: Computer only understands 1s and 0s, so assembler string “add x10,x11,x0” is meaningless to hardware
CS295
L07 – RISC V Final
Big Idea: Stored-Program Concept
programs can be stored in memory as numbers
Before: a number can mean anything
Now: make convention for interpreting numbers as instructions
6
INSTRUCTIONS ARE DATA
CS295
L07 – RISC V Final
Divide the 32 bits of an instruction into “fields”
regular field sizes → simpler hardware
will need some variation….
Instructions as Numbers
7
31
0
By convention, RISCV instructions are each
1 word = 4 bytes = 32 bits
CS295
L07 – RISC V Final
6 formats to memorize, but 4 formats to internalize (R with three registers, I with two registers and an immediate, S with two registers and an address, U with a big immediate/address)
7
Consequence #1: Everything Has a Memory Address
Since all instructions and data are stored in memory, everything has a memory address: instructions, data words
Both branches and jumps use these
C pointers are just memory addresses: they can point to anything in memory
Unconstrained use of addresses can lead to nasty bugs; avoiding errors up to you in C; limited in Java by language design
One register keeps address of instruction being executed: “Program Counter” (PC)
Basically a pointer to memory
Intel calls it Instruction Pointer (IP)
IBM 701, 1953 (Image source: Wikipedia)
CS295
L07 – RISC V Final
Consequence #2: Binary Compatibility
Programs are distributed in binary form
Programs bound to specific instruction set
Different version for phones and PCs
New machines want to run old programs (“binaries”) as well as programs compiled to new instructions
Leads to “backward-compatible” instruction set evolving over time
Selection of Intel 8088 in 1981 for 1st IBM PC is major reason latest PCs still use 80×86 instruction set; could still run program from 1981 PC today
CS295
L07 – RISC V Final
The 6 Instruction Formats
R-Format: instructions using 3 register inputs
add, xor, mul —arithmetic/logical ops
I-Format: instructions with immediates, loads
addi, lw, jalr, slli
S-Format: store instructions: sw,sb
SB-Format: branch instructions: beq, bge
U-Format: instructions with upper immediates
lui, auipc —upper immediate is 20-bits
UJ-Format: the jump instruction: jal
CS295
L07 – RISC V Final
lui: load upper immediate
auipc: adding an upper immediate to PC and putting the result in a register
The 6 Instruction Formats
CS295
L07 – RISC V Final
lui: load upper immediate
auipc: adding an upper immediate to PC and putting the result in a register
Stored-Program Concept
R-Format
I-Format
Administrivia
S-Format
SB-Format
U-Format
UJ-Format
Agenda
12
CS295
L07 – RISC V Final
R-Format Instructions (1/3)
Define “fields” of the following number of bits each: 7 + 5 + 5 + 3 + 5 + 7 = 32
Each field has a name:
Each field is viewed as its own unsigned int
5-bit fields can represent any number 0-31,
while 7-bit fields can represent any number 0-127, etc.
13
31
0
7
7
5
5
3
5
funct7
opcode
rs2
rs1
funct3
rd
31
0
CS295
L07 – RISC V Final
opcode (7): partially specifies operation
e.g. R-types have opcode = 0b0110011, SB (branch) types have opcode = 0b1100011
funct7+funct3 (10): combined with opcode, these two fields describe what operation to perform
How many R-format instructions can we encode?
with opcode fixed at 0b0110011, just funct varies:
(27) x (23)= (210) = 1024
R-Format Instructions (2/3)
14
funct7
opcode
rs2
rs1
funct3
rd
31
0
CS295
L07 – RISC V Final
R-Format Instructions (3/3)
rs1 (5): 1st operand (“source register 1”)
rs2 (5): 2nd operand (second source register)
rd (5): “destination register” — receives the result of computation
Recall: RISCV has 32 registers
A 5 bit field can represent exactly 25 = 32 things (interpret as the register numbers x0-x31)
15
funct7
opcode
rs2
rs1
funct3
rd
31
0
CS295
L07 – RISC V Final
rd = t0 = x5
rs1 = t1 = x6
rt2 = t2 = x7
Reading from the Green Sheet
16
31
0
???
???
???
???
???
???
funct7
opcode
rs2
rs1
funct3
rd
add t0 t1 t2
0
7
6
0
0x33
5
CS295
L07 – RISC V Final
RISCV Instruction: add x5,x6,x7
Field representation (decimal):
Field representation (binary):
R-Format Example
17
two
0
0x33
7
6
0
5
31
0
0000000
0110011
00111
00110
000
00101
31
0
hex representation: 0x 0073 02B3
decimal representation: 7,537,331
Called a Machine Language Instruction
CS295
L07 – RISC V Final
x5 = t0, x6 = t1, x7 = t2
All RV32 R-format instructions
18
CS295
L07 – RISC V Final
18
Stored-Program Concept
R-Format
I-Format
S-Format
SB-Format
U-Format
UJ-Format
Agenda
19
CS295
L07 – RISC V Final
I-Format Instructions (1/4)
What about instructions with immediates?
5-bit field too small for most immediates
Ideally, RISCV would have only one instruction format (for simplicity)
Unfortunately here we need to compromise
Define new instruction format that is mostly consistent with R-Format
First notice that, if instruction has immediate, then it uses at most 2 registers (1 src, 1 dst)
20
CS295
L07 – RISC V Final
I-Format Instructions (2/4)
Define “fields” of the following number of bits each: 12 + 5 + 3 + 5 + 7 = 32 bits
Field names:
Key Concept: Only imm field is different from R-format: rs2 and funct7 replaced by 12-bit signed immediate, imm[11:0]
21
12
3
5
7
31
0
5
imm[11:0]
func3
rd
opcode
31
0
rs1
CS295
L07 – RISC V Final
opcode (7): uniquely specifies the instruction
rs1 (5): specifies a register operand
rd (5): specifies destination register that receives result of computation
22
imm[11:0]
func3
rd
opcode
31
0
rs1
I-Format Instructions (3/4)
CS295
L07 – RISC V Final
I-Format Instructions (4/4)
immediate (12): 12 bit number
All computations done in words, so 12-bit immediate must be extended to 32 bits
always sign-extended to 32-bits before use in an arithmetic operation
23
imm[11:0]
func3
rd
opcode
31
0
rs1
Can represent 212 different immediates
imm[11:0] can hold values in range [-211 , +211)
CS295
L07 – RISC V Final
???
I-Format Example (1/2)
24
addi x15,x1,-50
rd = x15
rs1 = x1
???
?
???
???
31
0
00001
imm[11:0]
func3
rd
opcode
rs1
111111001110
01111
0
0x13
CS295
L07 – RISC V Final
I-Format Example (2/2)
RISCV Instruction: addi x15,x1,-50
Field representation (binary):
hex representation: 0x FCE0 8793
decimal representation: 4,242,573,203
Called a Machine Language Instruction
25
two
???
?
???
31
0
00001
111111001110
01111
000
0010011
CS295
L07 – RISC V Final
$21 = $s5, $22 = $s6
All RISCV I-Type Arithmatic Instructions
26
CS295
L07 – RISC V Final
26
5
Question: If the number of registers were halved,
which statement is true?
27
There must be more R-type instructions
(A)
There must be less I-type instructions
(B)
Shift amounts would change to 0-63
(C)
I-type instructions could have 2 more immediate bits
(D)
imm[11:0]
func3
rd
opcode
31
0
rs1
31
0
12
3
5
7
4
imm[13:0]
func3
rd
opcode
31
0
rs1
31
0
14
3
4
7
CS295
L07 – RISC V Final
27
Stored-Program Concept
R-Format
I-Format
S-Format
SB-Format
U-Format
UJ-Format
Agenda
28
CS295
L07 – RISC V Final
Load Instructions are also I-Type
The 12-bit signed immediate is added to the base address in register rs1 to form the memory address
This is very similar to the add-immediate operation but used to create address, not to create final result
Value loaded from memory is stored in rd
29
imm[11:0]
func3
rd
opcode
31
0
rs1
offset[11:0]
width
dst
LOAD
base
CS295
L07 – RISC V Final
29
I-Format Load Example
30
imm[11:0]
func3
rd
opcode
31
0
rs1
offset[11:0]
width
dst
LOAD
base
lw x14, 8(x2)
000000001000
010
01110
0000011
00010
imm=+8
LW
rd=14
LOAD
rs1=2
CS295
L07 – RISC V Final
30
All RV32 Load Instructions
LBU is “load unsigned byte”
LH is “load halfword”, which loads 16 bits (2 bytes) and sign-extends to fill destination 32-bit register
LHU is “load unsigned halfword”, which zero-extends 16 bits to fill destination 32-bit register
There is no LWU in RV32, because there is no sign/zero extension needed when copying 32 bits from a memory location into a 32-bit register
31
CS295
L07 – RISC V Final
31
Stored-Program Concept
R-Format
I-Format
S-Format
SB-Format
U-Format
UJ-Format
Agenda
32
CS295
L07 – RISC V Final
S-Format Used for Stores
Store needs to read two registers, rs1 for base memory address, and rs2 for data to be stored, as well as need immediate offset!
Can’t have both rs2 and immediate in same place as other instructions!
Note: stores don’t write a value to the register file, no rd!
RISC-V design decision is move low 5 bits of immediate to where rd field was in other instructions – keep rs1/rs2 fields in same place
register names more critical than immediate bits in hardware design
33
imm[11:5]
opcode
rs2
rs1
func3
imm[4:0]
31
0
CS295
L07 – RISC V Final
33
S-Format Example
sw x14, 8(x2)
34
imm[11:5]
opcode
rs2
rs1
func3
imm[4:0]
31
0
00000000
0100011
01110
00010
010
01000
off[11:5] = 0
STORE
rs2=14
rs1=2
SW
off[4:0] = 8
CS295
L07 – RISC V Final
34
All RV32 Store Instructions
35
CS295
L07 – RISC V Final
35
Stored-Program Concept
R-Format
I-Format
S-Format
SB-Format
U-Format
UJ-Format
Agenda
36
CS295
L07 – RISC V Final
Branching Instructions
beq, bne,bge,blt
Need to specify an address to go to
Also take two registers to compare
Doesn’t write into a register (similar to stores)
How to encode label, i.e., where to branch to?
37
CS295
L07 – RISC V Final
Branching Instruction Usage
Branches typically used for loops (if-else, while, for)
Loops are generally small (< 50 instructions)
Recall: Instructions stored in a localized area of memory (Code/Text)
Largest branch distance limited by size of code
Address of current instruction stored in the program counter (PC)
38
CS295
L07 – RISC V Final
PC-Relative Addressing
PC-Relative Addressing: Use the immediate field as a two’s complement offset to PC
Branches generally change the PC by a small amount
Can specify ± 211 addresses from the PC
Why not use byte address offset from PC as the immediate?
39
CS295
L07 – RISC V Final
Branching Reach
Recall: RISCV uses 32-bit addresses, and memory is byte-addressed
Instructions are “word-aligned”: Address is always a multiple of 4 (in bytes)
PC ALWAYS points to an instruction
PC is typed as a pointer to a word
can do C-like pointer arithmetic
Let immediate specify #words instead of #bytes
Instead of specifying ± 211 bytes from the PC,
we will now specify ± 211 words = ± 213 byte addresses around PC
40
CS295
L07 – RISC V Final
Locker room analogy
40
Branch Calculation
If we don’t take the branch:
PC = PC+4 = next instruction
If we do take the branch:
PC = PC + (immediate*4)
Observations:
immediate is number of instructions to move (remember, specifies words) either forward (+) or backwards (–)
41
CS295
L07 – RISC V Final
RISC-V Feature, n×16-bit instructions
Extensions to RISC-V base ISA support 16-bit compressed instructions and also variable-length instructions that are multiples of 16-bits in length
16-bit = half-word
To enable this, RISC-V scales the branch offset to be half-words even when there are no 16-bit instructions
Reduces branch reach by half and means that ½ of possible targets will be errors on RISC-V processors that only support 32-bit instructions (as used in this class)
RISC-V conditional branches can only reach ± 210 × 32-bit instructions either side of PC
42
CS295
L07 – RISC V Final
42
B-format is mostly same as S-Format, with two register sources (rs1/rs2) and a 12-bit immediate
But now immediate represents values -212 to +212-2 in 2-byte increments
The 12 immediate bits encode even 13-bit signed byte offsets (lowest bit of offset is always zero, so no need to store it)
RISC-V B-Format for Branches
43
imm[12|10:5]
opcode
rs2
rs1
func3
imm[4:1|11]
31
0
7
7
5
5
3
5
CS295
L07 – RISC V Final
43
Branch Example (1/2)
RISCV Code:
Loop: beq x19,x10,End
add x18,x18,x10
addi x19,x19,-1
j Loop
End:
Branch offset =
(Branch with offset of 0, branches to itself)
44
Start counting from instruction AFTER the branch
1
2
3
4
4×32-bit instructions = 16 bytes
CS295
L07 – RISC V Final
Branch Example (1/2)
RISCV Code:
Loop: beq x19,x10,End
add x18,x18,x10
addi x19,x19,-1
j Loop
End:
45
Start counting from instruction AFTER the branch
1
2
3
4
???????
1100011
01010
10011
000
?????
31
0
7
7
5
5
3
5
BRANCH
rs2=10
rs1=19
BEQ
CS295
L07 – RISC V Final
beq x19,x10,offset = 16 bytes
13-bit immediate, imm[12:0], with value 16
0000000010000
Branch Example (1/2)
46
0 000000
1100011
01010
10011
000
1000 0
31
0
imm[12|10:5]
imm[4:1|11]
imm[0] discarded, always zero
CS295
L07 – RISC V Final
RISC-V Immediate Encoding
Why is it so confusing?!?!
47
Upper bits sign-extended from inst[31] always
Only bit 7 of instruction changes role in immediate between S and B
CS295
L07 – RISC V Final
47
All RISC-V Branch Instructions
48
CS295
L07 – RISC V Final
48
Does the value in branch immediate field change if we move the code?
If moving individual lines of code, then yes
If moving all of code, then no (why?)
What do we do if destination is > 210 instructions away from branch?
Other instructions save us:
Questions on PC-addressing
49
beq x10,x0,far bne x10,x0,next
# next instr → j far
next: # next instr
CS295
L07 – RISC V Final
As we will see, jumping instructions can reach farther than branching
49
Agenda
Stored-Program Concept
R-Format
I-Format
S-Format
SB-Format
U-Format
UJ-Format
50
CS295
L07 – RISC V Final
How do we deal with 32-bit immediates?
Our I-type instructions only give us 12 bits
Solution: Need a new instruction format for dealing with the rest of the 20 bits.
This instruction should deal with:
a destination register to put the 20 bits into
the immediate of 20 bits
the instruction opcode
Dealing With Large Immediates
51
CS295
L07 – RISC V Final
U-Format for “Upper Immediate” instructions
Has 20-bit immediate in upper 20 bits of 32-bit instruction word
One destination register, rd
Used for two instructions
LUI – Load Upper Immediate
AUIPC – Add Upper Immediate to PC
52
imm[31:12]
opcode
rd
31
0
20
U-immediate[31:12]
7
LUI/AUIPC
5
dest
CS295
L07 – RISC V Final
52
lui writes the upper 20 bits of the destination with the immediate value, and clears the lower 12 bits
Together with an addi to set low 12 bits, can create any 32-bit value in a register using two instructions (lui/addi).
lui x10, 0x87654 # x10 = 0x87654000
addi x10, x10, 0x321 # x10 = 0x87654321
LUI to create long immediates
53
CS295
L07 – RISC V Final
How to set 0xDEADBEEF?
lui x10, 0xDEADB # x10 = 0xDEADB000
addi x10, x10,0xEEF # x10 = 0xDEADAEEF
addi 12-bit immediate is always sign-extended!
if top bit of the 12-bit immediate is a 1, it will subtract -1 from upper 20 bits
Corner Case
54
CS295
L07 – RISC V Final
How to set 0xDEADBEEF?
lui x10, 0xDEADC # x10 = 0xDEADC000
addi x10, x10,0xEEF # x10 = 0xDEADBEEF
Pre-increment value placed in upper 20 bits, if sign bit will be set on immediate in lower 12 bits.
Assembler pseudo-op handles all of this:
li x10, 0xDEADBEEF # Creates two instructions
Solution
55
CS295
L07 – RISC V Final
AUIPC
56
Adds upper immediate value to PC and places result in destination register
Used for PC-relative addressing
Label: auipc x10, 0
Puts address of label into x10
CS295
L07 – RISC V Final
56
Agenda
Stored-Program Concept
R-Format
I-Format
S-Format
SB-Format
U-Format
UJ-Format
57
CS295
L07 – RISC V Final
UJ-Format Instructions (1/3)
For branches, we assumed that we won’t want to branch too far, so we can specify a change in the PC
For general jumps (jal), we may jump to anywhere in code memory
Ideally, we would specify a 32-bit memory address to jump to
Unfortunately, we can’t fit both a 7-bit opcode and a 32-bit address into a single 32-bit word
Also, when linking we must write to an rd register
58
CS295
L07 – RISC V Final
We have lots of needs—we’ll do the best we can just like with U-instructions!
UJ-Format Instructions (2/3)
jal saves PC+4 in register rd (the return address)
Set PC = PC + offset (PC-relative jump)
Target somewhere within ±219 locations, 2 bytes apart
±218 32-bit instructions
Reminder: “j” jump is a pseudo-instruction—the assembler will instead use jal but sets rd=x0 to discard return address
Immediate encoding optimized similarly to branch instruction to reduce hardware cost
59
imm[20|10:1|11|19:12]
opcode
rd
31
0
20
offset[31:12]
7
JAL
5
dest
CS295
L07 – RISC V Final
UJ-Format Instructions (2/3)
# j pseudo-instruction
j Label = jal x0, Label # Discard return address
# Call function within 218 instructions of PC
jal ra, FuncName
Why is the immediate so funky?
Similar reasoning as for branch immediates
60
imm[20|10:1|11|19:12]
opcode
rd
31
0
20
offset[31:12]
7
JAL
5
dest
CS295
L07 – RISC V Final
jalr Instruction (I-Format)
jalr rd, rs1, offset
Writes PC+4 to rd (return address)
Sets PC = rs1 + offset
Uses same immediates as arithmetic & loads
no multiplication by 2 bytes
61
imm[11:0]
func3
rd
opcode
31
0
rs1
offset
0
dest
JALR
base
CS295
L07 – RISC V Final
61
Uses of jalr
# ret and jr psuedo-instructions
ret = jr ra = jalr x0, ra, 0
# Call function at any 32-bit absolute address
lui x1,
jalr ra, x1,
# Jump PC-relative with 32-bit offset
auipc x1,
jalr x0, x1,
62
imm[11:0]
func3
rd
opcode
31
0
rs1
offset
0
dest
JALR
base
CS295
L07 – RISC V Final
62
Summary of RISC-V Instruction Formats
63
CS295
L07 – RISC V Final
63