程序代写代做代考 assembler assembly cache RISC-V x86 Java x86 Programming III CSE 351 Autumn 2016

x86 Programming III CSE 351 Autumn 2016

Roadmap
1
car *c = malloc(sizeof(car));
c->miles = 100;
c->gals = 17;
float mpg = get_mpg(c);
free(c);
Car c = new Car();
c.setMiles(100);
c.setGals(17);
float mpg =
c.getMPG();

Java:
C:
Assembly language:
Machine code:
0111010000011000
100011010000010000000010
1000100111000010
110000011111101000011111
Computer system:
OS:

Memory & data
Arrays & structs
Integers & floats
RISC V assembly
Procedures & stacks
Executables
Memory & caches
Processor Pipeline
Performance
Parallelism

CS295
L07 – RISC V Final

1

Stored-Program Concept
R-Format
I-Format
S-Format
SB-Format
U-Format
UJ-Format
Agenda
2

CS295
L07 – RISC V Final

ENIAC (U.Penn., 1946)
First Electronic General-Purpose Computer
3
Blazingly fast (multiply in 2.8ms!)
10 decimal digits x 10 decimal digits
But needed 2-3 days to setup new program,
as programmed with patch cords and switches

CS295
L07 – RISC V Final
-Electronic Numerical Integrator and Computer
-Digital (electronic), capable of being reprogrammed to solve a variety of mathematical problems
-Dev began at the end of WWII, was designed and primarily used to calculate artillery firing tables for the US Army’s
-Very slow to program
-Story about 2 architects who designed it and Von Neumann, how it led to EDSAC
-Slow program entry, I/O

3

EDSAC (Cambridge, 1949)

First General Stored-Program Electronic Computer
Programs held as numbers in memory
35-bit binary 2’s complement words

CS295
L07 – RISC V Final
Programming EDSAC
4

So how do we represent instructions?
Remember: Computer only understands 1s and 0s, so assembler string “add x10,x11,x0” is meaningless to hardware

CS295
L07 – RISC V Final

Big Idea: Stored-Program Concept
programs can be stored in memory as numbers
Before: a number can mean anything
Now: make convention for interpreting numbers as instructions
6
INSTRUCTIONS ARE DATA

CS295
L07 – RISC V Final

Divide the 32 bits of an instruction into “fields”
regular field sizes → simpler hardware
will need some variation….

Instructions as Numbers
7
31
0

By convention, RISCV instructions are each
1 word = 4 bytes = 32 bits

CS295
L07 – RISC V Final
6 formats to memorize, but 4 formats to internalize (R with three registers, I with two registers and an immediate, S with two registers and an address, U with a big immediate/address)
7

Consequence #1: Everything Has a Memory Address
Since all instructions and data are stored in memory, everything has a memory address: instructions, data words
Both branches and jumps use these
C pointers are just memory addresses: they can point to anything in memory
Unconstrained use of addresses can lead to nasty bugs; avoiding errors up to you in C; limited in Java by language design
One register keeps address of instruction being executed: “Program Counter” (PC)
Basically a pointer to memory
Intel calls it Instruction Pointer (IP)

IBM 701, 1953 (Image source: Wikipedia)

CS295
L07 – RISC V Final

Consequence #2: Binary Compatibility
Programs are distributed in binary form
Programs bound to specific instruction set
Different version for phones and PCs
New machines want to run old programs (“binaries”) as well as programs compiled to new instructions
Leads to “backward-compatible” instruction set evolving over time
Selection of Intel 8088 in 1981 for 1st IBM PC is major reason latest PCs still use 80×86 instruction set; could still run program from 1981 PC today

CS295
L07 – RISC V Final

The 6 Instruction Formats
R-Format: instructions using 3 register inputs
add, xor, mul —arithmetic/logical ops
I-Format: instructions with immediates, loads
addi, lw, jalr, slli
S-Format: store instructions: sw,sb
SB-Format: branch instructions: beq, bge
U-Format: instructions with upper immediates
lui, auipc —upper immediate is 20-bits
UJ-Format: the jump instruction: jal

CS295
L07 – RISC V Final
lui: load upper immediate
auipc: adding an upper immediate to PC and putting the result in a register

The 6 Instruction Formats

CS295
L07 – RISC V Final
lui: load upper immediate
auipc: adding an upper immediate to PC and putting the result in a register

Stored-Program Concept
R-Format
I-Format
Administrivia
S-Format
SB-Format
U-Format
UJ-Format
Agenda
12

CS295
L07 – RISC V Final

R-Format Instructions (1/3)
Define “fields” of the following number of bits each: 7 + 5 + 5 + 3 + 5 + 7 = 32

Each field has a name:

Each field is viewed as its own unsigned int
5-bit fields can represent any number 0-31,
while 7-bit fields can represent any number 0-127, etc.

13
31
0
7
7
5
5
3
5
funct7
opcode
rs2
rs1
funct3
rd
31
0

CS295
L07 – RISC V Final

opcode (7): partially specifies operation
e.g. R-types have opcode = 0b0110011, SB (branch) types have opcode = 0b1100011
funct7+funct3 (10): combined with opcode, these two fields describe what operation to perform
How many R-format instructions can we encode?
with opcode fixed at 0b0110011, just funct varies:
(27) x (23)= (210) = 1024
R-Format Instructions (2/3)
14
funct7
opcode
rs2
rs1
funct3
rd
31
0

CS295
L07 – RISC V Final

R-Format Instructions (3/3)
rs1 (5): 1st operand (“source register 1”)
rs2 (5): 2nd operand (second source register)
rd (5): “destination register” — receives the result of computation
Recall: RISCV has 32 registers
A 5 bit field can represent exactly 25 = 32 things (interpret as the register numbers x0-x31)
15

funct7
opcode
rs2
rs1
funct3
rd
31
0

CS295
L07 – RISC V Final

rd = t0 = x5
rs1 = t1 = x6
rt2 = t2 = x7

Reading from the Green Sheet
16
31
0
???
???
???
???
???
???
funct7
opcode
rs2
rs1
funct3
rd
add t0 t1 t2

0
7
6
0
0x33

5

CS295
L07 – RISC V Final

RISCV Instruction: add x5,x6,x7
Field representation (decimal):

Field representation (binary):

R-Format Example
17
two
0
0x33
7
6
0
5
31
0
0000000
0110011
00111
00110
000
00101
31
0

hex representation: 0x 0073 02B3
decimal representation: 7,537,331
Called a Machine Language Instruction

CS295
L07 – RISC V Final
x5 = t0, x6 = t1, x7 = t2

All RV32 R-format instructions
18

CS295
L07 – RISC V Final

18

Stored-Program Concept
R-Format
I-Format
S-Format
SB-Format
U-Format
UJ-Format
Agenda
19

CS295
L07 – RISC V Final

I-Format Instructions (1/4)
What about instructions with immediates?
5-bit field too small for most immediates
Ideally, RISCV would have only one instruction format (for simplicity)
Unfortunately here we need to compromise
Define new instruction format that is mostly consistent with R-Format
First notice that, if instruction has immediate, then it uses at most 2 registers (1 src, 1 dst)
20

CS295
L07 – RISC V Final

I-Format Instructions (2/4)
Define “fields” of the following number of bits each: 12 + 5 + 3 + 5 + 7 = 32 bits

Field names:

Key Concept: Only imm field is different from R-format: rs2 and funct7 replaced by 12-bit signed immediate, imm[11:0]

21
12
3
5
7
31
0

5
imm[11:0]
func3
rd
opcode
31
0
rs1

CS295
L07 – RISC V Final

opcode (7): uniquely specifies the instruction
rs1 (5): specifies a register operand
rd (5): specifies destination register that receives result of computation
22

imm[11:0]
func3
rd
opcode
31
0
rs1
I-Format Instructions (3/4)

CS295
L07 – RISC V Final

I-Format Instructions (4/4)
immediate (12): 12 bit number
All computations done in words, so 12-bit immediate must be extended to 32 bits
always sign-extended to 32-bits before use in an arithmetic operation
23

imm[11:0]
func3
rd
opcode
31
0
rs1
Can represent 212 different immediates
imm[11:0] can hold values in range [-211 , +211)

CS295
L07 – RISC V Final

???

I-Format Example (1/2)
24

addi x15,x1,-50

rd = x15
rs1 = x1

???
?
???
???
31
0
00001
imm[11:0]
func3
rd
opcode
rs1
111111001110
01111
0
0x13

CS295
L07 – RISC V Final

I-Format Example (2/2)
RISCV Instruction: addi x15,x1,-50

Field representation (binary):

hex representation: 0x FCE0 8793
decimal representation: 4,242,573,203
Called a Machine Language Instruction

25
two

???
?
???
31
0
00001
111111001110
01111
000
0010011

CS295
L07 – RISC V Final
$21 = $s5, $22 = $s6

All RISCV I-Type Arithmatic Instructions
26

CS295
L07 – RISC V Final

26

5
Question: If the number of registers were halved,
which statement is true?
27

There must be more R-type instructions
(A)
There must be less I-type instructions
(B)
Shift amounts would change to 0-63
(C)
I-type instructions could have 2 more immediate bits
(D)
imm[11:0]
func3
rd
opcode
31
0
rs1
31
0
12
3
5
7
4
imm[13:0]
func3
rd
opcode
31
0
rs1
31
0
14
3
4
7

CS295
L07 – RISC V Final

27

Stored-Program Concept
R-Format
I-Format
S-Format
SB-Format
U-Format
UJ-Format
Agenda
28

CS295
L07 – RISC V Final

Load Instructions are also I-Type
The 12-bit signed immediate is added to the base address in register rs1 to form the memory address
This is very similar to the add-immediate operation but used to create address, not to create final result
Value loaded from memory is stored in rd
29
imm[11:0]
func3
rd
opcode
31
0
rs1
offset[11:0]
width
dst
LOAD
base

CS295
L07 – RISC V Final

29

I-Format Load Example
30
imm[11:0]
func3
rd
opcode
31
0
rs1
offset[11:0]
width
dst
LOAD
base
lw x14, 8(x2)
000000001000
010
01110
0000011
00010
imm=+8
LW
rd=14
LOAD
rs1=2

CS295
L07 – RISC V Final

30

All RV32 Load Instructions
LBU is “load unsigned byte”
LH is “load halfword”, which loads 16 bits (2 bytes) and sign-extends to fill destination 32-bit register
LHU is “load unsigned halfword”, which zero-extends 16 bits to fill destination 32-bit register
There is no LWU in RV32, because there is no sign/zero extension needed when copying 32 bits from a memory location into a 32-bit register
31

CS295
L07 – RISC V Final

31

Stored-Program Concept
R-Format
I-Format
S-Format
SB-Format
U-Format
UJ-Format
Agenda
32

CS295
L07 – RISC V Final

S-Format Used for Stores
Store needs to read two registers, rs1 for base memory address, and rs2 for data to be stored, as well as need immediate offset!
Can’t have both rs2 and immediate in same place as other instructions!
Note: stores don’t write a value to the register file, no rd!
RISC-V design decision is move low 5 bits of immediate to where rd field was in other instructions – keep rs1/rs2 fields in same place
register names more critical than immediate bits in hardware design
33
imm[11:5]
opcode
rs2
rs1
func3
imm[4:0]
31
0

CS295
L07 – RISC V Final

33

S-Format Example
sw x14, 8(x2)

34
imm[11:5]
opcode
rs2
rs1
func3
imm[4:0]
31
0
00000000
0100011
01110
00010
010
01000
off[11:5] = 0
STORE
rs2=14
rs1=2
SW
off[4:0] = 8

CS295
L07 – RISC V Final

34

All RV32 Store Instructions
35

CS295
L07 – RISC V Final

35

Stored-Program Concept
R-Format
I-Format
S-Format
SB-Format
U-Format
UJ-Format
Agenda
36

CS295
L07 – RISC V Final

Branching Instructions
beq, bne,bge,blt
Need to specify an address to go to
Also take two registers to compare
Doesn’t write into a register (similar to stores)
How to encode label, i.e., where to branch to?
37

CS295
L07 – RISC V Final

Branching Instruction Usage
Branches typically used for loops (if-else, while, for)
Loops are generally small (< 50 instructions) Recall: Instructions stored in a localized area of memory (Code/Text) Largest branch distance limited by size of code Address of current instruction stored in the program counter (PC) 38 CS295 L07 – RISC V Final PC-Relative Addressing PC-Relative Addressing: Use the immediate field as a two’s complement offset to PC Branches generally change the PC by a small amount Can specify ± 211 addresses from the PC Why not use byte address offset from PC as the immediate? 39 CS295 L07 – RISC V Final Branching Reach Recall: RISCV uses 32-bit addresses, and memory is byte-addressed Instructions are “word-aligned”: Address is always a multiple of 4 (in bytes) PC ALWAYS points to an instruction PC is typed as a pointer to a word can do C-like pointer arithmetic Let immediate specify #words instead of #bytes Instead of specifying ± 211 bytes from the PC, we will now specify ± 211 words = ± 213 byte addresses around PC 40 CS295 L07 – RISC V Final Locker room analogy 40 Branch Calculation If we don’t take the branch: PC = PC+4 = next instruction If we do take the branch: PC = PC + (immediate*4) Observations: immediate is number of instructions to move (remember, specifies words) either forward (+) or backwards (–) 41 CS295 L07 – RISC V Final RISC-V Feature, n×16-bit instructions Extensions to RISC-V base ISA support 16-bit compressed instructions and also variable-length instructions that are multiples of 16-bits in length 16-bit = half-word To enable this, RISC-V scales the branch offset to be half-words even when there are no 16-bit instructions Reduces branch reach by half and means that ½ of possible targets will be errors on RISC-V processors that only support 32-bit instructions (as used in this class) RISC-V conditional branches can only reach ± 210 × 32-bit instructions either side of PC 42 CS295 L07 – RISC V Final 42 B-format is mostly same as S-Format, with two register sources (rs1/rs2) and a 12-bit immediate But now immediate represents values -212 to +212-2 in 2-byte increments The 12 immediate bits encode even 13-bit signed byte offsets (lowest bit of offset is always zero, so no need to store it) RISC-V B-Format for Branches 43 imm[12|10:5] opcode rs2 rs1 func3 imm[4:1|11] 31 0 7 7 5 5 3 5 CS295 L07 – RISC V Final 43 Branch Example (1/2) RISCV Code: Loop: beq x19,x10,End add x18,x18,x10 addi x19,x19,-1 j Loop End:

Branch offset =
(Branch with offset of 0, branches to itself)
44
Start counting from instruction AFTER the branch

1
2
3
4

4×32-bit instructions = 16 bytes

CS295
L07 – RISC V Final

Branch Example (1/2)
RISCV Code:
Loop: beq x19,x10,End
add x18,x18,x10
addi x19,x19,-1
j Loop
End:

45
Start counting from instruction AFTER the branch

1
2
3
4

???????
1100011
01010
10011
000
?????
31
0
7
7
5
5
3
5

BRANCH
rs2=10
rs1=19
BEQ

CS295
L07 – RISC V Final

beq x19,x10,offset = 16 bytes
13-bit immediate, imm[12:0], with value 16
0000000010000
Branch Example (1/2)
46

0 000000
1100011
01010
10011
000
1000 0
31
0
imm[12|10:5]

imm[4:1|11]

imm[0] discarded, always zero

CS295
L07 – RISC V Final

RISC-V Immediate Encoding
Why is it so confusing?!?!
47

Upper bits sign-extended from inst[31] always

Only bit 7 of instruction changes role in immediate between S and B

CS295
L07 – RISC V Final

47

All RISC-V Branch Instructions

48

CS295
L07 – RISC V Final

48

Does the value in branch immediate field change if we move the code?
If moving individual lines of code, then yes
If moving all of code, then no (why?)
What do we do if destination is > 210 instructions away from branch?
Other instructions save us:
Questions on PC-addressing
49

beq x10,x0,far bne x10,x0,next
# next instr → j far
next: # next instr

CS295
L07 – RISC V Final
As we will see, jumping instructions can reach farther than branching
49

Agenda
Stored-Program Concept
R-Format
I-Format
S-Format
SB-Format
U-Format
UJ-Format
50

CS295
L07 – RISC V Final

How do we deal with 32-bit immediates?
Our I-type instructions only give us 12 bits
Solution: Need a new instruction format for dealing with the rest of the 20 bits.
This instruction should deal with:
a destination register to put the 20 bits into
the immediate of 20 bits
the instruction opcode
Dealing With Large Immediates
51

CS295
L07 – RISC V Final

U-Format for “Upper Immediate” instructions
Has 20-bit immediate in upper 20 bits of 32-bit instruction word
One destination register, rd
Used for two instructions
LUI – Load Upper Immediate
AUIPC – Add Upper Immediate to PC
52
imm[31:12]
opcode
rd
31
0
20
U-immediate[31:12]
7
LUI/AUIPC
5
dest

CS295
L07 – RISC V Final

52

lui writes the upper 20 bits of the destination with the immediate value, and clears the lower 12 bits
Together with an addi to set low 12 bits, can create any 32-bit value in a register using two instructions (lui/addi).

lui x10, 0x87654 # x10 = 0x87654000
addi x10, x10, 0x321 # x10 = 0x87654321
LUI to create long immediates
53

CS295
L07 – RISC V Final

How to set 0xDEADBEEF?

lui x10, 0xDEADB # x10 = 0xDEADB000
addi x10, x10,0xEEF # x10 = 0xDEADAEEF

addi 12-bit immediate is always sign-extended!
if top bit of the 12-bit immediate is a 1, it will subtract -1 from upper 20 bits

Corner Case
54

CS295
L07 – RISC V Final

How to set 0xDEADBEEF?

lui x10, 0xDEADC # x10 = 0xDEADC000
addi x10, x10,0xEEF # x10 = 0xDEADBEEF

Pre-increment value placed in upper 20 bits, if sign bit will be set on immediate in lower 12 bits.

Assembler pseudo-op handles all of this:
li x10, 0xDEADBEEF # Creates two instructions
Solution
55

CS295
L07 – RISC V Final

AUIPC
56
Adds upper immediate value to PC and places result in destination register
Used for PC-relative addressing

Label: auipc x10, 0
Puts address of label into x10

CS295
L07 – RISC V Final

56

Agenda
Stored-Program Concept
R-Format
I-Format
S-Format
SB-Format
U-Format
UJ-Format
57

CS295
L07 – RISC V Final

UJ-Format Instructions (1/3)
For branches, we assumed that we won’t want to branch too far, so we can specify a change in the PC
For general jumps (jal), we may jump to anywhere in code memory
Ideally, we would specify a 32-bit memory address to jump to
Unfortunately, we can’t fit both a 7-bit opcode and a 32-bit address into a single 32-bit word
Also, when linking we must write to an rd register
58

CS295
L07 – RISC V Final
We have lots of needs—we’ll do the best we can just like with U-instructions!

UJ-Format Instructions (2/3)
jal saves PC+4 in register rd (the return address)
Set PC = PC + offset (PC-relative jump)
Target somewhere within ±219 locations, 2 bytes apart
±218 32-bit instructions
Reminder: “j” jump is a pseudo-instruction—the assembler will instead use jal but sets rd=x0 to discard return address
Immediate encoding optimized similarly to branch instruction to reduce hardware cost
59

imm[20|10:1|11|19:12]
opcode
rd
31
0
20
offset[31:12]
7
JAL
5
dest

CS295
L07 – RISC V Final

UJ-Format Instructions (2/3)
# j pseudo-instruction
j Label = jal x0, Label # Discard return address
# Call function within 218 instructions of PC
jal ra, FuncName

Why is the immediate so funky?
Similar reasoning as for branch immediates
60

imm[20|10:1|11|19:12]
opcode
rd
31
0
20
offset[31:12]
7
JAL
5
dest

CS295
L07 – RISC V Final

jalr Instruction (I-Format)
jalr rd, rs1, offset
Writes PC+4 to rd (return address)
Sets PC = rs1 + offset
Uses same immediates as arithmetic & loads
no multiplication by 2 bytes
61
imm[11:0]
func3
rd
opcode
31
0
rs1
offset
0
dest
JALR
base

CS295
L07 – RISC V Final

61

Uses of jalr
# ret and jr psuedo-instructions
ret = jr ra = jalr x0, ra, 0
# Call function at any 32-bit absolute address
lui x1,
jalr ra, x1,
# Jump PC-relative with 32-bit offset
auipc x1,
jalr x0, x1,
62
imm[11:0]
func3
rd
opcode
31
0
rs1
offset
0
dest
JALR
base

CS295
L07 – RISC V Final

62

Summary of RISC-V Instruction Formats

63

CS295
L07 – RISC V Final

63