x86 Programming III CSE 351 Autumn 2016
Roadmap
1
car *c = malloc(sizeof(car));
c->miles = 100;
c->gals = 17;
float mpg = get_mpg(c);
free(c);
Car c = new Car();
c.setMiles(100);
c.setGals(17);
float mpg =
c.getMPG();
Java:
C:
Assembly language:
Machine code:
0111010000011000
100011010000010000000010
1000100111000010
110000011111101000011111
Computer system:
OS:
Memory & data
Arrays & structs
Integers & floats
RISC V assembly
Procedures & stacks
Executables
Memory & caches
Processor Pipeline
Performance
Parallelism
CMPT 295
L06 – RISC V – I
1
sum.c
sum.s
Compiler
C source
files
assembly
files
sum.o
Assembler
obj files
sum
Linker
executable
program
Executing
in
Memory
loader
process
exists on
disk
From Writing to Running
2
When most people say “compile” they mean
the entire process:
compile + assemble + link
“It’s alive!”
gcc -S
gcc -c
gcc -o
CMPT 295
L06 – RISC V – I
C compiler produces assembly files (contain RISCV assembly, pseudo-instructions, directives, etc.)
RISCV assembler produces object files (contain RISCV machine code, missing symbols, some layout information, etc.)
RISCV linker produces executable file (contains RISCV machine code, no missing symbols, some layout information)
OS loader gets it into memory and jumps to first instruction (machine code)
2
Mainstream ISAs
3
Macbooks & PCs
(Core i3, i5, i7, M)
x86 Instruction Set
Smartphone-like devices
(iPhone, iPad, Raspberry Pi)
ARM Instruction Set
Versatile and open-source
Relatively new, designed for cloud computing, high-end phones, small embedded sys.
RISCV Instruction Set
CMPT 295
L06 – RISC V – I
We have many choices as to which assembly language we could pick:
intel x86, which is the machine language used by most of our laptops.
Why don’t we?
SUPER long and complicated
Arm, which is used in our iphone
ARM stands for Advanced RISC Machine
3
Complex/Reduced Instruction Set Computing
Early trend: add more and more instructions to do elaborate operations – Complex Instruction Set Computing (CISC)
difficult to learn and comprehend language
super-complicated (slow?) hardware
Opposite philosophy later began to dominate: Reduced Instruction Set Computing (RISC)
Simpler (and smaller) instruction set makes it easier to build fast hardware
Let software do the complicated operations by composing simpler ones
4
CMPT 295
L06 – RISC V – I
x86 fell victim to the “early trend” — every time we want to do something complicated, let’s just implement it directly in the hardware! Downsides…
no one person can know the whole things
WHAT’S THE RULE ABOUT ASSEMBLY LANGUAGE?
New open-source, license-free ISA spec
Supported by growing shared software ecosystem
Appropriate for all levels of computing system, from microcontrollers to supercomputers
32-bit, 64-bit, and 128-bit variants (we’re using 32-bit in class, textbook uses 64-bit)
Why RISC-V instead of Intel 80×86?
RISC-V is simple, elegant. Don’t want to get bogged down in gritty details.
RISC-V has exponential adoption rate
RISC-V Green Card
IBM 360 Green Card
RISC-V Architecture
5
CMPT 295
L06 – RISC V – I
Registers — Summary
In high-level languages, number of variables limited only by available memory
ISAs have a fixed, small number of operands called registers
Special locations built directly into hardware
Benefit: Registers are EXTREMELY FAST
(faster than 1 billionth of a second)
Drawback: Operations can only be performed on these predetermined number of registers
6
CMPT 295
L06 – RISC V – I
Registers visible to compiler (or RISCV programmer).
6
Aside: Registers are Inside the Processor
Processor
Control
Datapath
PC
Registers
Arithmetic & Logic Unit
(ALU)
Memory
Input
Output
Bytes
Enable?
Read/Write
Address
Write Data
Read Data
Processor-Memory Interface
I/O-Memory Interfaces
Program
Data
7
CMPT 295
L06 – RISC V – I
RISCV — How Many Registers?
Tradeoff between speed and availability
more registers → can house more variables simultaneously; all registers are slower.
RISCV has 32 registers (x0-x31)
Each register is 32 bits wide and holds a word
6/27/2016
8
CMPT 295
L06 – RISC V – I
word = group of 32 bits (like how byte = group of 8 bits)
For the sake of comparison, ARM uses 16 registers.
How many bytes is a word?
what if we need more variables than that?!
8
RISC V Integer Registers – 32 bits wide
9
CMPT 295
L06 – RISC V – I
Names are largely arbitrary, reasons behind them are not super relevant to us
But there are conventions about how they are used
One is particularly important to not misuse: %sp
Stack pointer; we’ll get to this when we talk about procedures
Each is 4-bytes wide
What if we only want 4 bytes?
We can use these alternate names to use just half of the bytes
Memory vs. Registers
Addresses vs. Names
0x7FFFD024C3DC %x0
Big vs. Small
~ 8 GiB (16 x 8 B) = 128 B
Slow vs. Fast
~50-100 ns sub-nanosecond timescale
Dynamic vs. Static
Can “grow” as needed fixed number in hardware
while program runs
10
CMPT 295
L06 – RISC V – I
RISCV Registers
Register denoted by ‘x’ can be referenced by number (x0-x31) or name:
Registers that hold programmer variables:
s0-s1 ⬌ x8-x9
s2-s11 ⬌ x18-x27
Registers that hold temporary variables:
t0-t2 ⬌ x5-x7
t3-t6 ⬌ x28-x31
You’ll learn about the other 13 registers later
Registers have no type (C concept); the operation being performed determines how register contents are treated
11
CMPT 295
L06 – RISC V – I
11
C, Java variables vs. registers
In C (and most High Level Languages) variables declared first and given a type. E.g.,
int fahr, celsius;
char a, b, c, d, e;
Each variable can ONLY represent a value of the type it was declared as (cannot mix and match int and char variables).
In Assembly Language, the registers have no type
Operation determines how register contents are treated
12
CMPT 295
L06 – RISC V – I
RISCV Agenda
Basic Arithmetic Instructions
Comments
x0 (zero)
Immediates
Data Transfer Instructions
Decision Making Instructions
Bonus: C to RISCV Practice
Bonus: Additional Instructions
13
CMPT 295
L06 – RISC V – I
RISCV Instructions (1/2)
Instruction Syntax is rigid:
op dst, src1, src2
1 operator, 3 operands
op = operation name (“operator”)
dst = register getting result (“destination”)
src1 = first register for operation (“source 1”)
src2 = second register for operation (“source 2”)
Keep hardware simple via regularity
14
CMPT 295
L06 – RISC V – I
RISCV Instructions (2/2)
One operation per instruction,
at most one instruction per line
Assembly instructions are related to C operations (=, +, -, *, /, &, |, etc.)
Must be, since C code decomposes into assembly!
A single line of C may break up into several lines of RISCV
15
CMPT 295
L06 – RISC V – I
RISCV Instructions Example
Your very first instructions!
(assume here that the variables a, b, and c are assigned to registers s1, s2, and s3, respectively)
Integer Addition (add)
C: a = b + c
RISCV: add s1, s2, s3
Integer Subtraction (sub)
C: a = b – c
RISCV: sub s1, s2, s3
16
CMPT 295
L06 – RISC V – I
RISCV Instructions Example
17
Ordering of instructions matters (must follow order of operations)
Utilize temporary registers
Suppose a → s0,b → s1,c → s2,d → s3 and e → s4. Convert the following C statement to RISCV:
a = (b + c) – (d + e);
add t1, s3, s4
add t2, s1, s2
sub s0, t2, t1
CMPT 295
L06 – RISC V – I
add s0, s1, s2
add t0, s3, s4
sub s0, s0, t0
Suppose a -> s0, b->s1, c-> s2, d->s3 and e->s4. Convert the following C statement to RISCV:
a = (b + c) – (d + e);
add t1, s3, s4
add t2, s1, s2
sub s0, t2, t1
17
Assembly Instructions
In assembly language, each statement (called an Instruction), executes exactly one of a short list of simple commands
Unlike in C (and most other High Level Languages), each line of assembly code contains at most 1 instruction
Instructions are related to operations (=, +, -, *, /) in C or Java
Ok, enough already…gimme my RV32!
18
CMPT 295
L06 – RISC V – I
RISC V Assembly “Data Types”
Integral data of 1, 2, 4, or 8 bytes (we focus on 4 bytes)
Data values
Addresses
Floating point data of 4, 8, 10 or 2×8 or 4×4 or 8×2
Different registers for those (e.g. %f0, %f31)
No aggregate types such as arrays or structures
Just contiguously allocated bytes in memory
“AT&T”: used by our course, slides, textbook, gnu tools, …
19
Not covered
In 295
CMPT 295
L06 – RISC V – I
Assembly program doesn’t actually have notion of “data types,” but since encoding is so different between integral and floating point data, there are actually different instructions built into assembly because the hardware needs to deal with these numbers differently when performing arithmetic.
Three Basic Kinds of Instructions
Transfer data between memory and register
Load data from memory into register
%reg = Mem[address]
Store register data into memory
Mem[address] = %reg
Perform arithmetic operation on register or memory data
c = a + b; z = x << y; i = h & g;
Control flow: what instruction to execute next
Unconditional jumps to/from procedures
Conditional branches
20
Remember: Memory is indexed just like an array of bytes!
CMPT 295
L06 – RISC V - I
Operand types
Immediate: Constant integer data
Examples: $0x400, $-533
Like C literal, but prefixed with ‘$’
Encoded with 1, 2, 4, or 8 bytes
depending on the instruction
Register: 22 integer registers
Examples: %x9 … %x31
But %x0-x4 and x8 reserved for special use
Others have special uses for particular instructions
Memory: Consecutive bytes of memory at a computed address
Simplest example: (%x18)
21
CMPT 295
L06 – RISC V - I
RISC-V Addition and Subtraction (1/4)
Syntax of Instructions:
One two, three, four
where:
One = operation by name
two = operand getting result (“destination”)
three = 1st operand for operation (“source1”)
four = 2nd operand for operation (“source2”)
Syntax is rigid:
1 operator, 3 operands
Why? Keep Hardware simple via regularity
add x1, x2, x3
22
CMPT 295
L06 – RISC V - I
Addition and Subtraction of Integers (2/4)
Addition in Assembly
Example: add x1,x2,x3 (in RISC-V)
Equivalent to: a = b + c (in C)
where C variables ⇔ RISC-V registers are:
a ⇔ x1, b ⇔ x2, c ⇔ x3
Subtraction in Assembly
Example: sub x3,x4,x5 (in RISC-V)
Equivalent to: d = e - f (in C)
where C variables ⇔ RISC-V registers are:
d ⇔ x3, e ⇔ x4, f ⇔ x5
23
CMPT 295
L06 – RISC V - I
Addition and Subtraction of Integers (3/4)
How to do the following C statement?
a = b + c + d - e;
Break into multiple instructions
add x10, x1, x2 # a_temp = b + c
add x10, x10, x3 # a_temp = a_temp + d
sub x10, x10, x4 # a = a_temp - e
Notice: A single line of C may break up into several lines of RISC-V.
Notice: Everything after the hash mark on each line is ignored (comments). Check Apollo-11 comments!
24
CMPT 295
L06 – RISC V - I
Addition and Subtraction of Integers (4/4)
How do we do this?
f = (g + h) - (i + j);
Use intermediate temporary register
add x5, x20, x21 # a_temp = g + h
add x6, x22, x23 # b_temp = i + j
sub x19, x5, x6 # f = (g + h)- (i + j)
25
CMPT 295
L06 – RISC V - I
RISCV Agenda
Basic Arithmetic Instructions
Comments
x0 (zero)
Immediates
Data Transfer Instructions
Decision Making Instructions
Bonus: C to RISCV Practice
Bonus: Additional Instructions
26
CMPT 295
L06 – RISC V - I
Comments in Assembly
Another way to make your code more readable: comments!
Hash (#) is used for RISC-V comments
anything from hash mark to end of line is a comment and will be ignored
This is just like the C99 //
Note: Different from C.
C comments have format
/* comment */
so they can span many lines
27
CMPT 295
L06 – RISC V - I
The Zero Register
Zero appears so often in code and is so useful that it has its own register!
Register zero (x0 or zero) always has the value 0 and cannot be changed!
i.e. any instruction with x0 as dst has no effect
Example Uses:
add s3, x0, x0 # c=0
add s1, s2, x0 # a=b
28
CMPT 295
L06 – RISC V - I
RISCV Agenda
Basic Arithmetic Instructions
Comments
x0 (zero)
Immediates
Data Transfer Instructions
Decision Making Instructions
Bonus: C to RISCV Practice
Bonus: Additional Instructions
29
CMPT 295
L06 – RISC V - I
Immediates
Numerical constants are called immediates
Separate instruction syntax for immediates:
opi dst, src, imm
Operation names end with ‘i’, replace 2nd source register with an immediate
Example Uses:
addi s1, s2, 5 # a=b+5
addi s3, s3, 1 # c++
Why no subi instruction?
30
CMPT 295
L06 – RISC V - I
Goal of RISC is to minimize instruction set. subi dst, src, imm = addi dst, src, -imm.
30
Immediates
Immediates are numerical constants.
They appear often in code, so there are special instructions for them.
Add Immediate:
addi x3,x4,10 (in RISC-V)
f = g + 10 (in C)
where RISC-V registers x3,x4 are associated with C variables f, g
Syntax similar to add instruction, except that last argument is a number instead of a register.
31
CMPT 295
L06 – RISC V - I
Immediates
There is no Subtract Immediate in RISC-V: Why?
There are add and sub, but no addi counterpart
Limit types of operations that can be done to absolute minimum
if an operation can be decomposed into a simpler operation, don’t include it
addi …, -X = subi …, X => so no subi
addi x3,x4,-10 (in RISC-V)
f = g – 10 (in C)
where RISC-V registers x3, x4 are associated with C variables f, g
32
CMPT 295
L06 – RISC V – I
Processor
Control
Datapath
Data Transfer:
Load from and Store to memory
PC
Registers
Arithmetic & Logic Unit
(ALU)
Memory
Input
Output
Bytes
Enable?
Read/Write
Address
Write Data = Store to memory
Read Data = Load from
memory
Processor-Memory Interface
I/O-Memory Interfaces
Program
Data
Much larger place
To hold values, but slower than registers!
Fast but limited place
To hold values
33
CMPT 295
L06 – RISC V – I
0
1
2
3
…
Memory Addresses are in Bytes
Data typically smaller than 32 bits, but rarely smaller than 8 bits (e.g., char type)–works fine if everything is a multiple of 8 bits
8 bit chunk is called a byte
(1 word = 4 bytes)
Memory addresses are really
in bytes, not words
Word addresses are 4 bytes apart
Word address is same as address of
rightmost byte – least-significant byte
(i.e. Little-endian convention)
34
Least-significant byte in a word
0
4
8
12
…
1
5
9
13
…
2
6
10
14
…
3
7
11
15
…
31 24
23 16
15 8
7 0
Least-significant byte
gets the smallest address
CMPT 295
L06 – RISC V – I
Big-endian and little-endian derive from Jonathan Swift’s Gulliver’s Travels in which the Big Endians were a political faction that broke their eggs at the large end (“the primitive way”) and rebelled against the Lilliputian King who required his subjects (the Little Endians) to break their eggs at the small end.
Big Endian vs. Little Endian
Big Endian
ADDR3 ADDR2 ADDR1 ADDR0
BYTE0 BYTE1 BYTE2 BYTE3
00000001 00000100 00000000 00000000
Little Endian
ADDR3 ADDR2 ADDR1 ADDR0
BYTE3 BYTE2 BYTE1 BYTE0
00000000 00000000 00000100 00000001
Consider the number 1025 as we normally write it:
BYTE3 BYTE2 BYTE1 BYTE0
00000000 00000000 00000100 00000001
The order in which BYTES are stored in memory
Bits always stored as usual. (E.g., 0xC2=0b 1100 0010)
en.wikipedia.org/wiki/Big_endian
35
CMPT 295
L06 – RISC V – I
Great Idea #3: Principle of Locality / Memory Hierarchy
36
CMPT 295
L06 – RISC V – I
Speed of Registers vs. Memory
Given that
Registers: 32 words (128 Bytes)
Memory (DRAM): Billions of bytes (2 GB to 64 GB on laptop)
and physics dictates…
Smaller is faster
How much faster are registers than DRAM??
About 100-500 times faster! (in terms of latency of one access)
37
CMPT 295
L06 – RISC V – I
Load from Memory to Register
C code
int A[100];
g = h + A[3];
Using Load Word (lw) in RISC-V:
lw x10,12(x15) # Reg x10 gets A[3]
add x11,x12,x10 # g = h + A[3]
Note: x15 – base register (pointer to A[0])
12 – offset in bytes
Offset must be a constant known at assembly time
Data flow
38
CMPT 295
L06 – RISC V – I
38
Store from Register to Memory
C code
int A[100];
A[10] = h + A[3];
Using Store Word (sw) in RISC-V:
lw x10,12(x15) # Temp reg x10 gets A[3]
add x10,x12,x10 # Temp reg x10 gets h + A[3]
sw x10,40(x15) # A[10] = h + A[3]
Note: x15 – base register (pointer)
12,40 – offsets in bytes
x15+12 and x15+40 must be multiples of 4
Data flow
39
CMPT 295
L06 – RISC V – I
80 – 5 (Clicker) – 3 (News/Administrativia) = 72 (36 slides max)
Clicker at 20 (half+)
39
Loading and Storing Bytes
In addition to word data transfers
(lw, sw), RISC-V has byte data transfers:
load byte: lb
store byte: sb
Same format as lw, sw
E.g., lb x10,3(x11)
contents of memory location with address = sum of “3” + contents of register x11 is copied to the low byte position of register x10.
byte
loaded
zzz zzzz
x
This bit
…is copied to “sign-extend”
xxxx xxxx xxxx xxxx xxxx xxxx
x10:
RISC-V also has “unsigned byte” loads (lbu) which zero extends to fill register. Why no unsigned store byte sbu?
40
CMPT 295
L06 – RISC V – I
40
RISCV Agenda
Basic Arithmetic Instructions
Comments
x0 (zero)
Immediates
Data Transfer Instructions
Decision Making Instructions
Bonus: C to RISCV Practice
Bonus: Additional Instructions
41
CMPT 295
L06 – RISC V – I
Decision Making Instructions
Branch If Equal (beq)
beq reg1,reg2,label
If value in reg1 = value in reg2, go to label
Branch If Not Equal (bne)
bne reg1,reg2,label
If value in reg1 ≠ value in reg2, go to label
Jump (j)
j label
Unconditional jump to label
42
CMPT 295
L06 – RISC V – I
Types of Branches
Branch – change of control flow
Conditional Branch – change control flow depending on outcome of comparison
branch if equal (beq) or branch if not equal (bne)
Also branch if less than (blt) and branch if greater than or equal (bge)
Unconditional Branch – always branch
a RISC-V instruction for this: jump (j), as in j label
43
CMPT 295
L06 – RISC V – I
Breaking Down the If Else
C Code:
if(i==j) {
a = b /* then */
} else {
a = -b /* else */
}
In English:
If TRUE, execute the THEN block
If FALSE, execute the ELSE block
RISCV (beq):
# i→s0, j→s1
# a→s2, b→s3
beq s0,s1,then
else:
sub s2, x0, s3
j end
then:
add s2, s3, x0
end:
44
???
This label unnecessary
???
CS61C Su18 – Lecture 5
CMPT 295
L06 – RISC V – I
Breaking Down the If Else
C Code:
if(i==j) {
a = b /* then */
} else {
a = -b /* else */
}
In English:
If TRUE, execute the THEN block
If FALSE, execute the ELSE block
RISCV (bne):
# i→s0, j→s1
# a→s2, b→s3
bne s0,s1,else
then:
add s2, s3, x0
j end
else:
sub s2, x0, s3
end:
45
???
???
CS61C Su18 – Lecture 5
CMPT 295
L06 – RISC V – I
Branching on Conditions other than (Not) Equal
Set Less Than (slt)
slt dst, reg1,reg2
If value in reg1 < value in reg2, dst = 1, else 0
Set Less Than Immediate (slti)
slti dst, reg1,imm
If value in reg1 < imm, dst = 1, else 0
46
CMPT 295
L06 – RISC V - I
46
Breaking Down the If Else
C Code:
if(i
48
CMPT 295
L06 – RISC V – I
48
Breaking Down the If Else
C Code:
if(i
56
# t0 = *p
# *q = t0
# p = p + 1
# q = q + 1
# if *p==0, go to Exit
# go to Loop
CMPT 295
L06 – RISC V – I
C to RISCV Practice
Fill in lines:
# copy String p to q
# p→s0, q→s1 (pointers)
Loop: lb $t0,0($s0) # t0 = *p
sb $t0,0($s1) # *q = t0
addi $s0,$s0,1 # p = p + 1
addi $s1,$s1,1 # q = q + 1
beq $t0,$0,Exit # if *p==0, go to Exit
j Loop # go to Loop
Exit: # N chars in p => N*6 instructions
57
lb t0,0(s0)
sb t0,0(s1)
addi s0,s0,1
addi s1,s1,1
beq t0,0,Exit
CMPT 295
L06 – RISC V – I
C to RISCV Practice
Finished code:
# copy String p to q
# p→$s0, q→$s1 (pointers)
Loop: lb t0,0(s0) # t0 = *p
sb t0,0(s1) # *q = t0
addi s0,s0,1 # p = p + 1
addi s1,s1,1 # q = q + 1
beq t0,x0,Exit # if *p==0, go to Exit
j Loop # go to Loop
Exit: # N chars in p => N*6 instructions
58
CMPT 295
L06 – RISC V – I
Is this the only way to write out this function? Of course not.
58
C to RISCV Practice
Alternate code using bne:
# copy String p to q
# p→s0, q→s1 (pointers)
Loop: lb t0,0(s0) # t0 = *p
sb t0,0(s1) # *q = t0
addi s0,s0,1 # p = p + 1
addi s1,s1,1 # q = q + 1
bne t0,x0,Loop # if *p!=0, go to Loop
# N chars in p => N*5 instructions
59
CMPT 295
L06 – RISC V – I
Used fewer instructions!
Instead of “exit when equal to zero”, we are now using “loop when not equal to zero.”
59
RISCV Arithmetic Instructions
Multiplication (mul and mulh)
mul dst, src1, src2
mulh dst, src1, src2
src1*src2: lower 32-bits through mul, upper 32-bits in mulh
Division (div)
div dst, src1, src2
rem dst, src1, src2
src1/src2: quotient via div, remainder via rem
60
CMPT 295
L06 – RISC V – I
RISCV Bitwise Instructions
Note: a→s1, b→s2, c→s3
61
Instruction C RISCV
And a = b & c; and s1,s2,s3
And Immediate a = b & 0x1; andi s1,s2,0x1
Or a = b | c; or s1,s2,s3
Or Immediate a = b | 0x5; ori s1,s2,0x5
Exclusive Or a = b ^ c; xor s1,s2,s3
Exclusive Or Immediate a = b ^ 0xF; xori s1,s2,0xF
CMPT 295
L06 – RISC V – I
Shifting Instructions
In binary, shifting an unsigned number left is the same as multiplying by the corresponding power of 2
Shifting operations are faster
Does not work with shifting right/division
Logical shift: Add zeros as you shift
Arithmetic shift: Sign-extend as you shift
Only applies when you shift right (preserves sign)
Shift by immediate or value in a register
62
CMPT 295
L06 – RISC V – I
Shifting Instructions
63
Instruction Name RISCV
Shift Left Logical sll s1,s2,s3
Shift Left Logical Imm slli s1,s2,imm
Shift Right Logical srl s1,s2,s3
Shift Right Logical Imm srli s1,s2,imm
Shift Right Arithmetic sra s1,s2,s3
Shift Right Arithmetic Imm srai s1,s2,imm
When using immediate, only values 0-31 are practical
When using variable, only lowest 5 bits are used (read as unsigned)
CMPT 295
L06 – RISC V – I
Shifting Instructions
# sample calls to shift instructions
addi t0,x0 ,-256 # t0=0xFFFFFF00
slli s0,t0,3 # s0=0xFFFFF800
srli s1,t0,8 # s1=0x00FFFFFF
srai s2,t0,8 # s2=0xFFFFFFFF
addi t1,x0 ,-22 # t1=0xFFFFFFEA
# low 5: 0b01010
sll s3,t0,t1 # s3=0xFFFC0000
# same as slli s3,t0,10
64
CMPT 295
L06 – RISC V – I
Shifting Instructions
Example 1:
# lb using lw: lb s1,1(s0)
lw s1,0(s0) # get word
andi s1,s1,0xFF00 # get 2nd byte
srli s1,s1,8 # shift into lowest
65
CMPT 295
L06 – RISC V – I
Shifting Instructions
Example 2:
# sb using sw: sb s1,3(s0)
lw t0,0(s0) # get current word
andi t0,t0,0xFFFFFF # zero top byte
slli t1,s1,24 # shift into highest
or t0,t0,t1 # combine
sw t0,0(s0) # store back
66
CMPT 295
L06 – RISC V – I
Shifting Instructions
Extra for Experts:
Rewrite the two preceding examples to be more general
Assume that the byte offset (e.g. 1 and 3 in the examples, respectively) is contained in s2
Hint:
The variable shift instructions will come in handy
Remember, the offset can be negative
67
CMPT 295
L06 – RISC V – I
Switch Statement Example
Multiple case labels
Here: 5 & 6
Fall through cases
Here: 2
Missing cases
Here: 4
Implemented with:
Jump table
Indirect jump instruction
68
long switch_ex
(long x, long y, long z)
{
long w = 1;
switch (x) {
case 1:
w = y*z;
break;
case 2:
w = y/z;
/* Fall Through */
case 3:
w += z;
break;
case 5:
case 6:
w -= z;
break;
default:
w = 2;
}
return w;
}
CMPT 295
L06 – RISC V – I
Jump Table Structure
69
Code
Block 0
Targ0:
Code
Block 1
Targ1:
Code
Block 2
Targ2:
Code
Block n–1
Targn-1:
•
•
•
Targ0
Targ1
Targ2
Targn-1
•
•
•
JTab:
target = JTab[x];
goto target;
switch (x) {
case val_0:
Block 0
case val_1:
Block 1
• • •
case val_n-1:
Block n–1
}
Switch Form
Approximate Translation
Jump Table
Jump Targets
CMPT 295
L06 – RISC V – I
Jump Table Structure
70
switch (x) {
case 1:
break;
case 2:
case 3:
break;
case 5:
case 6:
break;
default:
}
Code
Blocks
Memory
Use the jump table when x 6:
if (x <= 6)
target = JTab[x];
goto target;
else
goto default;
C code:
0
1
2
3
4
5
6
Jump
Table
CMPT 295
L06 – RISC V - I
.section .rodata
.align 8
.L4:
.quad .L8 # x = 0
.quad .L3 # x = 1
.quad .L5 # x = 2
.quad .L9 # x = 3
.quad .L8 # x = 4
.quad .L7 # x = 5
.quad .L7 # x = 6
Jump Table
71
Jump table
switch(x) {
case 1: // .L3
w = y*z;
break;
case 2: // .L5
w = y/z;
/* Fall Through */
case 3: // .L9
w += z;
break;
case 5:
case 6: // .L7
w -= z;
break;
default: // .L8
w = 2;
}
declaring data, not instructions
8-byte memory alignment
this data is 64-bits wide
CMPT 295
L06 – RISC V - I
.rodata – read-only data section
Handling Fall-Through
72
long w = 1;
. . .
switch (x) {
. . .
case 2: // .L5
w = y/z;
/* Fall Through */
case 3: // .L9
w += z;
break;
. . .
}
case 3:
w = 1;
case 2:
w = y/z;
goto merge;
merge:
w += z;
More complicated choice than “just fall-through” forced by “migration” of w = 1;
Example compilation trade-off
CMPT 295
L06 – RISC V - I