Carnegie Mellon
Machine-Level Programming IV: Data
15-213/18-213/15-513: Introduction to Computer Systems 8th Lecture, June 4, 2020
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
1
Carnegie Mellon
Today
Arrays
▪ One-dimensional
▪ Multi-dimensional (nested) ▪ Multi-level
Structures ▪ Allocation
▪ Access
▪ Alignment
Floating Point
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
2
Carnegie Mellon
Array Allocation
Basic Principle T A[L];
▪ Array of data type T and length L
▪ Contiguously allocated region of L * sizeof(T) bytes in memory
char string[12];
int val[5];
double a[3];
char *p[3];
x
x
x
x
x + 4
x + 8
x + 8
x + 12
x + 12
x + 16
x + 16
x + 16
x + 20
x + 8 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
x + 24
x + 24
3
Carnegie Mellon
Array Access
Basic Principle T A[L];
▪ Array of data type T and length L
▪ Identifier A can be used as a pointer to array element 0: Type T*
1
5
2
1
3
int val[5];
Reference Type val[4] int
Value
val
val+1
&val[2]
val[5] int *(val+1) int
val + i
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
x x+4x+8x+12
x+16 x+20
int *
int *
int *
int *
4
Carnegie Mellon
Array Access
Basic Principle T A[L];
▪ Array of data type T and length L
▪ Identifier A can be used as a pointer to array element 0: Type T*
1
5
2
1
3
int val[5];
Reference Type Value val[4] int 3
val int *
val+1 int *
&val[2] int * val[5] int *(val+1) int
val + i
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
x x+4x+8x+12
x+16 x+20
int *
5
Carnegie Mellon
Array Access
Basic Principle T A[L];
▪ Array of data type T and length L
▪ Identifier A can be used as a pointer to array element 0: Type T*
1
5
2
1
3
int val[5];
Reference Type
val[4] int
val int *
val+1 int *
&val[2] int *
val[5] int
*(val+1) int
Value
val + i int *
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
6
x x+4x+8x+12
x+16 x+20
3
x
x+4
x +8
??
5 //val[1] x + 4 * i //&val[i]
Carnegie Mellon
Array Example
#define ZLEN 5
typedef int zip_dig[ZLEN];
zip_dig cmu = { 1, 5, 2, 1, 3 }; zip_dig mit = { 0, 2, 1, 3, 9 }; zip_dig ucb = { 9, 4, 7, 2, 0 };
1
5
2
1
3
zip_dig cmu;
zip_dig mit;
zip_dig ucb;
16 20 24 28 32 36 36 40 44 48 52 56
56 60 64 68 72 76
0
2
1
3
9
9
4
7
2
0
Declaration “zip_dig cmu” equivalent to “int cmu[5]”
Example arrays were allocated in successive 20 byte blocks
▪ Not guaranteed to happen in general Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
7
Carnegie Mellon
Array Accessing Example
1
5
2
1
3
zip_dig cmu;
int get_digit
(zip_dig z, int digit)
{
return z[digit];
}
x86-64
◼ Register %rdi contains starting address of array
◼ Register %rsi contains array index
◼ Desired digit at %rdi + 4*%rsi
◼ Use memory reference (%rdi,%rsi,4)
16 20 24 28 32 36
# %rdi = z
# %rsi = digit
movl (%rdi,%rsi,4), %eax # z[digit]
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
8
Carnegie Mellon
Array Loop Example
# %rdi = z
movl $0,%eax jmp .L3
.L4:
# i=0
# goto middle # loop:
addl $1, (%rdi,%rax,4) # z[i]++
addq $1,%rax .L3:
cmpq $4,%rax
jbe .L4 # if <=, goto loop rep; ret
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
9
void zincr(zip_dig z) {
size_t i;
for (i = 0; i < ZLEN; i++)
z[i]++; }
# i++
# middle
# i:4
Carnegie Mellon
Array Loop Example
# %rdi = z
movl $0,%eax jmp .L3
.L4:
# i=0
# goto middle # loop:
addl $1, (%rdi,%rax,4) # z[i]++
addq $1,%rax .L3:
cmpq $4,%rax
jbe .L4 # if <=, goto loop rep; ret
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
10
void zincr(zip_dig z) {
size_t i;
for (i = 0; i < ZLEN; i++)
z[i]++; }
# i++
# middle
# i:4
Carnegie Mellon
Understanding Pointers & Arrays #1
Decl
A1 , A2
*A1 , *A2
Comp
Bad
Size
Comp
Bad
Size
int A1[3]
int *A2
Comp: Compiles (Y/N)
Bad: Possible bad pointer reference (Y/N) Size: Value returned by sizeof
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
11
Carnegie Mellon
Understanding Pointers & Arrays #1
Decl
A1 , A2
*A1 , *A2
Comp
Bad
Size
Comp
Bad
Size
int A1[3]
int *A2
A1 A2
Allocated pointer Unallocated pointer Allocated int Unallocated int
Comp: Compiles (Y/N)
Bad: Possible bad pointer reference (Y/N) Size: Value returned by sizeof
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
12
Carnegie Mellon
Understanding Pointers & Arrays #1
Decl
A1 , A2
*A1 , *A2
Comp
Bad
Size
Comp
Bad
Size
int A1[3]
Y
N
12
Y
N
4
int *A2
Y
N
8
Y
Y
4
A1 A2
Allocated pointer Unallocated pointer Allocated int Unallocated int
Comp: Compiles (Y/N)
Bad: Possible bad pointer reference (Y/N) Size: Value returned by sizeof
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
13
Carnegie Mellon
Understanding Pointers & Arrays #2
Decl
An
*An
**An
Cmp
Bad
Size
Cmp
Bad
Size
Cmp
Bad
Size
int A1[3]
int *A2[3]
int (*A3)[3]
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
14
Allocated pointer Unallocated pointer Allocated int Unallocated int
Carnegie Mellon
Understanding Pointers & Arrays #2
Decl
An
*An
**An
Cmp
Bad
Size
Cmp
Bad
Size
Cmp
Bad
Size
int A1[3]
int *A2[3]
int (*A3)[3]
A1 A2
A3
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
15
Allocated pointer Unallocated pointer Allocated int Unallocated int
Carnegie Mellon
Understanding Pointers & Arrays #2
Decl
An
*An
**An
Cmp
Bad
Size
Cmp
Bad
Size
Cmp
Bad
Size
int A1[3]
Y
N
12
Y
N
4
N
-
-
int *A2[3]
Y
N
24
Y
N
8
Y
Y
4
int (*A3)[3]
Y
N
8
Y
Y
12
Y
Y
4
A1 A2
A3
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
16
Allocated pointer Unallocated pointer Allocated int Unallocated int
Carnegie Mellon
Multidimensional (Nested) Arrays
Declaration T A[R][C];
▪ 2D array of data type T ▪ R rows, C columns
Array Size
▪ R * C * sizeof(T) bytes
Arrangement
▪ Row-Major Ordering
int A[R][C];
A[0][0] • • • A[0][C-1]
•• •• ••
A[R-1][0] • • • A[R-1][C-1]
A [0] [0]
•••
A [0] [C-1]
A [1] [0]
•••
A [1] [C-1]
•••
A [R-1] [0]
•••
A [R-1] [C-1]
4*R*C Bytes Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
17
Carnegie Mellon
Nested Array Example
#define PCOUNT 4
typedef int zip_dig[5];
zip_dig pgh[PCOUNT] = {{1, 5, 2, 0, 6},
{1, 5, 2, 1, 3 },
{1, 5, 2, 1, 7 },
{1, 5, 2, 2, 1 }};
zip_dig
pgh[4];
76 96 116 136 156
“zip_dig pgh[4]” equivalent to “int pgh[4][5]” ▪ Variable pgh: array of 4 elements, allocated contiguously
▪ Each element is an array of 5 int’s, allocated contiguously
“Row-Major” ordering of all elements in memory Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
18
1
5
2
0
6
1
5
2
1
3
1
5
2
1
7
1
5
2
2
1
Carnegie Mellon
Nested Array Row Access
Row Vectors
▪ A[i] is array of C elements of type T
▪ Starting address A + i * (C * sizeof(T))
int A[R][C]; A[0]
A[i] A[R-1]
••• •••
A+(i*C*4) A+((R-1)*C*4)
A
[0]
[0]
•••
A
[0]
[C-1]
A
[i]
[0]
•••
A
[i]
[C-1]
A
[R-1]
[0]
•••
A
[R-1]
[C-1]
A
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
19
Carnegie Mellon
Nested Array Row Access Code
pgh pgh[2]
1
5
2
0
6
1
5
2
1
3
1
5
2
1
7
1
5
2
2
1
Row Vector
▪ pgh[index] is array of 5 int’s ▪ Starting address pgh+20*index
Machine Code
▪ Computes and returns address
▪ Compute as pgh + 4*(index+4*index) Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
20
int *get_pgh_zip(int index)
{
return pgh[index];
}
# %rdi = index
leaq (%rdi,%rdi,4),%rax # 5 * index leaqpgh(,%rax,4),%rax #pgh+(20*index)
Carnegie Mellon
Nested Array Element Access
Array Elements
▪ A[i][j] is element of type T, which requires K bytes
▪AddressA+i*(C*K)+ j*K = A + (i * C + j) * K
int A[R][C];
A[0] A[i]
A[R-1]
A [0] [0]
•••
A [0] [C-1]
•••
A [i] [j]
•••
A [R-1] [0]
•••
A [R-1] [C-1]
A
••• •••
A+(i*C*4) A+((R-1)*C*4)
A+(i*C*4)+(j*4)
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
21
Carnegie Mellon
Nested Array Element Access Code
pgh pgh[1][1]
1
5
2
0
6
1
5
2
1
3
1
5
2
1
7
1
5
2
2
1
Array Elements
▪ pgh[index][dig] is int
▪ Address: pgh + 20*index + 4*dig
= pgh + 4*(5*index + dig) Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
22
int get_pgh_digit(int index, int dig)
{
return pgh[index][dig];
}
leaq (%rdi,%rdi,4), %rax addl %rax, %rsi
movl pgh(,%rsi,4), %eax
# 5*index
# 5*index+dig
# M[pgh + 4*(5*index+dig)]
Carnegie Mellon
Multi-Level Array Example
zip_dig cmu = { 1, 5, 2, 1, 3 }; zip_dig mit = { 0, 2, 1, 3, 9 }; zip_dig ucb = { 9, 4, 7, 2, 0 };
Variableunivdenotes array of 3 elements
Each element is a pointer
▪ 8 bytes Eachpointerpointstoarray
of int’s
#define UCOUNT 3
int *univ[UCOUNT] = {mit, cmu, ucb};
1
5
2
1
3
univ 160
168 176
cmu
mit16 20 24
28 32 36 48 52 56
68 72 76
36
16
56
0
2
1
3
9
ucb
36 40 44 56 60 64
9
4
7
2
0
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
23
Carnegie Mellon
Element Access in Multi-Level Array
int get_univ_digit
(size_t index, size_t digit)
{
return univ[index][digit];
}
salq $2, %rsi # 4*digit
addq univ(,%rdi,8), %rsi # p = univ[index] + 4*digit movl (%rsi), %eax # return *p
ret
Computation
▪ Element access Mem[Mem[univ+8*index]+4*digit] ▪ Must do two memory reads
▪ First get pointer to row array
▪ Then access element within array
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
24
Carnegie Mellon
Array Element Accesses
Nested array Multi-level array
int get_pgh_digit
(size_t index, size_t digit)
{
return pgh[index][digit];
}
Accesses looks similar in C, but address computations very different:
Mem[pgh+20*index+4*digit] Mem[Mem[univ+8*index]+4*digit]
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
25
int get_univ_digit
(size_t index, size_t digit)
{
return univ[index][digit];
}
Carnegie Mellon
N X N Matrix Code
Fixed dimensions ▪ Know value of N at
compile time
Variable dimensions, explicit indexing
▪ Traditional way to implement dynamic arrays
Variable dimensions, implicit indexing
▪ Now supported by gcc
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
26
#define N 16
typedef int fix_matrix[N][N]; /* Get element A[i][j] */ int fix_ele(fix_matrix A,
size_t i, size_t j)
{
return A[i][j];
}
#define IDX(n, i, j) ((i)*(n)+(j))
/* Get element A[i][j] */
int vec_ele(size_t n, int *A,
size_t i, size_t j)
{
return A[IDX(n,i,j)];
}
/* Get element A[i][j] */
int var_ele(size_t n, int A[n][n],
size_t i, size_t j) {
return A[i][j];
}
Carnegie Mellon
16 X 16 Matrix Access
Array Elements
▪ int A[16][16]; ▪AddressA+i*(C*K)+ j*K ▪ C = 16, K = 4
/* Get element A[i][j] */
int fix_ele(fix_matrix A, size_t i, size_t j) {
return A[i][j];
}
# A in %rdi, i in %rsi, j in %rdx
salq $6, %rsi # 64*i
addq %rsi, %rdi # A + 64*i
movl (%rdi,%rdx,4), %eax # Mem[A + 64*i + 4*j] ret
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
27
Carnegie Mellon
n X n Matrix Access
Array Elements ▪ size_t n;
▪ int A[n][n]; ▪AddressA+i*(C*K)+ j*K ▪ C = n, K = 4
▪ Must perform integer multiplication
/* Get element A[i][j] */
int var_ele(size_t n, int A[n][n], size_t i, size_t j) {
return A[i][j];
}
# n in %rdi, A in %rsi, i in %rdx, j in %rcx imulq %rdx, %rdi # n*i
leaq (%rsi,%rdi,4), %rax # A + 4*n*i
movl (%rax,%rcx,4), %eax # A + 4*n*i + 4*j ret
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
28
Carnegie Mellon
Example: Array Access
#include
#define PCOUNT 4
typedef int zip_dig[ZLEN];
int main(int argc, char** argv) { zip_dig pgh[PCOUNT] =
{{1, 5, 2, 0, 6}, {1, 5, 2, 1, 3 }, {1, 5, 2, 1, 7 }, {1, 5, 2, 2, 1 }};
int *linear_zip = (int *) pgh;
int *zip2 = (int *) pgh[2];
int result =
pgh[0][0] + linear_zip[7] + *(linear_zip + 8) + zip2[1];
printf(“result: %d\n”, result);
return 0; }
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
29
linux> ./array
Carnegie Mellon
Example: Array Access
#include
#define PCOUNT 4
typedef int zip_dig[ZLEN];
int main(int argc, char** argv) { zip_dig pgh[PCOUNT] =
{{1, 5, 2, 0, 6}, {1, 5, 2, 1, 3 }, {1, 5, 2, 1, 7 }, {1, 5, 2, 2, 1 }};
int *linear_zip = (int *) pgh;
int *zip2 = (int *) pgh[2];
int result =
pgh[0][0] + linear_zip[7] + *(linear_zip + 8) + zip2[1];
printf(“result: %d\n”, result);
return 0; }
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
30
linux> ./array
result: 9
Carnegie Mellon
Today
Arrays
▪ One-dimensional
▪ Multi-dimensional (nested) ▪ Multi-level
Structures ▪ Allocation
▪ Access
▪ Alignment
Floating Point
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
31
Carnegie Mellon
Structure Representation
r
0 16 24 32
Structure represented as block of memory ▪ Big enough to hold all of the fields
Fields ordered according to declaration
▪ Even if another ordering could yield a more compact
representation
Compiler determines overall size + positions of fields
▪ Machine-level program has no understanding of the structures in the source code
struct rec { int a[4]; size_t i;
struct rec *next;
};
a
i
next
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
32
Carnegie Mellon
Generating Pointer to Structure Member
r r+4*idx
0 16 24 32
struct rec { int a[4]; size_t i;
struct rec *next;
};
a
i
next
Generating Pointer to Array Element
▪ Offset of each structure member determined at compile time
▪ Compute as r + 4*idx
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
33
int *get_ap
(struct rec *r, size_t idx)
{
return &r->a[idx];
}
# r in %rdi, idx in %rsi leaq (%rdi,%rsi,4), %rax ret
Carnegie Mellon
Following Linked List #1
C Code
r
0 16 24 32
long length(struct rec*r) {
long len = 0L;
while (r) { len ++;
r = r->next;
}
return len; }
struct rec {
int a[4];
size_t i;
struct rec *next;
};
a
i
next
Register
Value
%rdi
r
%rax
len
Loop assembly code
.L11:
addq $1, %rax
movq 24(%rdi), %rdi
testq %rdi, %rdi
jne .L11
# loop:
# len ++
# r = Mem[r+24]
# Test r
# If != 0, goto loop
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
34
Carnegie Mellon
Following Linked List #2
C Code
r
0 16 24 32
Element i
void set_val
(struct rec *r, int val)
{
while (r) {
size_t i = r->i; // No bounds check r->a[i] = val;
r = r->next;
} }
struct rec {
int a[4];
size_t i;
struct rec *next;
};
a
i
next
Register
Value
%rdi
r
%rsi
val
.L11: # loop:
movq 16(%rdi), %rax #
movl %esi, (%rdi,%rax,4) #
movq 24(%rdi), %rdi #
testq %rdi, %rdi #
jne .L11 #
i = Mem[r+16]
Mem[r+4*i] = val
r = Mem[r+24]
Test r
if !=0 goto loop
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
35
Carnegie Mellon
Structures & Alignment Unaligned Data
p p+1 p+5 p+9 p+17
Aligned Data
▪ Primitive data type requires B bytes implies
Address must be multiple of B
p+0 p+4 p+8 p+16
Multiple of 4 Multiple of 8 Multiple of 8
p+24
Multiple of 8
struct S1 {
char c;
int i[2];
double v; } *p;
c
i[0]
i[1]
v
c
3 bytes
i[0]
i[1]
4 bytes
v
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
36
Carnegie Mellon
Alignment Principles
Aligned Data
▪ Primitive data type requires B bytes
▪ Address must be multiple of B
▪ Required on some machines; advised on x86-64
Motivation for Aligning Data
▪ Memory accessed by (aligned) chunks of 4 or 8 bytes (system dependent)
▪ Inefficient to load or store datum that spans cache lines (64 bytes). Intel states should avoid crossing 16 byte boundaries.
[Cache lines will be discussed in Lecture 11.]
▪ Virtual memory trickier when datum spans 2 pages (4 KB pages) [Virtual memory pages will be discussed in Lecture 17.]
Compiler
▪ Inserts gaps in structure to ensure correct alignment of fields
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
37
Carnegie Mellon
Specific Cases of Alignment (x86-64)
1 byte: char, …
▪ no restrictions on address
2 bytes: short, …
▪ lowest 1 bit of address must be 02
4 bytes: int, float, …
▪ lowest 2 bits of address must be 002
8 bytes: double, long, char *, … ▪ lowest 3 bits of address must be 0002
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
38
Carnegie Mellon
Satisfying Alignment with Structures Within structure:
▪ Must satisfy each element’s alignment requirement
Overall structure placement
▪ Each structure has alignment requirement K
▪ K = Largest alignment of any element
▪ Initial address & structure length must be multiples of K
Example:
▪ K = 8, due to double element
NOTE: K < sizeof(struct S1)
p+0 p+4 p+8 p+16
Multiple of 4 Multiple of 8
p+24
c
3 bytes
i[0]
i[1]
4 bytes
v
Multiple of 8
Internal padding
Multiple of 8
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
39
struct S1 {
char c;
int i[2];
double v; } *p;
Carnegie Mellon
Meeting Overall Alignment Requirement For largest alignment requirement K
Overall structure must be multiple of K
External padding
p+0 p+8 p+16 p+24
Multiple of K=8
v
i[0]
i[1]
c
7 bytes
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
40
struct S2 { double v; int i[2]; char c;
} *p;
Carnegie Mellon
Arrays of Structures
Overall structure length multiple of K
Satisfy alignment requirement for every element
a[0]
a[1]
a[2]
•••
a+0 a+24 a+48
a+72
v
i[0]
i[1]
c
7 bytes
a+24
a+32 a+40
a+48
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
41
struct S2 { double v; int i[2]; char c;
} a[10];
Carnegie Mellon
Accessing Array Elements Compute array offset 12*idx
▪ sizeof(S3), including alignment spacers
Element j is at offset 8 within structure
Assembler gives offset a+8 ▪ Resolved during linking
a+0 a+12 a+12*idx
a+12*idx
•••
a[0]
•••
a[idx]
i
2 bytes
v
j
2 bytes
short get_j(int idx)
{
return a[idx].j;
}
a+12*idx+8
struct S3 {
short i;
float v;
short j;
} a[10];
# %rdi = idx
leaq (%rdi,%rdi,2),%rax # 3*idx movzwl a+8(,%rax,4),%eax
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
42
Carnegie Mellon
Saving Space
Put large data types first
struct S4 { char c;
int i;
char d; } *p;
struct S5 { int i;
char c;
char d; } *p;
c
3 bytes
i
d
3 bytes
Effect (largest alignment requirement K=4) 8 bytes
12 bytes
i
c
d
2 bytes
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
43
Carnegie Mellon
Activity
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
44
44
Carnegie Mellon
Today
Arrays
▪ One-dimensional
▪ Multi-dimensional (nested) ▪ Multi-level
Structures ▪ Allocation
▪ Access
▪ Alignment
Floating Point
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
45
Carnegie Mellon
Background
History ▪ x87 FP
▪ Legacy, very ugly ▪ SSE FP
▪ Supported by Shark machines
▪ Special case use of vector instructions ▪ AVX FP
▪ Newest version
▪ Similar to SSE (but registers are 32 bytes instead of 16) ▪ Documented in book
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
46
Carnegie Mellon
Programming with SSE4
XMM Registers
◼ 16 total, each 16 bytes ◼ 16 single-byte integers
◼ 8 16-bit integers
◼ 4 32-bit integers
◼ 4 single-precision floats ◼ 2 double-precision floats ◼ 1 single-precision float
◼ 1 double-precision float
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
47
Carnegie Mellon
Scalar & SIMD Operations
◼ Scalar Operations: Single Precision +
addss %xmm0,%xmm1
%xmm0
%xmm1
addps %xmm0,%xmm1
%xmm0
%xmm1
addsd %xmm0,%xmm1 %xmm0
◼ SIMD Operations: Single Precision ++++
◼ Scalar Operations: Double Precision +
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
48
%xmm1
Carnegie Mellon
FP Basics
Arguments passed in %xmm0, %xmm1, ... Result returned in %xmm0
All XMM registers caller-saved
float fadd(float x, float y)
{
return x + y; }
# x in %xmm0, y in %xmm1
addss %xmm1, %xmm0
ret
# x in %xmm0, y in %xmm1
addsd %xmm1, %xmm0
ret
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
49
double dadd(double x, double y)
{
return x + y; }
Carnegie Mellon
FP Memory Referencing
Integer (and pointer) arguments passed in regular registers
FP values passed in XMM registers
Different mov instructions to move between XMM registers, and between memory and XMM registers
double dincr(double *p, double v) {
double x = *p;
*p = x + v;
return x;
}
# p in %rdi, v in %xmm0
movapd %xmm0, %xmm1 # Copy v movsd (%rdi), %xmm0 # x = *p addsd %xmm0, %xmm1 # t = x + v movsd %xmm1, (%rdi) # *p = t ret
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
50
Carnegie Mellon
Other Aspects of FP Code Lots of instructions
▪ Different operations, different formats, ...
Floating-point comparisons
▪ Instructions ucomiss and ucomisd
▪ Set condition codes ZF, PF and CF
▪ Zeros OF and SF
Parity Flag
Using constant values
▪ Set XMM0 register to 0 with instruction xorpd %xmm0, %xmm0 ▪ Others loaded from memory
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
51
UNORDERED: ZF,PF,CF←111 GREATER_THAN: ZF,PF,CF←000 LESS_THAN: ZF,PF,CF←001 EQUAL: ZF,PF,CF←100
Carnegie Mellon
Summary
Arrays
▪ Elements packed into contiguous region of memory ▪ Use index arithmetic to locate individual elements
Structures
▪ Elements packed into single region of memory
▪ Access using offsets determined by compiler
▪ Possible require internal and external padding to ensure alignment
Combinations
▪ Can nest structure and array code arbitrarily
Floating Point
▪ Data held and operated on in XMM registers
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
52