程序代写代做代考 assembler assembly compiler C cache Carnegie Mellon

Carnegie Mellon
Machine-Level Programming IV: Data
15-213/18-213/15-513: Introduction to Computer Systems 8th Lecture, June 4, 2020
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
1

Carnegie Mellon
Today
 Arrays
▪ One-dimensional
▪ Multi-dimensional (nested) ▪ Multi-level
 Structures ▪ Allocation
▪ Access
▪ Alignment
 Floating Point
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
2

Carnegie Mellon
Array Allocation
 Basic Principle T A[L];
▪ Array of data type T and length L
▪ Contiguously allocated region of L * sizeof(T) bytes in memory
char string[12];
int val[5];
double a[3];
char *p[3];
x
x
x
x
x + 4
x + 8
x + 8
x + 12
x + 12
x + 16
x + 16
x + 16
x + 20
x + 8 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
x + 24
x + 24
3

Carnegie Mellon
Array Access
 Basic Principle T A[L];
▪ Array of data type T and length L
▪ Identifier A can be used as a pointer to array element 0: Type T*
1
5
2
1
3
int val[5];
 Reference Type val[4] int
Value
val
val+1
&val[2]
val[5] int *(val+1) int
val + i
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
x x+4x+8x+12
x+16 x+20
int *
int *
int *
int *
4

Carnegie Mellon
Array Access
 Basic Principle T A[L];
▪ Array of data type T and length L
▪ Identifier A can be used as a pointer to array element 0: Type T*
1
5
2
1
3
int val[5];
 Reference Type Value val[4] int 3
val int *
val+1 int *
&val[2] int * val[5] int *(val+1) int
val + i
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
x x+4x+8x+12
x+16 x+20
int *
5

Carnegie Mellon
Array Access
 Basic Principle T A[L];
▪ Array of data type T and length L
▪ Identifier A can be used as a pointer to array element 0: Type T*
1
5
2
1
3
int val[5];
 Reference Type
val[4] int
val int *
val+1 int *
&val[2] int *
val[5] int
*(val+1) int
Value
val + i int *
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
6
x x+4x+8x+12
x+16 x+20
3
x
x+4
x +8
??
5 //val[1] x + 4 * i //&val[i]

Carnegie Mellon
Array Example
#define ZLEN 5
typedef int zip_dig[ZLEN];
zip_dig cmu = { 1, 5, 2, 1, 3 }; zip_dig mit = { 0, 2, 1, 3, 9 }; zip_dig ucb = { 9, 4, 7, 2, 0 };
1
5
2
1
3
zip_dig cmu;
zip_dig mit;
zip_dig ucb;
16 20 24 28 32 36 36 40 44 48 52 56
56 60 64 68 72 76
0
2
1
3
9
9
4
7
2
0
 Declaration “zip_dig cmu” equivalent to “int cmu[5]”
 Example arrays were allocated in successive 20 byte blocks
▪ Not guaranteed to happen in general Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
7

Carnegie Mellon
Array Accessing Example
1
5
2
1
3
zip_dig cmu;
int get_digit
(zip_dig z, int digit)
{
return z[digit];
}
x86-64
◼ Register %rdi contains starting address of array
◼ Register %rsi contains array index
◼ Desired digit at %rdi + 4*%rsi
◼ Use memory reference (%rdi,%rsi,4)
16 20 24 28 32 36
# %rdi = z
# %rsi = digit
movl (%rdi,%rsi,4), %eax # z[digit]
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
8

Carnegie Mellon
Array Loop Example
# %rdi = z
movl $0,%eax jmp .L3
.L4:
# i=0
# goto middle # loop:
addl $1, (%rdi,%rax,4) # z[i]++
addq $1,%rax .L3:
cmpq $4,%rax
jbe .L4 # if <=, goto loop rep; ret Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 9 void zincr(zip_dig z) { size_t i; for (i = 0; i < ZLEN; i++) z[i]++; } # i++ # middle # i:4 Carnegie Mellon Array Loop Example # %rdi = z movl $0,%eax jmp .L3 .L4: # i=0 # goto middle # loop: addl $1, (%rdi,%rax,4) # z[i]++ addq $1,%rax .L3: cmpq $4,%rax jbe .L4 # if <=, goto loop rep; ret Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 10 void zincr(zip_dig z) { size_t i; for (i = 0; i < ZLEN; i++) z[i]++; } # i++ # middle # i:4 Carnegie Mellon Understanding Pointers & Arrays #1 Decl A1 , A2 *A1 , *A2 Comp Bad Size Comp Bad Size int A1[3] int *A2  Comp: Compiles (Y/N)  Bad: Possible bad pointer reference (Y/N)  Size: Value returned by sizeof Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 11 Carnegie Mellon Understanding Pointers & Arrays #1 Decl A1 , A2 *A1 , *A2 Comp Bad Size Comp Bad Size int A1[3] int *A2 A1 A2 Allocated pointer Unallocated pointer Allocated int Unallocated int  Comp: Compiles (Y/N)  Bad: Possible bad pointer reference (Y/N)  Size: Value returned by sizeof Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 12 Carnegie Mellon Understanding Pointers & Arrays #1 Decl A1 , A2 *A1 , *A2 Comp Bad Size Comp Bad Size int A1[3] Y N 12 Y N 4 int *A2 Y N 8 Y Y 4 A1 A2 Allocated pointer Unallocated pointer Allocated int Unallocated int  Comp: Compiles (Y/N)  Bad: Possible bad pointer reference (Y/N)  Size: Value returned by sizeof Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 13 Carnegie Mellon Understanding Pointers & Arrays #2 Decl An *An **An Cmp Bad Size Cmp Bad Size Cmp Bad Size int A1[3] int *A2[3] int (*A3)[3] Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 14 Allocated pointer Unallocated pointer Allocated int Unallocated int Carnegie Mellon Understanding Pointers & Arrays #2 Decl An *An **An Cmp Bad Size Cmp Bad Size Cmp Bad Size int A1[3] int *A2[3] int (*A3)[3] A1 A2 A3 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 15 Allocated pointer Unallocated pointer Allocated int Unallocated int Carnegie Mellon Understanding Pointers & Arrays #2 Decl An *An **An Cmp Bad Size Cmp Bad Size Cmp Bad Size int A1[3] Y N 12 Y N 4 N - - int *A2[3] Y N 24 Y N 8 Y Y 4 int (*A3)[3] Y N 8 Y Y 12 Y Y 4 A1 A2 A3 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 16 Allocated pointer Unallocated pointer Allocated int Unallocated int Carnegie Mellon Multidimensional (Nested) Arrays  Declaration T A[R][C]; ▪ 2D array of data type T ▪ R rows, C columns  Array Size ▪ R * C * sizeof(T) bytes  Arrangement ▪ Row-Major Ordering int A[R][C]; A[0][0] • • • A[0][C-1] •• •• •• A[R-1][0] • • • A[R-1][C-1] A [0] [0] ••• A [0] [C-1] A [1] [0] ••• A [1] [C-1] ••• A [R-1] [0] ••• A [R-1] [C-1] 4*R*C Bytes Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 17 Carnegie Mellon Nested Array Example #define PCOUNT 4 typedef int zip_dig[5]; zip_dig pgh[PCOUNT] = {{1, 5, 2, 0, 6}, {1, 5, 2, 1, 3 }, {1, 5, 2, 1, 7 }, {1, 5, 2, 2, 1 }}; zip_dig pgh[4]; 76 96 116 136 156  “zip_dig pgh[4]” equivalent to “int pgh[4][5]” ▪ Variable pgh: array of 4 elements, allocated contiguously ▪ Each element is an array of 5 int’s, allocated contiguously  “Row-Major” ordering of all elements in memory Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 18 1 5 2 0 6 1 5 2 1 3 1 5 2 1 7 1 5 2 2 1 Carnegie Mellon Nested Array Row Access  Row Vectors ▪ A[i] is array of C elements of type T ▪ Starting address A + i * (C * sizeof(T)) int A[R][C]; A[0] A[i] A[R-1] ••• ••• A+(i*C*4) A+((R-1)*C*4) A [0] [0] ••• A [0] [C-1] A [i] [0] ••• A [i] [C-1] A [R-1] [0] ••• A [R-1] [C-1] A Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 19 Carnegie Mellon Nested Array Row Access Code pgh pgh[2] 1 5 2 0 6 1 5 2 1 3 1 5 2 1 7 1 5 2 2 1  Row Vector ▪ pgh[index] is array of 5 int’s ▪ Starting address pgh+20*index  Machine Code ▪ Computes and returns address ▪ Compute as pgh + 4*(index+4*index) Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 20 int *get_pgh_zip(int index) { return pgh[index]; } # %rdi = index leaq (%rdi,%rdi,4),%rax # 5 * index leaqpgh(,%rax,4),%rax #pgh+(20*index) Carnegie Mellon Nested Array Element Access  Array Elements ▪ A[i][j] is element of type T, which requires K bytes ▪AddressA+i*(C*K)+ j*K = A + (i * C + j) * K int A[R][C]; A[0] A[i] A[R-1] A [0] [0] ••• A [0] [C-1] ••• A [i] [j] ••• A [R-1] [0] ••• A [R-1] [C-1] A ••• ••• A+(i*C*4) A+((R-1)*C*4) A+(i*C*4)+(j*4) Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 21 Carnegie Mellon Nested Array Element Access Code pgh pgh[1][1] 1 5 2 0 6 1 5 2 1 3 1 5 2 1 7 1 5 2 2 1  Array Elements ▪ pgh[index][dig] is int ▪ Address: pgh + 20*index + 4*dig = pgh + 4*(5*index + dig) Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 22 int get_pgh_digit(int index, int dig) { return pgh[index][dig]; } leaq (%rdi,%rdi,4), %rax addl %rax, %rsi movl pgh(,%rsi,4), %eax # 5*index # 5*index+dig # M[pgh + 4*(5*index+dig)] Carnegie Mellon Multi-Level Array Example zip_dig cmu = { 1, 5, 2, 1, 3 }; zip_dig mit = { 0, 2, 1, 3, 9 }; zip_dig ucb = { 9, 4, 7, 2, 0 };    Variableunivdenotes array of 3 elements Each element is a pointer ▪ 8 bytes Eachpointerpointstoarray of int’s #define UCOUNT 3 int *univ[UCOUNT] = {mit, cmu, ucb}; 1 5 2 1 3 univ 160 168 176 cmu mit16 20 24 28 32 36 48 52 56 68 72 76 36 16 56 0 2 1 3 9 ucb 36 40 44 56 60 64 9 4 7 2 0 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 23 Carnegie Mellon Element Access in Multi-Level Array int get_univ_digit (size_t index, size_t digit) { return univ[index][digit]; } salq $2, %rsi # 4*digit addq univ(,%rdi,8), %rsi # p = univ[index] + 4*digit movl (%rsi), %eax # return *p ret  Computation ▪ Element access Mem[Mem[univ+8*index]+4*digit] ▪ Must do two memory reads ▪ First get pointer to row array ▪ Then access element within array Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 24 Carnegie Mellon Array Element Accesses Nested array Multi-level array int get_pgh_digit (size_t index, size_t digit) { return pgh[index][digit]; } Accesses looks similar in C, but address computations very different: Mem[pgh+20*index+4*digit] Mem[Mem[univ+8*index]+4*digit] Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 25 int get_univ_digit (size_t index, size_t digit) { return univ[index][digit]; } Carnegie Mellon N X N Matrix Code  Fixed dimensions ▪ Know value of N at compile time  Variable dimensions, explicit indexing ▪ Traditional way to implement dynamic arrays  Variable dimensions, implicit indexing ▪ Now supported by gcc Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 26 #define N 16 typedef int fix_matrix[N][N]; /* Get element A[i][j] */ int fix_ele(fix_matrix A, size_t i, size_t j) { return A[i][j]; } #define IDX(n, i, j) ((i)*(n)+(j)) /* Get element A[i][j] */ int vec_ele(size_t n, int *A, size_t i, size_t j) { return A[IDX(n,i,j)]; } /* Get element A[i][j] */ int var_ele(size_t n, int A[n][n], size_t i, size_t j) { return A[i][j]; } Carnegie Mellon 16 X 16 Matrix Access  Array Elements ▪ int A[16][16]; ▪AddressA+i*(C*K)+ j*K ▪ C = 16, K = 4 /* Get element A[i][j] */ int fix_ele(fix_matrix A, size_t i, size_t j) { return A[i][j]; } # A in %rdi, i in %rsi, j in %rdx salq $6, %rsi # 64*i addq %rsi, %rdi # A + 64*i movl (%rdi,%rdx,4), %eax # Mem[A + 64*i + 4*j] ret Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 27 Carnegie Mellon n X n Matrix Access  Array Elements ▪ size_t n; ▪ int A[n][n]; ▪AddressA+i*(C*K)+ j*K ▪ C = n, K = 4 ▪ Must perform integer multiplication /* Get element A[i][j] */ int var_ele(size_t n, int A[n][n], size_t i, size_t j) { return A[i][j]; } # n in %rdi, A in %rsi, i in %rdx, j in %rcx imulq %rdx, %rdi # n*i leaq (%rsi,%rdi,4), %rax # A + 4*n*i movl (%rax,%rcx,4), %eax # A + 4*n*i + 4*j ret Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 28 Carnegie Mellon Example: Array Access #include #define ZLEN 5
#define PCOUNT 4
typedef int zip_dig[ZLEN];
int main(int argc, char** argv) { zip_dig pgh[PCOUNT] =
{{1, 5, 2, 0, 6}, {1, 5, 2, 1, 3 }, {1, 5, 2, 1, 7 }, {1, 5, 2, 2, 1 }};
int *linear_zip = (int *) pgh;
int *zip2 = (int *) pgh[2];
int result =
pgh[0][0] + linear_zip[7] + *(linear_zip + 8) + zip2[1];
printf(“result: %d\n”, result);
return 0; }
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
29
linux> ./array

Carnegie Mellon
Example: Array Access
#include #define ZLEN 5
#define PCOUNT 4
typedef int zip_dig[ZLEN];
int main(int argc, char** argv) { zip_dig pgh[PCOUNT] =
{{1, 5, 2, 0, 6}, {1, 5, 2, 1, 3 }, {1, 5, 2, 1, 7 }, {1, 5, 2, 2, 1 }};
int *linear_zip = (int *) pgh;
int *zip2 = (int *) pgh[2];
int result =
pgh[0][0] + linear_zip[7] + *(linear_zip + 8) + zip2[1];
printf(“result: %d\n”, result);
return 0; }
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
30
linux> ./array
result: 9

Carnegie Mellon
Today
 Arrays
▪ One-dimensional
▪ Multi-dimensional (nested) ▪ Multi-level
 Structures ▪ Allocation
▪ Access
▪ Alignment
 Floating Point
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
31

Carnegie Mellon
Structure Representation
r
0 16 24 32
 Structure represented as block of memory ▪ Big enough to hold all of the fields
 Fields ordered according to declaration
▪ Even if another ordering could yield a more compact
representation
 Compiler determines overall size + positions of fields
▪ Machine-level program has no understanding of the structures in the source code
struct rec { int a[4]; size_t i;
struct rec *next;
};
a
i
next
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
32

Carnegie Mellon
Generating Pointer to Structure Member
r r+4*idx
0 16 24 32
struct rec { int a[4]; size_t i;
struct rec *next;
};
a
i
next
 Generating Pointer to Array Element
▪ Offset of each structure member determined at compile time
▪ Compute as r + 4*idx
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
33
int *get_ap
(struct rec *r, size_t idx)
{
return &r->a[idx];
}
# r in %rdi, idx in %rsi leaq (%rdi,%rsi,4), %rax ret

Carnegie Mellon
Following Linked List #1
 C Code
r
0 16 24 32
long length(struct rec*r) {
long len = 0L;
while (r) { len ++;
r = r->next;
}
return len; }
struct rec {
int a[4];
size_t i;
struct rec *next;
};
a
i
next
Register
Value
%rdi
r
%rax
len
 Loop assembly code
.L11:
addq $1, %rax
movq 24(%rdi), %rdi
testq %rdi, %rdi
jne .L11
# loop:
# len ++
# r = Mem[r+24]
# Test r
# If != 0, goto loop
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
34

Carnegie Mellon
Following Linked List #2
 C Code
r
0 16 24 32
Element i
void set_val
(struct rec *r, int val)
{
while (r) {
size_t i = r->i; // No bounds check r->a[i] = val;
r = r->next;
} }
struct rec {
int a[4];
size_t i;
struct rec *next;
};
a
i
next
Register
Value
%rdi
r
%rsi
val
.L11: # loop:
movq 16(%rdi), %rax #
movl %esi, (%rdi,%rax,4) #
movq 24(%rdi), %rdi #
testq %rdi, %rdi #
jne .L11 #
i = Mem[r+16]
Mem[r+4*i] = val
r = Mem[r+24]
Test r
if !=0 goto loop
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
35

Carnegie Mellon
Structures & Alignment  Unaligned Data
p p+1 p+5 p+9 p+17
 Aligned Data
▪ Primitive data type requires B bytes implies
Address must be multiple of B
p+0 p+4 p+8 p+16
Multiple of 4 Multiple of 8 Multiple of 8
p+24
Multiple of 8
struct S1 {
char c;
int i[2];
double v; } *p;
c
i[0]
i[1]
v
c
3 bytes
i[0]
i[1]
4 bytes
v
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
36

Carnegie Mellon
Alignment Principles
 Aligned Data
▪ Primitive data type requires B bytes
▪ Address must be multiple of B
▪ Required on some machines; advised on x86-64
 Motivation for Aligning Data
▪ Memory accessed by (aligned) chunks of 4 or 8 bytes (system dependent)
▪ Inefficient to load or store datum that spans cache lines (64 bytes). Intel states should avoid crossing 16 byte boundaries.
[Cache lines will be discussed in Lecture 11.]
▪ Virtual memory trickier when datum spans 2 pages (4 KB pages) [Virtual memory pages will be discussed in Lecture 17.]
 Compiler
▪ Inserts gaps in structure to ensure correct alignment of fields
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
37

Carnegie Mellon
Specific Cases of Alignment (x86-64)
 1 byte: char, …
▪ no restrictions on address
 2 bytes: short, …
▪ lowest 1 bit of address must be 02
 4 bytes: int, float, …
▪ lowest 2 bits of address must be 002
 8 bytes: double, long, char *, … ▪ lowest 3 bits of address must be 0002
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
38

Carnegie Mellon
Satisfying Alignment with Structures  Within structure:
▪ Must satisfy each element’s alignment requirement
 Overall structure placement
▪ Each structure has alignment requirement K
▪ K = Largest alignment of any element
▪ Initial address & structure length must be multiples of K
 Example:
▪ K = 8, due to double element
NOTE: K < sizeof(struct S1) p+0 p+4 p+8 p+16 Multiple of 4 Multiple of 8 p+24 c 3 bytes i[0] i[1] 4 bytes v Multiple of 8 Internal padding Multiple of 8 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 39 struct S1 { char c; int i[2]; double v; } *p; Carnegie Mellon Meeting Overall Alignment Requirement  For largest alignment requirement K  Overall structure must be multiple of K External padding p+0 p+8 p+16 p+24 Multiple of K=8 v i[0] i[1] c 7 bytes Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 40 struct S2 { double v; int i[2]; char c; } *p; Carnegie Mellon Arrays of Structures  Overall structure length multiple of K  Satisfy alignment requirement for every element a[0] a[1] a[2] ••• a+0 a+24 a+48 a+72 v i[0] i[1] c 7 bytes a+24 a+32 a+40 a+48 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 41 struct S2 { double v; int i[2]; char c; } a[10]; Carnegie Mellon Accessing Array Elements  Compute array offset 12*idx ▪ sizeof(S3), including alignment spacers  Element j is at offset 8 within structure  Assembler gives offset a+8 ▪ Resolved during linking a+0 a+12 a+12*idx a+12*idx ••• a[0] ••• a[idx] i 2 bytes v j 2 bytes short get_j(int idx) { return a[idx].j; } a+12*idx+8 struct S3 { short i; float v; short j; } a[10]; # %rdi = idx leaq (%rdi,%rdi,2),%rax # 3*idx movzwl a+8(,%rax,4),%eax Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 42 Carnegie Mellon Saving Space  Put large data types first struct S4 { char c; int i; char d; } *p; struct S5 { int i; char c; char d; } *p; c 3 bytes i d 3 bytes  Effect (largest alignment requirement K=4) 8 bytes 12 bytes i c d 2 bytes Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 43 Carnegie Mellon Activity Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 44 44 Carnegie Mellon Today  Arrays ▪ One-dimensional ▪ Multi-dimensional (nested) ▪ Multi-level  Structures ▪ Allocation ▪ Access ▪ Alignment  Floating Point Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 45 Carnegie Mellon Background  History ▪ x87 FP ▪ Legacy, very ugly ▪ SSE FP ▪ Supported by Shark machines ▪ Special case use of vector instructions ▪ AVX FP ▪ Newest version ▪ Similar to SSE (but registers are 32 bytes instead of 16) ▪ Documented in book Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 46 Carnegie Mellon Programming with SSE4 XMM Registers ◼ 16 total, each 16 bytes ◼ 16 single-byte integers ◼ 8 16-bit integers ◼ 4 32-bit integers ◼ 4 single-precision floats ◼ 2 double-precision floats ◼ 1 single-precision float ◼ 1 double-precision float Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 47 Carnegie Mellon Scalar & SIMD Operations ◼ Scalar Operations: Single Precision + addss %xmm0,%xmm1 %xmm0 %xmm1 addps %xmm0,%xmm1 %xmm0 %xmm1 addsd %xmm0,%xmm1 %xmm0 ◼ SIMD Operations: Single Precision ++++ ◼ Scalar Operations: Double Precision + Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 48 %xmm1 Carnegie Mellon FP Basics  Arguments passed in %xmm0, %xmm1, ...  Result returned in %xmm0  All XMM registers caller-saved float fadd(float x, float y) { return x + y; } # x in %xmm0, y in %xmm1 addss %xmm1, %xmm0 ret # x in %xmm0, y in %xmm1 addsd %xmm1, %xmm0 ret Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 49 double dadd(double x, double y) { return x + y; } Carnegie Mellon FP Memory Referencing  Integer (and pointer) arguments passed in regular registers  FP values passed in XMM registers  Different mov instructions to move between XMM registers, and between memory and XMM registers double dincr(double *p, double v) { double x = *p; *p = x + v; return x; } # p in %rdi, v in %xmm0 movapd %xmm0, %xmm1 # Copy v movsd (%rdi), %xmm0 # x = *p addsd %xmm0, %xmm1 # t = x + v movsd %xmm1, (%rdi) # *p = t ret Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 50 Carnegie Mellon Other Aspects of FP Code  Lots of instructions ▪ Different operations, different formats, ...  Floating-point comparisons ▪ Instructions ucomiss and ucomisd ▪ Set condition codes ZF, PF and CF ▪ Zeros OF and SF Parity Flag  Using constant values ▪ Set XMM0 register to 0 with instruction xorpd %xmm0, %xmm0 ▪ Others loaded from memory Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 51 UNORDERED: ZF,PF,CF←111 GREATER_THAN: ZF,PF,CF←000 LESS_THAN: ZF,PF,CF←001 EQUAL: ZF,PF,CF←100 Carnegie Mellon Summary  Arrays ▪ Elements packed into contiguous region of memory ▪ Use index arithmetic to locate individual elements  Structures ▪ Elements packed into single region of memory ▪ Access using offsets determined by compiler ▪ Possible require internal and external padding to ensure alignment  Combinations ▪ Can nest structure and array code arbitrarily  Floating Point ▪ Data held and operated on in XMM registers Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 52