程序代写代做代考 assembler compiler cache C assembly Carnegie Mellon

Carnegie Mellon
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
1
14

513
18

613

Carnegie Mellon
Machine-Level Programming IV: Data
15-213/18-213/14-513/15-513/18-613: Introduction to Computer Systems 8th Lecture, September 24, 2020
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
2

Carnegie Mellon
Announcements
 Recitation Monday: C Review / C Bootcamp
 Lab 2 (bomblab)
▪ Due Tues, Sept. 29, 11:59pm ET
 Written Assignment 2 peer grading ▪ Due Wed, Sept. 30, 11:59pm ET
 Written Assignment 3 available on Canvas ▪ Due Wed, Sept. 30, 11:59pm ET
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
3

Carnegie Mellon
Today
 Arrays
▪ One-dimensional
▪ Multi-dimensional (nested) ▪ Multi-level
 Structures ▪ Allocation
▪ Access
▪ Alignment
 Floating Point
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
4

Carnegie Mellon
Array Allocation
 Basic Principle T A[L];
▪ Array of data type T and length L
▪ Contiguously allocated region of L * sizeof(T) bytes in memory
char string[12];
int val[5];
double a[3];
char *p[3];
x
x
x
x
x + 4
x + 8
x + 8
x + 12
x + 12
x + 16
x + 16
x + 16
x + 20
x + 8 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
x + 24
x + 24
5

Carnegie Mellon
Array Access
 Basic Principle T A[L];
▪ Array of data type T and length L
▪ Identifier A can be used as a pointer to array element 0: Type T*
1
5
2
1
3
int val[5];
 Reference Type Value val[4] int 3
val int *
val+1 int *
&val[2] int * val[5] int *(val+1) int
val + i
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
x x+4x+8x+12
x+16 x+20
int *
6

Carnegie Mellon
Array Access
 Basic Principle T A[L];
▪ Array of data type T and length L
▪ Identifier A can be used as a pointer to array element 0: Type T*
1
5
2
1
3
int val[5];
 Reference Type
val[4] int
val int *
val+1 int *
&val[2] int *
val[5] int
*(val+1) int
Value
val + i int *
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
7
x x+4x+8x+12
x+16 x+20
3
x
x+4
x +8
??
5 //val[1] x + 4 * i //&val[i]

Carnegie Mellon
Array Example
#define ZLEN 5
typedef int zip_dig[ZLEN];
zip_dig cmu = { 1, 5, 2, 1, 3 }; zip_dig mit = { 0, 2, 1, 3, 9 }; zip_dig ucb = { 9, 4, 7, 2, 0 };
1
5
2
1
3
zip_dig cmu;
zip_dig mit;
zip_dig ucb;
16 20 24 28 32 36 36 40 44 48 52 56
56 60 64 68 72 76
0
2
1
3
9
9
4
7
2
0
 Declaration “zip_dig cmu” equivalent to “int cmu[5]”
 Example arrays were allocated in successive 20 byte blocks
▪ Not guaranteed to happen in general Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
8

Carnegie Mellon
Array Accessing Example
1
5
2
1
3
zip_dig cmu;
int get_digit
(zip_dig z, int digit)
{
return z[digit];
}
x86-64
◼ Register %rdi contains starting address of array
◼ Register %rsi contains array index
◼ Desired digit at %rdi + 4*%rsi
◼ Use memory reference (%rdi,%rsi,4)
16 20 24 28 32 36
# %rdi = z
# %rsi = digit
movl (%rdi,%rsi,4), %eax # z[digit]
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
9

Carnegie Mellon
Array Loop Example
# %rdi = z
movl $0,%eax jmp .L3
.L4:
# i=0
# goto middle # loop:
addl $1, (%rdi,%rax,4) # z[i]++
addq $1,%rax .L3:
cmpq $4,%rax
jbe .L4 # if <=, goto loop rep; ret Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 10 #define ZLEN 5 void zincr(zip_dig z) { size_t i; for (i = 0; i < ZLEN; i++) z[i]++; } # i++ # middle # i:4 Carnegie Mellon Understanding Pointers & Arrays #1 Decl A1 , A2 *A1 , *A2 Comp Bad Size Comp Bad Size int A1[3] int *A2 A1 A2 Allocated pointer Unallocated pointer Allocated int Unallocated int  Comp: Compiles (Y/N)  Bad: Possible bad pointer reference (Y/N)  Size: Value returned by sizeof Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 11 Carnegie Mellon Understanding Pointers & Arrays #1 Decl A1 , A2 *A1 , *A2 Comp Bad Size Comp Bad Size int A1[3] Y N 12 Y N 4 int *A2 Y N 8 Y Y 4 A1 A2 Allocated pointer Unallocated pointer Allocated int Unallocated int  Comp: Compiles (Y/N)  Bad: Possible bad pointer reference (Y/N)  Size: Value returned by sizeof Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 12 Carnegie Mellon Understanding Pointers & Arrays #2 Decl An *An **An Cmp Bad Size Cmp Bad Size Cmp Bad Size int A1[3] int *A2[3] int (*A3)[3] A1 A2 A3 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 13 Allocated pointer Unallocated pointer Allocated int Unallocated int Carnegie Mellon Understanding Pointers & Arrays #2 Decl An *An **An Cmp Bad Size Cmp Bad Size Cmp Bad Size int A1[3] Y N 12 Y N 4 N - - int *A2[3] Y N 24 Y N 8 Y Y 4 int (*A3)[3] Y N 8 Y Y 12 Y Y 4 A1 A2 A3 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 14 Allocated pointer Unallocated pointer Allocated int Unallocated int Carnegie Mellon Multidimensional (Nested) Arrays  Declaration T A[R][C]; ▪ 2D array of data type T ▪ R rows, C columns  Array Size ▪ R * C * sizeof(T) bytes  Arrangement ▪ Row-Major Ordering int A[R][C]; A[0][0] • • • A[0][C-1] •• •• •• A[R-1][0] • • • A[R-1][C-1] A [0] [0] ••• A [0] [C-1] A [1] [0] ••• A [1] [C-1] ••• A [R-1] [0] ••• A [R-1] [C-1] 4*R*C Bytes Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 15 Carnegie Mellon Nested Array Example #define PCOUNT 4 typedef int zip_dig[5]; zip_dig pgh[PCOUNT] = {{1, 5, 2, 0, 6}, {1, 5, 2, 1, 3 }, {1, 5, 2, 1, 7 }, {1, 5, 2, 2, 1 }}; zip_dig pgh[4]; 76 96 116 136 156  “zip_dig pgh[4]” equivalent to “int pgh[4][5]” ▪ Variable pgh: array of 4 elements, allocated contiguously ▪ Each element is an array of 5 int’s, allocated contiguously  “Row-Major” ordering of all elements in memory Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 16 1 5 2 0 6 1 5 2 1 3 1 5 2 1 7 1 5 2 2 1 Carnegie Mellon Nested Array Row Access  Row Vectors ▪ A[i] is array of C elements of type T ▪ Starting address A + i * (C * sizeof(T)) int A[R][C]; A[0] A[i] A[R-1] ••• ••• A+(i*C*4) A+((R-1)*C*4) A [0] [0] ••• A [0] [C-1] A [i] [0] ••• A [i] [C-1] A [R-1] [0] ••• A [R-1] [C-1] A Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 17 Carnegie Mellon Nested Array Row Access Code pgh pgh[2] 1 5 2 0 6 1 5 2 1 3 1 5 2 1 7 1 5 2 2 1  Row Vector ▪ pgh[index] is array of 5 int’s ▪ Starting address pgh+20*index  Machine Code ▪ Computes and returns address ▪ Compute as pgh + 4*(index+4*index) Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 18 int *get_pgh_zip(int index) { return pgh[index]; } # %rdi = index leaq (%rdi,%rdi,4),%rax # 5 * index leaqpgh(,%rax,4),%rax #pgh+(20*index) Carnegie Mellon Nested Array Element Access  Array Elements ▪ A[i][j] is element of type T, which requires K bytes ▪AddressA+i*(C*K)+ j*K = A + (i * C + j) * K int A[R][C]; A[0] A[i] A[R-1] A [0] [0] ••• A [0] [C-1] ••• A [i] [j] ••• A [R-1] [0] ••• A [R-1] [C-1] A ••• ••• A+(i*C*4) A+((R-1)*C*4) A+(i*C*4)+(j*4) Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 19 Carnegie Mellon Nested Array Element Access Code pgh pgh[1][1] 1 5 2 0 6 1 5 2 1 3 1 5 2 1 7 1 5 2 2 1  Array Elements ▪ pgh[index][dig] is int ▪ Address: pgh + 20*index + 4*dig = pgh + 4*(5*index + dig) Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 20 int get_pgh_digit(int index, int dig) { return pgh[index][dig]; } leaq (%rdi,%rdi,4), %rax addl %rax, %rsi movl pgh(,%rsi,4), %eax # 5*index # 5*index+dig # M[pgh + 4*(5*index+dig)] Carnegie Mellon Multi-Level Array Example zip_dig cmu = { 1, 5, 2, 1, 3 }; zip_dig mit = { 0, 2, 1, 3, 9 }; zip_dig ucb = { 9, 4, 7, 2, 0 };    Variableunivdenotes array of 3 elements Each element is a pointer ▪ 8 bytes Eachpointerpointstoarray of int’s #define UCOUNT 3 int *univ[UCOUNT] = {mit, cmu, ucb}; 1 5 2 1 3 univ 160 168 176 cmu mit16 20 24 28 32 36 48 52 56 68 72 76 36 16 56 0 2 1 3 9 ucb 36 40 44 56 60 64 9 4 7 2 0 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 21 Carnegie Mellon Element Access in Multi-Level Array int get_univ_digit (size_t index, size_t digit) { return univ[index][digit]; } salq $2, %rsi # 4*digit addq univ(,%rdi,8), %rsi # p = univ[index] + 4*digit movl (%rsi), %eax # return *p ret  Computation ▪ Element access Mem[Mem[univ+8*index]+4*digit] ▪ Must do two memory reads ▪ First get pointer to row array ▪ Then access element within array Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 22 Carnegie Mellon Array Element Accesses Nested array Multi-level array int get_pgh_digit (size_t index, size_t digit) { return pgh[index][digit]; } Accesses looks similar in C, but address computations very different: Mem[pgh+20*index+4*digit] Mem[Mem[univ+8*index]+4*digit] Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 23 int get_univ_digit (size_t index, size_t digit) { return univ[index][digit]; } Carnegie Mellon N X N Matrix Code  Fixed dimensions ▪ Know value of N at compile time  Variable dimensions, explicit indexing ▪ Traditional way to implement dynamic arrays  Variable dimensions, implicit indexing ▪ Now supported by gcc Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 24 #define N 16 typedef int fix_matrix[N][N]; /* Get element A[i][j] */ int fix_ele(fix_matrix A, size_t i, size_t j) { return A[i][j]; } #define IDX(n, i, j) ((i)*(n)+(j)) /* Get element A[i][j] */ int vec_ele(size_t n, int *A, size_t i, size_t j) { return A[IDX(n,i,j)]; } /* Get element A[i][j] */ int var_ele(size_t n, int A[n][n], size_t i, size_t j) { return A[i][j]; } Carnegie Mellon 16 X 16 Matrix Access  Array Elements ▪ int A[16][16]; ▪AddressA+i*(C*K)+ j*K ▪ C = 16, K = 4 /* Get element A[i][j] */ int fix_ele(fix_matrix A, size_t i, size_t j) { return A[i][j]; } # A in %rdi, i in %rsi, j in %rdx salq $6, %rsi # 64*i addq %rsi, %rdi # A + 64*i movl (%rdi,%rdx,4), %eax # Mem[A + 64*i + 4*j] ret Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 25 Carnegie Mellon n X n Matrix Access  Array Elements ▪ size_t n; ▪ int A[n][n]; ▪AddressA+i*(C*K)+ j*K ▪ C = n, K = 4 ▪ Must perform integer multiplication /* Get element A[i][j] */ int var_ele(size_t n, int A[n][n], size_t i, size_t j) { return A[i][j]; } # n in %rdi, A in %rsi, i in %rdx, j in %rcx imulq %rdx, %rdi # n*i leaq (%rsi,%rdi,4), %rax # A + 4*n*i movl (%rax,%rcx,4), %eax # A + 4*n*i + 4*j ret Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 26 Carnegie Mellon Example: Array Access #include #define ZLEN 5
#define PCOUNT 4
typedef int zip_dig[ZLEN];
int main(int argc, char** argv) { zip_dig pgh[PCOUNT] =
{{1, 5, 2, 0, 6}, {1, 5, 2, 1, 3 }, {1, 5, 2, 1, 7 }, {1, 5, 2, 2, 1 }};
int *linear_zip = (int *) pgh;
int *zip2 = (int *) pgh[2];
int result =
pgh[0][0] + linear_zip[7] + *(linear_zip + 8) + zip2[1];
printf(“result: %d\n”, result);
return 0; }
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
27
linux> ./array
result: 9

Carnegie Mellon
Example: Array Access
#include #define ZLEN 5
#define PCOUNT 4
typedef int zip_dig[ZLEN];
int main(int argc, char** argv) { zip_dig pgh[PCOUNT] =
{{1, 5, 2, 0, 6}, {1, 5, 2, 1, 3 }, {1, 5, 2, 1, 7 }, {1, 5, 2, 2, 1 }};
int *linear_zip = (int *) pgh;
int *zip2 = (int *) pgh[2];
int result =
pgh[0][0] + linear_zip[7] + *(linear_zip + 8) + zip2[1];
printf(“result: %d\n”, result);
return 0; }
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
28
linux> ./array
result: 9

Carnegie Mellon
Quiz Time!
Check out:
https://canvas.cmu.edu/courses/17808
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
29

Carnegie Mellon
Today
 Arrays
▪ One-dimensional
▪ Multi-dimensional (nested) ▪ Multi-level
 Structures ▪ Allocation
▪ Access
▪ Alignment
 Floating Point
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
30

Carnegie Mellon
Structure Representation
r
0 16 24 32
 Structure represented as block of memory ▪ Big enough to hold all of the fields
 Fields ordered according to declaration
▪ Even if another ordering could yield a more compact
representation (due to alignment rules—coming soon)
 Compiler determines overall size + positions of fields
▪ Machine-level program has no understanding of the structures in the source code
struct rec { int a[4]; size_t i;
struct rec *next;
};
a
i
next
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
31

Carnegie Mellon
Generating Pointer to Structure Member
r r+4*idx
0 16 24 32
struct rec { int a[4]; size_t i;
struct rec *next;
};
a
i
next
 Generating Pointer to Array Element
▪ Offset of each structure member determined at compile time
▪ Compute as r + 4*idx
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
32
int *get_ap
(struct rec *r, size_t idx)
{
return &r->a[idx];
}
# r in %rdi, idx in %rsi leaq (%rdi,%rsi,4), %rax ret

Carnegie Mellon
Following Linked List #1
 C Code
r
0 16 24 32
long length(struct rec*r) {
long len = 0L;
while (r) { len ++;
r = r->next;
}
return len; }
struct rec {
int a[4];
size_t i;
struct rec *next;
};
a
i
next
Register
Value
%rdi
r
%rax
len
 Loop assembly code
.L11:
addq $1, %rax
movq 24(%rdi), %rdi
testq %rdi, %rdi
jne .L11
# loop:
# len ++
# r = Mem[r+24]
# Test r
# If != 0, goto loop
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
33

Carnegie Mellon
Following Linked List #2
 C Code
r
0 16 24 32
Element i
void set_val
(struct rec *r, int val)
{
while (r) {
size_t i = r->i; // No bounds check r->a[i] = val;
r = r->next;
} }
struct rec {
int a[4];
size_t i;
struct rec *next;
};
a
i
next
Register
Value
%rdi
r
%rsi
val
.L11: # loop:
movq 16(%rdi), %rax #
movl %esi, (%rdi,%rax,4) #
movq 24(%rdi), %rdi #
testq %rdi, %rdi #
jne .L11 #
i = Mem[r+16]
Mem[r+4*i] = val
r = Mem[r+24]
Test r
if !=0 goto loop
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
34

Carnegie Mellon
Structures & Alignment  Unaligned Data
p p+1 p+5 p+9 p+17
 Aligned Data
▪ Primitive data type requires B bytes implies
Address must be multiple of B
p+0 p+4 p+8 p+16
Multiple of 4 Multiple of 8 Multiple of 8
p+24
Multiple of 8
struct S1 {
char c;
int i[2];
double v; } *p;
c
i[0]
i[1]
v
c
3 bytes
i[0]
i[1]
4 bytes
v
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
35

Carnegie Mellon
Alignment Principles
 Aligned Data
▪ Primitive data type requires B bytes
▪ Address must be multiple of B
▪ Required on some machines; advised on x86-64
 Motivation for Aligning Data
▪ Memory accessed by (aligned) chunks of 4 or 8 bytes (system dependent)
▪ Inefficient to load or store datum that spans cache lines (64 bytes). Intel states should avoid crossing 16 byte boundaries.
[Cache lines will be discussed in Lecture 11.]
▪ Virtual memory trickier when datum spans 2 pages (4 KB pages) [Virtual memory pages will be discussed in Lecture 17.]
 Compiler
▪ Inserts gaps in structure to ensure correct alignment of fields
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
36

Carnegie Mellon
Specific Cases of Alignment (x86-64)
 1 byte: char, …
▪ no restrictions on address
 2 bytes: short, …
▪ lowest 1 bit of address must be 02
 4 bytes: int, float, …
▪ lowest 2 bits of address must be 002
 8 bytes: double, long, char *, … ▪ lowest 3 bits of address must be 0002
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
37

Carnegie Mellon
Satisfying Alignment with Structures  Within structure:
▪ Must satisfy each element’s alignment requirement
 Overall structure placement
▪ Each structure has alignment requirement K
▪ K = Largest alignment of any element
▪ Initial address & structure length must be multiples of K
 Example:
▪ K = 8, due to double element
p+0 p+4 p+8
Multiple of 4 Multiple of 8
Internal padding
p+16 p+24
c
3 bytes
i[0]
i[1]
4 bytes
v
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
38
Multiple of 8
struct S1 {
char c;
int i[2];
double v; } *p;
Multiple of 8

Carnegie Mellon
Meeting Overall Alignment Requirement  For largest alignment requirement K
 Overall structure must be multiple of K
External padding
p+0 p+8 p+16 p+24
Multiple of K=8
v
i[0]
i[1]
c
7 bytes
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
39
struct S2 { double v; int i[2]; char c;
} *p;

Carnegie Mellon
Arrays of Structures
 Overall structure length multiple of K
 Satisfy alignment requirement for every element
a[0]
a[1]
a[2]
•••
a+0 a+24 a+48
a+72
v
i[0]
i[1]
c
7 bytes
a+24
a+32 a+40
a+48
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
40
struct S2 { double v; int i[2]; char c;
} a[10];

Carnegie Mellon
Accessing Array Elements  Compute array offset 12*idx
▪ sizeof(S3), including alignment spacers
 Element j is at offset 8 within structure
 Assembler gives offset a+8 ▪ Resolved during linking
a+0 a+12 a+12*idx
a+12*idx
•••
a[0]
•••
a[idx]
i
2 bytes
v
j
2 bytes
a+12*idx+8
short get_j(int idx)
{
return a[idx].j;
}
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
41
# %rdi = idx
leaq (%rdi,%rdi,2),%rax # 3*idx movzwl a+8(,%rax,4),%eax
struct S3 {
short i;
float v;
short j;
} a[10];

Carnegie Mellon
Saving Space
 Put large data types first
struct S4 { char c;
int i;
char d; } *p;
struct S5 { int i;
char c;
char d; } *p;
c
3 bytes
i
d
3 bytes
 Effect (largest alignment requirement K=4) 8 bytes
12 bytes
i
c
d
2 bytes
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
42

Carnegie Mellon
Example Struct Exam Question
http://www.cs.cmu.edu/~213/oldexams/exam1-f12.pdf
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
43

Carnegie Mellon
Example Struct Exam Question
aXXXXXXXbbbbbbbb ccccdddXeeeeeeee ffffffff|
http://www.cs.cmu.edu/~213/oldexams/exam1-f12.pdf
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
44

Carnegie Mellon
Example Struct Exam Question (Cont’d)
http://www.cs.cmu.edu/~213/oldexams/exam1-f12.pdf
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
45

Carnegie Mellon
Example Struct Exam Question (Cont’d)
adddccccbbbbbbbb eeeeeeeeffffffff|
http://www.cs.cmu.edu/~213/oldexams/exam1-f12.pdf
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
46

Carnegie Mellon
Today
 Arrays
▪ One-dimensional
▪ Multi-dimensional (nested) ▪ Multi-level
 Structures ▪ Allocation
▪ Access
▪ Alignment
 Floating Point
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
47

Carnegie Mellon
Background
 History ▪ x87 FP
▪ Legacy, very ugly ▪ SSE FP
▪ Supported by Shark machines
▪ Special case use of vector instructions ▪ AVX FP
▪ Newest version
▪ Similar to SSE (but registers are 32 bytes instead of 16) ▪ Documented in book
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
48

Carnegie Mellon
Programming with SSE4
XMM Registers
◼ 16 total, each 16 bytes ◼ 16 single-byte integers
◼ 8 16-bit integers
◼ 4 32-bit integers
◼ 4 single-precision floats ◼ 2 double-precision floats ◼ 1 single-precision float
◼ 1 double-precision float
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
49

Carnegie Mellon
Scalar & SIMD Operations
◼ Scalar Operations: Single Precision +
addss %xmm0,%xmm1
%xmm0
%xmm1
addps %xmm0,%xmm1
%xmm0
%xmm1
addsd %xmm0,%xmm1 %xmm0
◼ SIMD Operations: Single Precision ++++
◼ Scalar Operations: Double Precision +
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
50
%xmm1

Carnegie Mellon
FP Basics
 Arguments passed in %xmm0, %xmm1, …  Result returned in %xmm0
 All XMM registers caller-saved
float fadd(float x, float y)
{
return x + y; }
# x in %xmm0, y in %xmm1
addss %xmm1, %xmm0
ret
# x in %xmm0, y in %xmm1
addsd %xmm1, %xmm0
ret
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
51
double dadd(double x, double y)
{
return x + y; }

Carnegie Mellon
FP Memory Referencing
 Integer (and pointer) arguments passed in regular registers
 FP values passed in XMM registers
 Different mov instructions to move between XMM registers, and between memory and XMM registers
double dincr(double *p, double v) {
double x = *p;
*p = x + v;
return x;
}
# p in %rdi, v in %xmm0
movapd %xmm0, %xmm1 # Copy v movsd (%rdi), %xmm0 # x = *p addsd %xmm0, %xmm1 # t = x + v movsd %xmm1, (%rdi) # *p = t ret
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
52

Carnegie Mellon
Other Aspects of FP Code  Lots of instructions
▪ Different operations, different formats, …
 Floating-point comparisons
▪ Instructions ucomiss and ucomisd
▪ Set condition codes ZF, PF and CF
▪ Zeros OF and SF
Parity Flag
 Using constant values
▪ Set XMM0 register to 0 with instruction xorpd %xmm0, %xmm0 ▪ Others loaded from memory
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
53
UNORDERED: ZF,PF,CF←111 GREATER_THAN: ZF,PF,CF←000 LESS_THAN: ZF,PF,CF←001 EQUAL: ZF,PF,CF←100

Carnegie Mellon
Summary
 Arrays
▪ Elements packed into contiguous region of memory ▪ Use index arithmetic to locate individual elements
 Structures
▪ Elements packed into single region of memory
▪ Access using offsets determined by compiler
▪ Possible require internal and external padding to ensure alignment
 Combinations
▪ Can nest structure and array code arbitrarily
 Floating Point
▪ Data held and operated on in XMM registers
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
54

Carnegie Mellon
Understanding Pointers & Arrays #3
int A1[3][5]
int *A2[3][5]
int (*A3)[3][5]
int *(A4[3][5])
int (*A5[3])[5]
Decl
 Cmp: Compiles (Y/N)  Bad: Possible bad
pointer reference (Y/N)  Size: Value returned by
sizeof
Cmp
An
Bad
Size
int A1[3][5]
int *A2[3][5]
Cmp
int (*A3)[3][5]
int *(A4[3][5])
int (*A5[3])[5]
Decl
*An
Bad
Size
Cmp
Cmp
**An
Bad
***An
Bad
Size
Size
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
55

Carnegie Mellon
Declaration
int A1[3][5]
int *A2[3][5]
int (*A3)[3][5]
int *(A4[3][5])
int (*A5[3])[5]
Allocated pointer Allocated pointer to unallocated int Unallocated pointer Allocated int Unallocated int
A1
A2/A4
A3 A5
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
56

Carnegie Mellon
Understanding Pointers & Arrays #3
int A1[3][5]
int *A2[3][5]
int (*A3)[3][5]
int *(A4[3][5])
int (*A5[3])[5]
Decl
Cmp
Y
Y
Y
Y
Y
An
Bad
N
N
N
N
N
Size
60
120
8
120
24
Cmp
Y
Y
Y
Y
Y
Decl
*An
Bad
N
N
Y
N
N
Size
20
40
60
40
8
Cmp
Y
Y
Y
Y
Y
**An
Bad
***An
N
N
Y
N
Y
Size
4
8
20
8
20
 Cmp: Compiles (Y/N)  Bad: Possible bad
pointer reference (Y/N)  Size: Value returned by
sizeof
int A1[3][5]
Cmp
N
Bad

Size

int *A2[3][5]
Y
Y
4
int (*A3)[3][5]
Y
Y
4
int *(A4[3][5])
Y
Y
4
int (*A5[3])[5]
Y
Y
4
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
57