Many of the following slides are taken with permission from
Complete Powerpoint Lecture Notes for
Computer Systems: A Programmer’s Perspective (CS:APP)
Randal E. Bryant and David R. O’Hallaron
http://csapp.cs.cmu.edu/public/lectures.html
The book is used explicitly in CS 2505 and CS 3214 and as a reference in CS 2506.
Many other slides were based on notes written by Dr Godmar Back for CS 3214.
CS 3114 Optimization 1
CS@VT Computer Organization II ©2005-2017 McQuain
“Big-O” O(…), Θ(…)
– Describes asymptotic behavior of time or space cost function of an algorithm as input size grows
– Subject of complexity analysis (CS 3114)
– Determine if a problem is tractable or not
Example:
– Quicksort – O(n log n) average case
– Bubble Sort – O(n^2) average case
Actual cost may be C1 * N^2 + C2 * N + C3
These constants can matter and optimization can reduce them
– Determine how big of a problem a tractable algorithm can handle in a concrete implementation on a given machine
Asymptotic Complexity Optimization 2
CS@VT Computer Organization II ©2005-2017 McQuain
Programmer:
– Choice of algorithm, Big-O
– Manual application of some
optimizations
– Choice of program structure that’s amenable to optimization
– Avoidance of “optimization blockers”
High-Level
Optimizing Compiler
– Applies transformations that preserve semantics, but reduce amount of, or time spent in computations
– Provides efficient mapping of code to machine:
Selects and orders code
Performs register allocation
– Usually consists of multiple stages
Low-Level
Roles of Programmer vs Compiler Optimization 3
CS@VT Computer Organization II ©2005-2017 McQuain
Programmer Compiler
-O0 (“O zero”)
– This is the default: minimal optimizations
-O1
– Apply optimizations that can be done quickly -O2
– Apply more expensive optimizations. That’s a reasonable default for running production code. Typical ratio between –O2 and –O0 is 5-20.
-O3
– Apply even more expensive optimizations -Os
– Optimize for code size
See ‘info gcc’ for list which optimizations are enabled when; note that –f switches
may enable additional optimizations that are not included in –O
Note: ability to debug code symbolically under gdb decreases with optimization level; usually use –O0 –g or –O1 –g or –ggdb3
Controlling Optimization with gcc Optimization 4
CS@VT Computer Organization II ©2005-2017 McQuain
Fundamentally, must emit code that implements specified semantics under all conditions
– Can’t apply optimizations even if they would only change behavior in corner case a programmer may not think of
– Due to memory aliasing
– Due to unseen procedure side-effects
Do not see beyond current compilation unit
Intraprocedural analysis typically more extensive (since cheaper) than interprocedural analysis
Usually base decisions on static information
Limitations of Optimizing Compilers Optimization 5
CS@VT Computer Organization II ©2005-2017 McQuain
Copy Propagation
Code Motion
Strength Reduction
Common Subexpression Elimination Eliminating Memory Accesses
– Through use of registers Inlining
Types of Optimizations Optimization 6
CS@VT Computer Organization II ©2005-2017 McQuain
int arith1(int x, int y, int z) {
int x_plus_y = x + y;
int x_minus_y = x – y;
int x_plus_z = x + z;
int x_minus_z = x – z;
int y_plus_z = y + z;
int y_minus_z = y – z;
int xy_prod = x_plus_y * x_minus_y; int xz_prod = x_plus_z * x_minus_z; int yz_prod = y_plus_z * y_minus_z;
return xy_prod + xz_prod + yz_prod;
}
Which produces faster code?
int arith2(int x, int y, int z)
{
return (x + y) * (x – y) + (x + z) * (x – z)
}
+ (y + z) * (y – z);
Copy Propagation Optimization 7
CS@VT Computer Organization II ©2005-2017 McQuain
arith1:
leal (%rdx,%rdi), %ecx movl %edi, %eax
subl %edx, %eax
imull %ecx, %eax
movl %esi, %ecx
subl %edx, %ecx
addl %esi, %edx
imull %edx, %ecx
movl %edi, %edx
subl %esi, %edx
addl %edi, %esi
imull %esi, %edx
addl %ecx, %eax
addl %edx, %eax
ret
arith2:
leal (%rdx,%rdi), %ecx movl %edi, %eax
subl %edx, %eax
imull %ecx, %eax
movl %esi, %ecx
subl %edx, %ecx
addl %esi, %edx
imull %edx, %ecx
movl %edi, %edx
subl %esi, %edx
addl %edi, %esi
imull %esi, %edx
addl %ecx, %eax
addl %edx, %eax
ret
Copy Propagation Optimization 8
CS@VT Computer Organization II ©2005-2017 McQuain
#include
int sum(int a[], int n)
{
int i, s = 0;
for (i = 0; i < n; i++)
s += a[i];
return s;
}
int main() {
int v[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }; int s = sum(v, 10);
printf("Sum is %d\n", s);
}
.LC0:
.string "Sum is %d\n" main:
...
movl $55, 4(%esp) movl $.LC0, (%esp) call printf
Constant Propagation Optimization 9
CS@VT Computer Organization II ©2005-2017 McQuain
Do not repeat computations if result is known Usually out of loops (“code hoisting”)
for (i = 0; i < n; i++) {
int ni = n*i;
for (j = 0; j < n; j++) a[ni + j] = b[j];
}
for (i = 0; i < n; i++) for (j = 0; j < n; j++)
a[n*i + j] = b[j];
CodeMotion Optimization 10
CS@VT Computer Organization II ©2005-2017 McQuain
Substitute lower cost operation for more expensive one
– E.g., replace 48*x with (x << 6) – (x << 4)
– Often machine dependent
Strength Reduction Optimization 11
CS@VT Computer Organization II ©2005-2017 McQuain
Reuse already computed expressions
/* Sum neighbors of i,j */
up = val[(i-1)*n + j];
down = val[(i+1)*n + j];
left = val[i*n + j-1];
right = val[i*n + j+1];
sum = up + down + left + right;
3 multiplications: i*n, (i–1)*n, (i+1)*n
int inj = i*n + j;
up = val[inj - n];
down = val[inj + n];
left = val[inj - 1];
right = val[inj + 1];
sum = up + down + left + right;
1 multiplication: i*n
CommonSubexpressionElimination Optimization 12
CS@VT Computer Organization II ©2005-2017 McQuain
Registers are faster than memory
How many memory accesses?
double sp1(double *x, double *y) {
double sum = *x * *x + *y * *y; double diff = *x * *x - *y * *y; return sum * diff;
}
sp1:
(%rdi), %xmm1 (%rsi), %xmm2 %xmm1, %xmm1 %xmm2, %xmm2
movsd
movsd
mulsd
mulsd
movapd %xmm1, %xmm0
subsd addsd mulsd ret
%xmm2, %xmm1 %xmm2, %xmm0 %xmm1, %xmm0
Number of memory accesses not related to how often pointer dereferences occur in source code
Eliminating Memory Accesses, Take 1 Optimization 13
CS@VT Computer Organization II ©2005-2017 McQuain
void sp1(double *x, double *y,
double *sum, double *prod)
{
*sum = *x + *y; *prod = *x * *y;
}
sp1:
movsd addsd movsd movsd mulsd movsd ret
(%rdi), %xmm0 (%rsi), %xmm0 %xmm0, (%rdx) (%rdi), %xmm0 (%rsi), %xmm0 %xmm0, (%rcx)
Eliminating Memory Accesses, Take 2 Optimization 14 Order of accesses matters
How many memory accesses?
CS@VT Computer Organization II ©2005-2017 McQuain
void sp2(double *x, double *y,
double *sum, double *prod)
{
double xlocal = *x; double ylocal = *y;
*sum = xlocal + ylocal;
*prod = xlocal * ylocal; }
sp2:
movsd (%rdi), %xmm0 movsd (%rsi), %xmm2 movapd %xmm0, %xmm1
mulsd addsd movsd movsd ret
%xmm2, %xmm0 %xmm2, %xmm1 %xmm1, (%rdx) %xmm0, (%rcx)
Eliminating Memory Accesses, Take 3 Optimization 15 Compiler can't know that sum or prod will never point to same location as x or y!
CS@VT Computer Organization II ©2005-2017 McQuain
How many memory accesses?