代写 algorithm html compiler Many of the following slides are taken with permission from

Many of the following slides are taken with permission from
Complete Powerpoint Lecture Notes for
Computer Systems: A Programmer’s Perspective (CS:APP)
Randal E. Bryant and David R. O’Hallaron
http://csapp.cs.cmu.edu/public/lectures.html
The book is used explicitly in CS 2505 and CS 3214 and as a reference in CS 2506.
Many other slides were based on notes written by Dr Godmar Back for CS 3214.
CS 3114 Optimization 1
CS@VT Computer Organization II ©2005-2017 McQuain

“Big-O” O(…), Θ(…)
– Describes asymptotic behavior of time or space cost function of an algorithm as input size grows
– Subject of complexity analysis (CS 3114)
– Determine if a problem is tractable or not
Example:
– Quicksort – O(n log n) average case
– Bubble Sort – O(n^2) average case
 Actual cost may be C1 * N^2 + C2 * N + C3
These constants can matter and optimization can reduce them
– Determine how big of a problem a tractable algorithm can handle in a concrete implementation on a given machine
Asymptotic Complexity Optimization 2
CS@VT Computer Organization II ©2005-2017 McQuain

Programmer:
– Choice of algorithm, Big-O
– Manual application of some
optimizations
– Choice of program structure that’s amenable to optimization
– Avoidance of “optimization blockers”
High-Level
Optimizing Compiler
– Applies transformations that preserve semantics, but reduce amount of, or time spent in computations
– Provides efficient mapping of code to machine:
 Selects and orders code
 Performs register allocation
– Usually consists of multiple stages
Low-Level
Roles of Programmer vs Compiler Optimization 3
CS@VT Computer Organization II ©2005-2017 McQuain
Programmer Compiler

-O0 (“O zero”)
– This is the default: minimal optimizations
-O1
– Apply optimizations that can be done quickly -O2
– Apply more expensive optimizations. That’s a reasonable default for running production code. Typical ratio between –O2 and –O0 is 5-20.
-O3
– Apply even more expensive optimizations -Os
– Optimize for code size
See ‘info gcc’ for list which optimizations are enabled when; note that –f switches
may enable additional optimizations that are not included in –O
Note: ability to debug code symbolically under gdb decreases with optimization level; usually use –O0 –g or –O1 –g or –ggdb3
Controlling Optimization with gcc Optimization 4
CS@VT Computer Organization II ©2005-2017 McQuain

Fundamentally, must emit code that implements specified semantics under all conditions
– Can’t apply optimizations even if they would only change behavior in corner case a programmer may not think of
– Due to memory aliasing
– Due to unseen procedure side-effects
Do not see beyond current compilation unit
Intraprocedural analysis typically more extensive (since cheaper) than interprocedural analysis
Usually base decisions on static information
Limitations of Optimizing Compilers Optimization 5
CS@VT Computer Organization II ©2005-2017 McQuain

Copy Propagation
Code Motion
Strength Reduction
Common Subexpression Elimination Eliminating Memory Accesses
– Through use of registers Inlining
Types of Optimizations Optimization 6
CS@VT Computer Organization II ©2005-2017 McQuain

int arith1(int x, int y, int z) {
int x_plus_y = x + y;
int x_minus_y = x – y;
int x_plus_z = x + z;
int x_minus_z = x – z;
int y_plus_z = y + z;
int y_minus_z = y – z;
int xy_prod = x_plus_y * x_minus_y; int xz_prod = x_plus_z * x_minus_z; int yz_prod = y_plus_z * y_minus_z;
return xy_prod + xz_prod + yz_prod;
}
Which produces faster code?
int arith2(int x, int y, int z)
{
return (x + y) * (x – y) + (x + z) * (x – z)
}
+ (y + z) * (y – z);
Copy Propagation Optimization 7
CS@VT Computer Organization II ©2005-2017 McQuain

arith1:
leal (%rdx,%rdi), %ecx movl %edi, %eax
subl %edx, %eax
imull %ecx, %eax
movl %esi, %ecx
subl %edx, %ecx
addl %esi, %edx
imull %edx, %ecx
movl %edi, %edx
subl %esi, %edx
addl %edi, %esi
imull %esi, %edx
addl %ecx, %eax
addl %edx, %eax
ret
arith2:
leal (%rdx,%rdi), %ecx movl %edi, %eax
subl %edx, %eax
imull %ecx, %eax
movl %esi, %ecx
subl %edx, %ecx
addl %esi, %edx
imull %edx, %ecx
movl %edi, %edx
subl %esi, %edx
addl %edi, %esi
imull %esi, %edx
addl %ecx, %eax
addl %edx, %eax
ret
Copy Propagation Optimization 8
CS@VT Computer Organization II ©2005-2017 McQuain

#include
int sum(int a[], int n)
{
int i, s = 0;
for (i = 0; i < n; i++) s += a[i]; return s; } int main() { int v[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }; int s = sum(v, 10); printf("Sum is %d\n", s); } .LC0: .string "Sum is %d\n" main: ... movl $55, 4(%esp) movl $.LC0, (%esp) call printf Constant Propagation Optimization 9 CS@VT Computer Organization II ©2005-2017 McQuain Do not repeat computations if result is known Usually out of loops (“code hoisting”) for (i = 0; i < n; i++) { int ni = n*i; for (j = 0; j < n; j++) a[ni + j] = b[j]; } for (i = 0; i < n; i++) for (j = 0; j < n; j++) a[n*i + j] = b[j]; CodeMotion Optimization 10 CS@VT Computer Organization II ©2005-2017 McQuain Substitute lower cost operation for more expensive one – E.g., replace 48*x with (x << 6) – (x << 4) – Often machine dependent Strength Reduction Optimization 11 CS@VT Computer Organization II ©2005-2017 McQuain Reuse already computed expressions /* Sum neighbors of i,j */ up = val[(i-1)*n + j]; down = val[(i+1)*n + j]; left = val[i*n + j-1]; right = val[i*n + j+1]; sum = up + down + left + right; 3 multiplications: i*n, (i–1)*n, (i+1)*n int inj = i*n + j; up = val[inj - n]; down = val[inj + n]; left = val[inj - 1]; right = val[inj + 1]; sum = up + down + left + right; 1 multiplication: i*n CommonSubexpressionElimination Optimization 12 CS@VT Computer Organization II ©2005-2017 McQuain Registers are faster than memory How many memory accesses? double sp1(double *x, double *y) { double sum = *x * *x + *y * *y; double diff = *x * *x - *y * *y; return sum * diff; } sp1: (%rdi), %xmm1 (%rsi), %xmm2 %xmm1, %xmm1 %xmm2, %xmm2 movsd movsd mulsd mulsd movapd %xmm1, %xmm0 subsd addsd mulsd ret %xmm2, %xmm1 %xmm2, %xmm0 %xmm1, %xmm0 Number of memory accesses not related to how often pointer dereferences occur in source code Eliminating Memory Accesses, Take 1 Optimization 13 CS@VT Computer Organization II ©2005-2017 McQuain void sp1(double *x, double *y, double *sum, double *prod) { *sum = *x + *y; *prod = *x * *y; } sp1: movsd addsd movsd movsd mulsd movsd ret (%rdi), %xmm0 (%rsi), %xmm0 %xmm0, (%rdx) (%rdi), %xmm0 (%rsi), %xmm0 %xmm0, (%rcx) Eliminating Memory Accesses, Take 2 Optimization 14 Order of accesses matters How many memory accesses? CS@VT Computer Organization II ©2005-2017 McQuain void sp2(double *x, double *y, double *sum, double *prod) { double xlocal = *x; double ylocal = *y; *sum = xlocal + ylocal; *prod = xlocal * ylocal; } sp2: movsd (%rdi), %xmm0 movsd (%rsi), %xmm2 movapd %xmm0, %xmm1 mulsd addsd movsd movsd ret %xmm2, %xmm0 %xmm2, %xmm1 %xmm1, (%rdx) %xmm0, (%rcx) Eliminating Memory Accesses, Take 3 Optimization 15 Compiler can't know that sum or prod will never point to same location as x or y! CS@VT Computer Organization II ©2005-2017 McQuain How many memory accesses?