程序代写代做代考 compiler C algorithm graph ant COMP90045 Programming Language Implementation

COMP90045 Programming Language Implementation
Global Optimization
Harald Søndergaard Lecture 22
Semester 1, 2019
PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 1 / 37

Liveness Information
So far we have assumed that information about variable liveness is available for register allocation.
In fact, liveness information is valuable for all sorts of optimization as well, so we will discuss it in some detail, as a typical example of global dataflow analysis.
Recall: A use of a variable is an instruction whose outcome depends on the value of the variable.
An variable n is live at program point p if there is a path starting at p on which n is used before being (unambiguously) defined.
PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 2 / 37

Reasoning about Liveness
Here are the basic rules: For each instruction i,
1 Ifi usesv thenv islivejustbeforei.
2 Ifi assignsavaluetov,andv isnotalsousedbyi,thenv is dead just before i.
3 Ifv islivejustafteri,andi doesnotassignavaluetov,thenv is also live just before i.
4 If v is live just before one of i’s successors, then v is live just after i.
Rule 1 says how liveness is generated, rule 2 how liveness is killed, and rules 3 and 4 how it is propagated, including between instructions (rule 4).
PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 3 / 37

Finding Successors for Instructions
Assume instructions are numbered consecutively.
The set of successors, succ[i], of the instruction numbered i, is found using these rules:
1 If the instruction is goto l then l is in succ [i ].
2 If the instruction is if p goto l, then {i + 1, l} ⊆ succ[i].
3 Otherwise i + 1 is in succ[i].
Note that we assume that both outcomes of p are possible. (Whether they are is an undecidable problem.)
We conservatively over-approximate the set of successors.
PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 4 / 37

The gen and kill Sets for Instructions We show the possible instructions schematically.
Instruction i
gen[i ]
kill [i ]
x := k
x := y
x := unop y
x := y binop z
x := a[y]
a[x] := y
goto l
if x relop y goto l x := call f (vs)
∅ {y} {y} {y,z} {y} {x,y} ∅ {x,y} vs
{x} {x} {x} {x} {x} ∅

∅ {x}
For an instruction such as x:=y-x, x is in the gen and the kill set.
PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 5 / 37

The Equations that Define Propagation
We want:
the set in[i] of variables that are live just before i, and the set out[i] of variables that are live just after i.
To find these, solve this set of recursive equations: in[i] = gen[i] ∪ (out[i] \ kill[i])
out[i] = 􏰄k∈succ[i] in[k]
To solve these, initialise all in[i] and out[i] to empty sets, and
repeatedly use the equations to update until no more changes occur.
This works under the assumption that all variables are dead at the end of the program. If not, start from out[ilast] = vs, where ilast is the last instruction and vs is the set of variables that are live after that.
PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 6 / 37

Example: Liveness Analysis
i
succ [i ]
gen[i ]
kill [i ]
1 2 3 4 5 6 7 8 9
10 11
2 3 4 5,11 6
7 8 9 10 4
n, z a, b b t n
a b z
t a b n z
1: a:=0
2: b:=1
3: z:=0
4: ifn=zgoto11
5: t:=a+b
6: a:=b
7: b:=t
8: n:=n−1
9: z:=0
10: goto4
11 :
Let us assume that a is live at the end, that is, out[11] = {a}.
PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 7 / 37

Example: Solving the Liveness Equations
Calculate in this order: out[11], in[11], out[10], in[10], out[9], in[9] . . .
i
Iteration1 out[i] in[i]
Iteration2 out[i] in[i]
1
a,n
n
a,n
n
2
a,b,n
a,n
a,b,n
a,n
3
a,b,n,z
a,b,n
a,b,n,z
a,b,n
4
a,b,n
a,b,n,z
a,b,n
a,b,n,z
5
b,n,t
a,b,n
b,n,t
a,b,n
6
n,t
b,n,t
a,n,t
b,n,t
7
n
n,t
a,b,n
a,n,t
8
n
a,b,n
a,b,n
9
a,b,n,z
a,b,n
10
a,b,n,z
a,b,n,z
11
a
a
a
a
PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 8 / 37

Example: Solving the Liveness Equations
If we do another iteration, we see that the result has not changed: We have found a fixed point for the process.
The final iteration shows what is live at each program point (or we should say: could be live).
We see that n is live at the start of the code. If the code is the body of a function which has n as a parameter, that is all well and good.
PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 9 / 37

Example 2: Liveness Analysis
B1 B2
B3 B4 B5 B6
Find smallest solution to:
IN(B1) = IN(B2) = IN(B3) = IN(B4) = IN(B5) = IN(B6) = OUT(B1) = OUT(B2) = OUT(B3) = OUT(B4) = OUT(B5) = OUT(B6) =
∅∪(OUT(B1)\{x})
{x} ∪ (OUT(B2) \ ∅)
{x} ∪ (OUT(B3) \ {y}) {x,y}∪(OUT(B4)\{x}) {x} ∪ (OUT(B5) \ {z}) {x} ∪ (OUT(B6) \ ∅) IN(B2)
IN(B3) ∪ IN(B6) IN(B4) ∪ IN(B5) IN(B5)
IN(B2)

read int x
if x < 2 goto B6 y := x/2 if y < 4 goto B5 x := x-y z := x - 4 goto B2 write int x PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 10 / 37 Example 2: Liveness Analysis B1 B2 Find smallest solution to: ∅ ∅ {x} ∅ IN(B1) = IN(B2) = IN(B3) = IN(B4) = IN(B5) = IN(B6) = OUT(B1) = OUT(B2) = OUT(B3) = OUT(B4) = OUT(B5) = OUT(B6) = ∅∪(OUT(B1)\{x}) {x}∪OUT(B2) {x} ∪ (OUT(B3) \ {y}) {x,y}∪OUT(B4) {x} ∪ (OUT(B5) \ {z}) {x}∪OUT(B6) IN(B2) IN(B3) ∪ IN(B6) IN(B4) ∪ IN(B5) IN(B5) IN(B2) ∅ read int x if x < 2 goto B6 ∅ y := x/2 if y < 4 goto B5 x := x-y ∅ ∅ z := x - 4 goto B2 write int x B B4 B5 B6 3 PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 11 / 37 Example 2: Liveness Analysis B1 B2 Find smallest solution to: ∅ ∅ {x} {x} IN(B1) = IN(B2) = IN(B3) = IN(B4) = IN(B5) = IN(B6) = OUT(B1) = OUT(B2) = OUT(B3) = OUT(B4) = OUT(B5) = OUT(B6) = ∅∪(OUT(B1)\{x}) {x}∪OUT(B2) {x} ∪ (OUT(B3) \ {y}) {x,y}∪OUT(B4) {x} ∪ (OUT(B5) \ {z}) {x}∪OUT(B6) IN(B2) IN(B3) ∪ IN(B6) IN(B4) ∪ IN(B5) IN(B5) IN(B2) ∅ read int x if x < 2 goto B6 {x} y := x/2 if y < 4 goto B5 {x,y} {x} x := x-y z := x - 4 goto B2 write int x B B4 B5 B6 3 PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 12 / 37 Example 2: Liveness Analysis B1 B2 Find smallest solution to: {x} {x} {x} {x} IN(B1) = IN(B2) = IN(B3) = IN(B4) = IN(B5) = IN(B6) = OUT(B1) = OUT(B2) = OUT(B3) = OUT(B4) = OUT(B5) = OUT(B6) = ∅∪(OUT(B1)\{x}) {x}∪OUT(B2) {x} ∪ (OUT(B3) \ {y}) {x,y}∪OUT(B4) {x} ∪ (OUT(B5) \ {z}) {x}∪OUT(B6) IN(B2) IN(B3) ∪ IN(B6) IN(B4) ∪ IN(B5) IN(B5) IN(B2) ∅ read int x if x < 2 goto B6 {x} y := x/2 if y < 4 goto B5 {x,y} {x} x := x-y z := x - 4 goto B2 write int x B B4 B5 B6 3 PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 13 / 37 Dataflow Analysis Liveness analysis, as presented, is an instance of a general framework for data flow analysis. We discussed subexpression elimination and copy elimination as optimizations within basic blocks, but like many other optimizations, they can be extended to cross basic block boundaries using the same idea as we used for liveness analysis. Dataflow analysis associates four sets of facts with each instruction i: gen[i] = kill[i] = in[i] = out[i] = facts added by i facts removed by i facts before i facts after i The in facts are usually defined exactly as we did for liveness. PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 14 / 37 Dataflow Analysis However, where we had out[i] = 􏰄k∈succ[i] in[k], other analyses may differ. Liveness is a backward analysis, propagating information against the flow of control. If we instead aggregate over k ∈ pred[i], we get forward analyses. And if we change the 􏰄 to 􏰅, we get analyses that reason about what must happen, rather than what may happen. All four combinations have useful instances, for example: Forwards Backwards Must Available expressions Very busy expressions May Reaching definitions Liveness PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 15 / 37 Global Analysis We shall return to data-flow analysis in the nect lecture. For now, we will look at some example transformations, including some that use global information. We use this C fragment for a running example: for (i=0; i != size/2; ++i) if (a[i] < a[i*2]) { x = a[i]; a[i] = a[i*2]; a[i*2] = x; } PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 16 / 37 Flow Graph for the Example B1 B2 B3 B4 B5 B6 PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 17 / 37 i=0 t1 = size/2 if i == t1 goto B6 Code for if a[i] >= a[i*2] goto B5
Code to exchange a[i] and a[i*2]
i = i+1 goto B2

Common Subexpressions, Local
t7 = i * 4
x = a[t7]
t8 = i * 2
t9 = t8 * 4
t10 = a[t9]
t11 = i * 4
a[t11] = t10
t12 = i * 2
t13 = t12 * 4
a[t13] = x
On the right is what B4 looks like in detail. Algebraic simplification (with next-use infor-
mation) can eliminate t8 and t12.
Local common subexpression elimination can
discard t11 and t13.
PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 18 / 37

Exploiting Algebraic Identities
Theidentityx∗2=x+x allowsustoreplacemulr1,r2,2with add r1, r2, r2, which will typically be faster. Implementing x ∗ 3 using two additions will typically also gain speed, although at the expense of code size.
These are examples of strength reduction, because a stronger operator (∗) is replaced by a weaker one (+).
When reordering operations, the compiler must be careful about overflows; as we have seen, some rules we learned in maths class may not hold for fixed-size integers or floating point numbers.
PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 19 / 37

Common Subexpressions
t7 = i * 4
x = a[t7]
t9 = i * 8
t10 = a[t9]
a[t7] = t10
a[t9] = x
For the running example, we are now left with the code shown here.
However, better common subexpression elimination becomes possible if we take a global view and take the neighbouring ba- sic blocks into account . . .
PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 20 / 37

More Common Subexpressions Eliminated
Here is what we currently have.

t2 = i * 4
t3 = a[t2]
t5 = i * 8
t6 = a[t5]
if t3 >= t6 goto B5
But t7, t9, t10 are also redundant. t2 can replace t7, t5 can replace t9, and (more subtly) t6 can replace t10.

B3
t7 = i * 4
x = a[t7]
t9 = i * 8
t10 = a[t9]
a[t7] = t10
a[t9] = x
Global common subexpression elim- ination is an important instance of the dataflow analysis framework.
B4
PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 21 / 37

Continuing the Running Example

t2 = i * 4
t3 = a[t2]
t5 = i * 8
t6 = a[t5]
if t3 >= t6 goto B5
B3

x = t3
a[t2] = t6
a[t5] = x
B4
PLI (Sem 1, 2019)
Global Optimization
⃝c University of Melbourne
22 / 37

Copy Propagation
Copy propagation allows us to remove the use of x in B4, and we have ↓
t2 = i * 4
t3 = a[t2]
t5 = i * 8
t6 = a[t5]
if t3 >= t6 goto B5
B3

x = t3
a[t2] = t6
a[t5] = t3
B4
PLI (Sem 1, 2019)
Global Optimization ⃝c University of Melbourne 23 / 37

Dead Code Elimination
For our example, a liveness analysis reveals that x is not live in its block, so the assignment to x can be removed.
This leaves a highly simplified basic block B4:
a[t2] = t6
a[t5] = t3
PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 24 / 37

Loop Optimisations
A very important target for optimization is the loop.
Even small improvements to inner-most loops can have a huge
impact on runtime performance.
PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 25 / 37

Loop Optimisations: Code Motion
The aim of code motion is to move invariant code outside loops.
In our example, t1 is invari- ant throughout the loop, so the assignment to t1 can be moved outside, as shown.
B1 B2
B3
i=0
t1 = size/2
if i == t1 goto B6
t2 = i * 4
t3 = a[t2]
t5 = i * 8
t6 = a[t5]
if t3 >= t6 goto B5
a[t2] = t6
a[t5] = t3
i=i+1 goto B2

PLI (Sem 1, 2019)
B4 B5
B6 Global Optimization
⃝c University of Melbourne
26 / 37

Loop Optimisations: Strength Reduction
Strength reduction replaces an operation with a cheaper one. For example, a compiler may look for opportunities to replace
multiplications with additions.
In our example, note that throughout the body of the loop, t2 is
equal to 4i.
As i is incremented by one in each iteration, we could simply add 4
to t2 at the same time, removing the multiplication ‘i*4’. Similarly for t5.
Of course we must then initialise t2 and t5 appropriately.
The next slide shows the result of this reduction in strength.
PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 27 / 37

After Strength Reduction
B1
B2 B3
B4
B5
B6
if i == t1 goto B6
t3 = a[t2]
t6 = a[t5]
if t3 >= t6 goto B5
a[t2] = t6 a[t5] = t3
i=i+1 t2 = t2 + 4 t5 = t5 + 8 goto B2

PLI (Sem 1, 2019)
Global Optimization
⃝c University of Melbourne
28 / 37
i=0
t1 = size/2 t2 = 0
t5 = 0

Loop Optimisations: Induction Variables
Here we wish to identify a set of variables that remain in a fixed relationship throughout the execution of a loop, and remove all but one.
In our example, i (the induction variable) is redundant, since we have that t2 = 4i.
So we express the loop condition in terms of t2 instead, and terminate when t2 reaches the value 4(size/2).
PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 29 / 37

Induction Variable Elimination
B1
B2 B3
B4
B5
if t2 == t14 goto B6
t3 = a[t2]
t6 = a[t5]
if t3 >= t6 goto B5
a[t2] = t6 a[t5] = t3
t2 = t2 + 4 t5 = t5 + 8 goto B2

B6
PLI (Sem 1, 2019)
Global Optimization
⃝c University of Melbourne
30 / 37
t14 = size * 2 t2 = 0
t5 = 0

Goto Reduction
Finally, a jump-over-jump analysis yields the code shown here.
Compare the result with the code we started from.
The loop now consists of 8 instructions, down from 19!
B1 B2 B3
B4
B5
B6 Global Optimization
if t2 == t14 goto B6
t3 = a[t2]
t6 = a[t5]
if t3 >= t6 goto B5
a[t2] = t6 a[t5] = t3
t2 = t2 + 4
t5 = t5 + 8
if t2 != t14 goto B3

PLI (Sem 1, 2019)
⃝c University of Melbourne
31 / 37
t14 = size * 2 t2 = 0
t5 = 0

Code Specialization
Sometimes dataflow analysis loses the information we want to exploit when it merges information from the predecessors of a basic block: here, in cases ONE and TWO, we know whether p == NULL will succeed.
We can replace the if-then-else after the switch with an if-then-else at the end of every switch arm.
Each arm’s copy of the if-then-else can then be optimized with the knowledge from that arm.
switch (…) {
case ONE:
p = NULL;
break;
case TWO:
p = &a[42];
break;
case THREE:
p = q;
break; …
}
if (p == NULL) {
… } else {

}
PLI (Sem 1, 2019) Global Optimization
⃝c University of Melbourne
32 / 37

Syntax-Directed Optimization
Some optimizations can be done more conveniently on the abstract syntax tree (a representation of the source program) than on the IR (a representation of the target program), because the AST is higher level and ignores low-level details.
One can perform dataflow analysis on the AST as well as the IR. One application is bounds analysis, which tries to optimize away array bound checks by keeping track of whether the index values are guaranteed to be within bounds or not:
float a[10];
unsigned i = 0;
while (i < 10) { ... a[i] ... i++; /* i is in [0..0] */ /* i is in [0..9] */ /* i is in [1..10] */ } PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 33 / 37 Interprocedural Optimization for (i = 0; i < n; i++) { if (a[i] < f(x)) { ++count; } } You can move f(x) out of the loop only if you know that it doesn’t have side effects. In an imperative language, this requires interprocedural analysis: transmitting information from one procedure to another. Many compilers can do that if the two procedures are in the same module. Few can do it well if the two procedures are in different modules. PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 34 / 37 Inlining Most compilers can perform inlining, which involves replacing a call with a copy of the body of the called procedure. The copy has fresh names for all the locals and has all formal parameters replaced with the corresponding actual parameters (except with value-result, which requires copy-in/copy-out code). Inlining avoids procedure call overheads, but is usually worthwhile only if the callee is small, or if the caller has information that can be used to optimize the callee. int max(int a, int b) { return (a > b) ? a : b;
}
… z = max(x, y); …
PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 35 / 37

High Level Optimizations
The optimizations we have discussed so far are low level: close to the machine. They can help speed up most programs, but typically only by small constant factors: a few percent to a few tens of percents.
If you want bigger speedups than that, the optimizer must change the algorithm.
int fib(int n) {
if (n < 2) return 1; else return fib(n-1) + fib(n-2); } You can change the complexity of this code from exponential to linear by caching the value of fib(k), once computed. PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 36 / 37 Declarative Languages High level optimizations are typically not feasible when the source language is imperative, because with such programs the compiler usually cannot be sure whether reordering or eliminating statements will change the output of the program. When the source language is a pure functional or logic programming language, a call can be deleted if and only if its output is not needed, and two calls can be reordered if and only if neither call computes an input of the other call. No other analysis is needed. Compilers for declarative languages can thus perform optimizations that can yield large speedups. Many of these optimizations are rarely applicable, but when they are, it is better for the compiler to do them than the programmer. PLI (Sem 1, 2019) Global Optimization ⃝c University of Melbourne 37 / 37