Static Program Analysis
Part 4 – flow sensitive analyses
http://cs.au.dk/~amoeller/spa/
Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University
Agenda
• Constant propagation analysis
• Live variables analysis
• Available expressions analysis
• Very busy expressions analysis
• Reaching definitions analysis
• Initialized variables analysis
2
Constant propagation optimization
var x,y,z;
x = 27;
y = input,
z = 2*x+y;
if (x<0) { y=z-3; } else { y=12 }
output y;
var x,y,z;
x = 27;
y = input;
z = 54+y;
if (0) { y=z-3; } else { y=12 }
output y;
var y;
y = input; output 12;
3
Constant propagation analysis
• Determinevariableswithaconstantvalue • Flatlattice:
⊤
-3 -2 -1 0 1
⊥
23
4
Constraints for constant propagation
• Essentially as for the Sign analysis...
• Abstract operator for addition:
⊥ ifn=⊥∨m=⊥ +(n,m) = ⊤ else if n=⊤ ∨ m=⊤
n+m otherwise
5
Agenda
• Constant propagation analysis
• Live variables analysis
• Available expressions analysis
• Very busy expressions analysis
• Reaching definitions analysis
• Initialized variables analysis
6
Liveness analysis
• A variable is live at a program point if its current value may be read in the remaining execution
• Thisisclearlyundecidable,butthepropertycanbe conservatively approximated
• Theanalysismustonlyanswer“dead” if the variable is really dead
– no need to store the values of dead variables
7
A lattice for liveness
A powerset lattice of program variables
var x,y,z;
x = input;
while (x>1) {
y = x/2;
if (y>3) x = x-y; z = x-4;
if (z>0) x = x/2; z = z-1;
}
output x;
L = (2{x,y,z}, )
the trivial answer
{x,y,z}
{x,y} {y,z} {x,z}
{x} {y} {z}
8
The control flow graph
x = input
x>1
y = x/2
y>3
x = x-y
var x,y,z
z = x-4
z>0
x = x/2
z = z-1
output x
9
Setting up
• For every CFG node, v, we have a variable ⟦v⟧: – the subset of program variables that are live at the
program point before v
• Sincetheanalysisisconservative,thecomputedsets
may be too large
• Auxiliarydefinition:
JOIN(v) = ⟦w⟧ wsucc(v)
w1
v
wk
w2
10
Liveness constraints
• For the exit node: ⟦exit⟧ =
• Forconditionsandoutput:
⟦if (E)⟧ = ⟦outputE ⟧ = JOIN(v) vars(E)
• Forassignments:
⟦ x=E ⟧ = JOIN(v) \ {x} vars(E)
• For variable declarations:
⟦ var x1, …, xn ⟧ = JOIN(v) \ {x1, …, xn}
• For all other nodes: ⟦v⟧ = JOIN(v)
vars(E) = variables occurring in E
right-hand sides are monotone since JOIN is monotone, and …
11
Generated constraints
⟦var x,y,z⟧ = ⟦z=input⟧ \ {x,y,z} ⟦x=input⟧ = ⟦x>1⟧ \ {x}
⟦x>1⟧ = (⟦y=x/2⟧ ⟦output x⟧) {x} ⟦y=x/2⟧ = (⟦y>3⟧ \ {y}) {x}
⟦y>3⟧ = ⟦x=x-y⟧ ⟦z=x-4⟧ {y} ⟦x=x-y⟧ = (⟦z=x-4⟧ \ {x}) {x,y} ⟦z=x-4⟧ = (⟦z>0⟧ \ {z}) {x} ⟦z>0⟧ = ⟦x=x/2⟧ ⟦z=z-1⟧ {z} ⟦x=x/2⟧ = (⟦z=z-1⟧ \ {x}) {x} ⟦z=z-1⟧ = (⟦x>1⟧ \ {z}) {z} ⟦output x⟧ = ⟦exit⟧ {x}
⟦exit⟧ =
12
Least solution
⟦entry⟧ =
⟦var x,y,z⟧ = ⟦x=input⟧ = ⟦x>1⟧ = {x} ⟦y=x/2⟧ = {x} ⟦y>3⟧ = {x,y} ⟦x=x-y⟧ = {x,y} ⟦z=x-4⟧ = {x}
⟦z>0⟧ = {x,z} ⟦x=x/2⟧ = {x,z} ⟦z=z-1⟧ = {x,z} ⟦outputx⟧ = {x} ⟦exit⟧ =
Many non-trivial answers!
13
Optimizations
• Variablesyandzareneversimultaneouslylive they can share the same variable location
• The value assigned in z=z-1 is never read the assignment can be skipped
var x,yz;
x = input;
while (x>1) {
yz = x/2;
if (yz>3) x = x-yz;
yz = x-4;
if (yz>0) x = x/2;
}
output x;
• better register allocation • a few clock cycles saved
14
Time complexity (for the naive algorithm)
• With n CFG nodes and k variables: – thelatticeLnhasheightkn
– so there are at most kn iterations
• Subsets of Vars (the variables in the program) can be represented as bitvectors:
– eachelementhassizek
– each , \, = operation takes time O(k)
• Each iteration uses O(n) bitvector operations: – soeachiterationtakestimeO(kn)
• Total time complexity: O(k2n2)
• Exercise: what is the complexity for the worklist algorithm?
15
Agenda
• Constant propagation analysis
• Live variables analysis
• Available expressions analysis
• Very busy expressions analysis
• Reaching definitions analysis
• Initialized variables analysis
16
Available expressions analysis
• A (nontrivial) expression is available at a program point if its current value has already been computed earlier in the execution
• Theapproximationgenerallyincludestoofew expressions
– the analysis can only report “available” if the expression is definitely available
– no need to re-compute available expressions (e.g. common subexpression elimination)
17
A lattice for available expressions
A reverse powerset lattice of nontrivial expressions
var x,y,z,a,b;
z = a+b;
y = a*b;
while (y > a+b) {
a = a+1;
x = a+b; }
L = (2{a+b, a*b, y>a+b, a+1}, )
18
Reverse powerset lattice
the trivial answer
{a+b, a*b}
{a+b} {a*b}
{a+b, y>a+b} {a+b, a+1}
{y>a+b} {a+1}
{a*b, y>a+b} {a*b, a+1} {y>a+b, a+1}
{a+b, a*b, y>a+b} {a+b, a*b, a+1}
{a+b, a*b, y>a+b, a+1}
{a+b, y>a+b, a+1} {a*b, y>a+b, a+1}
19
The flow graph
var x,y,z,a,b
z=a+b
y=a*b
y>a+b
a=a+1
x=a+b
20
Setting up
• For every CFG node, v, we have a variable ⟦v⟧:
– the subset of program variables that are available at the
program point after v
• Sincetheanalysisisconservative,thecomputedsets
may be too small
• Auxiliarydefinition:
JOIN(v) = ⟦w⟧ wpred(v)
w1
w2
wk
v
21
Auxiliary functions
• The function Xx removes all expressions from X that contain a reference to the variable x
• The function exps(E) is defined as: – exps(intconst) =
– exps(x) =
– exps(input) =
– exps(E1 op E2) = {E1 op E2} exps(E1) exps(E2)
but don’t include expressions containing input
22
Availability constraints
• For the entry node: ⟦entry⟧ =
• For conditions and output:
⟦if (E)⟧ = ⟦outputE ⟧ = JOIN(v) exps(E)
• For assignments:
⟦ x = E ⟧ = (JOIN(v) exps(E))x
• Foranyothernodev: ⟦v⟧ = JOIN(v)
23
Generated constraints
⟦entry⟧ =
⟦var x,y,z,a,b⟧ = ⟦entry⟧
⟦z=a+b⟧ = exps(a+b)z
⟦y=a*b⟧ = (⟦z=a+b⟧ exps(a*b))y
⟦y>a+b⟧ = (⟦y=a*b⟧ ⟦x=a+b⟧) exps(y>a+b) ⟦a=a+1⟧ = (⟦y>a+b⟧ exps(a+1))a
⟦x=a+b⟧ = (⟦a=a+1⟧ exps(a+b))x
⟦exit⟧ = ⟦y>a+b⟧
24
Least solution
Again, many nontrivial answers!
⟦entry⟧ =
⟦var x,y,z,a,b⟧ = ⟦z=a+b⟧ = {a+b} ⟦y=a*b⟧ = {a+b, a*b} ⟦y>a+b⟧ = {a+b, y>a+b} ⟦a=a+1⟧ =
⟦x=a+b⟧ = {a+b}
⟦exit⟧ = {a+b}
25
Optimizations
• We notice that a+b is available before the loop
• The program can be optimized (slightly):
var x,y,x,a,b,aplusb; aplusb = a+b;
z = aplusb;
y = a*b;
while (y > aplusb) { a = a+1;
aplusb = a+b;
x = aplusb;
}
26
Agenda
• Constant propagation analysis
• Live variables analysis
• Available expressions analysis
• Very busy expressions analysis
• Reaching definitions analysis
• Initialized variables analysis
27
Very busy expressions analysis
• A(nontrivial)expressionisverybusyifitwilldefinitely be evaluated before its value changes
• Theapproximationgenerallyincludestoofew expressions
– the answer “very busy” must be the true one – very busy expressions may be pre-computed
(e.g. loop hoisting)
• Same lattice as for available expressions
28
An example program
var x,a,b;
x = input;
a = x-1;
b = x-2;
while (x > 0) {
output a*b-x;
x = x-1; }
output a*b;
The analysis shows that a*b is very busy
29
var x,a,b;
x = input;
a = x-1;
b = x-2;
while (x > 0) {
output a*b-x;
x = x-1; }
output a*b;
Code hoisting
var x,a,b,atimesb;
x = input;
a = x-1;
b = x-2;
atimesb = a*b;
while (x > 0) {
output atimesb-x;
x = x-1; }
output atimesb;
30
Setting up
• For every CFG node, v, we have a variable ⟦v⟧:
– the subset of program variables that are very busy at the
program point before v
• Sincetheanalysisisconservative,thecomputedsets
may be too small
• Auxiliarydefinition:
JOIN(v) = ⟦w⟧ wsucc(v)
w1
v
wk
w2
31
Very busy constraints
• For the exit node: ⟦exit⟧ =
• For conditions and output:
⟦if (E)⟧ = ⟦outputE ⟧ = JOIN(v) exps(E)
• For assignments:
⟦ x=E ⟧ = JOIN(v)x exps(E)
• For all other nodes: ⟦v⟧ = JOIN(v)
32
Agenda
• Constant propagation analysis
• Live variables analysis
• Available expressions analysis
• Very busy expressions analysis
• Reaching definitions analysis
• Initialized variables analysis
33
Reaching definitions analysis
• The reaching definitions for a program point are those assignments that may define the current values of variables
• The conservative approximation may include too many possible assignments
34
A lattice for reaching definitions
The powerset lattice of assignments
L = (2{x=input, y=x/2, x=x-y, z=x-4, x=x/2, z=z-1},)
var x,y,z;
x = input;
while (x > 1) {
y = x/2;
if (y>3) x = x-y; z = x-4;
if (z>0) x = x/2; z = z-1;
}
output x;
35
Reaching definitions constraints
• Forassignments:
⟦ x = E ⟧ = JOIN(v)x { x = E }
• For all other nodes: ⟦v⟧ = JOIN(v)
• Auxiliarydefinition: JOIN(v) = ⟦w⟧
wpred(v)
• The function Xx removes assignments to x from X
w1
w2
wk
v
36
Def-use graph
Reaching definitions define the def-use graph: – like a CFG but with edges from def to use nodes – basis for dead code elimination and code motion
x=input
x=x-y
z=z-1
x>1
y=x/2
z=x-4
output x
y>3
z>0
x=x/2
37
Forward vs. backward
• Aforwardanalysis:
– computes information about the past behavior
– examples: available expressions, reaching definitions
• Abackwardanalysis:
– computes information about the future behavior – examples: liveness, very busy expressions
38
May vs. must
• Amayanalysis:
– describes information that is possibly true
– an over-approximation
– examples: liveness, reaching definitions
• Amustanalysis:
– describes information that is definitely true
– an under-approximation
– examples: available expressions, very busy expressions
39
Classifying analyses
forward
backward
may
example: reaching definitions ⟦v⟧ describes state after v
JOIN(v) = ⨆⟦w⟧ = ⟦w⟧ wpred(v) wpred(v)
example: liveness
⟦v⟧ describes state before v
JOIN(v) = ⨆⟦w⟧ = wsucc(v)
⟦w⟧ wsucc(v)
must
example: available expressions ⟦v⟧ describes state after v
JOIN(v) = ⨆⟦w⟧ = wpred(v)
⟦w⟧ wpred(v)
example: very busy expressions ⟦v⟧ describes state before v
JOIN(v) = ⨆⟦w⟧ = ⟦w⟧ wsucc(v) wsucc(v)
40
Agenda
• Constant propagation analysis
• Live variables analysis
• Available expressions analysis
• Very busy expressions analysis
• Reaching definitions analysis
• Initialized variables analysis
41
Initialized variables analysis
• Computeforeachprogrampointthosevariables that have definitely been initialized in the past
• (Called definite assignment analysis in Java and C#)
• forwardmustanalysis
• Reverse powerset lattice of all variables
JOIN(v) = ⟦w⟧ wpred(v)
• Forassignments:⟦x=E⟧=JOIN(v){x}
• For all others: ⟦v⟧ = JOIN(v)
42