Static Program Analysis
Part 3 – lattices and fixpoints
http://cs.au.dk/~amoeller/spa/
Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University
Flow-sensitivity
• Typecheckingis(usually)flow-insensitive:
– statements may be permuted without affecting typability – constraints are naturally generated from AST nodes
• Otheranalysesmustbeflow-sensitive:
– the order of statements affects the results
– constraints are naturally generated from control flow graph nodes
2
Sign analysis
• Determine the sign (+,-,0) of all expressions
• TheSignlattice: “any number”
⊤
+-0
⊥
• StatesaremodeledbythemaplatticeVarsSign where Vars is the set of variables in the program
“not of type number” (or, “unreachable code”)
Implementation: TIP/src/tip/analysis/SignAnalysis.scala
3
1 var a,b; 2 a=42;
3 4
b = a + input;
a = a – b;
3
4
1
2
x1 = [a ↦⊤,b ↦⊤]
x2 = x1[a ↦ +]
x3 = x2[b ↦ x2(a)+⊤]
x4 = x3[a ↦ x3(a)-x3(b)]
Generating constraints
var a,b
a = 42
b = a + input
a=a-b
4
Sign analysis constraints
• The variable ⟦v⟧ denotes a map that gives the sign value for all variables at the program point after node v
• For variable declarations:
⟦ var x1, …, xn ⟧ = JOIN(v)[x1 ↦⊤, …, xn ↦⊤]
• For assignments:
⟦ x = E ⟧ = JOIN(v)[x ↦ eval(JOIN(v),E)]
• For all other nodes: ⟦v⟧ = JOIN(v)
where JOIN(v) = ⨆ ⟦w⟧ wpred(v)
combines information from predecessors (explained later…)
5
Evaluating signs
• The eval function is an abstract evaluation: – eval(,x) = (x)
– eval(,intconst) = sign(intconst)
– eval(, E1 op E2) = op(eval(,E1),eval(,E2))
• : Vars Sign is an abstract state
• The sign function gives the sign of an integer
• The op function is an abstract evaluation of the given operator
6
Abstract operators
+
0
–
+
⊤
0
0
–
+
⊤
–
–
–
⊤
⊤
+
+
⊤
+
⊤
⊤
⊤
⊤
⊤
⊤
–
0
–
+
⊤
0
0
+
–
⊤
–
–
⊤
–
⊤
+
+
+
⊤
⊤
⊤
⊤
⊤
⊤
⊤
*
0
–
+
⊤
0
0
0
0
0
–
0
+
–
⊤
+
0
–
+
⊤
⊤
0
⊤
⊤
⊤
==
0
–
+
⊤
0
+
0
0
⊤
–
0
⊤
0
⊤
+
0
0
⊤
⊤
⊤
⊤
⊤
⊤
⊤
/
0
–
+
⊤
0
0
0
⊤
–
⊤
⊤
⊤
+
⊤
⊤
⊤
⊤
⊤
⊤
⊤
>
0
–
+
⊤
0
0
+
0
⊤
–
0
⊤
0
⊤
+
+
+
⊤
⊤
⊤
⊤
⊤
⊤
⊤
(assuming the subset of TIP with only integer values)
7
Increasing precision
• Some loss of information:
– (2>0)==1 is analyzed as ⊤
– +/+ is analyzed as ⊤, since e.g. 1⁄2 is rounded down
• Usearicherlatticeforbetterprecision: ⊤
+0 -0
+0- 1
⊥
• Abstractoperatorsarenow88tables
8
Partial orders
• Given a set S, a partial order ⊑ is a binary relation on S that satisfies:
– reflexivity:
– transitivity:
– anti-symmetry:
xS: x ⊑ x
x,y,zS: x ⊑ y y ⊑ z x ⊑ z x,yS: x ⊑ y y ⊑ x x = y
• Can be illustrated by a Hasse diagram (if finite)
9
Upper and lower bounds
• Let X S be a subset
• WesaythatySisanupperbound(X⊑y)when
xX: x ⊑ y
• We say that yS is a lower bound (y ⊑ X) when
xX: y ⊑ x
• Aleastupperbound⨆Xisdefinedby
X ⊑ ⨆X yS: X ⊑ y ⨆X ⊑ y
• Agreatestlowerbound⨅Xisdefinedby
⨅X ⊑ X yS: y ⊑ X y ⊑ ⨅X
10
Lattices
• A (complete) lattice is a partial order where ⨆X and ⨅X exist for all X S
• Alatticemusthave
– a unique largest element, ⊤ = ⨆S – a unique smallest element, ⊥ = ⨅S
(exercise)
• If S is a finite set, then it defines a lattice iff
– ⊤ and ⊥ exist in S
– x⊔y and x⊓y exist for all x,y S (x⊔y is notation for ⨆{x,y})
Implementation: TIP/src/tip/lattices/
11
These partial orders are lattices
12
These partial orders are not lattices
13
The powerset lattice
• Every finite set A defines a lattice (2A,) where –⊥=
–⊤=A
– x⊔y = xy – x⊓y = xy
{0,1,2,3}
{0,1}
{0,1,2} {0,1,3}
{0,2} {0,3} {0} {1}
{0,2,3} {1,2,3}
{1,2} {1,3} {2} {3}
{2,3}
{}
14
Lattice height
• The height of a lattice is the length of the longest path from ⊥ to ⊤
• The lattice (2A,) has height |A| {0,1,2,3}
{0,1}
{0,1,2} {0,1,3}
{0,2} {0,3}
{0} {1}
{0,2,3} {1,2,3}
{1,2} {1,3}
{2} {3}
{2,3}
{}
15
Map lattice
• If A is a set and L is a lattice, then we obtain the map lattice:
A L = { [a1↦x1, a2↦x2, …] | A={a1, a2, …} x1, x2 ,… L } ordered pointwise
• ⊔and⊓canbecomputedpointwise • height(A L) = |A|height(L)
Example: A L where
• A is the set of program variables
• L is the Sign lattice
16
Product lattice
• If L1, L2, …, Ln are lattices, then so is the product: L1L2 … Ln = { (x1,x2,…,xn) | xi Li }
where ⊑ is defined pointwise
• Notethat⊔and⊓canbecomputedpointwise
• height(L1L2 … Ln) = height(L1)+ … + height(Ln)
Example:
each Li is the map lattice A L from the previous slide, and n is the number of CFG nodes
17
Flat lattice
• If A is a set, then flat(A) is a lattice: ⊤
a1 a2 … an
⊥
• height(flat(A)) = 2
18
Lift lattice
• If L is a lattice, then so is lift(L), which is:
⊥
• height(lift(L)) = height(L)+1
19
Sign analysis constraints, revisited
• The variable ⟦v⟧ denotes a map that gives the sign value for all variables at the program point after node v
• ⟦v⟧States where States = Vars Sign
• For variable declarations:
⟦ var x1, …, xn ⟧ = JOIN(v)[x1 ↦⊤, …, xn ↦⊤]
• For assignments:
⟦ x = E ⟧ = JOIN(v)[x ↦ eval(JOIN(v),E)]
• For all other nodes:
⟦v⟧ = JOIN(v)
where JOIN(v) = ⨆ ⟦w⟧ wpred(v)
combines information from predecessors
20
var a,b,c; a = 42;
b = 87;
if (input) {
c = a + b;
} else {
c = a – b;
}
Generating constraints
⟦entry⟧ = ⊥
⟦var a,b,c⟧ = ⟦entry⟧[a ↦⊤,b ↦⊤,c ↦⊤]
⟦a = 42⟧ = ⟦var a,b,c⟧[a ↦ +]
⟦b = 87⟧ = ⟦a = 42⟧[b ↦ +]
⟦input⟧ = ⟦b = 87⟧
⟦c = a + b⟧ = ⟦input⟧[c ↦ ⟦input⟧(a)+⟦input⟧(b)] ⟦c = a – b⟧ = ⟦input⟧[c ↦ ⟦input⟧(a)-⟦input⟧(b)] ⟦exit⟧ = ⟦c = a + b⟧ ⊔ ⟦c = a – b⟧
using l.u.b.
21
Constraints
• From the program being analyzed, we have constraint variables x1, …, xnL and a collection of constraints:
x1 = f1(x1, …, xn) x2 = f2(x1, …, xn) …
xn = fn(x1, …, xn)
Note that Ln is a product lattice
• These can be collected into a single function f: LnLn: f(x1,…,xn) = (f1(x1,…,xn), …, fn(x1,…,xn))
• How do we find the least (i.e. most precise) value of x1,…,xn such that x1,…,xn = f(x1,…,xn) (if that exists)???
22
Monotone functions
• Afunctionf:LLismonotonewhen x,y L: x ⊑ y f(x) ⊑ f(y)
• A function with several arguments is monotone if it is monotone in each argument
• Monotonefunctionsareclosedundercomposition
• Asfunctions,⊔and⊓arebothmonotone
(exercises)
• x ⊑ y can be interpreted as “x is at least as precise as y”
• When f is monotone:
“more precise input cannot lead to less precise output”
23
Monotonicity for the sign analysis
• The⊔operatorandmap updates are monotone
• Compositionspreserve monotonicity
• Aretheabstractoperators monotone?
(exercises)
• Can be verified by a tedious inspection: – x,y,x’L: x ⊑ x’ xopy ⊑ x’opy
– x,y,y’L: y ⊑ y’ xopy ⊑ xopy’
Example, constraints for assignments: ⟦ x = E ⟧ = JOIN(v)[x↦eval(JOIN(v),E)]
24
Kleene’s fixed-point theorem
x L is a fixed-point of f: L L iff f(x)=x
In a lattice with finite height, every monotone function f has a unique least fixed-point:
fix(f) = ⨆ fi(⊥) i 0
25
Proof of existence
• Clearly,⊥⊑f(⊥)
• Since f is monotone, we also have f(⊥) ⊑ f2(⊥)
• Byinduction,fi(⊥)⊑fi+1(⊥)
• Thismeansthat
⊥ ⊑ f(⊥) ⊑ f2(⊥) ⊑ … fi(⊥) … is an increasing chain
• L has finite height, so for some k: fk(⊥) = fk+1(⊥)
• If x ⊑ y then x ⊔ y = y (exercise)
• Sofix(f)=fk(⊥)
26
Proof of unique least
• Assumethatxisanotherfixed-point:x=f(x)
• Clearly,⊥⊑x
• By induction, fi(⊥) ⊑ fi(x) = x
• In particular, fix(f) = fk(⊥) ⊑ x, i.e. fix(f) is least
• Uniquenessthenfollowsfromanti-symmetry
27
Computing fixed-points
The time complexity of fix(f) depends on: – the height of the lattice
– the cost of computing f
– the cost of testing equality
x = ⊥; do {
t = x;
x = f(x);
} while (xt);
Implementation: TIP/src/tip/solvers/FixpointSolvers.scala
28
Summary: lattice equations
• Let L be a lattice with finite height
• A equation system is of the form: x1 = f1(x1, …, xn)
x2 = f2(x1, …, xn) …
xn = fn(x1, …, xn)
where xi are variables and each fi: LnL is monotone
• Note that Ln is a product lattice
29
Solving equations
• Everyequationsystemhasauniqueleastsolution, which is the least fixed-point of the function f: LnLn defined by
f(x1,…,xn) = (f1(x1,…,xn), …, fn(x1,…,xn))
• A solution is always a fixed-point (for any kind of equation)
• The least one is the most precise
30
Solving inequations
• A inequation system is of the form
x1 ⊑ f1(x1, …, xn)
x2 ⊑ f2(x1, …, xn) or …
xn ⊑ fn(x1, …, xn)
x1 ⊒ f1(x1, …, xn) x2 ⊒ f2(x1, …, xn) …
xn ⊒ fn(x1, …, xn)
• Canbesolvedbyexploitingthefactsthat x⊑y ⇔ x = x⊓y
and
x⊒y ⇔ x = x⊔y
31
Monotone frameworks
John B. Kam, Jeffrey D. Ullman: Monotone Data Flow Analysis Frameworks. Acta Inf. 7: 305-317 (1977)
• ACFGtobeanalyzed,nodesNodes={v1,v2,…,vn}
• A finite-height lattice L of possible answers
– fixed or parametrized by the given program
• A constraint variable ⟦v⟧L for every CFG node v
• Adataflowconstraintforeachsyntacticconstruct – relates the value of ⟦v⟧ to the variables for other nodes
– typically a node is related to its neighbors
– the constraints must be monotone functions:
⟦vi⟧ = fi(⟦v1⟧, ⟦v2⟧, …, ⟦vn⟧)
32
Monotone frameworks
• Extract all constraints for the CFG
• Solveconstraintsusingthefixed-pointalgorithm:
– we work in the lattice Ln where L is a lattice describing abstract states
– computing the least fixed-point of the combined function: f(x1,…,xn) = (f1(x1,…,xn), …, fn(x1,…,xn))
• This solution gives an answer from L for each CFG node 33
Generating and solving constraints
fixed-point solver
CFG
solution
Conceptually, we separate constraint generation from constraint solving, but in implementations, the two stages are typically interleaved
34
constraints
⟦p⟧ = &int
⟦q⟧ = &int ⟦alloc 0⟧ = &int ⟦x⟧ =
⟦foo⟧ =
⟦&n⟧ = &int ⟦main⟧ = ()->int
Lattice points as answers
the trivial, useless answer safe answers
our answer (the least fixed-point)
unsafe answers
the true answer
Conservative approximation…
35
The naive algorithm
• Correctness ensured by the fixed point theorem
• Does not exploit any special structure of Ln or f
(i.e. xLn and f(x1,…,xn) = (f1(x1,…,xn), …, fn(x1,…,xn)))
x = (⊥, ⊥, …, ⊥); do {
t = x;
x = f(x);
} while (xt);
Implementation: SimpleFixpointSolver
36
Example: sign analysis
ite(n) { var f; f = 1;
while (n>0) {
f = f*n;
n = n-1;
}
return f; }
1 2
3
4
false
5 true
6
7
8
[ [ [ [ [ [ [ [ [ n n n n n n n n n ⊤ ⊤⊥⊤ ⊤ ⊤ ⊤ ⊤ ⊤ , , , , , , , , , f f f f f f f f f ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ] ] ] ] ] ] ] ] ] [ [ [ [ [ [ [ [ [n n n n n n ⊤ ⊤⊥⊤ ⊤ ⊤ ⊤ ⊤ ⊤, , , , , , f f f f f f f f ⊤ ⊤⊥⊤ ⊤ ⊤ ⊤ ⊤ ⊤] ] ] ] ] ] ]
[ [ [ [ [ [ [ [ [ n n n n n n n n n ⊤ ⊤⊥⊤ ⊤ ⊤ ⊤⊥⊤ , , , , , , , , , f f f f f f f f f + +⊥+ + + + + + ] ] ] ] ] ] ] ] ] [ [ [ [ [ [ [ [ [n n n n n n ⊤⊥ ⊥⊤ ⊤ ⊤ ⊤⊥, , , , , , f f f f f f f f ⊤+⊥+⊤⊥+] ] ] ] ] ]
[ [ [ [ [ [ [ [ [ n n n n n n n n n ⊤⊥ ⊥⊤ ⊤ ⊤ ⊤⊥ ⊥ , , , , , , , , , f f f f f f f f f ⊤⊥ ⊥⊤ ⊤ ⊤ ⊤⊥ ⊥ ] ] ] ] ] ] ] ] ] [ [ [ [ [ [ [ [ [ n n n n n n n n n ⊤⊥ ⊥ ⊥⊤⊥⊤ ⊤⊥ , , , , , , , , , f f f f f f f f f ⊤⊥ ⊥⊤ ⊤⊥ ⊥⊤⊥ ] ] ] ] ] ] ] ] ]
[ [ [ [ [ [ [ [ [ n n n n n n n n n ⊤⊥ ⊥⊤ ⊤ ⊤⊥⊤⊥ , , , , , , , , , f f f f f f f f f ⊤+⊥+ + +⊤⊥ ⊥ ] ] ] ] ] ] ] ] ]
var f
f=1
n>0
f=f*n
n=n-1
return f
[ [ [ [ [ [ [ [ [ n n n n n n n n n ⊤⊥ ⊥ ⊥⊤⊥⊤ ⊤⊥ , , , , , , , , , f f f f f f f f f ⊤⊥ ⊥+ + + +⊥ ⊥ ] ] ] ] ] ] ] ] ] (We shall later see how to improve precision for the loop condition)
37
The naive algorithm
f0(⊥, ⊥, …, ⊥)
f1(⊥, ⊥, …, ⊥)
…
fk(⊥, ⊥, …, ⊥)
1
⊥
f11(⊥, ⊥, …, ⊥)
…
f1k(⊥, ⊥, …, ⊥)
2
⊥
f21(⊥, ⊥, …, ⊥)
…
f2k(⊥, ⊥, …, ⊥)
…
…
…
…
…
n
⊥
fn1(⊥, ⊥, …, ⊥)
…
fnk(⊥, ⊥, …, ⊥)
Computing each new entry is done using the previous column
• Without using the entries in the current column that have already been computed!
• And many entries are likely unchanged from one column to the next!
38
Chaotic iteration
Recall that f(x1,…,xn) = (f1(x1,…,xn), …, fn(x1,…,xn))
x1 = ⊥;…xn = ⊥;
while ((x1,…,xn) ≠ f(x1,…, xn)) {
pick i nondeterministically such that xi ≠ fi(x1, …, xn)
xi = fi(x1, …, xn); }
We now exploit the special structure of Ln
– may require a higher number of iterations,
but less work in each iteration
39
Correctness of chaotic iteration
• Let xj be the value of x=(x1, …, xn) in the j’th iteration of the naive algorithm
• Let xj be the value of x=(x1, …, xn) in the j’th iteration of the chaotic iteration algorithm
• By induction in j, show j: xj ⊑ xj
• Chaotic iteration eventually terminates at a fixed point
• Itmustbeidenticaltotheresultofthenaivealgorithm since that is the least fixed point
40
Towards a practical algorithm
• Computing i:… in chaotic iteration is not practical
• Idea: predict i from the analysis and the structure
of the program!
• Example:
In sign analysis, when we have processed a CFG node v, process succ(v) next
41
The worklist algorithm (1/2)
• Essentially a specialization of chaotic iteration that exploits the special structure of f
• Most right-hand sides of fi are quite sparse:
– constraints on CFG nodes do not involve all others
• Useamap:
dep: Nodes 2Nodes
that for vNodes gives the variables w where v occurs on the right-hand side of the constraint for w
42
The worklist algorithm (2/2)
x1 = ⊥;…xn = ⊥; W = {v1, …, vn}; while (W) {
vi = W.removeNext(); y = fi(x1, …, xn); if (yxi) {
for (vj dep(vi)) W.add(vj);
xi = y; }
}
Implementation: SimpleWorklistFixpointSolver
43
Further improvements
• Representtheworklistasapriorityqueue – find clever heuristics for priorities
• Look at the graph of dependency edges:
– build strongly-connected components
– solve constraints bottom-up in the resulting DAG
44
Transfer functions
• The constraint functions in dataflow analysis usually
have this structure:
⟦ v ⟧ = tv(JOIN(v))
where tv: States States is called the transfer function for v
• Example:
⟦ x = E ⟧ = JOIN(v)[x ↦ eval(JOIN(v),E)]
= tv(JOIN(v))
where
tv(s) = s[x ↦ eval(s,E)]
v
w1 …wn tv
45
Sign Analysis, continued…
• Another improvement of the worklist algorithm:
– only add the entry node to the worklist initially
– then let dataflow propagate through the program according to the constraints…
• Now, what if the constraint rule for variable declarations was: ⟦varx1,…,xn ⟧ = JOIN(v)[x1 ↦ ⊥, …, xn ↦ ⊥]
(would make sense if we treat “uninitialized” as “no value” instead of “any value”)
• Problem: iteration would stop before the fixpoint!
• Solution: replace Vars Sign by lift(Vars Sign)
(allows us to distinguish between “unreachable” and “all variables are non-integers”)
• This trick is also useful for context-sensitive analysis! (later…)
Implementation: WorklistFixpointSolverWithReachability, MapLiftLatticeSolver
46