Static Program Analysis
Part 3 – lattices and fixpoints
http://cs.au.dk/~amoeller/spa/
øller & . Schwartzbach Computer Science, Aarhus University
Copyright By PowCoder代写 加微信 powcoder
Flow-sensitivity
• Typecheckingis(usually)flow-insensitive:
– statements may be permuted without affecting typability – constraints are naturally generated from AST nodes
• Other analyses must be flow-sensitive:
– the order of statements affects the results
– constraints are naturally generated from control flow graph nodes
Sign analysis
• Determinethesign(+,-,0⊤)ofallexpressions • TheSignlattice:
“any number”
+ ⊥- 0 “not of type number”
(or, “unreachable code”)
• StatesaremodeledbythemaplatticeVars→Sign where Vars is the set of variables in the program
Implementation: TIP/src/tip/analysis/SignAnalysis.scala
Generating constraints
3 b=a+input; 4 a=a-b;
x = [a ↦⊤,b ↦⊤]
x1 = x [a ↦ +]
x2 = x1[b ↦ x (a)+⊤] 322
x = x [a ↦ x (a)-x (b)] 4333
b = a + input
Sign analysis constraints
• The variable ⟦v⟧ denotes a map that gives the sign value for all variables at the program point after CFG node v
• For assignments:
⟦ x = E ⟧ = JOIN(v)[x ↦ eval(JOIN(v),E)]
• For variable declarations:
⟦varx ,…,x ⟧ = JOIN(v)[x ↦⊤, …, x ↦⊤]
1n1n • For all other nodes:
⟦v⟧ = JOIN(v)
where JOIN(v) = ⨆ ⟦w⟧ w∈pred(v)
combines information from predecessors (explained later…)
Evaluating signs
• Theevalfunctionisanabstractevaluation: – eval(σ,x) = σ(x)
– eval(σ,intconst) = sign(intconst)
– eval(σ, E1 op E2) = op(eval(σ,E1),eval(σ,E2))
• σ:Vars→Signisanabstractstate
• Thesignfunctiongivesthesignofaninteger
• The op function is an abstract evaluation of the given operator op
Abstract operators
+⊤+⊤0 -⊤⊤ ⊤-⊤ ⊤ ++⊤+⊤+++⊤⊤+-+⊤ ⊤⊤⊤⊤⊤⊤⊤⊤⊤⊤⊤⊤⊤⊤
00⊤ +⊤ 0⊤ ⊤⊤⊤ ⊤0⊤ ⊤0⊤ + ⊤⊤⊤ + ++⊤⊤ + 00⊤⊤ ⊤ ⊤⊤⊤ ⊤ ⊤⊤⊤⊤ ⊤ ⊤⊤⊤⊤
(assuming the subset of TIP with only integer values)
Increasing precision
• Somelossofinformation:
– (2>0)==1 is analyzed as ⊤
– +/+ is analyzed as ⊤, since e.g. 1⁄2 is rounded down
• Usearicherlatticeforbetterprecision: ⊤
• Abstractoperatorsarenow8×8tables
Partial orders
• GivenasetS,apartialorder⊑isabinaryrelationonS that satisfies:
– reflexivity: ∀x∈S: x ⊑ x
– transitivity: ∀x,y,z∈S: x ⊑ y ∧ y ⊑ z ⇒ x ⊑ z – anti-symmetry: ∀x,y∈S: x ⊑ y ∧ y ⊑ x ⇒ x = y
• Can be illustrated by a Hasse diagram (if finite) ⊤ +-0
Upper and lower bounds
• Let X ⊆ S be a subset
• Wesaythaty∈Sisanupperbound(X⊑y)when
∀ x∈X: x ⊑ y
• Wesaythaty∈Sisalowerbound(y⊑X)when
∀ x∈X: y ⊑ x
• Aleastupperbound⨆Xisdefinedby
X ⊑ ⨆X ∧ ∀y∈S: X ⊑ y ⇒ ⨆X ⊑ y
• Agreatestlowerbound⨅Xisdefinedby ⨅X ⊑ X ∧ ∀y∈S: y ⊑ X ⇒ y ⊑ ⨅X
• A complete lattice must have
– auniquelargestelement,⊤=⨆S – auniquesmallestelement,⊥=⨅S
(exercise)
• A lattice is a partial order where
x⊔y and x⊓y exist for all x,y∈S (x⊔y is notation for ⨆{x,y})
• A complete lattice is a partial order where ⨆X and ⨅X exist for all X ⊆ S
• A finite lattice is complete if ⊤ and ⊥ exist
Implementation: TIP/src/tip/lattices/
These partial orders are lattices
These partial orders are not lattices
The powerset lattice
• EveryfinitesetAdefinesacompletelattice(P(A),⊆) where
–⊥=∅ –⊤=A –x⊔y=x∪y –x⊓y=x∩y
{0,1,2} {0,1,3}
{0,1} {0,2} {0,3}
{0,2,3} {1,2,3}
{1,2} {1,3}
for A = {0,1,2,3}
Lattice height
• The height of a lattice is the length of the longest path from ⊥ to ⊤
• Thelattice(P(A),⊆)hasheight|A| {0,1,2,3}
{0,1,2} {0,1,3}
{0,2} {0,3}
{0,2,3} {1,2,3}
{1,2} {1,3}
for A = {0,1,2,3}
Map lattice
• IfAisasetandLisacompletelattice,thenweobtain a complete lattice called a map lattice:
A → L = { [a1↦x1, a2↦x2, …] | A={a1, a2, …} ∧ x1, x2 ,… ∈ L } ordered pointwise
Example: A → L where
• A is the set of program variables
• L is the Sign lattice
• ⊔and⊓canbecomputedpointwise • height(A→L)=|A|⋅height(L)
Product lattice
• If L1, L2, …, Ln are complete lattices, then so is the product:
L ×L ×…×L ={(x ,x ,…,x )|x ∈L } 12n12nii
where ⊑ is defined pointwise
• Note that ⊔ and ⊓ can be computed pointwise
• height(L1×L2×…×Ln)=height(L1)+…+height(Ln)
each Li is the map lattice A → L from the previous slide, and n is the number of CFG nodes
Flat lattice
• IfAisaset,then⊤flat(A)isacompletelattice: a1 a2 … an
• height(flat(A))=⊥2
Lift lattice
• If L is a complete lattice, then so is lift(L), which is:
• height(lift(L))=height(L⊥)+1
Sign analysis constraints, revisited
• The variable ⟦v⟧ denotes a map that gives the sign value for all variables at the program point after CFG node v
• ⟦v⟧∈States where States = Vars → Sign
• For assignments:
⟦ x = E ⟧ = JOIN(v)[x ↦ eval(JOIN(v),E)] • For variable declarations:
⟦varx ,…,x ⟧ = JOIN(v)[x ↦⊤, …, x ↦⊤] 1n1n
• For all other nodes: ⟦v⟧ = JOIN(v)
where JOIN(v) = ⨆ ⟦w⟧ w∈pred(v)
combines information from predecessors
var a,b,c;
if (input) {
c = a + b;
Generating constraints
c = a – b;⟦entry⟧ = ⊥
⟦var a,b,c⟧ = ⟦entry⟧[a ↦⊤,b ↦⊤,c ↦⊤]
using l.u.b.
⟦a = 42⟧ = ⟦var a,b,c⟧[a ↦ +]
⟦b = 87⟧ = ⟦a = 42⟧[b ↦ +]
⟦input⟧ = ⟦b = 87⟧
⟦c = a + b⟧ = ⟦input⟧[c ↦ ⟦input⟧(a)+⟦input⟧(b)] ⟦c = a – b⟧ = ⟦input⟧[c ↦ ⟦input⟧(a)-⟦input⟧(b)] ⟦exit⟧ = ⟦c = a + b⟧ ⊔ ⟦c = a – b⟧
Constraints
• Fromtheprogrambeinganalyzed,wehaveconstraint variables x1, …, xn∈L and a collection of constraints:
x1 = f1(x1, …, xn) x2 = f2(x1, …, xn) …
xn = fn(x1, …, xn)
Note that Ln is a product lattice
• Thesecanbecollectedintoasinglefunctionf:Ln→Ln: f(x1,…,xn) = (f1(x1,…,xn), …, fn(x1,…,xn))
• How do we find the least (i.e. most precise) value of x1,…,xn such that (x1,…,xn) = f(x1,…,xn) (if that exists)???
Monotone functions
• Afunctionf:L→Lismonotonewhen ∀x,y ∈ L: x ⊑ y ⇒ f(x) ⊑ f(y)
• Afunctionwithseveralargumentsismonotoneif it is monotone in each argument
• Asfunctions,⊔and⊓arebothmonotone
• Monotonefunctionsareclosedundercomposition
• x ⊑ y can be interpreted as “x is at least as precise as y”
• Whenfismonotone:
“more precise input cannot lead to less precise output”
(exercises)
Monotonicity for the sign analysis
Example, constraints for assignments: ⟦ x = E ⟧ = JOIN(v)[x↦eval(JOIN(v),E)]
• The⊔operatorandmap updates are monotone
• Compositionspreserve monotonicity
• Aretheabstractoperators monotone?
• Canbeverifiedbyatediousinspection: – ∀x,y,x’∈L: x ⊑ x’ ⇒ xopy ⊑ x’opy
– ∀x,y,y’∈L: y ⊑ y’ ⇒ xopy ⊑ xopy’
(exercises)
Kleene’s fixed-point theorem
x ∈ L is a fixed point of f: L → L iff f(x)=x
In a complete lattice with finite height, every monotone function f has a unique least fixed-point:
lfp(f) = ⨆ fi(⊥) i ≥0
Proof of existence
• Clearly,⊥⊑f(⊥)
• Since f is monotone, we also have f(⊥) ⊑ f (⊥)
• By induction, f (⊥) ⊑ f (⊥)
• Thismeansthat
⊥⊑f(⊥)⊑f (⊥)⊑… f(⊥)…
is an increasing chain
• Lhasfiniteheight,soforsomek: f (⊥)=f (⊥)
• If x ⊑ y then x ⊔ y = y (exercise)
• Solfp(f)=fk(⊥)
Proof of unique least
• Clearly,⊥⊑x
• Byinductionandmonotonicity,f(⊥)⊑f(x)=x
• Assume that x is another fixed-point: x = f(x)
• Inparticular,lfp(f)=f(⊥)⊑x,i.e.lfp(f)isleast
• Uniquenessthenfollowsfromanti-symmetry
Computing fixed-points
The time complexity of lfp(f) depends on: – the height of the lattice
– the cost of computing f
– the cost of testing equality
x = ⊥; do {
} while (x≠t);
Implementation: TIP/src/tip/solvers/FixpointSolvers.scala
Summary: lattice equations
• Let L be a complete lattice with finite height
• An equation system is of the form: x1 = f1(x1, …, xn)
x2 = f2(x1, …, xn) …
xn = fn(x1, …, xn)
where xi are variables and each fi: Ln→L is monotone
• Note that Ln is a product lattice
Solving equations
• Everyequationsystemhasauniqueleastsolution, which is the least fixed-point of the function f: Ln→Ln defined by
f(x1,…,xn) = (f1(x1,…,xn), …, fn(x1,…,xn))
• Asolutionisalwaysafixed-point (for any kind of equation)
• The least one is the most precise
Solving inequations
x ⊑f(x,…,x) x ⊒f(x,…,x)
• An inequation system is of the form
x1 ⊑ f1(x1, …, xn) x1 ⊒ f1(x1, …, xn) 221n or221n
x ⊑f(x,…,x) x ⊒f(x,…,x) nn1n nn1n
• Canbesolvedbyexploitingthefactsthat x⊑y ⇔ x=x⊓y
andx⊒y ⇔ x=x⊔y
Monotone frameworks
. Kam, . Ullman: Monotone Data Flow Analysis Frameworks. Acta Inf. 7: 305-317 (1977)
• ACFGtobeanalyzed,nodesNodes={v1,v2,…,vn}
• Afinite-heightcompletelatticeLofpossibleanswers
– fixed or parametrized by the given program
• Aconstraintvariable⟦v⟧∈LforeveryCFGnodev
• Adataflowconstraintforeachsyntacticconstruct – relates the value of ⟦v⟧ to the variables for other nodes
– typically a node is related to its neighbors
– the constraints must be monotone functions:
⟦v⟧=f(⟦v ⟧,⟦v ⟧,…,⟦v ⟧) ii12n
Monotone frameworks
• ExtractallconstraintsfortheCFG
• Solveconstraintsusingthefixed-pointalgorithm:
– we work in the lattice Ln where L is a lattice describing abstract states
– computing the least fixed-point of the combined function: f(x1,…,xn) = (f1(x1,…,xn), …, fn(x1,…,xn))
• ThissolutiongivesananswerfromLforeachCFGnode 33
Generating and solving constraints
fixed-point solver
Conceptually, we separate constraint generation from constraint solving, but in implementations, the two stages are typically interleaved
constraints
⟦p⟧ = &int
⟦q⟧ = &int ⟦alloc 0⟧ = &int ⟦x⟧ = φ
⟦&n⟧ = &int ⟦main⟧ = ()->int
Lattice points as answers
the trivial, useless answer
safe answers
our answer (the least fixed-point)
unsafe answers
the true answer
Conservative approximation…
The naive algorithm
x = (⊥, ⊥, …, ⊥); do {
} while (x≠t);
• Correctnessensuredbythefixedpointtheorem • Does not exploit any special structure of Ln or f
(i.e. x∈Ln and f(x1,…,xn) = (f1(x1,…,xn), …, fn(x1,…,xn)))
Implementation: SimpleFixpointSolver
Example: sign analysis
[n → ⊤⊥⊤, f → ⊥] [n → ⊤⊥⊤, f → ⊤⊥⊤]
while (n>0) {
[n → ⊤⊥⊤⊥⊤, f → +⊥+] [n → ⊤⊥⊤⊥⊤⊥, f → ⊥⊤+⊤+⊥]
[n → ⊤⊥⊤⊥, f → ⊤⊥⊤⊥] [n → ⊥⊤⊥⊤⊥⊤⊥, f → ⊥⊤⊥⊤⊥]
return f; }
[n → ⊤⊥⊤⊥, f → +⊥⊤⊥+⊤⊥]
[n → ⊥⊤⊥⊤⊥⊤⊥, f → ⊥⊤+] (We shall later see how to improve precision for the loop condition)
The naive algorithm
f (⊥,⊥,…,⊥) f (⊥,⊥,…,⊥) 0 ⊥ f1(⊥, ⊥, …, ⊥) ⊥ f1(⊥, ⊥, …, ⊥)
f (⊥,⊥,…,⊥) k …
⊥ fn(⊥, ⊥, …, ⊥)
Computing each new entry is done using the previous column
• Without using the entries in the current column that have already been computed!
• And many entries are likely unchanged from one column to the next!
Chaotic iteration
Recall that f(x1,…,xn) = (f1(x1,…,xn), …, fn(x1,…,xn))
x = ⊥; … x = ⊥; 1n
while ((x ,…,x ) ≠ f(x ,…, x )) { 1n1n
pick i nondeterministically such that x ≠ f(x, …, x)
ii1n xi = fi(x1, …, xn);
We now exploit the special structure of Ln
– may require a higher number of iterations,
but less work in each iteration
Correctness of chaotic iteration
• Let xj be the value of x=(x1, …, xn) in the j’th iteration of the naive algorithm
• Let xj be the value of x=(x , …, x ) in the j’th iteration 1n
• Byinductioninj,show∀j:x ⊑x
of the chaotic iteration algorithm jj
• Chaoticiterationeventuallyterminatesatafixedpoint
• It must be identical to the result of the naive algorithm since that is the least fixed point
Towards a practical algorithm
• Computing∃i:…inchaoticiterationisnotpractical
• Idea:predictifromtheanalysisandthestructure
of the program!
• Example:
In sign analysis, when we have processed a CFG node v, process succ(v) next
The worklist algorithm (1/2)
• Essentially a specialization of chaotic iteration that exploits the special structure of f
• Most right-hand sides of fi are quite sparse: – constraintsonCFGnodesdonotinvolveallothers
• Use a map:
dep: Nodes → 2Nodes
that for v∈Nodes gives the set of nodes (i.e. constraint variables) w where v occurs on the right-hand side of the constraint for w
The worklist algorithm (2/2)
x1 = ⊥;…xn = ⊥; W = {v1, …, vn}; while (W≠∅) {
vi = W.removeNext(); y = fi(x1, …, xn); if (y≠xi) {
for (vj ∈ dep(vi)) W.add(vj);
Implementation: SimpleWorklistFixpointSolver
Further improvements
• Representtheworklistasapriorityqueue – find clever heuristics for priorities
• Look at the graph of dependency edges:
– build strongly-connected components
– solve constraints bottom-up in the resulting DAG
Transfer functions
• Theconstraintfunctionsindataflowanalysisusually
have this structure:
⟦ v ⟧ = t (JOIN(v))
w1 …wn tv
where tv: States → States is called
the transfer function for v • Example:
⟦ x = E ⟧ = JOIN(v)[x ↦ eval(JOIN(v),E)]
= tv(JOIN(v))
t (s) = s[x ↦ eval(s,E)] v
Sign Analysis, continued…
• Another improvement of the worklist algorithm:
– only add the entry node to the worklist initially
– then let dataflow propagate through the program according to the constraints…
• Now, what if the constraint rule for variable declarations was: ⟦varx ,…,x ⟧ = JOIN(v)[x ↦ ⊥, …, x ↦ ⊥]
(would make sense if we treat “uninitialized” as “no value” instead of “any value”)
• Problem: iteration would stop before the fixpoint!
• Solution: replace Vars → Sign by lift(Vars → Sign)
(allows us to distinguish between “unreachable” and “all variables are non-integers”)
• This trick is also useful for context-sensitive analysis! (later…)
Implementation: WorklistFixpointSolverWithReachability, MapLiftLatticeSolver
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com