Static Program Analysis
Part 7 – interprocedural analysis
http://cs.au.dk/~amoeller/spa/
Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University
Interprocedural analysis
• Analyzing the body of a single function: – intraprocedural analysis
• Analyzingthewholeprogramwithfunctioncalls: – interprocedural analysis
• For now, we consider TIP without function pointers and indirect calls
• A naive approach:
– analyze each function in isolation
– be maximally pessimistic about results of function calls – rarely sufficient precision…
2
CFG for whole programs
The idea:
• construct a CFG for each function
• thengluethemtogethertoreflectfunctioncalls and returns
We need to take care of:
• parameter passing
• returnvalues
• values of local variables across calls (including recursive functions, so not enough to assume unique variable names)
3
A simplifying assumption
• Assume that all function calls are of the form X = f(E1, …, En);
• This can always be obtained by normalization
4
Interprocedural CFGs (1/3)
Split each original call node
X = f(E1, …, En)
into two nodes:
a special edge that connects the call node with its after-call node
the “call node”
the “after-call node”
⬚ = f(E1, …, En)
X=⬚
5
Interprocedural CFGs (2/3)
Change each return node
into an assignment:
return E
result = E
(where result is a fresh variable)
6
Interprocedural CFGs (3/3)
Add call edges and return edges:
function g(a1, …, am)
function f(b1, …, bn)
⬚ = f(E1, …, En)
X=⬚
result = E
7
Constraints
• For call/entry nodes:
– be careful to model evaluation of all the actual parameters before binding them to the formal parameter names (otherwise, it may fail for recursive functions)
• For after-call/exit nodes:
– like an assignment: X = result
– but also restore local variables from before the call using the call↷after-call edge
• The details depend on the specific analysis…
8
Example: interprocedural sign analysis
• Recall the intraprocedural sign analysis…
• Lattice for abstract values:
⊤
+-0
⊥
• Lattice for abstract states: Vars Sign
Sign =
9
Example: interprocedural sign analysis
• Constraint for entry node v of function f(b1,…, bn): ww
⟦v⟧ = ⨆ ⊥[b eval(⟦w⟧,E ), …, b eval(⟦w⟧,E )] 11nn
wpred(v)
where E w is i’th argument at w i
• Constraint for after-call node v labeled X = ⬚, with call node v’:
⟦v⟧ = ⟦v’⟧[X⟦w⟧(result)] where wpred(v)
(Recall: no global variables, no heap, and no higher-order functions)
function f(b1, …, bn)
⬚ = f(E1, …, En)
X=⬚
result = E
10
Alternative formulations
w1 …wn
v
2) wsucc(v): tv(⟦v⟧) ⊑ ⟦w⟧
– recall ”solving inequations”
– may require fewer join operations
if there are many CFG edges
– more suitable for interprocedural flow
1) ⟦v⟧ = tv(⨆ ⟦w⟧) wpred(v)
tv
v
w1 …wn
tv
11
The worklist algorithm (original version)
w1 …wn tv
x1 = ⊥; … xn = ⊥ W = {v1, …, vn} while (W) {
vi = W.removeNext() y = fi(x1, …, xn) if (yxi) {
for (vj dep(vi)) { W.add(vj)
}
xi = y }
v
}
12
The worklist algorithm (alternative version)
v
w1 …wn
tv
13
x1 = ⊥; … xn = ⊥ W = {v1, …, vn} while (W) {
vi = W.removeNext() y = ti(xi)
for (vj dep(vi)) {
propagate(y,vj) }
}
propagate(y,vj) { z = xj ⊔ y
if (zxj) {
xj = z W.add(vj)
}
Implementation: WorklistFixpointPropagationSolver }
Agenda
• Interprocedural analysis
• Context-sensitive interprocedural analysis
14
Interprocedurally invalid paths
⬚ = f(E1, …, En)
⬚ = f(E’1, …, E’n)
X=⬚
X’ = ⬚
15
Example
What is the sign of the return value of g?
f(z) {
return z*42;
}
g() {
var x,y;
x = f(0);
y = f(87); return x + y;
}
Our current analysis says “⊤”
16
Function cloning (alternatively, function inlining)
• Clone functions such that each function has only one callee
• Can avoid interprocedurally invalid paths
• For high nesting depths, gives exponential blow-up
• Doesn’t work on (mutually) recursive functions
• Use heuristics to determine when to apply (trade-off between CFG size and precision)
17
Example, with cloning
What is the sign of the return value of g?
f1(z1) { return z1*42;
}
f2(z2) { return z2*42;
}
g() {
var x,y;
x = f1(0);
y = f2(87); return x + y;
} 18
Context sensitive analysis
• Function cloning provides a kind of context sensitivity (also called polyvariant analysis)
• Instead of physically copying the function CFGs, do it logically
• Replace the lattice for abstract states, States, by
where Contexts is a set of call contexts
– the contexts are abstractions of the state at function entry
– Contexts must be finite to ensure finite height of the lattice – the bottom element of lift(States) represents
“unreachable” contexts
• Different strategies for choosing the set Contexts…
19
Contexts → lift(States)
One-level cloning
• Let c1,…,cn be the call nodes in the program
• DefineContexts={c1,…,cn}{ε}
– each call node now defines its own “call context”
(using ε to represent the call context at the main function)
– the context is then like the return address of the top-most stack frame in the call stack
• Same effect as one-level cloning, but without actually copying the function CFGs
• Usually straightforward to generalize the constraints for a context insensitive analysis to this lattice
• (Example: context-sensitive sign analysis – later…)
20
The call string approach
• Let c1,…,cn be the call nodes in the program
• Define Contexts as the set of strings over {c1,…,cn} of length k
– such a string represents the top-most k call locations on the call stack
– the empty string ε again represents the call context at the main function
• For k=1 this amounts to one-level cloning
Implementation: CallStringSignAnalysis
21
Example:
interprocedural sign analysis with call strings (k=1)
Lattice for abstract states: where Contexts={ε,c1,c2}
f(z) {
var t1,t2; t1 = z*6; t2 = t1*7; return t2;
}
…
x = f(0); // c1 y = f(87); // c2 …
Contexts → lift(Vars → Sign)
[ε ↦ unreachable,
c1 ↦ ⊥[z↦0, t1↦0, t2↦0],
c2 ↦ ⊥[z↦+, t1↦+, t2↦+]]
What is an example program that requires k=2
to avoid loss of precision?
22
Context sensitivity with call strings function entry nodes, for k=1
Constraint for entry node v of function f(b1,…, bn): (if not ‘main’)
function f(b1, …, bn) v
if ⟦w⟧(c’) = unreachable
⊥[b1eval(⟦w⟧(c’),E1), …, bneval(⟦w⟧(c’),En)] otherwise 23
⟦v⟧(c)= ⨆ sc’
wpred(v) ∧ c= w ∧ c’∈ Contexts
w
w
c’
unreachable
s=ww
w
⬚ = f(E1, …, En)
X=⬚
result = E
Context sensitivity with call strings after-call nodes, for k=1
Constraint for after-call node v labeled X = ⬚, with call node v’ and exit node wpred(v):
unreachable if ⟦v’⟧(c) = unreachable ∨ ⟦w⟧(v’) = unreachable
⟦v⟧(c) =
⟦v’⟧(c)[X⟦w⟧(v’)(result)] v’
otherwise
function f(b1, …, bn)
w
⬚ = f(E1, …, En)
X=⬚
v
result = E
24
The functional approach
• The call string approach considers control flow
– but why distinguish between two different call sites if
their abstract states are the same?
• The functional approach instead considers data
• In the most general form, choose Contexts = States
(requires States to be finite)
• Each element of the lattice States → lift(States)
is now a map m that provides an element m(x) from States (or “unreachable”) for each possible x
where x describes the state at function entry
25
Example:
interprocedural sign analysis with the functional approach
Lattice for abstract states: where Contexts = Vars → Sign
f(z) {
var t1,t2; t1 = z*6; t2 = t1*7; return t2;
}
…
x = f(0); y = f(87); …
Contexts → lift(Vars → Sign)
[⊥[z↦0] ↦ ⊥[z↦0, t1↦0, t2↦0], ⊥[z↦+] ↦ ⊥[z↦+, t1↦+, t2↦+],
all other contexts ↦ unreachable ]
26
The functional approach
• The lattice element for a function exit node is thus a function summary that maps abstract function input to abstract function output
• This can be exploited at call nodes!
• When entering a function with abstract state x:
– consider the function summary s for that function
– if s(x) already has been computed, use that to model the entire function body, then proceed directly to the after-call node
• Avoidstheproblemwithinterprocedurallyinvalidpaths!
• …but may be expensive if States is large
Implementation: FunctionalSignAnalysis
27
Context sensitivity with the
functional approach function entry nodes
Constraint for entry node v of function f(b1,…, bn): (if not ‘main’)
⟦v⟧(c) = ⨆ s c’ function f(b1, …, bn)
wv
wpred(v) ∧ c=sc’ ∧
c’∈ Contexts
where s c’ is defined as before w
w
w
⬚ = f(E1, …, En)
X=⬚
result = E
28
Context sensitivity with the
functional approach after-call nodes
Constraint for after-call node v labeled X = ⬚, with call node v’ and exit node wpred(v):
c
⟦v⟧(c) =
unreachable if ⟦v’⟧(c) = unreachable ∨ ⟦w⟧(sv’) = unreachable
⟦v’⟧(c)[X⟦w⟧(sc )(result)] v’
v’ v
otherwise
function f(b1, …, bn)
⬚ = f(E1, …, En)
w
X=⬚
result = E
29