程序代写代做代考 Java go graph C Static Program Analysis

Static Program Analysis
Part 9 – pointer analysis
http://cs.au.dk/~amoeller/spa/
Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University

Agenda
• Introduction to points-to analysis • Andersen’s analysis
• Steensgaards’s analysis
• Interprocedural points-to analysis • Null pointer analysis
• Flow-sensitive points-to analysis
2

Analyzing programs with pointers
How do we perform e.g. constant propagation analysis when the programming language has pointers?
(or object references?)

*x = 42;
*y = -87;
z = *x;
// is z 42 or -87?
E  &X
| alloc E | *E
| null |…
S  *X=E; |…
3

Heap pointers
• For simplicity, we ignore records
– alloc then only allocates a single cell
– only linear structures can be built in the heap
• Let’s at first also ignore functions as values
• Westillhavemanyinterestinganalysischallenges…
x
y
z
4

Pointer targets
• The fundamental question about pointers: What cells can they point to?
• Weneedasuitableabstraction
• The set of (abstract) cells, Cells, contains – alloc-i for each allocation site with index i – X for each program variable named X
• Thisiscalledallocationsiteabstraction
• Eachabstractcellmaycorrespondtomany concrete memory cells at runtime
5

Points-to analysis
• DetermineforeachpointervariableXtheset
pt(X) of the cells X may point to
• Aconservative(“maypoints-to”)analysis:

*x = 42;
*y = -87;
z = *x;
// is z 42 or -87?
• We’llfocusonflow-insensitiveanalyses:
– take place on the AST
– before or together with the control-flow analysis
– the set may be too large
– can show absence of aliasing: pt(X)  pt(Y) = 
6

Obtaining points-to information
• Analmost-trivialanalysis(calledaddress-taken):
– include all alloc-i cells
– include the X cell if the expression &X occurs in the program
• Improvement for a typed language:
– eliminate those cells whose types do not match
• Thisissometimesgoodenough – and clearly very fast to compute
7

Pointer normalization
• Assume that all pointer usage is normalized:
• X = alloc P
• X=&Y
• X=Y
• X=*Y
• *X=Y
• X=null
where P is null or an integer constant
• Simplyintroducelotsoftemporaryvariables…
• Allsub-expressionsarenownamed
• We choose to ignore the fact that the cells created at variable declarations are uninitialized
8

Agenda
• Introduction to points-to analysis • Andersen’s analysis
• Steensgaards’s analysis
• Interprocedural points-to analysis • Null pointer analysis
• Flow-sensitive points-to analysis
9

Andersen’s analysis (1/2)
• For every cell c, introduce a constraint variable ⟦c⟧ ranging over sets of cells, i.e. ⟦∙⟧: Cells → 2Cells
• Generateconstraints:
• X = alloc P:
• X=&Y:
• X=Y:
• X=*Y:
• *X=Y:
• X = null:
alloc-i  ⟦X⟧
Y  ⟦X⟧
⟦Y⟧  ⟦X⟧
c  ⟦Y⟧  ⟦c⟧  ⟦X⟧ for each cCells c  ⟦X⟧  ⟦Y⟧  ⟦c⟧ for each cCells (no constraints)
10

Andersen’s analysis (2/2)
• The points-to map is defined as: pt(X) = ⟦X⟧
• Theconstraintsfitintothecubicframework
• UniqueminimalsolutionintimeO(n3)
• Inpractice,forJava:O(n2)
• The analysis is flow-insensitive but directional
– models the direction of the flow of values in assignments
11

Example program
var p,q,x,y,z; p = alloc null; x = y;
x = z;
*p = z;
p = q;
q = &y;
x = *p;
p = &z;
Cells = {p, q, x, y, z, alloc-1}
12

Applying Andersen
• Generatedconstraints:
• Smallestsolution:
pt(p) = { alloc-1, y, z }
pt(q) = {y}
alloc-1  ⟦p⟧
⟦y⟧  ⟦x⟧
⟦z⟧  ⟦x⟧
c  ⟦p⟧  ⟦z⟧  ⟦α⟧ for each cCells ⟦q⟧  ⟦p⟧
y⟦q⟧
c  ⟦p⟧  ⟦α⟧  ⟦x⟧ for each cCells z⟦p⟧
13

Agenda
• Introduction to points-to analysis • Andersen’s analysis
• Steensgaards’s analysis
• Interprocedural points-to analysis • Null pointer analysis
• Flow-sensitive points-to analysis
14

Steensgaard’s analysis
• Viewassignmentsasbeingbidirectional • Generateconstraints:
• X = alloc
• X=&Y:
• X=Y:
• X=*Y:
• *X=Y:
• Extraconstraints:
alloc-i  ⟦X⟧
Y⟦X⟧
⟦X⟧ = ⟦Y⟧
c  ⟦Y⟧  ⟦c⟧ = ⟦X⟧ for each cCells c  ⟦X⟧  ⟦Y⟧ = ⟦c⟧ for each cCells
P:
c1,c2⟦c⟧⟦c1⟧=⟦c2⟧ and ⟦c1⟧∩⟦c2⟧≠⟦c1⟧=⟦c2⟧ (whenever a cell may point to two cells, they are essentially merged into one)
• Steensgaard’s original formulation uses conditional unification for X = Y: c  ⟦Y⟧  ⟦X⟧ = ⟦Y⟧ for each cCells (avoids unifying if Y is never a pointer)
15

Steensgaard’s analysis
• Reformulate as term unification
• Generateconstraints: • X=alloc P:
• X=&Y:
• X=Y:
• X=*Y:
• *X=Y:
⟦X⟧ = &⟦alloc-i⟧
⟦X⟧ = &⟦Y⟧
⟦X⟧ = ⟦Y⟧
⟦Y⟧=&α  ⟦X⟧=α whereαisfresh ⟦X⟧=&α  ⟦Y⟧=α whereαisfresh
• Terms:
– term variables, e.g. ⟦X⟧, ⟦alloc-i⟧, α (each representing the possible values of a cell)
– a single (unary) term constructor &t (representing pointers)
– ⟦X⟧ is now a term variable, not a constraint variable holding a set of cells
• Fitswithourunificationsolver!(union-find…)
• The points-to map is defined as pt(X) = { cCells | ⟦X⟧ = &⟦c⟧ }
• Note that there is only one kind of term constructor, so unification never fails16

Applying Steensgaard
• Generatedconstraints:
• Smallestsolution:
pt(p) = { alloc-1, y, z } pt(q) = { alloc-1, y, z }
alloc-1  ⟦p⟧ ⟦y⟧ = ⟦x⟧
⟦z⟧ = ⟦x⟧
α  ⟦p⟧  ⟦z⟧ = ⟦α⟧ ⟦q⟧ = ⟦p⟧
y⟦q⟧
α  ⟦p⟧  ⟦α⟧ = ⟦x⟧ z⟦p⟧
+ the extra constraints
17

a1 = &b1;
b1 = &c1;
c1 = &d1;
a2 = &b2;
b2 = &c2;
c2 = &d2;
b1 = &c2;
Another example
Andersen:
a1 b1 c1 d1 a2 b2 c2 d2
Steensgaard:
a1 b1 c1 d1 a2 b2 c2 d2
18

Recall our type analysis…
• Focusing on pointers…
• Constraints:
• X=allocP:
• X=&Y: • X=Y:
• X=*Y: • *X=Y:
⟦X⟧=&⟦P⟧ ⟦X⟧=&⟦Y⟧ ⟦X⟧=⟦Y⟧ &⟦X⟧=⟦Y⟧ ⟦X⟧=&⟦Y⟧
• Implicit extra constraint for term equality: &t1 =&t2 t1 =t2
• Assuming the program type checks, is the solution for pointers the same as for Steensgaard’s analysis?
20

Agenda
• Introduction to points-to analysis • Andersen’s analysis
• Steensgaards’s analysis
• Interprocedural points-to analysis • Null pointer analysis
• Flow-sensitive points-to analysis
21

Interprocedural points-to analysis
• In TIP, function values and pointers may be mixed together:
(***x)(1,2,3)
• In this case the CFA and the points-to analysis must happen simultaneously!
• The idea: Treat function values as a kind of pointers
22

Function call normalization
• Assume that all function calls are of the form x = y(a1, …, an)
• y may be a variable whose value is a function pointer
• Assume that all return statements are of the form
return z;
• As usual, simply introduce lots of temporary variables…
• IncludeallfunctionnamesinCells
23

CFA with Andersen
• For the function call
x = y(a1, …, an)
and every occurrence of
f(x1, …, xn) { … return z; }
add these constraints:
f  ⟦f⟧
f  ⟦y⟧  (⟦ai⟧  ⟦xi⟧ for i=1,…,n  ⟦z⟧  ⟦x⟧)
• (Similarly for simple function calls)
• Fitsdirectlyintothecubicframework!
24

CFA with Steensgaard
• For the function call
x = y(a1, …, an)
and every occurrence of
f(x1, …, xn) { … return z; }
add these constraints:
f  ⟦f⟧
f  ⟦y⟧  (⟦ai⟧ = ⟦xi⟧ for i=1,…,n  ⟦z⟧ = ⟦x⟧)
• (Similarly for simple function calls)
• Fits into the unification framework, but requires a generalization of the ordinary union-find solver
25

Context-sensitive pointer analysis
• Generalize the abstract domain Cells → 2Cells to Contexts → Cells → 2Cells
(or equivalently: Cells × Contexts → 2Cells) where Contexts is a (finite) set of call contexts
• Asusual,manypossiblechoicesofContexts
– recall the call string approach and the functional approach
• We can also track the set of reachable contexts for each function (like the use of lifted lattices earlier):
Contexts → lift(Cells → 2Cells)
• Does this still fit into the cubic solver?
26

Context-sensitive pointer analysis
foo(a) {
return *a;
}
bar() { …
x = alloc null; // alloc-1 y = alloc null; // alloc-2 *x = alloc null; // alloc-3 *y = alloc null; // alloc-4 …
q = foo(x); w = foo(y); …
}
Are q and w aliases?
27

Context-sensitive pointer analysis
mk() {
return alloc null; // alloc-1
}
baz() {
var x,y;
x = mk();
y = mk();

}
Are x and y aliases?
28

Context-sensitive pointer analysis
• We can go one step further and introduce context-sensitive heap (a.k.a. heap cloning)
• Let each abstract cell be a pair of
– alloc-i (the alloc with index i) or X (a program variable)
– a heap context from a (finite) set HeapContexts
• Thisallowsabstractcellstobenamedby the source code allocation site
and (information from) the current context
• Onechoice:
– set HeapContexts = Contexts
– at alloc, use the entire current call context as heap context
29

Agenda
• Introduction to points-to analysis • Andersen’s analysis
• Steensgaards’s analysis
• Interprocedural points-to analysis • Null pointer analysis
• Flow-sensitive points-to analysis
30

Null pointer analysis
• Decideforeverydereference*p, is p different from null?
• (Why not just treat null as a special cell
in an Andersen or Steensgaard-style analysis?)
• Usethemonotoneframework
– assuming that a points-to map pt has been computed
• Let us consider an intraprocedural analysis (i.e. we ignore function calls)
31

A lattice for null analysis
• DefinethesimplelatticeNull: ?
NN
where NN represents “definitely not null” and ? represents “maybe null”
• Useforeveryprogrampointthemaplattice: Cells  Null
32

Setting up
• For every CFG node, v, we have a variable ⟦v⟧: – a map giving abstract values for all cells
at the program point after v • Auxiliarydefinition:
JOIN(v) = ⨆ ⟦w⟧ wpred(v)
(i.e. we make a forward analysis)
w1
w2
wk
v
33

Null analysis constraints
• For operations involving pointers:
• X=alloc P: • X=&Y:
• X=Y:
• X=*Y:
• *X=Y:
• X=null:
⟦v⟧ = ??? ⟦v⟧ = ??? ⟦v⟧ = ??? ⟦v⟧ = ??? ⟦v⟧ = ??? ⟦v⟧ = ???
where P is null or an integer constant
• For all other CFG nodes: • ⟦v⟧=JOIN(v)
34

Null analysis constraints
• For a heap store operation *X = Y we need to model the change of whatever X points to
• That may be multiple abstract cells (i.e. the cells pt(X))
• With the present abstraction, each abstract heap cell alloc-i may describe multiple concrete cells
• So we settle for weak update:
*X = Y: ⟦v⟧ = store(JOIN(v), X, Y)
where store(, X, Y) = [α ↦ (α) ⊔ (Y)] αpt(X)
35

Null analysis constraints
• For a heap load operation X = *Y we need to model the change of the program variable X
• OurabstractionhasasingleabstractcellforX
• Thatabstractcellrepresentsasingleconcretecell
• So we can use strong update:
X = *Y: ⟦v⟧ = load(JOIN(v), X, Y)
where load(, X, Y) = [X ↦ ⨆(α)] αpt(Y)
36

is c null here?
Strong and weak updates
mk() {
return alloc null; // alloc-1
}

a = mk();
b = mk();
*a = alloc null; // alloc-2
n = null;
*b = n; // strong update here would be unsound! c = *a;
The abstract cell alloc-1 corresponds to multiple concrete cells
37

is c null here?
a = alloc null; // alloc-1 b = alloc null; // alloc-2 *a = alloc null; // alloc-3 *b = alloc null; // alloc-4 if (…) {
x = a;
} else {
x = b;
}
n = null;
*x = n; // strong update here would be unsound! c = *x;
Strong and weak updates
The points-to set for x contains multiple abstract cells
38

• X = alloc P :
• X = &Y:
• X = Y:
• X = null:
⟦v⟧ = JOIN(v)[X ↦ NN, alloc-i ↦ ?]
Null analysis constraints
⟦v⟧ = JOIN(v)[X ↦ NN]
⟦v⟧ = JOIN(v)[X ↦ JOIN(v)(Y)] ⟦v⟧ = JOIN(v)[X ↦ ?]
could be improved…
• In each case, the assignment modifies a program variable
• So we can use strong updates, as for heap load operations
39


Strong and weak updates, revisited
Strong update: [c ↦ new-value]
– possible if c is known to refer to a single concrete cell
– works for assignments to local variables (as long as TIP doesn’t have e.g. nested functions)
Weak update: [c ↦ (c) ⊔ new-value]
– necessary if c may refer to multiple concrete cells
– bad for precision, we lose some of the power of flow-sensitivity
– required for assignments to heap cells (unless we extend the analysis abstraction!)

40

Interprocedural null analysis
• Context insensitive or context sensitive, as usual… – at the after-call node, use the heap from the callee
• Butbecareful!
Pointers to local variables may escape to the callee
– the abstract state at the after-call node cannot simply copy the abstract values for local variables from the abstract state
at the call node
function f(b1, …, bn)
⬚ = f(E1, …, En);
x=⬚
result = E;
41

Using the null analysis
• The pointer dereference *p is “safe” at entry of v if JOIN(v)(p) = NN
• The quality of the null analysis depends on the quality of the underlying points-to analysis
42

Example program
p = alloc null;
q = &p;
n = null;
*q = n;
*p = n;
Andersen generates:
pt(p) = {alloc-1} pt(q) = {p}
pt(n) = Ø
43

Generated constraints
⟦p=alloc null⟧ = [p ↦ NN , alloc-1 ↦ ?]
⟦q=&p⟧ = ⟦p=alloc null⟧[q ↦ NN]
⟦n=null⟧ = ⟦q=&p⟧[n ↦ ?]
⟦*q=n⟧ = ⟦n=null⟧[p ↦ ⟦n=null⟧(p) ⊔ ⟦n=null⟧(n)] ⟦*p=n⟧ = ⟦*q=n⟧[alloc-1 ↦ ⟦*q=n⟧(alloc-1) ⊔ ⟦*q=n⟧(n)]
44

Solution
⟦p=alloc null⟧ = [p↦NN,q↦NN,n↦NN,alloc-1↦?] ⟦q=&p⟧ = [p↦NN,q↦NN,n↦NN,alloc-1↦?]
⟦n=null⟧ = [p↦NN,q↦NN,n↦?,alloc-1↦?]
⟦*q=n⟧ = [p↦?,q↦NN,n↦?,alloc-1↦?]
⟦*p=n⟧ = [p↦?,q↦NN,n↦?,alloc-1↦?]
• At the program point before the statement *q=n
the analysis now knows that q is definitely non-null
• … and before *p=n, the pointer p is maybe null
• Due to the weak updates for all heap store operations, precision is bad for alloc-i cells
45

Agenda
• Introduction to points-to analysis • Andersen’s analysis
• Steensgaards’s analysis
• Interprocedural points-to analysis • Null pointer analysis
• Flow-sensitive points-to analysis
46

Points-to graphs
• Graphsthatdescribepossibleheaps:
– nodes are abstract cells
– edges are possible pointers between the cells
• The lattice of points-to graphs is 2CellsCells ordered under subset inclusion
(or alternatively, Cells → 2Cells)
• For every CFG node, v, we introduce a constraint
variable ⟦v⟧ describing the state after v
• Intraprocedural analysis (i.e. ignore function calls)
48

Constraints
• For pointer operations:
• X = alloc P: ⟦v⟧ = JOIN(v)X ∪ { (X, alloc-i) }
• X=&Y:
• X=Y:
• X=*Y:
• *X=Y:
• X = null:
⟦v⟧ = JOIN(v)X ∪ { (X, Y) }
⟦v⟧ = JOIN(v)X ∪ { (X, t) | (Y, t)JOIN(v)}
⟦v⟧ = JOIN(v)X ∪ { (X, t) | (Y, s), (s, t)JOIN(v)}
⟦v⟧ = JOIN(v) ∪ { (s, t) | (X, s)JOIN(v), (Y, t) JOIN(v)}
note: weak update!
JOIN(v) = ⋃⟦w⟧ wpred(v)
⟦v⟧ = JOIN(v)X where X = { (s,t) | s  X}
• For all other CFG nodes: • ⟦v⟧=JOIN(v)
49

Example program
var x,y,n,p,q;
x = alloc null; y = alloc null;
*x = null; *y = y;
n = input;
while (n>0) {
p = alloc null; q = alloc null; *p = x; *q = y;
x = p; y = q;
n = n-1;
}
50

Result of analysis
• Aftertheloopwehavethispoints-tograph:
p
alloc-3
x
alloc-1
• Weconcludethatxandywillalwaysbedisjoint
q
y
alloc-4
alloc-2
51

Points-to maps from points-to graphs
• Apoints-tomapforeachprogrampointv: pt(X) = { t | (X,t)  ⟦v⟧ }
• Moreexpensive,butmoreprecise: – Andersen: pt(x) = { y, z }
– flow-sensitive: pt(x) = { z }
x=&y; x = &z;
52

Improving precision with abstract counting
• The points-to graph is missing information:
– alloc-2 nodes always form a self-loop in the example
• We need a more detailed lattice: 2CellCell  (Cell → Count)
where we for each cell keep track of
how many concrete cells that abstract cell describes
?
Count=0 1 >1 • Thispermitsstrongupdatesonthose
that describe precisely 1 concrete cell

53

Constraints
•X=allocP: … • *X=Y: … •…
54

Better results
• Aftertheloopwehavethisextendedpoints-tograph: ??
1
p
alloc-3
x
alloc-1
• Thus, alloc-2 nodes form a self-loop
q
y
alloc-4
1
alloc-2
55

Escape analysis
• Perform a points-to analysis
• Lookatreturnexpression
• Checkreachabilityinthepoints-to graph to arguments or variables defined in the function itself
baz() { var x;
return &x; }
main() { var p;
p=baz();
*p=1;
return *p;
}
• Noneofthose 
no escaping stack cells
60