程序代写代做代考 ocaml C data structure Java algorithm Haskell Static Program Analysis

Static Program Analysis
Part 2 – type analysis and unification
http://cs.au.dk/~amoeller/spa/
Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University

Type errors
• Reasonable restrictions on operations: – arithmetic operators apply only to integers
– comparisons apply only to like values – only integers can be input and output – conditions must be integers
– only functions can be called
– the * operator applies only to pointers
– field lookup can only be performed on records
• Violationsresultinruntimeerrors
2

Type checking
• Can type errors occur during runtime?
• This is interesting, hence instantly undecidable
• Instead,weuseconservativeapproximation
– a program is typable if it satisfies some type constraints
– these are systematically derived from the syntax tree – if typable, then no runtime errors occur
– but some programs will be unfairly rejected (slack)
• What we shall see next is the essence of the
Damas–Hindley–Milner type inference technique,
which forms the basis of the type systems of e.g. ML, OCaml, and Haskell
3

no type errors
typable
Typability
slack
4

Fighting slack
• Make the type checker a bit more clever:
• Aneternalstruggle
5

Fighting slack
• Make the type checker a bit more clever:
• Aneternalstruggle
• And a great source of publications
6

Be careful out there
• The type checker may be unsound:
• Example: covariant arrays in Java – a deliberate pragmatic choice
7

Generating and solving constraints
AST
solver (unification)
⟦p⟧ = &int
⟦q⟧ = &int ⟦alloc 0⟧ = &int ⟦x⟧ = 
⟦foo⟧ = 
⟦&n⟧ = &int ⟦main⟧ = ()->int
constraints
solution
8

Types
• Types describe the possible values:
• These describe integers, pointers, functions, and records
• Typesaretermsgeneratedbythisgrammar – example: (int,&int)->&&int
int | &
| (,…,)-> | {X:,…,X:}
9

Type constraints
• We generate type constraints from an AST: – all constraints are equalities
– they can be solved using a unification algorithm
• Typevariables:
– for each identifier declaration X we have the variable ⟦X⟧
– for each non-identifier expression E we have the variable ⟦E⟧
• Recall that all identifiers are unique
• The expression E denotes an AST node, not syntax
• (Possible extensions: polymorphism, subtyping, …)
10

Generating constraints (1/3)
I:
E1 op E2:
E1 == E2:
input:
X = E:
output E:
if (E) {S}:
if (E) {S1} else {S2}: while (E) {S}:
⟦I⟧ = int
⟦E1⟧ = ⟦E2⟧ = ⟦E1 op E2⟧ = int ⟦E1⟧ = ⟦E2⟧  ⟦E1==E2⟧ = int ⟦input⟧ = int
⟦X⟧ = ⟦E⟧
⟦E⟧ = int
⟦E⟧ = int
⟦E⟧ = int
⟦E⟧ = int
11

Generating constraints (2/3)
X(X1,…,Xn){ … return E; }:
⟦X⟧ = (⟦X1⟧, …, ⟦Xn⟧) -> ⟦E⟧
E(E1, …, En):
⟦E⟧ = (⟦E1⟧, …, ⟦En⟧) -> ⟦E(E1, …, En)⟧
alloc E: &X: null: *E: *X=E:
⟦alloc E⟧ = &⟦E⟧ ⟦&X⟧ = &⟦X⟧ ⟦null⟧ = &
⟦E⟧ = &⟦*E⟧
⟦X⟧ =&⟦E⟧
(each  is a fresh type variable)
12

Generating constraints (3/3)
{X1:E1, …, Xn:En}:
⟦{X1:E1, …, Xn:En}⟧ = { X1:⟦E1⟧, …, Xn:⟦En⟧ }
E.X: ⟦E⟧={ …,X:⟦E.X⟧,…}
This is the idea, but not directly expressible in our language of types
13

Generating constraints (3/3)
Let {f1, f2, …, fm} be the set of field names that appear in the program
{X1:E1, …, Xn:En}: ⟦{X1:E1, …, Xn:En}⟧ = { f1:γ1, …, fm: γm }
where γi = E.X:
where γi =
⟦Ej⟧iffi=Xj forsomej αi otherwise
⟦E⟧= { f1:γ1, …, fm: γm }
⟦E.X⟧ if fi = X αi otherwise
14

Exercise
main() {
var x, y, z;
x = input;
y = alloc 8;
*y = x;
z = *y;
return x;
}
• Generate and solve the constraints
• Then try withy = alloc 8 replaced byy = 42
• Also try with the Scala implementation (when it’s completed)
15

General terms
Ex: int Ex: &
Ex: (1, 2)-> 3
Constructor symbols: • 0-ary: a, b, c
• 1-ary: d, e • 2-ary: f, g, h • 3-ary: i, j, k
Terms: •a
• d(a)
• h(a,g(d(a),b))
Terms with variables: • f(X,b)
• h(X,g(Y,Z))
16

The unification problem
• An equality between two terms with variables: k(X,b,Y) = k(f(Y,Z),Z,d(Z))
• A solution (a unifier) is an assignment from variables to closed terms that makes both sides equal:
X = f(d(b),b) Y = d(b) Z=b
Implicit constraint for term equality: c(t1,…,tk) = c(t1’,…,tk’)  ti = ti’ for all i
17

Unification errors
• Constructorerror: d(X) = e(X)
• Arityerror: a = a(X)
18

The linear unification algorithm
• Paterson and Wegman (1978)
• In time O(n):
– finds a most general unifier – or decides that none exists
• Can be used as a back-end for type checking
• … but only for finite terms
19

Recursive data structures
The program
var p;
p = alloc null;
*p = p;
creates these constraints
which have this “recursive solution” for p: ⟦p⟧= where=&
⟦null⟧ = &
⟦alloc null⟧ = &⟦null⟧ ⟦p⟧ = &⟦alloc null⟧
⟦p⟧ = &⟦p⟧
20

Regular terms
• Infinitebut(eventually)repeating:
– e(e(e(e(e(e(…))))))
– d(a,d(a,d(a, …)))
– f(f(f(f(…),f(…)),f(f(…),f(…))),f(f(f(…),f(…)),f(f(…),f(…))))
• Onlyfinitelymanydifferentsubtrees • A non-regular term:
– f(a,f(d(a),f(d(d(a)),f(d(d(d(a))),…))))
21

Regular unification
• Huet(1976)
• The unification problem for regular terms can be solved in O(n⋅ A(n))
using a union-find algorithm
• A(n) is the inverse Ackermann function:
– smallest k such that n  Ack(k,k)
– this is never bigger than 5 for any real value of n
• See the TIP implementation…
22

Union-Find
union(x, y) { xr := find(x) yr := find(y) if xr = yr
return
if xr.rank < yr.rank xr.parent := yr else yr.parent := xr if xr.rank = yr.rank xr.rank := xr.rank + 1 } makeset(x) { x.parent := x x.rank := 0 } find(x) { if x.parent != x x.parent := find(x.parent) return x.parent } 23 Union-Find (simplified) makeset(x) { x.parent := x } union(x, y) { xr := find(x) yr := find(y) if xr = yr return xr.parent := yr } find(x) { if x.parent != x x.parent := find(x.parent) return x.parent } Implement ‘unify’ procedure using union and find to unify terms... 24 Implementation strategy • Representation of the different kinds of types (including type variables) • Map from AST nodes to types • Union-Find • Traverse AST, generate constraints, unify on the fly – report type error if unification fails – when unifying a type variable with e.g. a function type, it is useful to pick the function type as representative – for outputting solution, assign names to type variables (that are roots), and be careful about recursive types 25 The complicated function foo(p,x) { var f,q; if (*p==0) { f=1; } else { q = alloc 0; *q = (*p)-1; f=(*p)*(x(q,x)); } return f; } main() { var n; n = input; return foo(&n,foo); } 26 Generated constraints ⟦*p==0⟧ = int ⟦f⟧ = ⟦1⟧ ⟦0⟧ = int ⟦q⟧ = ⟦alloc 0⟧ ⟦q⟧ = &⟦(*p)-1⟧ ⟦*p⟧ = int ⟦(*p)*(x(q,x))⟧ = int ⟦x⟧ = (⟦q⟧,⟦x⟧)->⟦x(q,x)⟧ ⟦main⟧ = ()->⟦foo(&n,foo)⟧ ⟦&n⟧ = &⟦n⟧
⟦(*p)-1⟧ = int ⟦*p⟧ = ⟦0⟧
⟦foo⟧ = (⟦p⟧,⟦x⟧)->⟦f⟧ ⟦*p⟧ = int
⟦1⟧ = int
⟦p⟧ = &⟦*p⟧
⟦alloc 0⟧ = &⟦0⟧
⟦q⟧ = &⟦*q⟧
⟦f⟧ = ⟦(*p)*(x(q,x))⟧
⟦x(q,x)⟧ = int
⟦input⟧ = int
⟦n⟧ = ⟦input⟧
⟦foo⟧ =(⟦&n⟧,⟦foo⟧)->⟦foo(&n,foo)⟧
27

Solutions
Here,  is the regular type that is the unfolding of  = (&int,)->int
which can also be written  = μ α.(&int,α)->int All other variables are assigned int
⟦p⟧ = &int
⟦q⟧ = &int ⟦alloc 0⟧ = &int ⟦x⟧ = 
⟦foo⟧ = 
⟦&n⟧ = &int ⟦main⟧ = ()->int
28

Infinitely many solutions
The function
poly(x) {
return *x;
}
has type (&)-> for any type 
(which is not expressible in our current type language)
29

Recursive and polymorphic types
• Extra notation for recursive and polymorphic types:
(not very useful unless we also add polymorphic expansion at calls,
but that makes complexity exponential, or even undecidable…)
…
| μα.

• Types are (finite) terms generated by this grammar
• μ α.  is the (potentially recursive) type  where
occurrences of α represent  itself
• α is a type variable (implicitly universally quantified
if not bound by an enclosing μ) 30

Slack
main() {
return bar(null,1)
}
This never has a type error at runtime – but it is not typable: int= ⟦r⟧ = ⟦g⟧ =&
bar(g,x) {
var r;
if (x==0) {
r=g;
} else { r=bar(2,0);
}
return r+1; }
31

• Not all errors are type errors: – dereference of null pointers
– reading of uninitialized variables – division by zero
– escaping stack cells
(why not?)
baz() { var x;
return &x; }
main() { var p;
p=baz();
*p=1;
return *p;
}
Other errors
• Other kinds of static analysis may catch these
32