CS考试辅导 Static Program Analysis

Static Program Analysis
Part 2 – type analysis and unification
http://cs.au.dk/~amoeller/spa/
øller & . Schwartzbach Computer Science, Aarhus University

Copyright By PowCoder代写 加微信 powcoder

Type errors
• Reasonablerestrictionsonoperations: – arithmetic operators apply only to integers
– comparisons apply only to like values – only integers can be input and output – conditions must be integers
– only functions can be called
– the * operator only applies to pointers
– field lookup can only be performed on records
– the fields being accessed are guaranteed to be present
• Violationsresultinruntimeerrors • Note:notypeannotationsinTIP

Type checking
• Cantypeerrorsoccurduringruntime?
• Thisisinteresting,henceinstantlyundecidable
• Instead,weuseconservativeapproximation
– a program is typable if it satisfies some type constraints
– these are systematically derived from the syntax tree – if typable, then no runtime errors occur
– but some programs will be unfairly rejected (slack)
• What we shall see next is the essence of the
Damas–Hindley–Milner type inference technique,
which forms the basis of the type systems of e.g. ML, OCaml, and Haskell

Typability
no type errors

Fighting slack
• Make the type checker a bit more clever:
• An eternal struggle

Fighting slack
• Make the type checker a bit more clever:
• An eternal struggle
• And a great source of publications

Be careful out there
• The type checker may be unsound:
• Example: covariant arrays in Java – a deliberate pragmatic choice

Generating and solving constraints
solver (unification)
constraints
⟦p⟧ = ⬆int
⟦q⟧ = ⬆int ⟦alloc 0⟧ = ⬆int ⟦x⟧ = 
⟦&n⟧ = ⬆int ⟦main⟧ = ()→int

• Typesdescribethepossiblevalues:
• Thesedescribeintegers,pointers,functions, and records
• Types are terms generated by this grammar – example: (int,⬆int)→⬆⬆int
Type  int | ⬆️ Type
| (Type, …, Type ) → Type | { Id: Type, …, Id: Type }

Type constraints
• We generate type constraints from an AST: – all constraints are equalities
– they can be solved using a unification algorithm
• Typevariables:
– for each identifier declaration X we have the variable ⟦X⟧
– for each non-identifier expression E we have the variable ⟦E⟧
• Recall that all identifiers are unique
• The expression E denotes an AST node, not syntax
• (Possibleextensions:polymorphism,subtyping,…)

Generating constraints (1/3)
if (E) {S}:
if (E) {S1} else {S2}: while (E) {S}:
⟦E1⟧=⟦E2⟧=⟦E1 opE2⟧=int ⟦E1⟧ = ⟦E2⟧  ⟦E1==E2⟧ = int ⟦input⟧ = int

Generating constraints (2/3)
X(X1,…,Xn){ … return E; }:
⟦X⟧ = (⟦X1⟧, …, ⟦Xn⟧) → ⟦E⟧
E(E1, …, En):
⟦E⟧ = (⟦E1⟧, …, ⟦En⟧) → ⟦E(E1, …, En)⟧
alloc E: &X: null: *E:
⟦alloc E⟧ = ⬆⟦E⟧ ⟦&X⟧ = ⬆⟦X⟧ ⟦null⟧ = ⬆
⟦E⟧ = ⬆⟦*E⟧
⟦E1⟧ = ⬆⟦E2⟧
(each  is a fresh type variable)
For each parameter X of the main function: ⟦X⟧ = int For the return expression E of the main function: ⟦E⟧ = int

var x, y, z; x = input;
y = alloc 8; *y = x;
z = *y; return x;
• Generate and solve the constraints
• Then try withy = alloc 8 replaced byy = 42
• Also try with the Scala implementation (when it’s completed)

Generating constraints (3/3)
{X1:E1, …, Xn:En}:
⟦{X1:E1, …, Xn:En}⟧ = { X1:⟦E1⟧, …, Xn:⟦En⟧ }
E.X: ⟦E⟧ = { …, X:⟦E.X⟧, … }
This is the idea, but not directly expressible in our language of types

Generating constraints (3/3)
Let {f1, f2, …, fm} be the set of field names that appear in the program
Extend Type  … |⋄ where ⋄ represents absent fields
{X1:E1, …, Xn:En}: ⟦{X1:E1, …, Xn:En}⟧ = { f1:γ1, …, fm:γm }
where γi = E.X:
where γi =
⟦Ej⟧iffi=Xj forsomej ⋄ otherwise
⟦E⟧= { f1:γ1, …, fm:γm } ∧ ⟦E.X⟧≠⋄
⟦E.X⟧ if fi = X αi otherwise
(Field write statements? Exercise…)

General terms
Ex: int Ex: &
Ex: (1, 2)→ 3
Constructor symbols: • 0-ary: a, b, c
• 1-ary: d, e • 2-ary: f, g, h • 3-ary: i, j, k
• h(a,g(d(a),b))
Terms with variables: • f(X,b)
• h(X,g(Y,Z))
X, Y, and Z here are type variables, like ⟦(*p)-1⟧ or ⟦p⟧,
not program variables

The unification problem
• Anequalitybetweentwotermswithvariables: k(X,b,Y) = k(f(Y,Z),Z,d(Z))
• A solution (a unifier) is an assignment from variables to terms that makes both sides equal:
X = f(d(b),b) Y = d(b) Z=b
Implicit constraint for term equality: c(t1,…,tk) = c(t1’,…,tk’)  ti = ti’ for all i

Unification errors
• Constructorerror: d(X) = e(X)
• Arityerror: a = a(X)

The linear unification algorithm
• Paterson and Wegman (1978) • IntimeO(n):
– finds a most general unifier – or decides that none exists
• Can be used as a back-end for type checking
• … but only for finite terms

Recursive data structures
The program
p = alloc null; *p = p;
creates these constraints
which have this “recursive solution” for p: ⟦p⟧=t wheret=⬆t
⟦null⟧ = ⬆t
⟦alloc null⟧ = ⬆⟦null⟧ ⟦p⟧ = ⟦alloc null⟧
⟦p⟧ = ⬆⟦p⟧

Regular terms
• Infinitebut(eventually)repeating:
– e(e(e(e(e(e(…))))))
– d(a,d(a,d(a,…)))
– f(f(f(f(…),f(…)),f(f(…),f(…))),f(f(f(…),f(…)),f(f(…),f(…))))
• Only finitely many different subtrees • Anon-regularterm:
– f(a,f(d(a),f(d(d(a)),f(d(d(d(a))),…))))

Regular unification
• Huet(1976)
• The unification problem for regular terms can be solved in O(n⋅ A(n))
using a union-find algorithm
• A(n) is the inverse Ackermann function:
– smallest k such that n  Ack(k,k)
– this is never bigger than 5 for any real value of n
• SeetheTIPimplementation…

Union-Find
union(x, y) { xr := find(x) yr := find(y) if xr = yr
if xr.rank < yr.rank xr.parent := yr else yr.parent := xr if xr.rank = yr.rank xr.rank := xr.rank + 1 } makeset(x) { x.parent := x x.rank := 0 if x.parent != x x.parent := find(x.parent) return x.parent Union-Find (simplified) makeset(x) { x.parent := x union(x, y) { xr := find(x) yr := find(y) if xr = yr return xr.parent := yr if x.parent != x x.parent := find(x.parent) return x.parent Implement ‘unify’ procedure using union and find to unify terms... Implementation strategy • Representationofthedifferentkindsoftypes (including type variables) • MapfromASTnodestotypevariables • Union-Find • ST, generate constraints, unify on the fly – report type error if unification fails – when unifying a type variable with e.g. a function type, it is useful to pick the function type as representative – for outputting solution, assign names to type variables (that are roots), and be careful about recursive types The complicated function foo(p,x) { var f,q; if (*p==0) { q = alloc 0; *q = (*p)-1; f=(*p)*(x(q,x)); return f; } main() { var n; n = input; return foo(&n,foo); Generated constraints ⟦*p==0⟧ = int ⟦q⟧ = ⟦alloc 0⟧ ⟦q⟧ = ⬆⟦(*p)-1⟧ ⟦*p⟧ = int ⟦(*p)*(x(q,x))⟧ = int ⟦x⟧ = (⟦q⟧,⟦x⟧)→⟦x(q,x)⟧ ⟦main⟧ = ()→⟦foo(&n,foo)⟧ ⟦&n⟧ = ⬆⟦n⟧ ⟦*p⟧ = ⟦0⟧ ⟦foo(&n,foo)⟧ = int ⟦foo⟧ = (⟦p⟧,⟦x⟧)→⟦f⟧ ⟦*p⟧ = int ⟦p⟧ = ⬆⟦*p⟧ ⟦alloc 0⟧ = ⬆⟦0⟧ ⟦q⟧ = ⬆⟦*q⟧ ⟦f⟧ = ⟦(*p)*(x(q,x))⟧ ⟦x(q,x)⟧ = int ⟦input⟧ = int ⟦n⟧ = ⟦input⟧ ⟦foo⟧ =(⟦&n⟧,⟦foo⟧)→⟦foo(&n,foo)⟧ ⟦(*p)-1⟧ = int Here,  is the regular type that is the unfolding of  = (⬆int,)→int which can also be written  = μ t.(⬆int, t)→int All other variables are assigned int ⟦p⟧ = ⬆int ⟦q⟧ = ⬆int ⟦alloc 0⟧ = ⬆int ⟦x⟧ =  ⟦&n⟧ = ⬆int ⟦main⟧ = ()→int Infinitely many solutions The function return *x; has type (⬆)→ for any type  (which is not expressible in our current type language) Recursive and polymorphic types • Extra notation for recursive and polymorphic types: (not very useful unless we also add polymorphic expansion at calls, but that makes complexity exponential, or even undecidable...) • A type  ∈ Type is a (finite) term generated by this grammar • μ α.  is the (potentially recursive) type  where occurrences of α represent  itself • α ∈ TypeVar is a type variable (implicitly universally quantified if not bound by an enclosing μ) Type  ... | μ TypeVar. Type | TypeVar TypeVar  t | u | ... Slack – let-polymorphism f(x) { return *x; return f(alloc 1) + *(f(alloc(alloc 2)); This never has a type error at runtime – but it is not typable ⬆int = ⟦x⟧ = ⬆⬆int But we could analyze f before main: ⟦f⟧ = (⬆t)→ t and then “instantiate” that type at each call to f in main Slack – let-polymorphism polyrec(g,x) { if (x==0) { r=polyrec(2,0); return r+1; } return polyrec(null,1) This never has a type error at runtime – but it is not typable And let-polymorphism doesn’t work here because bar is recursive Slack – flow-insensitivity f() { var x; x = alloc 17; return x + 87; This never has a type error at runtime – but it is not typable The type analysis is flow insensitive (it ignores the order of statements) 33 Other programming errors • Not all errors are type errors: – dereference of null pointers – reading of uninitialized variables – division by zero – escaping stack cells (why not?) baz() { var x; return &x; } p=baz(); *p=1; return *p; • Other kinds of static analysis may catch these 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com