程序代写代做代考 distributed system algorithm graph compiler Static Program Analysis

Static Program Analysis
Part 1 – the TIP language
http://cs.au.dk/~amoeller/spa/
Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University

Questions about programs
• Doestheprogramterminateonallinputs?
• Howlargecantheheapbecomeduringexecution?
• Can sensitive information leak to non-trusted users?
• Can non-trusted users affect sensitive information?
• Are buffer-overruns possible?
• Dataraces?
• SQLinjections?
• XSS? •…
2

foo(p,x) {
var f,q;
if (*p==0) { f=1; }
else {
q = malloc;
*q = (*p)-1;
f=(*p)*(x(q,x));
}
return f; }
Invariants:
any point in the program = any value of the PC
Program points
A property holds at a program point if it holds in any such state for any execution with any input
3

Questions about program points
• Willthevalueofxbereadinthefuture?
• Canthepointerpbenull?
• Whichvariablescanppointto?
• Is the variable x initialized before it is read?
• What is a lower and upper bound on the value of the integer variable x?
• Atwhichprogrampointscouldxbeassignedits current value?
• Dopandqpointtodisjointstructuresintheheap?
• Can this assert statement fail?
4

Why are the answers interesting?
• Increaseefficiency
– resource usage
– compiler optimizations
• Ensurecorrectness – verify behavior
– catch bugs early
• Supportprogramunderstanding
• Enablerefactorings
5

Testing?
“Program testing can be used to show the presence of bugs, but never to show their absence.”
[Dijkstra, 1972]
• Testing often takes 50% of development cost
• Errors in concurrent/distributed systems are hard to (re)produce with testing (“Heisenbugs”)
6

Programs that reason about programs
a program analyzer A
P always works correctly
a program P
P fails for some inputs
7

Requirements to the perfect program analyzer
SOUNDNESS (don’t miss any errors) COMPLETENESS (don’t raise false alarms)
TERMINATION (always give an answer)

Rice’s theorem, 1953
9

Rice’s theorem
Any non-trivial property of the behavior of programs in a Turing-complete language is undecidable!
10

Reduction to the halting problem
• Can we decide if a variable has a constant value? x = 17; if (TM( j)) x = 18;
• Here, x is constant if and only if the j’th Turing machine does not halt on empty input
11

Undecidability of program correctness
M
P yes
Is the FAIL state
unreachable in
the given program
(for any input)?
e(T)
Build e(S from e(T)
e(ST)
ACCEPT
no
(ignoring input
) T
w
ST
Simulate T on input e(T)
• •
REJECT
w)
FAIL
Does M accept input e(M)?
If simulation reaches ACCEPT, then (without reaching FAIL)
goto Otherwise, just terminate or loop forever
12

Undecidability of program correctness
M
P yes
Is the FAIL state
unreachable in
the given program
(for any input)?
no
moves
goto
Does M accept input e(M)?
ACCEPT
e(T)
Build e(S from e(T)
e(ST)
) T
REJECT
w
ST
Simulate T on input e(T
) for |w|
• •
If simulation reaches ACCEPT, then Otherwise, just terminate
(without reaching FAIL)
FAIL
(Note: this proof works even if we only consider programs that always terminate!)
13

Approximation
• Approximate answers may be decidable!
• The approximation must be conservative:
– i.e. only err on “the safe side”
– which direction depends on the client application
• We’llfocusondecisionproblems
• Moresubtleapproximationsifnotonly“yes”/“no” – e.g. memory usage, pointer targets
14

False positives and false negatives
15

Example approximations
• Decide if a given function is ever called at runtime: – if “no”, remove the function from the code
– if “yes”, don’t do anything
– the “no” answer must always be correct if given
• Decide if a cast (A)x will always succeed: – if “yes”, don’t generate a runtime check
– if “no”, generate code for the cast
– the “yes” answer must always be correct if given
16

Beyond “yes”/“no” problems
• How much memory / time may be used in any execution?
• Which variables may be the targets of a pointer variable p?
17

The engineering challenge
• A correct but trivial approximation algorithm may just give the useless answer every time
• The engineering challenge is to give the useful answer often enough to fuel the client application
• …andtodosowithinreasonabletimeandspace
• This is the hard (and fun) part of static analysis!
18

Bug finding
int main() {
char *p,*q;
p = NULL; printf(“%s”,p);
q = (char *)malloc(100); p = q;
free(q);
*p = ‘x’;
free(p);
p = (char *)malloc(100); p = (char *)malloc(100); q = p;
strcat(p,q);
}
gcc –Wall foo.c
lint foo.c
No errors!
19

Does anyone use static program analysis?
For optimization:
• every optimizing compiler and modern JIT
For verification or error detection:
• •
• • • •
• • • •
20

A constraint-based approach
Conceptually separates the analysis specification from algorithmic aspects and implementation details
constraint solver
program to analyze
mathematical constraints
21
⟦p⟧ = &int ⟦q⟧ = &int ⟦alloc⟧ = &int ⟦x⟧ = 
⟦foo⟧ = 
⟦&n⟧ = &int ⟦main⟧ = ()->int
solution

Challenging features in modern programming language
• Higher-orderfunctions
• Mutable records or objects, arrays
• Integer or floating-point computations
• Dynamicdispatching
• Inheritance
• Exceptions
• Reflection •…
22

The TIP language
• TinyImperativeProgramminglanguage
• Example language used in this course: – minimal C-style syntax
– cut down as much as possible
– enough features to make static analysis challenging and fun
• Scalaimplementationavailable
23

Expressions
EI |X
| E+E | E–E | E*E | E/E | E>E | E==E | (E)
| input
• I represents an integer constant
• X represents an identifier (x, y, z,…)
• input expression reads an integer from the input stream
• comparison operators yield 0 (false) or 1 (true)
24

Statements
• In conditions, 0 is false, all other values are true
• The output statement writes an integer value to the output stream
S X=E;
| outputE; |SS
|
| if (E) {S} [else {S}]? | while (E) {S}
25

Functions
• Functions take any number of arguments and return a single value:
• The optional var block declares a collection of uninitialized variables
• Function calls are an extra kind of expressions:
F  X(X,…,X){
[var X, …, X;]? S
return E; }
E  X(E,…,E)
26

Records
E {X:E,…,X:E} | E.X
27

Pointers
(No pointer arithmetic)
E  alloc E | &X
| *E
| null
S  *X=E;
28

Functions as values
• Functions are first-class values
• The name of a function is like a variable that refers to
that function
• Generalizedfunctioncalls:
• Functionvaluessufficetoillustratethemainchallenges with methods (in object-oriented languages)
and higher-order functions (in functional languages)
E  E(E,…,E)
29

Programs
• Aprogramisacollectionoffunctions
• Thefunctionnamedmaininitiatesexecution – its arguments are taken from the input stream
– its result is placed on the output stream
• Weassumethatalldeclaredidentifiersareunique
P  F … F
30

An iterative factorial function
ite(n) { var f; f = 1;
while (n>0) {
f = f*n;
n = n-1; }
return f; }
31

A recursive factorial function
rec(n) {
var f;
if (n==0) { f=1;
} else {
f=n*rec(n-1);
}
return f; }
32

An unnecessarily complicated function
foo(p,x) { var f,q;
if (*p==0) {
f=1;
} else {
q = alloc 0;
*q = (*p)-1; f=(*p)*(x(q,x));
}
return f; }
main() { var n;
n = input;
return foo(&n,foo); }
33

Beyond TIP
Other common language features in mainstream languages:
• global variables • objects
• nestedfunctions •…
34

Control flow graphs
ite(n) {
var f;
f = 1;
while (n>0) { f = f*n;
n = n-1;
}
return f; }
false
var f
f=1
n>0
true
f=f*n
n=n-1
return f
35

Control flow graphs
• A control flow graph (CFG) is a directed graph:
– nodes correspond to program points
(either immediately before or after statements)
– edges represent possible flow of control
• ACFGalwayshas
– a single point of entry
– a single point of exit
(think of them as no-op statements)
• Let v be a node in a CFG
– pred(v) is the set of predecessor nodes – succ(v) is the set of successor nodes
36

CFG construction (1/3)
• For the simple while fragment of TIP, CFGs are constructed inductively
• CFGs for simple statements etc.:
X=E
output E
return E
var X
37

CFG construction (2/3)
For a statement sequence S1 S2:
– eliminate the exit node of S1 and the entry node of S2 – glue the statements together
S1
S1
S2
S2
38

CFG construction (3/3)
Similarly for the other control structures:
E
E
E
false
true
true false
false true
S
S1
S2
S
39

Normalization
• Sometimesconvenienttoensurethat
each CFG node performs only one operation

Normalization: flatten nested expressions, using fresh variables
x = f(y+3)*5;
t1 = y+3; t2 = f(t1); x = t2*5;
40