COMP90045 Programming Language Implementation
Dataflow Analysis Revisited
Harald Søndergaard Lecture 23
Semester 1, 2019
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 1 / 41
Monotone Framework
The data-flow analyses we’ve seen follow the same pattern:
assume the strongest possible information (no variables are live, all expressions are available, etc.) and weaken it until a “fixed point” is reached.
This is formalized as the monotone framework.
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 2 / 41
Lattices
Instances of the monotone framework are based on lattices.
A lattice is a set of objects L equipped with operators ⟨⊑, ⊔, ⊓⟩:
⊑ is a partial order
x ⊔ y (join) is the smallest element greater than both x and y x ⊓ y (meet) is the largest element smaller than both x and y
All our operations are monotone, that is:
x ⊑ y → f (x) ⊑ f (y).
This ensures each iteration moves upwards in the lattice.
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 3 / 41
The Liveness Lattice
Example program:
Lattice:
{x,y,z}
{x,y} {x,z} {y,z} {x} {y} {z} ∅
L = (P({x,y,z}),⊆)
var x,y,z;
x = input; while (x > 1) {
y = x/2;
if (y > 3) x = x − y ; z = x − 4;
if (z > 0) x = x/2;
z = z − 1;
}
output x;
PLI (Sem 1, 2019)
Dataflow Analysis Revisited ⃝c University of Melbourne 4 / 41
The Available Expressions Lattice
var x,y,z,a,b;
z = a + b;
y = a ∗ b;
while (y > a + b) {
a = a + 1;
x = a + b; }
we have 4 different nontrivial expressions, so our lattice is: L=(P({a+b,a∗b,y >a+b,a+1}),⊇)
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 5 / 41
The Available Expressions Lattice
L=(P({a+b,a∗b,y >a+b,a+1}),⊇) {}
{a+b} {a∗b}
{y >a+b} {a+1}
{a+b,a∗b} {a+b,y >a+b} {a∗b,y >a+b} {a+b,a+1} {a∗b,a+1} {y >a+b,a+1}
{a+b,a∗b,y >a+b} {a+b,a∗b,a+1} {a+b,y >a+b,a+1} {a∗b,y >a+b,a+1}
{a + b, a ∗ b, y > a + b, a + 1}
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 6 / 41
Naive Fixed Point Algorithm
If the CFG has nodes v1,v2,…,vn, then we work in the lattice Ln. Assuming that node vi generates the dataflow equation
[vi ] = Fi ([v1], . . . , [vn]), we construct the function F : Ln → Ln: F(x1,…,xn) = (F1(x1,…,xn),…,Fn(x1,…,xn))
The naive algorithm is this (⊥ is the smallest element):
x = (⊥,…,⊥);
do {t = x;x = F(x);} while (x ̸= t) ;
This produces the least fixed point x in finite time, provided the lattice has no infinite ascending chains.
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 7 / 41
Chaotic Iteration
A better way exploits the fact that our lattice has the structure Ln:
x1 = ⊥; . . . ; xn = ⊥; do {
t1 =x1;…;tn =xn; x1 = F1(x1,…,xn); …;
xn = Fn(x1,…,xn);
}while(x1 ̸=t1 ∨x2 ̸=t2 ∨···∨xn ̸=tn);
This chaotic iteration computes the least fixed point (x1, …, xn). Calculating ⊥, F(⊥), F(F(⊥)) and so on until the sequence is
stationary is sometimes called Kleene iteration.
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 8 / 41
Worklist Algorithm
Better yet, take node dependencies into account. Typically, each Fi will depend only on a few variables.
For each node v let the function influences : V → P(V) give the set of nodes u for which [v] occurs in a nontrivial manner in the right-hand side of u’s equation.
That is, influences(v) is the set of nodes whose information may depend on v’s information.
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 9 / 41
Worklist Algorithm
x1 =⊥;…;xn =⊥; q = [v1,…,vn]; while (q ̸= [ ]) {
vi = q.head();
y = Fi(x1,…,xn); q = q.tail();
if (y ̸= xi ) {
for (v ∈ influences(vi)) q.append(v);
xi =y; }
}
In practice this saves much time.
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 10 / 41
Example 1: Very Busy Expressions
In the last lecture we saw two examples of dataflow analysis, to discover liveness and available expressions. Let us go through some more examples.
An expression is very busy if it is definitely evaluated again before its value changes.
To approximate this property, we need the same lattice and auxiliary functions as for available expressions.
For every CFG node v the variable [v] denotes the set of very busy expressions at the program point before the node.
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 11 / 41
Very Busy Expressions: Dataflow Constraints
Define exps(e1 op e2) = {e1 op e2} ∪ exps(e1) ∪ exps(e2)—for all other expressions e, we have exps(e) = ∅.
JOIN(v) = [w] w∈succ(v)
The constraint for the exit node is: [exit] = {}
For conditions and output e we have: [v] = JOIN(v) ∪ exps(e) For an assignment x = e: [v] = JOIN(v) ↓ x ∪ exps(e)
For all other nodes: [v] = JOIN(v)
Here ↓ x removes every expression that refers to x.
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 12 / 41
Very Busy Expressions: Example Program
Moving the computation of a ∗ b to the earliest point where it is still very busy:
var x,a,b;
x = input;
a = x − 1;
b = x − 2; while (x > 0) {
output a∗b−x;
x = x − 1; }
output a ∗ b;
var x,a,b,atimesb; x = input;
a = x − 1;
b = x − 2;
atimesb = a ∗ b; while (x > 0) {
output atimesb − x;
x = x − 1; }
output atimesb;
Here the analysis shows a ∗ b to be very busy in the loop.
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 13 / 41
Example 2: Sign Analysis
To determine the sign (+, 0, −) of all expressions we do not need a powerset lattice, but just the small sign lattice Sign:
⊤ +0−
⊥
The full lattice for our analysis is the map lattice Vars → Sign, where Vars is the set of program variables.
For each CFG node v we assign a variable [v] that maps to each program variable its possible sign at that point.
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 14 / 41
Sign Analysis: Dataflow Constraints
JOIN(v) = [w] w∈pred(v)
For declaration var x1,…,xn: [v] = JOIN(v)[x1 → ⊤,…,xn → ⊤] For an assignment x = e: [v] = JOIN(v)[x → eval(JOIN(v),e)] For all other nodes: [v] = JOIN(v)
Here eval performs an abstract evaluation of expressions:
eval(σ,k) = sign(k)
eval(σ,x) = σ(x)
eval(σ,e1 ope2)=op(eval(σ,e1),eval(σ,e2))
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 15 / 41
Sign Analysis: op
+
⊥
⊥
⊥
0
⊥
−
⊥
+
⊥
⊤
⊥
0
⊥
0
−
+
⊤
−
⊥
−
−
⊤
⊤
+
⊥
+
⊤
+
⊤
⊤
⊥
⊤
⊤
⊤
⊤
*
⊥
⊥
⊥
0
0
−
⊥
+
⊥
⊤
⊥
0
0
0
0
0
0
−
⊥
0
+
−
⊤
+
⊥
0
−
+
⊤
⊤
⊥
0
⊤
⊤
⊤
Similarly for the other operations.
Note that these operations are monotone, as are ⊔ and ⊓. This guarantees that the analysis will terminate.
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 16 / 41
Example 3: More Precise Sign Analysis
The previous sign analysis is crude; much precision is lost.
For example, (3 − 2) == 1 yields ⊤, when we would hope for 1.
Similarly +/+ yields ⊤ because we cannot distinguish 8/4 from 4/8—the latter evaluating to 0.
This suggests we should enrich the sign lattice with elements 1, 0+, and 0- to keep track of more precise abstract values and describe the abstract operators by
8 × 8 tables.
⊤
0+ +
1
⊥
0
0− −
PLI (Sem 1, 2019) Dataflow Analysis Revisited
⃝c University of Melbourne
17 / 41
Example 4: Constant Propagation
For every program point, determine the variables that have a constant value.
A different lattice, but the constraints are as in the sign analysis. ⊤ (unknown value)
··· −3 −2 −1 0 1 2 3 ··· ⊥ (no value)
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne
18 / 41
Constant Propagation: Dataflow Constraints
JOIN(v) = [w] w∈pred(v)
For declaration var x1,…,xn: [v] = JOIN(v)[x1 → ⊤,…,xn → ⊤]
For an assignment x = e: [v] = JOIN(v)[x → eval(JOIN(v),e)]
For all other nodes: [v] = JOIN(v)
Here eval performs an abstract evaluation of expressions: eval(σ,k) = k
eval(σ,x) = σ(x)
⊥ i f e 1 = ⊥ o r e 2 = ⊥ eval(σ,e1ope2)= ⊤ ife1 =⊤ore2 =⊤
op(eval (σ, e1 ), eval (σ, e2 )) otherwise
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 19 / 41
Constant Folding
var x,y,z;
x = 27;
y = input;
z = 2 ∗ x + y; if (x < 0) {
y = z − 3; } else {
y = 12; }
output y;
var x,y,z; x = 27;
y = input; z = 54 + y ; if (0) {
y = z − 3; } else {
y = 12; }
output y;
Constant propagation justifies this transformation. The result can then be simplified to output 12;.
⇒
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 20 / 41
Example 5: Parity Analysis
read n
416 yes
halt
n == 1
no 2
7
yes
3 neven 5
Aside: This is the famous “3n + 1” program coming out of Collatz’s problem in number theory: Does the program halt for all n > 0?
no
n = n/2
n = 3*n+1
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 21 / 41
Parity Analysis: Lattice of Values
read n
416 yes
⊤
eo
⊥
halt
n == 1
no 2
n even
7
yes
no
n = n/2
3
5
n = 3*n+1
+
⊥eo⊤
⊥
e o ⊤
⊥⊥⊥⊥ ⊥eo⊤ ⊥oe⊤ ⊥⊤⊤⊤
∗
⊥eo⊤
⊥
e o ⊤
⊥⊥⊥⊥ ⊥eee ⊥eo⊤ ⊥e⊤⊤
halve (⊥) = ⊥ halve(o) = ⊤ halve(e) = ⊤ halve (⊤) = ⊤
PLI (Sem 1, 2019)
Dataflow Analysis Revisited
⃝c University of Melbourne
22 / 41
Parity Analysis: Flow Equations
read n
416 yes
Equations:
n1 = ⊤ ⊔ n4 ⊔ n6 n2 = n1 ⊓ ⊤
n3 = n2 ⊓ e
n4 = halve(n3) n5 = n2 ⊓ o
n6 =(o∗n5)+o n7 = n1 ⊓ o
halt
yes 3
n even
no
5
n == 1
no
7
2
n = n/2
n = 3*n+1
PLI (Sem 1, 2019)
Dataflow Analysis Revisited
⃝c University of Melbourne 23 / 41
Parity Analysis: Solving the Equations
We can think of the set of equations as defining a function F: n1 ⊤⊔n4⊔n6 ⊤
n2n1⊓⊤n1
n3 n2 ⊓ e n2 ⊓ e
F n4 n5 n6 n7
= halve(n3) = halve(n3) n2⊓o n2⊓o
(o∗n5)+o (o∗n5)+o n1 ⊓ o n1 ⊓ o
Starting from (⊥, ⊥, ⊥, ⊥, ⊥, ⊥, ⊥) and repeatedly applying F takes us to the solution (the least fixed point of F):
(n1,n2,n3,n4,n5,n6,n7) = (⊤,⊤,e,⊤,o,e,o)
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 24 / 41
Parity Analysis: Using the Information
The analysis allows us to improve the program, avoiding many tests:
read n
416 yes
halt
n == 1
no 2
7
yes
3 neven 5
no
n = n/2
n = 3*n+1
PLI (Sem 1, 2019)
Dataflow Analysis Revisited
⃝c University of Melbourne
25 / 41
Distributive Dataflow Problems
By monotonicity, we have f (x ∪ y) ≥ f (x) ∪ f (y) A function is distributive if f (x ∪ y) = f (x) ∪ f (y) Benefits of distributivity: joins lose no information!
f
h
g
k
k(h(f (⊥)∪g(⊥))) = k(h(f (⊥))∪h(g(⊥))) = k(h(f (⊥)))∪k(h(g(⊥)))
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 26 / 41
Accuracy of Dataflow Analyses
Ideally we would like to compute the join all over paths solution:
fs is the transfer function for statement s foreachpaths1,…,sn wecancomputefp =fsn ◦…◦fs1 let path(s) be the set of paths from the entry to s
JOP(s) = fp(⊥) p∈path(s)
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 27 / 41
Computing Join all Over Paths
Problems:
Loops results in an infinite number of paths Exponential blow up of number of paths
Solution:
Compute joins early at merge points rather than at the end This is our standard way of solving data flow equations and it is
called Minimal Fixed-Point (MFP)
If a dataflow analysis is distributive then JOP = MFP. Otherwise, JOP ≤ MFP.
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 28 / 41
Examples of Distributive Dataflow Analyses
Liveness
Available Expressions Very Busy Expressions Reaching Definitions Uninitialized Variables
PLI (Sem 1, 2019)
Dataflow Analysis Revisited
⃝c University of Melbourne
29 / 41
Non-Distributive Example: Constant Propagation
x=1
x=2
y=2
y =1
z=x+y
One possibility:
eval ([x → 1, y → 2] ∪ [x → 2, y → 1], z = x + y ) = ⊤ An alternative:
eval ([x → 1, y → 2], z = x + y ) = [x → 1, y → 2, z → 3] eval ([x → 2, y → 1], z = x + y ) = [x → 2, y → 1, z → 3] [x →2,y →1,z →3]∪[x →1,y →2,z →3]=[z →3]
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 30 / 41
Example 6: Interval Analysis
An interval analysis computes for every integer variable a lower and an upper bound for its possible values.
The lattice describing a single variable is defined as: Interval = {[lo, hi] | lo, hi ∈ N ∧ lo ≤ hi} ∪ {⊥}
where:
N = {−∞,…,−2,−1,0,1,2,…,∞}
is the set of integers extended with infinite endpoints and the order
on intervals is:
[lo1,hi1]⊑[lo2,hi2]⇔lo2 ≤lo1 ∧hi1 ≤hi2
corresponding to inclusion of points.
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 31 / 41
The Interval Lattice
[−∞, ∞]
[−∞, 1]
[−∞, 0] [−2, 2]
[−1, ∞]
[0, ∞]
[−∞, −1]
[−2, 1] [−1, 2] [−2, 0] [−1, 1] [0, 2]
[1, ∞]
[−2, −1] [−1, 0] [0, 1]
··· [−2,−2] [−1,−1] [0,0] [1,1] [2,2] ···
⊥
[1, 2]
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 32 / 41
Interval Analysis: Ascending Chains
The lattice has infinite ascending chains, for example, [0,0] ⊑ [0,1] ⊑ [0,2] ⊑ [0,3] ⊑ [0,4] ⊑ [0,5]…
This carries over to the lattice we would ultimately use, namely: L = Vars → Interval
We return shortly to how the associated termination problem can be solved.
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 33 / 41
Interval Analysis: Constraints
JOIN(v) = [w] w∈pred(v)
For the entry node: [entry ] = λx .[−∞, ∞]
For an assignment x = e: [v] = JOIN(v)[x → eval(JOIN(v),e)] Foraconditionalif x≤k:[v]=JOIN(v)⊓[x→[−∞,k]] Other comparisons are treated in similar ways.
For other nodes: [v] = JOIN(v)
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 34 / 41
Interval Analysis: Constraints
The function eval is defined:
eval(σ, k) = [k, k]
eval(σ,x) = σ(x)
eval(σ,e1 ope2)=op(eval(σ,e1),eval(σ,e2))
and the abstract operators are defined by:
op([l1,h1],[l2,h2]) = [ min x op y, max x op y]
x∈[l1,h1],y∈[l2,h2] x∈[l1,h1],y∈[l2,h2]
(Why so complicated? We will explore this in a tutorial exercise.)
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 35 / 41
Achieving Convergence through Widening
Since the interval lattice has infinite ascending chains, Kleene iteration may not terminate.
That is, for the lattice Ln, the sequence of approximants Fi(⊥,…,⊥) need never converge.
We may then resort to widening: Coarsening of information. We can introduce a function w : Ln → Ln so that the sequence:
(F ◦w)i(⊥,…,⊥)
now converges on a fixed-point that is larger than every F i (⊥, . . . , ⊥)
and thus represents sound information about the program.
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 36 / 41
Widening: Giving a Longer Leash
For our interval analysis, w is defined pointwise down to single intervals. It operates relative to a fixed finite subset B with {−∞, ∞} ⊆ B ⊂ N.
Typically, B will be seeded with all the integer constants found in the given program, but other heuristics can also be used.
On a single interval we have:
w([lo,hi])=[max{i ∈B |i ≤lo},min{i ∈B |hi ≤i}]
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 37 / 41
Narrowing: Pulling Back In Again
Widening shoots above the target, but subsequent narrowing may improve the result.
Narrowing means applying λz . z ⊓ F (z ).
This may refine our result and it cannot produce unsound
information.
Narrowing may be repeated as long as you please, and you may stop whenever you please.
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 38 / 41
Widening Example
Consider the program:
x = 90;
y = 0;
x = x + 1; while (input) {
x = 90;
x = x + 1; y = y + 1;
}
For the program point after the loop, without widening, the analysis will produce the diverging sequence of approximants shown on the right—not very useful.
[x → ⊥, y → ⊥]
[x → [91, 91], y → [0, 1]] [x → [91, 91], y → [0, 2]] [x → [91, 91], y → [0, 3]] .
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 39 / 41
Widening Example
If we apply widening, based on the set B = {−∞, 0, 1, 90, ∞}, we obtain a converging sequence:
[x → ⊥, y → ⊥]
[x → [90, ∞], y → [0, 1]] [x → [90, ∞], y → [0, 90]] [x → [90, ∞], y → [0, ∞]]
The result for x is disappointing, but a single application of narrowing refines the interval to the optimal result:
[x → [91, 91], y → [0, ∞]]
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 40 / 41
Further Reading
Parts of this lecture are based on Michael Schwartzbach’s Lecture Notes on Static Analysis, available out there somewhere on the inter-web.
PLI (Sem 1, 2019) Dataflow Analysis Revisited ⃝c University of Melbourne 41 / 41