CS314 Fall 2018
Assignment 8 Solutions
Problem 1 – Dependencies
Recall the following program (a sequence of instructions), with each instruction labeled as S
S1: a := 4;
S2: b := 2;
S3: c := 5;
S4: read(d);
S5: a := b + 3;
S6: b := a – 3;
S7: c := d * b;
S8: e := a + 6;
S9: print(c);
S10: print(e);
(a) The statement-level dependence graph is as follows:
S1 S2 S3 S4
S5 S6
S8 S7
S10 S9
output
true
true/anti
output
true
true
true
true
output
true
(b)
The sequential execution time is 10 cycles (1 per instruction). For parallel execution time, note
that a statement can be executed once all statements it depends on have been executed. You
can schedule statements based on the topological sort of their dependences. The critical path
in this graph is S1 → S5 → S6 → S7 → S9 for a total of 5 cycles. Another critical path is S2 →
S5 → S6 → S7 → S9. One way to separate statements into groups that can be executed parallel
is:
Cycle 1: {S1, S2, S3, S4}
Cycle 2: {S5}
Cycle 3: {S6, S8}
Cycle 4: {S7, S10}
Cycle 5: {S9}
1
Problem 2 – Dependence Analysis
Give the source and sink references, the type (whether a dependence is true, anti, or output), and
the distance vectors for all dependences in the following loops.
(a) do i = 3, 100
a(i) = a(i – 1) + a(i + 1) – a(i – 2)
enddo
Source Reference Sink reference Type Distance vector
a(i) a(i – 1) true (1)
a(i) a(i – 2) true (2)
a(i + 1) a(i) anti (1)
(b) do i = 2, 100
a(3 * i) = a(3 * i – 3) + a(3 * i + 3)
enddo
Source Reference Sink reference Type Distance vector
a(3i + 3) a(3i) anti (1)
a(3i) a(3i – 3) true (1)
(c) do i = 1, 10
a(i) = a(5) + a(i)
enddo
Source Reference Sink reference Type Distance vector
aW (i) a(5) true
a(5) aW (i) anti
Problem 3 – Loop Parallelization
Given the following nested loop:
do i = 2, 100
do j = 2, 100
S1: a(i, j) = b(i – 1, j – 1) + 2
S2: b(i, j) = i + j – 1
enddo
enddo
(a) The statement-level dependence graph with distance vectors, along with the iteration space
graph, are given below:
S2
S1
true RAW (1, 1)
2
S2
S1
S2
S1
S2
S1
S2
S1
S2
S1
S2
S1
S2
S1
S2
S1
S2
S1
S2
S1
S2
S1
S2
S1
S2
S1
S2
S1
S2
S1
1 2 3 4 5 i
j
3
2
1
S2
S1
S2
S1
S2
S1
S2
S1
S2
S1
S2
S1
S2
S1
S2
S1
S2
S1
S2
S1
4
5
(b) The given loop above, in its current form, is parallelizable by fixing i. By fixing i, the j-loop can
be parallelizable, as shown in the dependence graph for statement instances.
(c) First, note that for statements S1 and S2, using the distance vector found in 3(a):
i2 = i1 + 1 , j2 = j1 + 1
Following Lecture 19, we wish to find C11, C12, C21, C22, d1, and d2 such that:
p(S1) = C11 ∗ i + C12 ∗ j + d1
p(S2) = C21 ∗ i + C22 ∗ j + d2
and
C11 ∗ i1 + C12 ∗ j1 + d1 = C21 ∗ i2 + C22 ∗ j2 + d2
Substitution gives the following:
C11 ∗ i1 + C12 ∗ j1 + d1 = C21 ∗ (i1 + 1) + C22 ∗ (j1 + 1) + d2
In matrix form, this is:
(
C11 C12
)
∗
(
i1
j1
)
+ d1 =
(
C21 C22
)
∗
(
i1 + 1
j1 + 1
)
+ d2
Simplifying this results in the following:
(
C11 − C21 C12 − C22
)
∗
(
i1
j1
)
+ (d1 − d2 − (C21 + C22)) = 0
Solving for this gives:
C11 = C21, C12 = C22, (d1 − d2) = C21 + C22
3
Any values satisfying this relation will be correct. Since the hint given suggested d1 = d2 = 0,
set C11 = C21 = 1 and C12 = C22 = −1. Thus,
p(S1) = i− j
p(S2) = i− j
(d) We now generate the two-level loop for the affine schedule given above. First, we find the range
for p. Since i ranges from 2 to 100, and j ranges from 2 to 100, p ranges from -98 to 98. We
then have the following:
do p = -98, 98
do i = 2, 100
do j = 2, 100
if (i – j == p)
a(i, j) = b(i – 1, j – 1) + 2
b(i, j) = i + j – 1
endif
enddo
enddo
enddo
Now we can eliminate j by doing Fourier-Motzkin elimination: notice i − j = p. So, since j
ranges from 2 to 100,
2 ≤ i− p ≤ 100
Solving for this gives: i ≤ p + 100 and i ≥ p + 2. Moreover, note that i ranges from 2 to 100 as
well. Therefore, we have:
do p = -98, 98
do i = max(2, p + 2), min(100, p + 100)
a(i, i – p) = b(i – 1, i – p – 1) + 2
b(i, i – p) = i + (i – p) – 1
enddo
enddo
which is the necessary two-level loop code.
4