程序代写代做代考 CS314 Fall 2018

CS314 Fall 2018

Assignment 8 Solutions

Problem 1 – Dependencies

Recall the following program (a sequence of instructions), with each instruction labeled as S:

S1: a := 4;

S2: b := 2;

S3: c := 5;

S4: read(d);

S5: a := b + 3;

S6: b := a – 3;

S7: c := d * b;

S8: e := a + 6;

S9: print(c);

S10: print(e);

(a) The statement-level dependence graph is as follows:

S1 S2 S3 S4

S5 S6

S8 S7

S10 S9

output
true

true/anti

output

true

true

true

true

output

true

(b)

The sequential execution time is 10 cycles (1 per instruction). For parallel execution time, note
that a statement can be executed once all statements it depends on have been executed. You
can schedule statements based on the topological sort of their dependences. The critical path
in this graph is S1 → S5 → S6 → S7 → S9 for a total of 5 cycles. Another critical path is S2 →
S5 → S6 → S7 → S9. One way to separate statements into groups that can be executed parallel
is:
Cycle 1: {S1, S2, S3, S4}
Cycle 2: {S5}
Cycle 3: {S6, S8}
Cycle 4: {S7, S10}
Cycle 5: {S9}

1

Problem 2 – Dependence Analysis

Give the source and sink references, the type (whether a dependence is true, anti, or output), and
the distance vectors for all dependences in the following loops.

(a) do i = 3, 100
a(i) = a(i – 1) + a(i + 1) – a(i – 2)

enddo

Source Reference Sink reference Type Distance vector

a(i) a(i – 1) true (1)

a(i) a(i – 2) true (2)

a(i + 1) a(i) anti (1)

(b) do i = 2, 100
a(3 * i) = a(3 * i – 3) + a(3 * i + 3)

enddo

Source Reference Sink reference Type Distance vector

a(3i + 3) a(3i) anti (1)

a(3i) a(3i – 3) true (1)

(c) do i = 1, 10
a(i) = a(5) + a(i)

enddo

Source Reference Sink reference Type Distance vector

aW (i) a(5) true

a(5) aW (i) anti

Problem 3 – Loop Parallelization

Given the following nested loop:

do i = 2, 100

do j = 2, 100

S1: a(i, j) = b(i – 1, j – 1) + 2

S2: b(i, j) = i + j – 1

enddo

enddo

(a) The statement-level dependence graph with distance vectors, along with the iteration space
graph, are given below:

S2

S1

true RAW (1, 1)

2

S2

S1

S2

S1

S2

S1

S2

S1

S2

S1

S2

S1

S2

S1

S2

S1

S2

S1

S2

S1

S2

S1

S2

S1

S2

S1

S2

S1

S2

S1

1 2 3 4 5 i

j

3

2

1

S2

S1

S2

S1

S2

S1

S2

S1

S2

S1

S2

S1

S2

S1

S2

S1

S2

S1

S2

S1

4

5

(b) The given loop above, in its current form, is parallelizable by fixing i. By fixing i, the j-loop can
be parallelizable, as shown in the dependence graph for statement instances.

(c) First, note that for statements S1 and S2, using the distance vector found in 3(a):

i2 = i1 + 1 , j2 = j1 + 1

Following Lecture 19, we wish to find C11, C12, C21, C22, d1, and d2 such that:

p(S1) = C11 ∗ i + C12 ∗ j + d1

p(S2) = C21 ∗ i + C22 ∗ j + d2
and

C11 ∗ i1 + C12 ∗ j1 + d1 = C21 ∗ i2 + C22 ∗ j2 + d2
Substitution gives the following:

C11 ∗ i1 + C12 ∗ j1 + d1 = C21 ∗ (i1 + 1) + C22 ∗ (j1 + 1) + d2

In matrix form, this is:

(
C11 C12

)

(
i1
j1

)
+ d1 =

(
C21 C22

)

(
i1 + 1
j1 + 1

)
+ d2

Simplifying this results in the following:

(
C11 − C21 C12 − C22

)

(
i1
j1

)
+ (d1 − d2 − (C21 + C22)) = 0

Solving for this gives:

C11 = C21, C12 = C22, (d1 − d2) = C21 + C22

3

Any values satisfying this relation will be correct. Since the hint given suggested d1 = d2 = 0,
set C11 = C21 = 1 and C12 = C22 = −1. Thus,

p(S1) = i− j

p(S2) = i− j

(d) We now generate the two-level loop for the affine schedule given above. First, we find the range
for p. Since i ranges from 2 to 100, and j ranges from 2 to 100, p ranges from -98 to 98. We
then have the following:

do p = -98, 98

do i = 2, 100

do j = 2, 100

if (i – j == p)

a(i, j) = b(i – 1, j – 1) + 2

b(i, j) = i + j – 1

endif

enddo

enddo

enddo

Now we can eliminate j by doing Fourier-Motzkin elimination: notice i − j = p. So, since j
ranges from 2 to 100,

2 ≤ i− p ≤ 100

Solving for this gives: i ≤ p + 100 and i ≥ p + 2. Moreover, note that i ranges from 2 to 100 as
well. Therefore, we have:

do p = -98, 98

do i = max(2, p + 2), min(100, p + 100)

a(i, i – p) = b(i – 1, i – p – 1) + 2

b(i, i – p) = i + (i – p) – 1

enddo

enddo

which is the necessary two-level loop code.

4