程序代写代做代考 concurrency lec18

lec18

CS 314 Principles of Programming Languages

Prof. Zheng Zhang
Rutgers University

Lecture 18: Parallelism and Dependence Analysis

November 9, 2018

Class Information

• Project 2 is released.
• Homework 7 will be released this weekend.

Programming with Concurrency

• A PROCESS or THREAD is a potentially-active execution context
• Classic von Neumann model of computing has single thread of

control, however parallel programs have more than one
• A process or thread can be thought of as  

An abstraction of a physical PROCESSOR
• Processes/Threads can come from
‣ Multiple CPUs
‣ Kernel-level multiplexing of single physical machine
‣ Language or library level multiplexing of kernel-level abstraction

• They can run
‣ In true parallel
‣ Unpredictably interleaved
‣ Run-until-block

Dependence and Parallelization

The dependence relation can be modeled as a directed graph such that  
if A → B, the result of task A is required for the processing of task B

Dependence analysis is fundamental to parallelization analysis

S1: pi = 3.14
S2: R = 5
S3: Area = pi * R2

S1 S2

Example:

Statement-level dependence graph

Dependence relation: all task–to–task execution orderings that must be
preserved if the meaning of the program is to remain the same.

Dependence Graph

• Directed acyclic graph (DAG)
• A node represents a task
• A directed edge represents precedence constraint

start

end

c
d

g
f

DAG example 1:

• Directed acyclic graph (DAG)
• A node represents a task
• A directed edge represents precedence constraint

Dependence Graph

DAG example 2:

S = sum(A[1], A[2], …, A[N])

+ + +…

+ +

……

+ +

A[1] A[2] A[N]A[3] A[4] …… A[N-1]

Scheduling a DAG

Tp: time to perform computation with p processors
• T1 : work (total # operations)
• T∞ : critical path or span

start

end

c
d

g
f

Assuming a task
takes 1 unit time

Scheduling a DAG

Tp: time to perform computation with p processors
• T1 : work (total # operations)
• T∞ : critical path or span

start

end

c
d

g
f

T1 = ?

Assuming a task
takes 1 unit time

Scheduling a DAG

Tp: time to perform computation with p processors
• T1 : work (total # operations)
• T∞ : critical path or span

start

end

c
d

g
f

T∞ = ?

Assuming a task
takes 1 unit time

Scheduling a DAG

Tp: time to perform computation with p processors
• T1 : work (total # operations)
• T∞ : critical path or span

start

end

c
d

g
f

T∞ = ?

Assuming a task
takes 1 unit time

start

end

c
d

g
f

Computing Critical Path

Compute the earliest start time of each node
• Keep a value called S(n) associated with each node n
• For each node n 
S(n) is the maximum of { S(p) + 1 }, for all p ∈ pred(n)  

Assuming a task
takes 1 unit time

start

end

c
d

g
f

Computing Critical Path

: 1
Assuming a task
takes 1 unit time

Compute the earliest start time of each node
• Keep a value called S(n) associated with each node n
• For each node n 
S(n) is the maximum of { S(p) + 1 }, for all p ∈ pred(n)  

start

end

c
d

g
f

Computing Critical Path

: 1
Assuming a task
takes 1 unit time

: 2 : 2
: 2

Compute the earliest start time of each node
• Keep a value called S(n) associated with each node n
• For each node n 
S(n) is the maximum of { S(p) + 1 }, for all p ∈ pred(n)  

start

end

c
d

g
f

Computing Critical Path

: 1
Assuming a task
takes 1 unit time

: 2 : 2
: 2

: 3
: 3

: 3

Compute the earliest start time of each node
• Keep a value called S(n) associated with each node n
• For each node n 
S(n) is the maximum of { S(p) + 1 }, for all p ∈ pred(n)  

start

end

c
d

g
f

Computing Critical Path

: 1
Assuming a task
takes 1 unit time

: 2 : 2
: 2

: 3
: 3

: 3

: 4
: 4

Compute the earliest start time of each node
• Keep a value called S(n) associated with each node n
• For each node n 
S(n) is the maximum of { S(p) + 1 }, for all p ∈ pred(n)  

start

end

c
d

g
f

Computing Critical Path

: 1
Assuming a task
takes 1 unit time

: 2 : 2
: 2

: 3
: 3

: 3

: 4
: 4

: 5

Compute the earliest start time of each node
• Keep a value called S(n) associated with each node n
• For each node n 
S(n) is the maximum of { S(p) + 1 }, for all p ∈ pred(n)  

Based on if the dependence constraints have been resolved
• Schedule the nodes that are ready at every time tick
• A completed operation at the end of one time step can lead to more

ready operations at next time tick  

start

end

c
d

g
f

List Scheduling

: 1
: 2 : 2

: 2

: 3
: 3

: 3

: 4
: 4

: 5

1 2 3 4 5

T1 start a b f end
T2 c e i
T3 d g
T4 h

Four threads T1, T2, T3, T4

Assuming a task
takes 1 unit time

Automatic Parallelization

We will use loop analysis as an example to describe automatic
dependence analysis and parallelization.

Assumptions:
1. We only have scalar and subscripted variables (no pointers and

no control dependence) for loop dependence analysis.
2. We focus on affine loops: both loop bounds and memory

references are affine functions of loop induction variables.

A function f(x1, x2, …, xn) is affine if it is in such a form:
f = c0 + c1*x1 + c2*x2 + … + cn*xn, where ci are all constants

Affine Loops

Three spaces
• Iteration space
‣ The set of dynamic execution instances
‣ i.e. the set of value vectors taken by loop indices
‣ A k-dimensional space for a k-level loop nest

• Data space
‣ The set of array elements accessed
‣ An n-dimensional space for an n-dimensional array

• Processor space
‣ The set of processors in the system
‣ In analysis, we may pretend there are unbounded # of virtual processors

Three Spaces

• Iteration space, data space, and processor space

0 1 9 10 11 19 20

0 1 9

… … …

…Iteration space

0 1 9
…Processor space

float Z[100];
for (i=0; i<10; i++) Z[i+10] = Z[i]; Data Space Maximum parallelism: T1 / T∞ Assuming one task is one loop iteration, what is the maximum parallelism? Array Z[ ] i = 0 … 9 Dependence Definition 21 • Both statements (instances) access the same memory locations • One of them is a write • There is a run-time execution path from S1 to S2 Bernstein’s Condition: — There is a data dependence from statement (instance) S1 to statement S2 (instance) if float Z[100]; for (i=0; i<10; i++) Z[i+10] = Z[i]; No dependence across any two loop iterations! Data Dependence Classifications 22 “S2 depends on S1” — (S1 δ S2) True (flow) dependence occurs when S1 writes a memory location that S2 later reads (RAW). Anti dependence occurs when S1 reads a memory location that S2 later writes (WAR). Output dependence occurs when S1 writes a memory location that S2 later writes (WAW). Input dependence occurs when S1 reads a memory location that S2 later reads (RAR). • Examples: Simple Dependence Testing 23 for (i = 1; i <= 100; i++) { S1: A[i] = ... S2: ...= A[i - 1] } float Z[100]; for (i =0; i < 12; i++) { S: Z[ i+10 ] = Z[i]; } 1. Is there dependence? 2. If so, what type of dependence? 3. From which statement (instance) to which statement (instance)? Dependence Testing 24 • Single loop nest with constant lower (LB) and upper (UB) bound, and step 1. Single Induction Variable (SIV) Test for i = LB, UB, 1 ... endfor for i = LB, UB, 1 R1: X(a*i + c1) = ... \\ write R2: ... = X(a*i + c2) ... \\ read   endfor Question: Is there a true dependence between R1 and R2? • Two array references as affine function of loop induction variable Dependence Testing 25 There is a dependence between R1 and R2 iff ∃ i, i′: LB ≤ i ≤ i′ ≤ UB and (a∗i+c1) = (a∗i′+c2) where i and i′ represent two iterations in the iteration space. This means that in both iterations, the same element of array X is accessed. • ∆d is an integer value • UB - LB ≥ ∆d ≥ 0 for i = LB, UB, 1 R1: X(a*i + c1) = ... \\ write R2: ... = X(a*i + c2) ... \\ read   endfor So let’s just solve the equation: (a ∗ i + c1) = (a ∗ i′ + c2) (c1 - c2)/a = i′ − i = ∆d There is a dependence iff • Examples: Simple Dependence Testing 26 for (i = 1; i <= 100; i++) { S1: A[i] = ... S2: ...= A[i - 1] } float Z[100]; for (i =0; i < 12; i++) { S: Z[ i+10 ] = Z[i]; } 1. Is there dependence? 2. If so, what type of dependence? 3. From which statement (instance) to which statement (instance)? • Examples: Simple Dependence Testing 27 for (i = 1; i <= 100; i++) { S1: A[i] = ... S2: ...= A[i - 1] } float Z[100]; for (i =0; i < 12; i++) { S: Z[ i+10 ] = Z[i]; } S1 S2 True Dependence (read after write): Wt: A[i] in S1 → Rd: A[i’-1] in S2 S True Dependence (read after write):   Wt: Z[i+10] in S → Rd: Z[i’] in S i’ = i +1 i’ = i +10 ∆d = 1 ∆d = 10 • More Examples: Simple Dependence Testing 28 1. Is there dependence? 2. If so, what type of dependence? 3. From which statement (instance) to which statement (instance)? for (i = 1; i <= 100; i++) { R1: X(i) = ... R2: ... = X(i + 2)   } for (i = 3; i <= 15, i++) { S1: X(2 * i) = ... S2: ... = X(2 * i - 1)   } • More Examples: Simple Dependence Testing 29 for (i = 1; i <= 100; i++) { R1: X[i] = ... R2: ... = X[i + 2]   } for (i = 3; i <= 15, i++) { S1: X[2 * i] = ... S2: ... = X[2 * i - 1]   } R1 R2 Anti Dependence (write after read):   Rd: X[i+2] in R2 → Wt: X[i’] in R1 S1 S2 No dependence! Next Class 30 Reading: • ALSU, Chapter 11.1 - 11.3

Related Posts