程序代写代做代考 algorithm PowerPoint Presentation

PowerPoint Presentation

Dr Massoud Zolgharni
mzolgharni@lincoln.ac.uk

Room SLB1004, SLB

Dr Grzegorz Cielniak
gcielniak@lincoln.ac.uk

Room INB2221, INB

Week W/C Lecture Workshop

1 23/01 Introduction –

2 30/01 Architectures
Tutorial-1

3 06/02 Patterns 1

4 13/02 Patterns 2
Tutorial-2

5 20/02 Patterns 3

6 27/02 Patterns 4
Tutorial-3

7 06/03 Communication & Synchronisation

8 13/03 Algorithms 1
Assessment

support
9 20/03 Algorithms 2

10 27/03 Algorithms 3

11 03/04 Performance & Optimisation
Tutorial-4 &

Demonstration
12 24/04 Parallel Libraries

13 01/05 –

Data Reorganisation Patterns

Gather

Scatter

values = [ 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 ]
column index = [ 1 1 2 2 3 3 4 4 5 6 7 8 ]
row index = [ 1 6 2 8 1 4 5 7 3 6 2 8 ]

Perform matrix operation (e.g. transpose) only on
non-zero elements

Given a mask, perform the image processing
operation only on foreground pixels

Data movement – major bottleneck of parallel
applications, sometimes more important than
parallel computation

Data-intensive applications: reorganise data first,
then perform computation

Reorganisation does not always imply direct data
reduction (e.g. sort)

Gather and scatter patterns arise from the
combination of random (irregular) read and write,
respectively, with the map pattern

Gather

Scatter

Perform operations on the reduced set

mask (indices)

Input: collection of read locations (addresses or array indices),
source array

Gather reads data from source array at given locations into
output

Output size = index collection size

Applications: sparse matrix operations, ray tracing, volume
rendering, proximity queries, and collision detection

index collectionsource array

output

Given the following locations and source array, what values
should go into the output collection:

3 7 0 1 4 0 0 4 5 3 1 0

0 1 2 3 4 5 6 7 8 9 10 11

3 1 10 8 0

? ? ? ? ?

locations

input

output1 7 1 5 3

Shift – special case of gather

Moves data to left or right in memory

Data accesses are offset by fixed distances (efficient)

Shift requires “out of bounds” data
• different variants based on how boundary conditions are

handled

left right

Stencil – regular data access

For each offset in stencil, generate a new input
vector by shifting input by the offset

Good for one-dimensional stencils

Input: collection of write locations (addresses or array
indices), source array

Scatter writes data into output indicated by the given location
– similar to gather which used the locations for reading

Input size = index collection size

The combination of a map with a random write

• writes to the same location are possible

• parallel writes to the same location are collisions

index collection
source array

output

——————————-
par_for i = 1:N

output[i] = input[index[i]]
——————————-

——————————-
par_for i = 1:N

output[index[i]] = input[i]
——————————-

Gather

Scatter

N – number of indices

Given the following locations and source array, what values
should go into the output collection:

3 7 0 1 4 0 0 4 5 3 1 0

0 1 2 3 4 5 6 7 8 9 10 11

2 4 1 5 5 0 4 2 1 2 1 4

? ? ? ? ? ?

locations

input

output

Given the following locations and source array, what values
should go into the output collection:

3 7 0 1 4 0 0 4 5 3 1 0

0 1 2 3 4 5 6 7 8 9 10 11

2 4 1 5 5 0 4 2 1 2 1 4

0 1 3 0 4

locations

input

output

Race Condition: two (or more) values being written to
the same location in output collection. Result is
undefined – needs rules to be resolved!

Different strategies:

• Atomic scatter

• Permutation scatter

• Merge scatter

• Priority scatter

Atomic

Only one value is written
– non-deterministic
in parallel execution

Useful with Boolean variables

Permutation

 collisions are
not allowed!

output is a permutation of the input

 check for collisions in advance: turn scatter into gather

Examples: FFT scrambling, matrix/image transpose,
unpacking

Merge

Associative and
commutative operators
as merge elements

Commutative properties are required since scatters to a
particular location could occur in any order

Priority

 Input elements are
assigned a priority
based on their position

Histogram – an estimator of data distribution. Uses bins (value
intervals) to count the number of values falling into that
interval.

———————————-
vector A = {1,2,5,4,0,1,…};
vector H(8);//number of bins

//zero all bins
for (uint i = 0; i < H.size(); i++) H[i] = 0; //fill in histogram bins for (uint i = 0; i < A.size(); i++) H[bin_index(A[i])]++; ---------------------------------- bin_index provides a list of write locations – scatter! A = {1,4,3,2,0,2,1,3,2,7,5,6,3,8,1,3}; H = 8 bins int bin_index_1(int value) { return value; } int bin_index_2(int value) { return value/2; } H for bin_index_1 H for bin_index_2 Partial OpenCL implementation – naive! ---------------------------------------- __kernel hist_naive(__global const int* A, __global int* H, int nr_bins) { int id = get_global_id(0); //zero bin counts if (id < nr_bins) hist[id] = 0; barrier(CLK_GLOBAL_MEM_FENCE); H[bin_index(A[id])]++; } ---------------------------------------- What if two threads want to increase the same bin at the same time? lock() unlock() Atomic functions resolve the conflict by serialising accesses to a variable. More in the following lecture on synchronisation techniques OpenCL Merge Scatter with atomic operations ---------------------------------------- __kernel hist_atomic(__global const int* A, __global int* H, int nr_bins) { int id = get_global_id(0); //zero bin counts if (id < nr_bins) hist[id] = 0; barrier(CLK_GLOBAL_MEM_FENCE); atomic_inc(&H[bin_index(A[id])]); } ---------------------------------------- Complexity measures (for N indices) are the same as for the parallel Map patter • work – requires N operations O(n) • span – O(n/p) for p parallel processors In reality, the performance is affected by random nature of memory accesses • Gather: performance lower due to random reads (not so bad) • Scatter: performance lower due to random writes and potential write conflicts (e.g. consider all indices being the same!) Structured parallel programming: patterns for efficient computation • Chapter 6, Data Reorganization

Related Posts