CS代考 XJCO3221 Parallel Computation

Overview Parallel data reorganisation Collective communication in MPI Summary and next lecture
XJCO3221 Parallel Computation

University of Leeds

Lecture 10: Parallel data reorganisation
XJCO3221 Parallel Computation

Parallel data reorganisation Previous lectures
Collective communication in MPI This lecture Summary and next lecture
Previous lectures
In the last lecture we saw how to perform explicit point-to-point communication in a distributed memory system:
Send data (i.e. an array or sub-array) from one process to another.
In MPI (where processes are identified by their rank):
Sending process calls MPI Send(). Receiving process calls MPI Recv().
Both blocking calls that do not return until resources can be safely modified.
Can result in deadlock, e.g. cyclic communication pattern.
XJCO3221 Parallel Computation

Parallel data reorganisation Previous lectures Collective communication in MPI This lecture
Summary and next lecture
This lecture
In this lecture we are going to look at one of the key considerations for the performance of distributed memory systems: Data reorganisation
Often necessary for both shared and distributed memory systems.
For distributed systems, data reorganisation can result in a significant parallel overhead.
Improved performance using collective communication routines.
Will go through a worked example of a simple distributed counting algorithm.
XJCO3221 Parallel Computation

Parallel data reorganisation
Collective communication in MPI Summary and next lecture
Data reorganisation on shared and distributed systems
Performance of distributed parallel programs Reducing communication times
Data reorganisation
Many algorithms require some form of large-scale data reorganisation:
Adding or removing items from a container (i.e. vector,
stack, queue etc.) or a database.
Numerical algorithms, i.e. reordering columns and rows in a
Compression (e.g. bzip, gzip etc.) …
XJCO3221 Parallel Computation

Parallel data reorganisation
Collective communication in MPI Summary and next lecture
Data reorganisation on shared and distributed systems
Performance of distributed parallel programs Reducing communication times
Generalised scatter and gather
General scatter:1 oldData
index[0] = 2; index[1] = 0; index[2] = 5; index[3] = 2; index[4] = 1; index[5] = 7;
newData ? In serial code:
for( i=0; i0 send their counts to rank 0, which determines the total.
Note we assume only rank 0 knows the total data size. e.g. if rank 0 had loaded the data from a file.
XJCO3221 Parallel Computation

Overview Parallel data reorganisation Collective communication in MPI Summary and next lecture
Worked example: Distributed count
Broadcasting: MPI Bcast() Scattering: MPI Scatter() Gathering: MPI Gather()
Step 1: Broadcasting: MPI Bcast()
Sending the variable localSize to all processes can be performed using point-to-point communication:
if( rank==0 )
for( p=1; p0 do not call MPI Bcast() in the example. Rank 0 will wait forever – deadlock.
The name broadcast is misleading as it suggests only sending is involved, whereas in fact it also includes the receiving.
XJCO3221 Parallel Computation

Overview Parallel data reorganisation Collective communication in MPI Summary and next lecture
Worked example: Distributed count Broadcasting: MPI Bcast() Scattering: MPI Scatter() Gathering: MPI Gather()
Step 2: Scattering: MPI Scatter()
Need to break up an array into equal sized chunks and send one
chunk to each process [cf. vector addition last lecture]:
if( rank==0 )
for( p=1; pCS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts