Overview Parallel data reorganisation Collective communication in MPI Summary and next lecture
XJCO3221 Parallel Computation
University of Leeds
Copyright By PowCoder代写 加微信 powcoder
Lecture 10: Parallel data reorganisation
XJCO3221 Parallel Computation
Parallel data reorganisation Previous lectures
Collective communication in MPI This lecture Summary and next lecture
Previous lectures
In the last lecture we saw how to perform explicit point-to-point communication in a distributed memory system:
Send data (i.e. an array or sub-array) from one process to another.
In MPI (where processes are identified by their rank):
Sending process calls MPI Send(). Receiving process calls MPI Recv().
Both blocking calls that do not return until resources can be safely modified.
Can result in deadlock, e.g. cyclic communication pattern.
XJCO3221 Parallel Computation
Parallel data reorganisation Previous lectures Collective communication in MPI This lecture
Summary and next lecture
This lecture
In this lecture we are going to look at one of the key considerations for the performance of distributed memory systems: Data reorganisation
Often necessary for both shared and distributed memory systems.
For distributed systems, data reorganisation can result in a significant parallel overhead.
Improved performance using collective communication routines.
Will go through a worked example of a simple distributed counting algorithm.
XJCO3221 Parallel Computation
Parallel data reorganisation
Collective communication in MPI Summary and next lecture
Data reorganisation on shared and distributed systems
Performance of distributed parallel programs Reducing communication times
Data reorganisation
Many algorithms require some form of large-scale data reorganisation:
Adding or removing items from a container (i.e. vector,
stack, queue etc.) or a database.
Numerical algorithms, i.e. reordering columns and rows in a
Compression (e.g. bzip, gzip etc.) …
XJCO3221 Parallel Computation
Parallel data reorganisation
Collective communication in MPI Summary and next lecture
Data reorganisation on shared and distributed systems
Performance of distributed parallel programs Reducing communication times
Generalised scatter and gather
General scatter:1 oldData
index[0] = 2; index[1] = 0; index[2] = 5; index[3] = 2; index[4] = 1; index[5] = 7;
newData ? In serial code:
for( i=0; i
Note we assume only rank 0 knows the total data size. e.g. if rank 0 had loaded the data from a file.
XJCO3221 Parallel Computation
Overview Parallel data reorganisation Collective communication in MPI Summary and next lecture
Worked example: Distributed count
Broadcasting: MPI Bcast() Scattering: MPI Scatter() Gathering: MPI Gather()
Step 1: Broadcasting: MPI Bcast()
Sending the variable localSize to all processes can be performed using point-to-point communication:
if( rank==0 )
for( p=1; p
The name broadcast is misleading as it suggests only sending is involved, whereas in fact it also includes the receiving.
XJCO3221 Parallel Computation
Overview Parallel data reorganisation Collective communication in MPI Summary and next lecture
Worked example: Distributed count Broadcasting: MPI Bcast() Scattering: MPI Scatter() Gathering: MPI Gather()
Step 2: Scattering: MPI Scatter()
Need to break up an array into equal sized chunks and send one
chunk to each process [cf. vector addition last lecture]:
if( rank==0 )
for( p=1; p