CS代写 XJCO3221 Parallel Computation

Overview Non-blocking and asynchronous communication Application: Domain partitioning Summary and next lecture
XJCO3221 Parallel Computation

University of Leeds

Copyright By PowCoder代写 加微信 powcoder

Lecture 12: Non-blocking communication
XJCO3221 Parallel Computation

Non-blocking and asynchronous communication Previous lectures
Application: Domain partitioning Today’s lecture Summary and next lecture
Previous lectures
So far we have only considered blocking communication in distributed memory systems:
Do not return until it is safe to use the resources, i.e. the memory allocated for the data.
This may happen after the data is copied to a buffer.
If the buffer is too small, will wait until it is being received.
Point-to-point communication: MPI Send(), MPI Recv().
Collective communication: MPI Bcast(), MPI Gather(), MPI Scatter(), MPI Reduce().
XJCO3221 Parallel Computation

Non-blocking and asynchronous communication Previous lectures Application: Domain partitioning Today’s lecture
Summary and next lecture
Today’s lecture
Today we will look at non-blocking communication: Non-blocking calls return ‘immediately’.
Require extra coding to determine when it is safe to re-use the resources.
Can overlap communication with computation to improve performance: Latency hiding.
Useful in situations that require domain partitioning.
Briefly look at stencils, a graphical representation of calculation locality.
XJCO3221 Parallel Computation

Non-blocking and asynchronous communication
Application: Domain partitioning Summary and next lecture
Blocking and synchronous communication
Non-blocking communication in MPI Performance benefits and latency hiding
Blocking communication
Definition
A communication is blocking if return of control to the calling process only occurs after all resources are safe to re-use.
In MPI, resources primarily refers to the memory allocated for the message, such as the pointer data in this MPI Send() example:
1 MPI_Send( data, size, MPI_INT, … );
Note this only refers to the viewpoint of the calling process; the
receiving process is not mentioned. and
XJCO3221 Parallel Computation

Non-blocking and asynchronous communication
Application: Domain partitioning Summary and next lecture
Blocking and synchronous communication
Non-blocking communication in MPI Performance benefits and latency hiding
Synchronous communication
Definition
Communication is synchronous if the operation does not complete before both processes have started their communication operation.
On return, all resources can be re-used, and we know the destination process has started receiving1.
For instance, a blocking call may return once the data has been copied to the buffer, before it has even been sent to the network (and is therefore not synchronised with the receiver).
1MPI supports synchronised communication with MPI Ssend(). A common use is debugging: If replacing MPI Send() with MPI Ssend() results in deadlock, the original code would have deadlocked when the data exceeded the buffer.
XJCO3221 Parallel Computation

Non-blocking and asynchronous communication
Application: Domain partitioning Summary and next lecture
Blocking and synchronous communication
Non-blocking communication in MPI Performance benefits and latency hiding
Non-blocking and asynchronous communication
Definition
A non-blocking operation may return before it is safe to re-use the resources. In particular, changing data after returning may change the data being sent.
Essentially, such calls only start the communication.
Definition
Asynchronous communication does not require any co-operation between the sender(s) and the receiver(s).
e.g. a send that doesn’t expect a corresponding receive. and
XJCO3221 Parallel Computation

Non-blocking and asynchronous communication
Application: Domain partitioning Summary and next lecture
Blocking and synchronous communication
Non-blocking communication in MPI Performance benefits and latency hiding
Blocking ̸= synchronous
Sometimes the terms blocking and synchronous, and non-blocking and asynchronous, are used interchangeably.
Blocking can act as a form of synchronisation.
e.g. MPI Recv() will not return until the data has been
However, the distinction is more subtle:
Blocking and non-blocking refer to a single process’s view, i.e. ‘what the programmer needs to know.’
Synchronous and asynchronous refer to a more global view involving at least two processes.
XJCO3221 Parallel Computation

Non-blocking and asynchronous communication
Application: Domain partitioning Summary and next lecture
Blocking and synchronous communication
Non-blocking communication in MPI
Performance benefits and latency hiding
Non-blocking communication in MPI
The key routines are:
MPI Isend() : Start a non-blocking send.
MPI Irecv() : Start a non-blocking receive.
MPI Wait() : Will not return until the communication is complete. MPI Test() : Test to see if the communication is complete (but
return immediately).
The ‘I’ in MPI Isend() and MPI Irecv() stands for immediate, because they return (almost) immediately.
There are other routines, including non-blocking collective communication in MPI v3 [MPI Ibcast(), . . . ], but these will not be covered here.
XJCO3221 Parallel Computation

Non-blocking and asynchronous communication
Application: Domain partitioning Summary and next lecture
Blocking and synchronous communication
Non-blocking communication in MPI
Performance benefits and latency hiding
MPI Request
To link each MPI Isend() or MPI Irecv() with its corresponding MPI Wait() or MPI Test(), MPI uses requests:
1 2 3 4 5 6 7 8 9
10 11 12 13 14
MPI_Request request; MPI_Status status;
// Start the communication.
MPI_Isend( data, size, …, &request );
// Do other things not involving ‘data’.
// Wait until the communication is complete.
// (Can replace &status with MPI_STATUS_IGNORE.) MPI_Wait( &request , &status );
// Can now safely re-use ‘data’.
XJCO3221 Parallel Computation

Non-blocking and asynchronous communication
Application: Domain partitioning Summary and next lecture
Blocking and synchronous communication Non-blocking communication in MPI Performance benefits and latency hiding
Why use non-blocking communication?
Since a non-blocking communication call returns immediately, we can perform other useful calculations while the communication is going on – as long as they do not involve the resources.
So rather than performing calculations and communications sequentially, some may be performed concurrently.
Reduces total runtime, improving performance.
Latency hiding
The primary reason to use non-blocking communication is to overlap communication with computation or other communications. This is known as latency hiding.
and XJCO3221 Parallel Computation

Non-blocking and asynchronous communication
Application: Domain partitioning Summary and next lecture
Blocking and synchronous communication Non-blocking communication in MPI Performance benefits and latency hiding
Schematic (sending)
Blocking MPI_Send():
call MPI_Send() MPI_Send() returns
Non-blocking MPI_Isend():
MPI_Isend() MPI_Isend() returns Send complete
Perform useful calculations
XJCO3221 Parallel Computation

Non-blocking and asynchronous communication
Application: Domain partitioning Summary and next lecture
Blocking and synchronous communication Non-blocking communication in MPI Performance benefits and latency hiding
Testing for completion or lock availability
In Lecture 7 we saw how locks can be used to synchronise threads in a shared memory system:
regionLock.lock();
// Does not return until lock acquired
The lock() method is blocking – it does not return until the lock is available.
Using test() could allow useful calculations to be performed while waiting for the lock to become available:
while( !regionLock.test() )
{ … /* Do as many calculations as possible */ }
The MPI function MPI Test() performs a similar role for non-blocking communication.
XJCO3221 Parallel Computation

Overview Non-blocking and asynchronous communication Application: Domain partitioning Summary and next lecture
Domain partitioning
Ghost cells
Implementations with computation and communication
Potential applications of non-blocking communucation
Many applications require large data sets to be modified according to some rules. Examples include1:
Signal processing: (1D data sets) Frequency analysis, noise filtering, . . .
Image processing: (2D data sets)
Colour filtering, blurring, edge detection, . . .
Scientific and engineering modelling: (1D, 2D or 3D)
Fluid dynamics, elasticity/mechanics, weather forecasting, . . .
1Wilkinson and Allen, Parallel programming (Pearson, 2005). XJCO3221 Parallel Computation

Overview Non-blocking and asynchronous communication Application: Domain partitioning Summary and next lecture
Domain partitioning
Ghost cells
Implementations with computation and communication
Domain partitioning
The standard way to parallelise such problems with distributed memory is to partition the domain between the processes, i.e.
Segments of a time series. Regions of an image [next slide]. …
Domain partitioning
Each processing unit is responsible for transforming one partition. If the transformation only depends on each data point in isolation,
this is a map; also an embarrassingly parallel problem. and XJCO3221 Parallel Computation

Overview Non-blocking and asynchronous communication Application: Domain partitioning Summary and next lecture
Domain partitioning
Ghost cells
Implementations with computation and communication
Map example: Colour transformation
XJCO3221 Parallel Computation

Overview Non-blocking and asynchronous communication Application: Domain partitioning Summary and next lecture
Domain partitioning
Ghost cells
Implementations with computation and communication
Local transformations
More commonly, however, the transformation depends on nearby information.
Blurring or edge detection in image processing.
Most scientific and engineering applications solve equations
with gradient terms (i.e. changes in quantities). rank 0 rank 1
rank 2 rank 3
Need to communicate information lying at the edges of domains to perform the calculations correctly.
and XJCO3221 Parallel Computation

Overview Non-blocking and asynchronous communication Application: Domain partitioning Summary and next lecture
Domain partitioning
Ghost cells
Implementations with computation and communication
A stencil is a graphical representation of where the required data exists relative to point being calculated.
This common stencil arises in many scientific applications:
Simulation grid: Stencil:
data point
red cell calculation requires values of grey cells
XJCO3221 Parallel Computation

Overview Non-blocking and asynchronous communication Application: Domain partitioning Summary and next lecture
Domain partitioning
Ghost cells
Implementations with computation and communication
Ghost cells
The standard way to communicate across boundaries is to use ghost cells, sometimes known as a halo.
Layer(s) of data points around each process’s domain.
Contains read-only copy of corresponding points from neighbouring processes’ domain.
Updated after each iteration to match the values calculated by the neighbouring processes.
Updating performed using point-to-point communication.
XJCO3221 Parallel Computation

Overview Non-blocking and asynchronous communication Application: Domain partitioning Summary and next lecture
Domain partitioning
Ghost cells
Implementations with computation and communication
Conceptual simulation domains:
Implemented simulation domains:
edge calculation
rank 1 ghost cells
interior calculation
from rank 1 domain to rank 2 ghost cells
from rank 2 domain to rank 1 ghost cells
rank 2 ghost cells
XJCO3221 Parallel Computation

Overview Non-blocking and asynchronous communication Application: Domain partitioning Summary and next lecture
Domain partitioning
Ghost cells
Implementations with computation and communication
Implementation v1
Code on Minerva: heatEqn.c
In pseudocode the obvious implementation would be:
1 2 3 4 5 6 7 8 9
// Iterate multiple times.
for(iter=0;iterCS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com