程序代写代做代考 flex Microsoft PowerPoint – mpi-2

Microsoft PowerPoint – mpi-2

High Performance Computing
Course Notes

Message Passing Programming II

Dr Ligang He

2Computer Science, University of Warwick

MPI functions

MPI is a complex system comprising of numerous
functions with various parameters and variants

Six of them are indispensable, but can write a large number
of useful programs already

Other functions add flexibility (datatype),

robustness (non-blocking send/receive), efficiency (ready-

mode communication), modularity (communicators, groups)

or convenience (collective operations, topology).

In the lectures, we are going to cover most commonly

encountered functions

3Computer Science, University of Warwick

Modularity

 MPI supports modular programming via communicators

 Provides information hiding by encapsulating local
communications and having local namespaces for
processes

 All MPI communication operations specify a communicator
(process group that is engaged in the communication)

4Computer Science, University of Warwick

Creating new communicators – Approach 1

MPI_Comm world, workers;

MPI_Group world_group, worker_group;

int ranks[1];

MPI_Init(&argc, &argv);

world=MPI_COMM_WORLD;

MPI_Comm_size(world, &numprocs);

MPI_Comm_rank(world, &myid);

server=numprocs-1;

MPI_Comm_group(world, &world_group);

ranks[0]=server;

MPI_Group_excl(world_group, 1, ranks, &worker_group);
MPI_Comm_create(world, worker_group, &workers);
MPI_Group_free(&world_group);

MPI_Comm_free(&workers);

int MPI_Group_excl (MPI_Group group, int n, int *ranks,
MPI_Group *newgroup)

5Computer Science, University of Warwick

Creating new communicators – functions

int MPI_Comm_group(MPI_Comm comm, MPI_Group
*group)

int MPI_Group_excl(MPI_Group group, int n, int *ranks,
MPI_Group *newgroup)

int MPI_Group_incl(MPI_Group group, int n, int *ranks,
MPI_Group *newgroup)

int MPI_Comm_create(MPI_Comm comm, MPI_Group
group, MPI_Comm *newcomm)

int MPI_Group_free(MPI_Group *group)

int MPI_Comm_free(MPI_Comm *comm)

6Computer Science, University of Warwick

Creating new communicators – Approach 2

MPI_Comm_split (comm, colour, key, newcomm)

Creates one or more new communicators from the original
comm

comm communicator (handle)

colour control of subset assignment (processes
with same colour are in same new
communicator)

key control of rank assignment

newcomm new communicator

Is a collective communication operation (must be executed by all
processes in the comm)

Is used to (re-) allocate processes to communicator (groups)

7Computer Science, University of Warwick

Creating new communicators – Approach 2

MPI_Comm_split (comm, colour, key, newcomm)

The number of communicators created is the same as the
number of different values of colour

 The processes with the same value of colour will be put in
the same communicator

These processes are assigned new ID (starting at zero)
with the order determined by the value of key

8Computer Science, University of Warwick

Creating new communicators – Approach 2

MPI_Comm_split (comm, colour, key, newcomm)

MPI_Comm comm, newcomm; int myid, color;
MPI_Comm_rank(comm, &myid); // id of current process
color = myid%3;

MPI_Comm_split(comm, colour, myid, *newcomm);

9Computer Science, University of Warwick

Communications

Point-to-point communications: involving exact two
processes, one sender and one receiver

For example, MPI_Send() and MPI_Recv()

Collective communications: involving a group of
processes

10Computer Science, University of Warwick

Collective operations

• Coordinated communication operations involving multiple
processes

• Programmer could do this by hand (tedious), MPI

• provides a specialized collective communications

barrier – synchronize all processes

reduction operations – sums, multiplies etc. distributed data

broadcast – sends data from one to all processes

gather – gathers data from all processes to one process

scatter – scatters data from one process to all processes

All executed collectively (by all processes in the group, at the
same time, with the same parameters)

11Computer Science, University of Warwick

Collective operations

MPI_Barrier (comm)

Global synchronization

comm is the communicator handle

No processes return from function until all processes have
called it

Good way of separating one phase from another

12Computer Science, University of Warwick

Barrier synchronizations

Barrier synchronizations

13Computer Science, University of Warwick

Implementation of Barrier

– Depending on the versions of MPI implementations
– A possible implementation is to pass a token message

around processes from process 0 to n-1,

– e.g., When a process calls MPI_Barrier
– If it is Process 0, it sends a token to process 1, and then waits

for the token to be received from process n-1
– For another process i, it waits to receive a token from process i-

1 and then send the token to process i+1

– No process can continue until the token has been passed
back to Process 0.

14Computer Science, University of Warwick

Collective operations

MPI_Bcast (buf, count, type, root, comm)

Broadcast data from root to all processes

buf address of receiver’s buffer or sender’s buffer (root)
count no. of entries in buffer (>=0)
type datatype of buffer elements
root process id of root process
comm communicator

15Computer Science, University of Warwick

Broadcast 100 ints from process 0 to every process
in the group

Example of MPI_Bcast

MPI_Comm comm;
int array[100];
int root = 0;


MPI_Bcast (array, 100, MPI_INT, root, comm);

16Computer Science, University of Warwick

Implementation of MPI_Bcast

Naïve Implementation:

Smarter Implementation:

The number of messages being transported is reduced.

17Computer Science, University of Warwick

Collective operations

MPI_Gather (sendbuf, sendcount, sendtype, recvbuf, recvcount,
recvtype, root, comm)

Collective data movement function
sendbuf address of input buffer
sendcount no. of elements sent from each (>=0)
sendtype datatype of input buffer elements
recvbuf address of output buffer (var param)
recvcount no. of elements received from each
recvtype datatype of output buffer elements
root process id of root process
comm communicator

18Computer Science, University of Warwick

Collective operations

MPI_Gather (sendbuf, sendcount, sendtype, recvbuf, recvcount,
recvtype, root, comm)

Collective data movement function
sendbuf address of send buffer
sendcount no. of elements sent from each (>=0)
sendtype datatype of send buffer elements
recvbuf address of recv buffer (var param) recvcount no. of
elements received from each
recvtype datatype of recv buffer elements
root process id of root process
comm communicator

19Computer Science, University of Warwick

Collective operations

MPI_Gather (sendbuf, sendcount, sendtype, recvbuf, recvcount,
recvtype, root, comm)

Collective data movement function
sendbuf address of send buffer
sendcount no. of elements sent from each (>=0)
sendtype datatype of send buffer elements
recvbuf address of recv buffer (var param)
recvcount no. of elements received from each
recvtype datatype of recv buffer elements
root process id of root process
comm communicator

20Computer Science, University of Warwick

Collective operations

MPI_Gather (sendbuf, sendcount, sendtype, recvbuf, recvcount,
recvtype, root, comm)

Collective data movement function
sendbuf address of send buffer
sendcount no. of elements sent from each (>=0)
sendtype datatype of send buffer elements
recvbuf address of recv buffer (var param)
recvcount no. of elements received from each
recvtype datatype of recv buffer elements
root process id of root process
comm communicator

21Computer Science, University of Warwick

Collective operations

MPI_Gather (sendbuf, sendcount, sendtype, recvbuf, recvcount,
recvtype, root, comm)

Collective data movement function
sendbuf address of send buffer
sendcount no. of elements sent from each (>=0)
sendtype datatype of send buffer elements
recvbuf address of recv buffer (var param) //note the size
recvcount no. of elements received from each
recvtype datatype of recv buffer elements
root process id of root process
comm communicator

22Computer Science, University of Warwick

Collective operations

MPI_Gather (sendbuf, sendcount, sendtype, recvbuf, recvcount,
recvtype, root, comm)

Collective data movement function
sendbuf address of send buffer
sendcount no. of elements sent from each (>=0)
sendtype datatype of send buffer elements
recvbuf address of output buffer (var param)
recvcount no. of elements received from each
recvtype datatype of output buffer elements
root process id of root process
comm communicator

23Computer Science, University of Warwick

MPI_Gather

– Collective: All processes call MPI_Gather at the same time.

MPI_Gather (sendbuf, sendcount, sendtype, recvbuf, recvcount,
recvtype, root, comm)

– Different processes interpret the parameters in different ways

– If the calling process is root, it looks at these parameters

MPI_Gather (sendbuf, sendcount, sendtype, recvbuf, recvcount,
recvtype, root, comm)

– If the calling process is non-root, it looks at these parameters

MPI_Gather (sendbuf, sendcount, sendtype, recvbuf, recvcount,
recvtype, root, comm)

One more thing about root: root moves data from sendbuff to
recvbuff.

24Computer Science, University of Warwick

Gather 100 ints from every process in group to root

MPI_Comm comm;

Int gsize, sendarray[100];

Int root, myrank, *rbuf;

MPI_Comm_rank( comm, myrank ); // find proc. id

If (myrank == root){

MPI_Comm_size(comm, &gsize); // find group size

rbuf = (int *) malloc(gsize*100*sizeof(MPI_INT)); // calc. receive buffer

}

// MPI_Gather is run by all processes at the same time

MPI_Gather(sendarray, 100, MPI_INT, rbuf, 100, MPI_INT, root, comm);

MPI_Gather example

25Computer Science, University of Warwick

How to use MPI_Send and MPI_Recv to achieve the equivalent
outcome as MPI_Gather?

All processes perform

MPI_Send(sendbuf, sendcount, sendtype, root, …),

The root process call MPI_Recv for each process i (i=0, …, n-1)

MPI_Recv(recvbuf + i*recvcount*sizeof(recvtype), recvcount,
recvtype, i, …),

26Computer Science, University of Warwick

MPI_Scatter(sendbuf, sendcount, sendtype, recvbuf, recvcount,
recvtype, root, comm)

Collective data movement function

sendbuf address of send buffer // note buff size
sendcount no. of elements sent to each (>=0)
sendtype datatype of send buffer elements
recvbuf address of recv buffer
recvcount no. of elements received by each
recvtype datatype of recv buffer elements
root process id of root process
comm communicator

Collective operations

A0 A1 A2 A3 A0

A1

A2

A3

One to all
scatter

MPI_SCATTER

data

proc.

27Computer Science, University of Warwick

MPI_Scatter is reverse of MPI_Gather
MPI_Comm comm;
int gsize, *sendbuf;
int root, rbuff[100];

MPI_Comm_rank( comm, myrank); // find proc. id
If (myrank == root) {
MPI_Comm_size (comm, &gsize);
}
sendbuf = (int *) malloc (gsize*100*sizeof(int));
…MPI_Scatter (sendbuf, 100, MPI_INT, rbuf, 100, MPI_INT, root, comm);

It is as if the root sends the data in sendbuf using

MPI_Send(sendbuf+i*sendcount*sizeof(MPI_INT), sendcount,
sendtype, pidi , … )

pidi is the process id of the i-th process

Example of MPI_Scatter

28Computer Science, University of Warwick

MPI_Reduce(sendbuf, recvbuf, count, type, op, root, comm)

root performs op over the data in the sendbuf and put the
result in the recvbuf in root

sendbuf address of send buffer
recvbuf address of recv buffer
count no. of elements in send buffer (>=0)
type datatype of send buffer elements
op operation
root process id of root process
comm communicator

Collective operations

2 4

5 7

0 3

6 2

0 2
Using MPI_MIN
Root = 0

MPI_REDUCE

data

proc.

29Computer Science, University of Warwick

MPI_Allreduce(sendbuf, recvbuf, count, type, op, root, comm)

root performs op over the data in the sendbuf and put the
result in the recvbuf in all processes

sendbuf address of send buffer
recvbuf address of recv buffer
count no. of elements in send buffer (>=0)
type datatype of send buffer elements
op operation
root process id of root process
comm communicator

Collective operations

2 4

5 7

0 3

6 2

0 2

0 2

0 2

0 2

Using MPI_MIN

MPI_ALLREDUCE

data

proc.

30Computer Science, University of Warwick

sendbuf and recvbuf are arrays

recvbuf[0] = sum(proc1.sendbuf[0], proc2.sendbuff[0])

recvbuf[1] = sum(proc1.sendbuf[1], proc2.sendbuff[1])

Operations performed in MPI_Reduce

Proc 0 Proc 1 Proc 2

Sum op

31Computer Science, University of Warwick

 Blocking send

 The sender doesn’t return until the application buffer can be re-used (which
often means that the data have been copied from application buffer to system
buffer) //note: it doesn’t mean that the data will be received

MPI_Send(buf, count, datatype, dest, tag, comm)

Blocking and non-blocking
communications

32Computer Science, University of Warwick

 Application buffer: Specified by the first parameter in
MPI_Send/Recv functions

 System buffer:

o Hidden from the programmer and managed by the MPI library

o is limited and can be easy to exhaust

MPI_Send(buf, count, datatype, dest, tag, comm)

Buffering in MPI communications

33Computer Science, University of Warwick

 Blocking send

 The sender doesn’t return until the application buffer can be re-used (which
often means that the data have been copied from application buffer to system
buffer) //note: it doesn’t mean that the data will be received

MPI_Send(buf, count, datatype, dest, tag, comm)

 Blocking receive:

 The receiver doesn’t return until the data have been ready to use by the
receiver (which often means that the data have been copied from system
buffer to application buffer)

 Non-blocking send/receive

 The calling process returns immediately
 Just request the MPI library to perform the communication; no gaurantee

when this will happen
 Unsafe to modify the application buffer until you can make sure the requested

operation has been performed (MPI provides routines to test this)
 Can be used to overlap computation with communication and have possible

performance gains

Blocking and non-blocking
communications

MPI_Isend(buf, count, datatype, dest, tag, comm, request)

34Computer Science, University of Warwick

 Completion tests come in two types:

 WAIT type

 Test type

 WAIT type: the WAIT type routines block until the
communication has been completed.

 A non-blocking communication immediately followed by a WAIT-
type test is equivalent to the corresponding blocking
communication

 TEST type: these TEST type routines return immediately with
a TRUE or FALSE value

 The process can perform some other tasks if the
communication has not been completed

Testing non-blocking communications

35Computer Science, University of Warwick

The WAIT-type test is:

MPI_Wait(request, status)

This routine blocks until the communication specified by the
request handle has completed. The request handle will have been
returned by an earlier call to a non-blocking communication
routine.

The TEST-type test is:

MPI_Test (request, flag, status)

In this case the communication specified by the handle request is
simply queried to see if the communication has completed and
the result of the query (TRUE or FALSE) is returned into flag.

Testing non-blocking communications
for completion

36Computer Science, University of Warwick

Wait for all communications to complete

MPI_Waitall (count, array_of_requests, array_of_statuses)

This routine blocks until all the communications specified by the
request handles, array_of_requests, have completed. The
statuses of the communications are returned in the array
array_of_statuses and each can be queried in the usual way for
the source and tag if required

Test if all communications have completed

MPI_Testall (count, array_of_requests, flag, array_of_statuses)

If all the communications have completed, flag is set to TRUE,
and information about each of the communications is returned in
array_of_statuses. Otherwise flag is set to FALSE and
array_of_statuses is undefined.

Testing non-blocking communications
for completion

37Computer Science, University of Warwick

Query a number of communications at a time to find out if any of
them have completed

Wait: MPI_Waitany (count, array_of_requests, index, status)

MPI_WAITANY blocks until one or more of the communications
associated with the array of request handles, array_of_requests,
has completed.

The index of the completed communication in the
array_of_requests handles is returned in index, and its status is
returned in status.

Should more than one communication have completed, the
choice of which is returned is arbitrary.

Test: MPI_Testany (count, array_of_requests, index, flag, status)

 The result of the test (TRUE or FALSE) is returned immediately
in flag.

Testing non-blocking communications
for completion