Microsoft PowerPoint – COMP528 HAL10 MPI modes.pptx
Dr Michael K Bane, G14, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane/COMP528
COMP528: Multi-core and
Multi-Processor Programming
10 – HAL
DISTRIBUTED MEMORY
• Access memory of another node
• Latency & bandwidth issues
• IB .v. gigE
• Expandable (memory & nodes)
• Programming 99% always MPI
• Message Passing Interface
• Library calls
• Different implementations of MPI standard
• Can use MPI on a shared memory node too
Core 0 Core 1 Point-to-point
communicationh t t p s : / / w w w. m p i – f o r u m . o r g /
COMP328/COMP528 (c) mkbane, university of liverpool
Sending and Receiving
• QUESTION: when is the instruction following a MPI_Send() or an MPI_Recv() executed?
Core 0 Core 1 Core 2 Core 3
MPI_Send(&x, …)
x = f(x,y)
MPI_Recv(&x, …)
x = f(x,y)
• for sender
• can we return from the call immediately
(irrespective of what is happening with ‘x’?)
• for receiver
• can we return from the call immediately
(irrespective of what is happening with ‘x’?)
• Options
• Blocking
• Non-blocking
• These options are key to performance
and (non)-correctnessCOMP328/COMP528 (c) mkbane, university of liverpool
Key MPI terms from
the MPI Standard
COMP328/COMP528 (c) mkbane, university of liverpool
Recall
MPI_Send(&myID, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
Core 0 Core 1
0 1
MPI_Recv(&inputBuffer, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
0
1
COMP328/COMP528 (c) mkbane, university of liverpool
• So, if a send is blocking and a receive is blocking
• what situation is likely to occur?
COMP328/COMP528 (c) mkbane, university of liverpool
• So, if a send is blocking and a receive is blocking
• what situation is likely to occur?
DEADLOCK
• Very bad
•
• Neither the sender nor receiver process will be able to continue
COMP328/COMP528 (c) mkbane, university of liverpool
Solution?
• Use “non-blocking” send
• Use “non-blocking” receive
• Usually, the send-receive isn’t just 2 processes
but generally part of a pattern
such as “everybody send to the next [rank+1] process”
• in which case, we can alternative e.g.
• Even processes post their send and then their recv
• Odd processes post their receive and then their send
COMP328/COMP528 (c) mkbane, university of liverpool
T
I M
E
rank 0
send to rank 1
recv from rank 2
rank 1
recv from rank 0
send to rank 2
rank 2
send to rank 0
recv from rank 1
COMP328/COMP528 (c) mkbane, university of liverpool
T
I M
E
rank 0
send to rank 1
recv from rank 2
rank 1
recv from rank 0
send to rank 2
rank 2
send to rank 0
recv from rank 1
COMP328/COMP528 (c) mkbane, university of liverpool
T
I M
E
rank 0
send to rank 1
recv from rank 2
rank 1
recv from rank 0
send to rank 2
rank 2
send to rank 0
recv from rank 1
COMP328/COMP528 (c) mkbane, university of liverpool
T
I M
E
rank 0
send to rank 1
recv from rank 2
rank 1
recv from rank 0
send to rank 2
rank 2
send to rank 0
recv from rank 1
COMP328/COMP528 (c) mkbane, university of liverpool
T
I M
E
rank 0
send to rank 1
recv from rank 2
rank 1
recv from rank 0
send to rank 2
rank 2
send to rank 0
recv from rank 1
COMP328/COMP528 (c) mkbane, university of liverpool
Solution?
• Use “non-blocking” send
• Use “non-blocking” receive
• Usually, the send-receive isn’t just 2 processes
but generally part of a pattern
such as “everybody send to the next [rank+1] process”
• in which case, we can alternative e.g.
• Even processes post their send and then their recv
• Odd processes post their receive and then their send
• This is a common pattern and MPI supports efficient implementation
by use of an MPI Collective
MPI_Sendrecv
COMP328/COMP528 (c) mkbane, university of liverpool
Solution?
• Use “non-blocking” send
• Use “non-blocking” receive
• Usually, the send-receive isn’t just 2 processes
but generally part of a pattern
such as “everybody send to the next [rank+1] process”
• in which case, we can alternative e.g.
• Even processes post their send and then their recv
• Odd processes post their receive and then their send
• This is a common pattern and MPI supports efficient implementation
by use of an MPI Collective
MPI_Sendrecv
C
O
M
P
32
8
/C
O
M
P
52
8
(
c)
m
kb
an
e,
u
ni
ve
rs
it
y
o
f
liv
er
p
o
o
l
Solution?
• Use “non-blocking” send
• Use “non-blocking” receive
• Usually, the send-receive isn’t just 2 processes
but generally part of a pattern
such as “everybody send to the next [rank+1] process”
• in which case, we can alternative e.g.
• Even processes post their send and then their recv
• Odd processes post their receive and then their send
• This is a common pattern and MPI supports efficient implementation
by use of an MPI Collective
MPI_Sendrecv
C
O
M
P
328
/C
O
M
P
528
(c) m
kb
ane, university o
f liverp
o
o
l
Non-blocking communications
• Where the executing code initiates the
MPI_X call
and then immediately moves to execute
the following statements
even if the MPI_X call has not completed
• e.g. send may still be sending
• e.g. receive may still be receiving
COMP328/COMP528 (c) mkbane, university of liverpool
NON blocking
• A good thing, yes?
• Means that control immediately returns to the C program
• Higher efficiency than waiting
• BUT…?
COMP328/COMP528 (c) mkbane, university of liverpool
• Non-blocking send
double myX[1000000]; // big
MPI_Isend(myX, …)
myX[k] = -myX[k]
• Non-blocking recv
double myX[1000000], newX[1000000];
MPI_Irecv(newX, …)
for (i=0; i<1000000; i++) {
myX[i] = f(newX[i])
}
• Non-blocking removes the wait / ‘synchronisation’
• for sender: would we update values of ‘x’ before the MPI has finished
with x…
• for receiver: would we read and use ‘x’ before a matching message has
been received (or fully received)…
COMP328/COMP528 (c) mkbane, university of liverpool
• Blocking & Non-blocking
• Blocking: waits for command before returning to code & executing next statement
• Non-blocking: starts something in background and quickly returns to code & executes
next statement
Default Non-blocking Synchronous (user)
Buffered
Others…
MPI_Send MPI_Isend MPI_Ssend MPI_Bsend
MPI_Recv MPI_Irecv
Blocking (but
some
implementation
dependencies)
I for
“immediate”
Completes
only once msg
received
Wait or Test
MPI_Send (buf, count, datatype, dest, tag, comm);
MPI_Isend(buf, count, datatype, dest, tag, comm, MPI_Request *request);
MPI_Recv (buf, count, datatype, source, tag, comm, MPI_Status *status);
MPI_Irecv(buf, count, datatype, source, tag, comm, MPI_Request *request);
• Each of the “immediate” (non-blocking) calls
has a “MPI_Request” comms handle
• This is then used by MPI_Wait || MPI_Waitall || MPI_Test
/* simple example for COMP528, (c) University of Liverpool */
#include
#include
int main(void) {
const int vol=500000;
int myID, comm_sz;
double out[vol], in[vol];
int sendTo, recvFrom;
MPI_Request request;
MPI_Init(NULL,NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &myID);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
if (myID>1) {
// no op
}
else {
if (myID==0) {
// sender
MPI_Isend(&out[0], vol, MPI_DOUBLE, 1, 99, MPI_COMM_WORLD, &request);
// now wait on request
MPI_Wait(&request, MPI_STATUS_IGNORE);
out[vol-1] = 0.0; // example operation
printf(“all sent by %d\n”, myID);
}
else {
// ID=1: recv-er
MPI_Irecv(&in[0], vol, MPI_DOUBLE, 0, 99, MPI_COMM_WORLD, &request);
MPI_Wait(&request, MPI_STATUS_IGNORE);
out[vol-1] = in[vol-1]; // example operation
printf(“all recv-ed by %d\n”, myID);
}
}
MPI_Finalize();
}
COMP328/COMP528 (c) mkbane, university of liverpool
Block or not to Block
• What about MPI_Send(…) ?
• It is known as a blocking send
• Call only returns once x has been copied to a buffer
• BUT which buffer? An internal MPI buffer or the receiver buffer…
• implementation dependent!
• Generally,
• Small messages will get copied to internal buffer: decouples
send & recv
• Large messages will only get copied to the receiver buffer
(to save overhead of setting up large system buffer)
• and this then requires a matching MPI_Recv in order for the MPI_Send to
complete and then be able to move on to the next statement
• Therefore:
Always treat MPI_Send as blocking (waiting on a matching Recv)
Core 0 Core 1 Core 2 Core 3
MPI_Send(&x, …)
x = f(x,y)
MPI_Recv(&x, …)
x = f(x,y)
COMP328/COMP528 (c) mkbane, university of liverpool
MPI Labs
• Compile & run an MPI code DONE
• See the ‘local’ nature of vars DONE
• MPI point-to-point comms DONE / more to come!
• Weeks 5&6 labs
• Point-to-point comms for all processes to pass on some data
• And avoiding deadlock
• MPI_Sendrecv collective
• example of effect of #data items on performance
• MPI wall-clock timer
• Use of some useful MPI collectives
COMP328/COMP528 (c) mkbane, university of liverpool
MPI_Status
• Recall
MPI_Send (*buf, count, datatype, dest, tag, comm);
MPI_Recv (*buf, maxCount, datatype, source, tag, comm, MPI_Status
*status);
• What is this for?
• Contains info on what is actually received
• Actual #items in buf (via MPI_Get_count)
• Actual source & actual tag (via members of status var returned)
• A receive can use “MPI_ANY_TAG” and “MPI_ANY_SOURCE” as wildcard matches
if (myID%2==0) {
MPI_Send(&myID, 1, MPI_INT, sendTo, 0, MPI_COMM_WORLD);
MPI_Recv(&inputBuffer, 1, MPI_INT, recvFrom, 0, MPI_COMM_WORLD, &stat);
MPI_Get_count(&stat, MPI_INT, &numRecv);
printf(“Even %d has received %d of MPI_INT from %d using tag %d\n”, myID, numRecv, stat.MPI_SOURCE, stat.MPI_TAG);
}
COMP328/COMP528 (c) mkbane, university of liverpool
Questions via MS Teams / email
Dr Michael K Bane, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane