Microsoft PowerPoint – COMP528 HAL13 MPI collective comms – detail.pptx
Dr Michael K Bane, G14, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane/COMP528
COMP528: Multi-core and
Multi-Processor Programming
13 – HAL
NAME
MPI_Send – Performs a blocking sending
SYNOPSIS
int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)
INPUT PARAMETERS
buf – initial address of sending buffer (choice)
count – number of elements in sending buffer (non-negative integer)
datatype
– datatype of each sending buffer element (handle)
dest – rank of destination (integer)
tag – message tag (integer)
comm – communicator (handle)
NAME
MPI_Recv – Blocking receiving for a message
SYNOPSIS
int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm,
MPI_Status *status)
OUTPUT PARAMETERS
buf – initial address of receiving buffer (choice)
status – status object (Status)
INPUT PARAMETERS
count – maximum number of elements in receiving buffer (integer)
datatype – datatype of each receiving buffer element (handle)
source – rank of source (integer)
tag – message tag (integer)
comm – communicator (handle)
Same “message data” syntax:
What to send|recv, how
many, what data type
Require matching “message
envelope” for message to
“complete”
Nothing changes on the Sender
BUT the Receiver gets new data in
“buf” and “status”
Re-visit syntax
NAME
MPI_Send – Performs a blocking sending
SYNOPSIS
int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag,
MPI_Comm comm)
INPUT PARAMETERS
buf – initial address of sending buffer (choice)
count – number of elements in sending buffer (non-negative integer)
datatype
– datatype of each sending buffer element (handle)
dest – rank of destination (integer)
tag – message tag (integer)
comm – communicator (handle)
• MPI_Datatype
• Pre-defined (via
C type MPI_Datatype
int MPI_INT
float MPI_FLOAT
double MPI_DOUBLE
…
Re-visit syntax
NAME
MPI_Send – Performs a blocking sending
SYNOPSIS
int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag,
MPI_Comm comm)
INPUT PARAMETERS
buf – initial address of sending buffer (choice)
count – number of elements in sending buffer (non-negative integer)
datatype
– datatype of each sending buffer element (handle)
…
• “Pass by reference” – buffer will be copied (fits general MPI model)
• Pass the ADDRESS…
• Scalars: &x, &xyzzy – pass the variable’s address (x would be the value of the scalar)
• Vectors: by definition if variable y is an array then “y” is a ptr
• &y is thus address of a pointer: wrong
• &y[0] – pass the address of the variable’s first element to be copied: okay
• y is also okay
Re-visit syntax
NAME
MPI_Send – Performs a blocking sending
SYNOPSIS
int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag,
MPI_Comm comm)
INPUT PARAMETERS
buf – initial address of sending buffer (choice)
count – number of elements in sending buffer (non-negative integer)
datatype
– datatype of each sending buffer element (handle)
dest – rank of destination (integer)
tag – message tag (integer)
comm – communicator (handle)
• “comm” is the variable name of the required “communicator”
• “comm” is of type MPI_Comm (ie defined via
• We use MPI_COMM_WORLD communicator which is defined to be all
processes (via mpirun)
Syntax of Collectives similar to Syntax of Point to Point
NAME
MPI_Bcast – Broadcasts a message from the process with rank “root” to all other processes
of the communicator
SYNOPSIS
int MPI_Bcast( void *buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm )
INPUT/OUTPUT PARAMETER
buffer – starting address of buffer (choice)
INPUT PARAMETERS
count – number of entries in buffer (integer)
datatype – data type of buffer (handle)
root – rank of broadcast root (integer)
comm – communicator (handle)
• How the MPI implementation implements MPI_Bcast may be a series of MPI_Send & MPI_Recv
and if so it has to ensure no possible mix up with user’s point-to-point comms; no need for “dest”
since ALL processes now involved (and “src” can be consider as “root”)
Same “message data” syntax:
What to send|recv, how
many, what data type
“message envelope”
> no tags
(but… “root”)
Upon completion, “buffer” on
each process has copy of “buffer”
values on “root”
Scatter: syntax like send+recv syntax
NAME
MPI_Scatter – Sends data from one process
to all other processes in a communicator
SYNOPSIS
int MPI_Scatter(void *sendbuf, int sendcnt, MPI_Datatype sendtype,
void *recvbuf, int recvcnt, MPI_Datatype recvtype,
int root,
MPI_Comm comm)
INPUT PARAMETERS
sendbuf – address of sending buffer (choice, significant only at root )
sendcount – number of elements sent to each process (integer, significant only at root )
sendtype – data type of sending buffer elements (significant only at root ) (handle)
recvcount – number of elements in receiving buffer (integer)
recvtype – data type of receiving buffer elements (handle)
root – rank of sending process (integer)
comm – communicator (handle)
OUTPUT PARAMETER
recvbuf – address of receiving buffer (choice)
Same “message data” syntax:
What to send, how many,
what data type
“message envelope”
> no tags
(but… “root”)
Upon completion, “recvbuf” on
each process has a share of copy
of “sendbuf” values on “root”
Same “message data” syntax:
What to recv, how many, what
data type
Gather (inverse of scatter function):
syntax like send+recv syntax
NAME
MPI_Gather – Gathers together values
from a group of processes
SYNOPSIS
int MPI_Gather(void *sendbuf, int sendcnt, MPI_Datatype sendtype,
void *recvbuf, int recvcnt, MPI_Datatype recvtype,
int root, MPI_Comm comm)
INPUT PARAMETERS
sendbuf – starting address of sending buffer (choice)
sendcount – number of elements in sending buffer (integer)
sendtype – data type of sending buffer elements (handle)
recvcount – number of elements for any single receive (integer, significant only at root)
recvtype – data type of recv buffer elements (significant only at root) (handle)
root – rank of receiving process (integer)
comm – communicator (handle)
OUTPUT PARAMETER
recvbuf – address of receiving buffer (choice,
significant only at root )
Same “message data” syntax:
What to send, how many,
what data type
“message envelope”
> no tags
(but… “root”)
Upon completion, “recvbuf” on “root” has rank-
ordered collection of each copy of “sendbuf” from
all processes
Same “message data” syntax:
What to recv, how many, what
data type
for (int i=myStartIter; i<=myFinishIter; i++) {
x = a + i*stepsize;
mySum += 0.5*stepsize*(func(x) + func(x+stepsize));
}
Rank
0
Rank
1
Rank
2
0 1
2 3
4
5
So each MPI process sums its 2 trapezoidals in to “mySum”
Need to form globalSum on rank 0
Every process calls the MPI_Reduce collective
for (int i=myStartIter; i<=myFinishIter; i++) {
x = a + i*stepsize;
mySum += 0.5*stepsize*(func(x) + func(x+stepsize));
}
globalSum = 0.0;
MPI_Reduce(&mySum, &globalSum, 1, MPI_FLOAT, MPI_SUM, 0, MPI_COMM_WORLD);
if (myRank==0) {
printf("TOTAL SUM: %f\n", globalSum )
}
Reminder why using collective
• More simple to write
• Clean code is good code
• Expectation that MPI collective is more efficient that user-written functionality
• A good MPI implementation will make use of system knowledge
for (int i=myStartIter; i<=myFinishIter; i++) {
x = a + i*stepsize;
mySum += 0.5*stepsize*(func(x) + func(x+stepsize));
}
/* each pass back sum to root for global sum */
if (myRank != 0) {
MPI_Send(&mySum, 1, MPI_FLOAT, 0, 999, MPI_COMM_WORLD);
}
else { // myRank is 0
float inputSum;
for (int i=1; i
Element-wise Reduction
Rank
0
Rank
1
Rank
2
Rank
3
A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 D1 D2 D3 D4 D5
MPI_Reduce(mySum, globalSum, 3, MPI_FLOAT, MPI_SUM, 0, MPI_COMM)
mySum:
A1 + B1 + C1 +
D1
A2+B2+C2+D2 A3+B3+C3+D3
globalSum: ?? ?? ??
A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 D1 D2 D3 D4 D5
mySum:
MPI_Reduce
NAME
MPI_Reduce – Reduces values on all processes to a single value
SYNOPSIS
int MPI_Reduce(void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype,
MPI_Op op, int root, MPI_Comm comm)
INPUT PARAMETERS
sendbuf – address of sending buffer (choice)
count – number of elements in sending buffer (integer)
datatype – data type of elements of sending buffer (handle)
op – reduce operation (handle)
root – rank of root process (integer)
comm – communicator (handle)
OUTPUT PARAMETER
recvbuf – address of receiving buffer (choice, significant only at root )
• Reduce operation
• Commutative: [x op y = y op x]
• Presumed associative: x op y op z = (x op y) op z = x op (y op z) within
rounding errors
MPI Reduction Operations
• Requirements
• commutative [x op y = y op x]
• Assumed to be associative: x op y op z = (x op y) op z = x op (y op z)
within rounding errors
• Any more potential reduction operators…?
• A number are pre-defined
• Can also define own
• Need to meet commutative & associative rules
YES NO
+ * /
Max Min –
MinLOC, MaxLOC
MPI
+ MPI_SUM
* MPI_PROD
Min MPI_MIN
Max MPI_MAX
MinLOC MPI_MINLOC
MaxLOC MPI_MAXLOC
Questions via MS Teams / email
Dr Michael K Bane, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane