Microsoft PowerPoint – COMP528 HAL12 MPI collectives.pptx
Dr Michael K Bane, G14, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane/COMP528
COMP528: Multi-core and
Multi-Processor Programming
12 – HAL
Overview of
MPI Collective Comms
If you were designing MPI…
• Any “common” patterns?
• Typical code might be
int main(void) {
… declarations
MPI_Init(NULL,NULL);
… set up the problem
… share work
for (t=now; t<=then; t+=deltaT) { … do some of the work … global update write_results_to_file() } … release memory MPI_Finalize(); } COMP528 (c) Michael Bane, University of Liverpool (2019) If you were designing MPI… • Any “common” patterns? • Typical code might be int main(void) { … declarations MPI_Init(NULL,NULL); … set up the problem … share work for (t=now; t<=then; t+=deltaT) { … do some of the work … global update write_results_to_file() } … release memory MPI_Finalize(); } Probably on root process e.g. read from file|stdin COMP528 (c) Michael Bane, University of Liverpool (2019) If you were designing MPI… • Any “common” patterns? • Typical code might be int main(void) { … declarations MPI_Init(NULL,NULL); MPI_Comm_rank(MPI_COMM_WORLD, &myRank); if (myRank==0) { … set up the problem } … share work for (t=now; t<=then; t+=deltaT) { … do some of the work … global update write_results_to_file() } … release memory MPI_Finalize(); e.g. Distribute data to work on i.e. Each rank does parallel work Tell each process how big their part of the problem is (why?) “broadcast” of same data to all Data decomposition: “scatter” input array so each MPI process has some, and that is what they will work on COMP328/COMP528 (c) Michael Bane, University of Liverpool (2019-2020) If you were designing MPI… • Any “common” patterns? • Typical code might be int main(void) { … declarations MPI_Init(NULL,NULL); MPI_Comm_rank(MPI_COMM_WORLD, &myRank); if (myRank==0) { … set up the problem // including problem size } MPI_Bcast(&problem_size, …) MPI_Scatter(&inputArray, …) for (t=now; t<=then; t+=deltaT) { … do some of the work // on my portion of inputArray … global update write_results_to_file() } … release memory COMP328/COMP528 (c) Michael Bane, University of Liverpool (2019-2020) If you were designing MPI… int main(void) { … declarations MPI_Init(NULL,NULL); MPI_Comm_rank(MPI_COMM_WORLD, &myRank); if (myRank==0) { … set up the problem // including problem size } MPI_Bcast(&problem_size, …) MPI_Scatter(&inputArray, …) for (t=now; t<=then; t+=deltaT) { … do some of the work // on my portion of inputArray … global update write_results_to_file() } … release memory MPI_Finalize(); } • xx Might be MPI_Reduce(…) OR updating global matrix by “gathering” local inputArray from each process COMP328/COMP528 (c) Michael Bane, University of Liverpool (2019-2020) If you were designing MPI… int main(void) { … declarations MPI_Init(NULL,NULL); MPI_Comm_rank(MPI_COMM_WORLD, &myRank); if (myRank==0) { … set up the problem // including problem size } MPI_Bcast(&problem_size, …) MPI_Scatter(&inputArray, …) for (t=now; t<=then; t+=deltaT) { … do some of the work // on my portion of inputArray … neighbours updates write_results_to_file() } … release memory MPI_Finalize(); } Only update neighbours - could use many pairs of MPI_Send & MPI_Recv - more likely use MPI_Sendrecv COMP328/COMP528 (c) Michael Bane, University of Liverpool (2019-2020) If you were designing MPI… int main(void) { … declarations MPI_Init(NULL,NULL); MPI_Comm_rank(MPI_COMM_WORLD, &myRank); if (myRank==0) { … set up the problem // including problem size } MPI_Bcast(&problem_size, …) MPI_Scatter(&inputArray, …) for (t=now; t<=then; t+=deltaT) { … do some of the work // on my portion of inputArray … global update write_results_to_file() } … release memory MPI_Finalize(); } Any issues with everybody doing this? Any efficiency issues? COMP328/COMP528 (c) Michael Bane, University of Liverpool (2019-2020) [some] Available MPI COLLECTIVE Functions • MPI_Bcast • Sending same data to all ranks • MPI_Scatter • Distributing data between all ranks • MPI_Gather • Collecting together data from all ranks • MPI_Reduce • Pulling data from all ranks and doing a “reduction operation” (summation, product, etc) MPI Collective Communications • All MPI processes in given communicator have to call the function • MPI_COMM_WORLD is name of communicator refers to all MPI processes, defined at start of program execution • Possible to later divide this up in to new communicators • They will all participate and do something • For some there is a concept of a “root process” • Nearly always, this ‘root process’ is the same rank throughout a program execution • Usually people have rank 0 as the root • We only cover blocking collective communications MPI_Bcast Rank 0 Rank 1 Rank 2 Rank 3 A1 A2 A3 A4 A5 x q T t MPI_Bcast(&x, 3, MPI_FLOAT, 0, MPI_COMM) A1 A2 A3 A4 A5 A1 A2 A3 A1 A2 A3 A1 A2 A3 T t COMP528 (c) Michael Bane, University of Liverpool (2019) MPI_Scatter Rank 0 A B C D E F G H I J K L M N O MPI_Scatter(&global_x, 3, MPI_FLOAT, &local_x, 3, MPI_FLOAT, 0, MPI_COMM) A B C D E F G H I J K L M N O Rank 1 Rank 2 Rank 3 Rank 4 A B C D E F G H I J K L M N O global_x global_x local_x COMP528 (c) Michael Bane, University of Liverpool (2019) Scatter then Gather? GATHER • The “inverse” of Scatter • Comparable to each rank sending its local data, in turn, to “root” that assembles global data in order MPI_Gather(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm) SCATTER • The “inverse” of Gather • Comparable to “root” sending N elements to each rank (including itself) in ascending rank order MPI_Scatter(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm) MPI_Scatter(&global_x, 3, MPI_FLOAT, &local_x, 3, MPI_FLOAT, 0, MPI_COMM) MPI_Scatter Rank 0 A B C D E F G H I J K L M N O Rank 1 Rank 2 Rank 3 Rank 4 global_x local_x Rank 0 A B C D E F G H I J K L M N O COMP528 (c) Michael Bane, University of Liverpool (2019) MPI_Gather Rank 0 A B C D E F G H I J K L M N O MPI_Gather(&local_x, 3, MPI_FLOAT, &global_x, 3, MPI_FLOAT, 0, MPI_COMM) A B C D E F G H I J K L M N O Rank 1 Rank 2 Rank 3 Rank 4 global_x local_x Rank 0 COMP528 (c) Michael Bane, University of Liverpool (2019) Scatter then Gather? GATHER • Send “sendcount” elements of “sendbuf” from all processes (within “comm”) to “root” which receives into “recvbuf” (in expected rank ordering) MPI_Gather(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm) SCATTER • “root” sends “sendcount” elements of “sendbuf” to all process (within “comm”) who receive in their “recvbuf” MPI_Scatter(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm) sendcount & recvcount AND sendtype & recvtype will generally be the same in a given call [some] Available MPI COLLECTIVE Functions • MPI_Bcast • Sending same data to all ranks • MPI_Scatter • Distributing data between all ranks with same size of data chunks • MPI_Gather • Collecting together data from all ranks with same size of data chunks • MPI_Reduce • Pulling data from all ranks and doing a “reduction operation” (summation, product, etc) [some] Available MPI COLLECTIVE Functions • MPI_Bcast • Sending same data to all ranks • MPI_Scatterv • Distributing data between all ranks with optional vary size of data chunks • MPI_Gatherv • Collecting together data from all ranks with optional vary size of data chunks • MPI_Reduce • Pulling data from all ranks and doing a “reduction operation” (summation, product, etc) [some] Available MPI COLLECTIVE Functions • MPI_Bcast • Sending same data to all ranks • MPI_Scatterv • Distributing data between all ranks with optional vary size of data chunks • MPI_Allgather, MPI_Allgatherv • Collecting together data from all ranks with optional vary size of data chunks • MPI_Allreduce • Pulling data from all ranks and doing a “reduction operation” (summation, product, etc) “All” varieties – final result not just on “root” but now on all processes Three types of “Collective” • Collective communication (data movement) • MPI_Bcast, MPI_Scatter, MPI_Gather • Collective synchronisation • Collective computation • MPI_Reduce • Data movement with some math • NEXT TIME • More detail & more collectives Questions via MS Teams / email Dr Michael K Bane, Computer Science, University of Liverpool m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane