程序代写 CSC 367 Parallel Programming

CSC 367 Parallel Programming

The Message Passing Paradigm MPI
University of Toronto Mississauga, Department of Mathematical and Computational Sciences

Message passing paradigm
• Key principles:
• Partitionedaddressspace(aimedat”shared-nothing”infrastructures) • Requiresexplicitparallelization(moreprogrammingeffort)
• Canachievegreatscalabilityifdoneright
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 2

Message passing paradigm
• Key principles:
• Partitionedaddressspace(aimedat”shared-nothing”infrastructures) • Requiresexplicitparallelization(moreprogrammingeffort)
• Canachievegreatscalabilityifdoneright • Logicalview:
Execution hosts (processes)
What are the implications? What advantages aside from scaling?
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 3
Submission host (process)

Message passing model
• SingleProgramMultipleData(SPMD)
• Allprocessesexecutesamecode,communicateviamessages
• Technically,itdoessupportexecutingdifferentprogramsoneach host/process
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 4

Why message passing?
• Heterogeneousenvironments,computationsthatneedmassivescaling
• Build a parallel multi-computer from lots and lots of cheap nodes • Nodesareconnectedwithpoint-to-pointlinks
• Partitionthedataacrossthenodes,computationinparallel
• Iflocaldataisneededonremotenode,senditovertheinterconnect
• Computationisdonecollectively
• Can always add more and more nodes => bottleneck is not the number of cores or memory on a single node
• Scale out instead of scale up
• Downside:hardertoprogram
• Everycommunicationhastobehand-codedbythedeveloper
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 5

Message passing – logical & physical view
• Logicalview:Eachprocessisaseparateentitywithitsownmemory • Eachprocessrunsaseparateinstanceofthesame(parallel)program
• Aprocesscannotaccessanotherprocess’smemoryexceptviaexplicitmessages
• Recalldistinctionbetweensharedaddressspaceandmessagepassing
• Keepinmindsharedaddressspace!=sharedmemory • Message passing implies a distributed address space
• Physical view: the underlying architecture could be either distributed memory or shared memory
• Messagepassingonaphysicalsharedmemoryarchitecture:messagescanbesimulated by copying (or mapping) data between different processes’ memory space
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 6

Important observations
• Lettheprocessescomputeindependentlyandcoordinateonlyrarelyto exchange information
• Communicationaddsoverheads,soitshouldbedonerarely
• Needtostructuretheprogramdifferently,toincorporatethemessagepassing operations for exchanging data
• Needtopartitiondatasuchthatwemaximizelocalityandminimizetransfers • Recalltheconceptsdiscussedafewlecturesago
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 7

msg = 5; send(&msg, 1, 1); msg = 200;
recv(&msg, 1, 0); printf(“%d”, msg);
• WhatwillP1receive?
Building Blocks
• Interactionsarecarriedoutviapassingmessagesbetweenprocesses
• Buildingblocks:sendandreceiveprimitives–generalform:
• send(void *sendbuf, int nelems, int dest)
• receive(void *recvbuf, int nelems, int source)
• Complexity lies in how the operations are carried out internally
• Example(pseudocode):
• Send operation may be implemented to return before the receipt is confirmed
• Supportingthiskindofsendisnotabadidea–why?
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 8

• Two possibilities:
• Blocking non-buffered send/receive • Blockingbufferedsend/receive
Blocking operations
• Only return from an operation once it’s safe to do so
• Not necessarily when the msg has been received, just guarantee semantics
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 9

Blocking non-buffered send/recv
• Send operation does not return until matching receive is encountered at the
receiver and communication operation is completed • Non-bufferedhandshakeprotocol–idlingoverheads:
RTS = request to send OTS=oktosend
Sender send
Receiver Sender
OTS recv send
1. Sender is first; idling at sender 2. Same time; idling minimized 3.Receiver is first; idling at recv
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 10

Deadlocks in blocking non-buffered comm
• Deadlockscanoccurwithcertainorderingsofoperations,duetoblocking • Example–canyouseethedeadlock?Howcanwefixit?
send(&m1, 1, 1); recv(&m2, 1, 1);
send(&m1, 1, 0); recv(&m2, 1, 0);
• Solution:switchorderinoneoftheprocesses
• But, more difficult to write code this way, and could create bugs
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 11

Sender send CTB
Sender send
CTB = Copy to buffer CFB = Copy from buf
Blocking buffered send/recv
• Sendercopiesdataintobufferandreturnsoncethecopytobufferiscompleted • Receivermustalsostorethedataintoabufferuntilitreachesthematchingrecv • Bufferedtransferprotocol–withorwithouthardwaresupport:
1. Use buffer at both sender and receiver Communication handled by H/W
2. Buffer only on one side. E.g., sender interrupts receiver and deposits the data in a buffer (or vice-versa)
But, now buffer management
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 12
In both cases,
no idling overheads!
overheads!

Problems with blocking buffered comm
• 1.Potentialproblemswithfinitebuffers • Example–seetheproblem?
P0 (producer)
for(i = 0; i < 1000000; i++){ P1 (consumer) for(i = 0; i < 1000000; i++){ create_message(&m); recv(&m, 1, 0); digest_message(&m); send(&m, 1, 1); }} • 2. Deadlocks still possible • Example–seetheproblem? recv(&m1, 1, 1); send(&m2, 1, 1); recv(&m1, 1, 0); send(&m2, 1, 0); • Solution is similar: break circular waits • Unlike previously, in this protocol, deadlocks can only be caused by waits on recv University of Toronto Mississauga, Department of Mathematical and Computational Sciences 13 Blocking Operations Sending process returns after data has been copied into communication buffer. Sending process blocks until matching receive operation has been encountered. send and recv semantics ensured by corresponding operation Non-blocking Operations Sending process returns after initiating DMA transfer to buffer. This operation may not be completed on return. Programmer must explictly ensure semantics by polling to verify completion Non-blocking operations • Why?Performance! • User is responsible to ensure that data is not changed until it's safe • Typicallyacheck-statusoperationindicatesifcorrectnesscouldbeviolatedbya previous transfer which is still in flight • Canalsobebufferedornon-buffered • Canbeimplementedwithorwithouthardwaresupport Non-Buffered University of Toronto Mississauga, Department of Mathematical and Computational Sciences 14 When the receive is encountered, communication is initiated Unsafe to update data Unsafe to update data Advantage: very little communication overhead, or completely masked! Example: non-blocking non-buffered • Senderissuesarequesttosendandreturnsimmediately RTS=requesttosend OTS=oktosend Without hardware support With hardware support beingsent OTS data recv beingsent OTS data 1. When recv is encountered, transfer is handled by interrupting the sender 2. When recv is found, comm. hardware handles the transfer and receiver can continue doing other work University of Toronto Mississauga, Department of Mathematical and Computational Sciences 15 Unsafe to update data being received Take-aways • Carefullyconsidertheimplementationguarantees • Communication protocol and hardware support • Blocking vs. non-blocking, buffered vs. non-buffered • Tradeoffsintermsofcorrectnessandperformance • Need automatic correctness guarantees => might not hide communication overhead that well
• Need performance => user is responsible for correctness via polling
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 16

The Message Passing Interface (MPI)
University of Toronto Mississauga, Department of Mathematical and Computational Sciences

MPI standard
• Standardlibraryformessagepassing
• Writeportablemessagepassingalgorithms,mostlyusingCorFortran
• RichAPI(over100routines,butonlyahandfularefundamental) • MustinstallOpenMPIorMPICH2(labsalreadyhave)
• Includempi.hheader
• Exampleruncommand:mpirun-np8./myapparg1arg2
• Basicroutines:
• MPI_Init:initializeMPIenvironment
• MPI_Finalize:terminatetheMPIenvironment
• MPI_Comm_size: get number of processes
• MPI_Comm_rank:gettheprocessIDofthecaller
• MPI_Send:sendmessage
• MPI_Recv:receivemessage
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 18

• int MPI_init(int *argc, char ***argv);
• int MPI_Finalize();
• Onsuccess=>MPI_SUCCESS,otherwiseerrorcode
• NoMPIcallsallowedafterthis,notevenanewMPI_Init!
MPI basics
• MPI_Init:onlycalledonceatstartbyonethread,toinitializetheMPIenvironment
• ExtractsandremovestheMPIpartsofthecommandline(e.g.,mpirun–np8)fromargv • Processyourapplication’scommandlineargumentsonlyaftertheMPI_Init
• Onsuccess=>MPI_SUCCESS,otherwiseerrorcode
• MPI_Finalize:calledattheend,todocleanupandterminatetheMPIenvironment
• These calls are made by all participating processes, otherwise results in
undefined behaviour
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 19

MPI Communication domains
• MPI communication domain = set of processes which are allowed to communicate with each other
• Communicators (MPI_comm variables) Store info about communication domains
• Commoncase:allprocessesneedtocommunicatetoallotherprocesses
• Defaultcommunicator:MPI_COMM_WORLDincludesallprocesses
• Inspecialcases,wemaywanttoperformtasksinseparate(oroverlapping)
groups of processes => define custom communicators
• Nomessagesforagivengroupwillbereceivedbyprocessesinothergroups
• Communicator size and id of current process can be retrieved with:
• int MPI_Comm_size(MPI_Comm comm, int *size); • int MPI_Comm_rank(MPI_Comm comm, int *rank);
• Theprocesscallingtheseroutinesmustbeinthecommunicatorcomm
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 20

• Hello world:
• mpirun –np 8 ./hello_world
Some basic examples
• Use mpicc to compile MPI programs
• Any program can be run using mpirun, even non-MPI ones: • mpirun–np8–hostfilehostfile.txtecho”Helloworld”
• mpirun –np 8 ls
• hostfile(optional):specifyonehostperline,buteachhostmayuseseveralprocessors
• InOpenMPI,e.g.localnodeonly: • localhostslots=8
• InMPICH2,e.g.localnodeonly: • localhost:8
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 21

• CanuseMPI_Wtime() • Example:
Timing measurements
double t1, t2;
t1 = MPI_Wtime();
t2 = MPI_Wtime();
printf(“Elapsed time: %f\n”, t2 – t1);
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 22

MPI_UNSIGNED_CHAR
MPI_UNSIGNED_SHORT
MPI_UNSIGNED
MPI_UNSIGNED_LONG
MPI_DOUBLE
MPI_PACKED
signed char
signed short int
signed int
signed long int
unsigned char
unsigned short int
unsigned int
unsigned long int
MPI data types
• Equivalenttobuilt-inCtypes,exceptforMPI_BYTEandMPI_PACKED
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 23

Sending and receiving messages
int MPI_Send(void *buf, int count, MPI_Datatype datatype,
int dest, int tag, MPI_Comm comm);
int MPI_Recv(void *buf, int count, MPI_Datatype datatype,
int source, int tag, MPI_Comm comm, MPI_Status *status);
• Each message has a tag associated to distinguish it from other messages
• ThesourceandtagcanbeMPI_ANY_SOURCE/MPI_ANY_TAG
• The status can be used to get info about the MPI_recv operation:
typedef struct MPI_Status {
int MPI_SOURCE; // source of the received message
int MPI_TAG; // tag of the received message
int MPI_ERROR; // a potential error code
• Thelengthofthereceivedmessagecanberetrievedusing:
int MPI_Get_count(MPI_Status *status, MPI_Datatype datatype, int *count)
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 24

Blocking vs non-blocking
• MPI_Recv is blocking
• Itreturnsonlyaftermessageisreceivedandcopiedintothebuffer! • Buffer can be safely sreused right after MPI_Recv
• MPI_Sendcanbeimplementedeitherblockingornon-blocking
• Blocking:returnsonlyafterthematchingMPI_Recvisexecutedandthe message was sent
• Non-blocking: copy msg into buf and returns, without waiting for MPI_Recv
• Inboth,thebuffercanbesafelyreusedrightafterMPI_Send
• DependingontheMPI_Sendimplementation,theprogramsemanticsmight need to be carefully analyzed for potential problems!
• Fornow,thinkofMPI_Sendasblocking
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 25

Deadlock avoidance
• Restrictions on MPI_Send/MPI_Recv, in order to avoid deadlocks
• Example:behaviourisMPI_Sendimplementation-dependent
int a[20], b[20], myrank;
MPI_Status status;
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
if (myrank == 0) {
MPI_Send(a, 20, MPI_INT, 1, 1, MPI_COMM_WORLD);
MPI_Send(b, 20, MPI_INT, 1, 2, MPI_COMM_WORLD); }
else if (myrank == 1) {
MPI_Recv(b, 20, MPI_INT, 0, 2, MPI_COMM_WORLD, &status); MPI_Recv(a, 20, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
• What’stheprobleminthiscode?
• Fixbymatchingtheorderofthesendandrecvoperations
• We want to write “safe” programs which are not implementation dependent! University of Toronto Mississauga, Department of Mathematical and Computational Sciences 26

Deadlock avoidance
• Anotherexample:Circularchainofsend/recvoperations–whatcanhappen?
int a[20], b[20], myrank, np;
MPI_Status status;
MPI_Comm_size(MPI_COMM_WORLD, &np);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Send(a,20, MPI_INT, (myrank+1)%np, 1, MPI_COMM_WORLD); MPI_Recv(b,20, MPI_INT, (myrank-1+npes)%np, 1, MPI_COMM_WORLD,&status);
• Worksfineifsendisbufferednon-blocking,butdeadlocksifsendisblocking • Must rewrite the code to make it safe:
int a[20], b[20], myrank, np;
MPI_Status status;
MPI_Comm_size(MPI_COMM_WORLD, &np);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
if (myrank % 2 == 1) {
MPI_Send(a,20, MPI_INT, (myrank+1)%np, 1, MPI_COMM_WORLD);
MPI_Recv(b,20, MPI_INT, (myrank-1+npes)%np, 1, MPI_COMM_WORLD,&status); }
else if (myrank % 2 == 0) {
MPI_Recv(b,20, MPI_INT, (myrank-1+npes)%np, 1, MPI_COMM_WORLD,&status); MPI_Send(a,20, MPI_INT, (myrank+1)%np, 1, MPI_COMM_WORLD);
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 27

Previous example is a common pattern
=> Combine the MPI_Send and MPI_Recv primitives into MPI_Sendrecv
Send/recv simultaneously
int MPI_Sendrecv(void *sendbuf, int sendcount, MPI_Datatype sendtype,
int dest, int sendtag,
void *recvbuf, int recvcount, MPI_Datatype recvtype,
int source, int recvtag,
MPI_Comm comm, MPI_Status *status);
• Previousprogrambecomeseasiertowrite:
int a[20], b[20], myrank, np;
MPI_Status status; MPI_Comm_size(MPI_COMM_WORLD, &np); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); MPI_Sendrecv(a, 20, MPI_INT, (myrank+1)%np, 1,
b, 20, MPI_INT, (myrank-1+npes)%np, 1,
MPI_COMM_WORLD, &status);
• Restriction:sendandreceivebuffersmustbedisjoint,otherwiseuse:
int MPI_Sendrecv_replace(void *buf, int count, MPI_Datatype datatype,
int dest, int sendtag,
int source, int recvtag,
MPI_Comm comm, MPI_Status *status);
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 28

Overlap communication with computation
• MPI_Sendrecv_replace–Blockingoperation!
• Couldoverlapsomedatatransmissionwithcomputationsthatdon’tdependon
the data being transmitted yet
• MPIprovidesnon-blockingprimitives
• MPI_Isendstartsasendoperation,andreturnsbeforethedataiscopiedoutofthebuffer
• MPI_Irecvstartsarecvoperation,andreturnsbeforethedataisreceivedintothebuffer
int MPI_Isend(void *buf, int count, MPI_Datatype datatype,
int dest, int tag, MPI_Comm comm, MPI_Request *request)
int MPI_Irecv(void *buf, int count, MPI_Datatype datatype,
int src, int tag, MPI_Comm comm, MPI_Request *request)
• Allocatearequestobjectandreturnapointertoitintherequestargument(detailslater)
• Non-blocking operation can be matched with a corresponding blocking operation
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 29

Overlap communication with computation
• Anyproblemswithusingthesenon-blockingprimitives?
• Atsomepoint,weneedthedatatobeguaranteedtohavebeensent/received,otherwise violates correctness
• Use MPI_Test and/or MPI_Wait to determine whether a non-blocking operation has finished, and/or to wait (block) until the non-blocking operation is completed
• UsetherequestobjectprovidedbyMPI_Isend/MPI_Irecvtotest/waitonthecompletion
int MPI_Test(MPI_Request *request, int *flag, MPI_Status *status)
• Returnsnon-zeroifcompleted(ifso,requestisdeallocated,settoMPI_REQUEST_NULL)
int MPI_Wait(MPI_Request *request, MPI_Status *status)
• Blocksuntilrequestcompletes,thendeallocatesrequestandsetsittoMPI_REQUEST_NULL • StatusissimilartotheonefortheblockingSend/Recv
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 30

Avoiding deadlocks
• Recalldeadlockexampleforblockingoperations
• Implementationdependent–maycausedeadlocks
int a[20], b[20], myrank;
MPI_Status status;
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
if (myrank == 0) {
MPI_Send(a, 20, MPI_INT, 1, 1, MPI_COMM_WORLD);
MPI_Send(b, 20, MPI_INT, 1, 2, MPI_COMM_WORLD); }
else if (myrank == 1) {
MPI_Recv(b, 20, MPI_INT, 0, 2, MPI_COMM_WORLD, &status); MPI_Recv(a, 20, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
• If we replace either MPI_Send or MPI_Recv operations with non-blocking versions, the code will be safe, regardless of MPI implementation, e.g.:
MPI_Irecv(b, 20, MPI_INT, 0, 2, &requests[0], MPI_COMM_WORLD); MPI_Irecv(a, 20, MPI_INT, 0, 1, &requests[1], MPI_COMM_WORLD);
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 31

MPI collective operations
University of Toronto Mississauga, Department of Mathematical and Computational Sciences

Collective communication / computation
• Commoncollectiveoperations • Barrier
• Broadcast
• Reduction
• Prefixsum
• Scatter/Gather • All-to-all
• Allprocessesinthecommunicatororgrouphavetoparticipate!
University of Toronto Mississauga, Department of Mathematical and Computational Sciences 33

rank0 rank1 rank2
rankn comm MPI_Barrier(comm)
• Warning1:Carefulwithpotentialdeadlocks!
if(my_rank % 2 == 0) {
// do stuff
MPI_Barrier(MPI_COMM_WORLD); }
• Blocksuntilallprocessesinthegivencommunicatorhitthebarrier int MPI_Barrier(MPI_Comm comm)
• Warning2:Barrierdoesnotmagicallywaitforpendingnon-blockingoperations! University of Toronto Mississauga, Department of Mathematical and Computational Sciences 35

MPI_Bcast(foo, 20, MPI_INT, src, comm);
• One-to-all: send buf of source to all other processes in the group (into their buf)
int MPI_Bcast(void *buf, int count, MPI_Datatype datatype, int source, M

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts