Message Passing Programming
Modes, Tags and Communicators
Reusing this material
This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_US
This means you are free to copy and redistribute the material and adapt and build on the
material under the following terms: You must give appropriate credit, provide a link to the license and indicate if changes were made. If you adapt or build on the material you must distribute your work under the same license as the original.
Acknowledge EPCC as follows: EPCC, The University of Edinburgh, www.epcc.ed.ac.uk
Note that this presentation contains images owned by others. Please seek their permission before reusing these images.
3
Overview
Lecture will cover
– explanationofMPImodes(Ssend,BsendandSend) – meaning and use of message tags
– rationale for MPI communicators
These are all commonly misunderstood
– essential for all programmers to understand modes
– often useful to use tags
– certain cases benefit from exploiting different communicators
MPI通信器的基本原理
4
Modes
MPI_Ssend (Synchronous Send)
guaranteed to be synchronous
routine will not return until message has been delivered
MPI_Bsend (Buffered Send) guaranteed to be asynchronous
routine returns before the message is delivered
system copies data into a buffer and sends it later on
例程在消息传递之前不会返回
ennti
on_no .n e t
MPI_Send (standard Send)
may be implemented as synchronous or asynchronous send this causes a lot of confusion (see later)
5
MPI_Ssend
Process A
Ssend(x,B)
Process B
x
Data Transfer
y
Running other non-MPI code
Wait in Ssend
Recv(y,A)
Ssend returns
x can be overwritten by A
Recv returns
y can now be read by B
6
MPI_Bsend
Process A
Bsend(x,B)
Bsend returns x can be
overwritten by A
Process B
Running other non-MPI code
x
buffer
Recv(y,A)
y
Recv returns
y can now be read by B
7
Notes
Recv is always synchronous
if process B issued Recv before the Bsend from process A, then B
would wait in the Recv until Bsend was issued
Where does the buffer space come from?
for Bsend, the user provides a single large block of memory
make this available to MPI using MPI_Buffer_attach on_no
If A issues another Bsend before the Recv 系统尝试将消息存储在缓 冲区的空闲空间中
system tries to store message in free space in the buffer if there is not enough space then Bsend will FAIL!
8
Send
Problems
– Ssend runs the risk of deadlock
– Bsend less likely to deadlock, and your code may run faster, but
the user must supply the buffer space
the routine will FAIL if this buffering is exhausted 缓冲耗尽
MPI_Send tries to solve these problems
– buffer space is provided by the system
– Send will normally be asynchronous (like Bsend)
– if buffer is full, Send becomes synchronous (like Ssend)
MPI_Send routine is unlikely to fail
– but could cause your program to deadlock if buffering runs out
9
MPI_Send
Process A
Process B
Send(x,B)
Recv(x,B)
Send(y,A)
Recv(y,A)
是
This code is NOT guaranteed to work
– will deadlock if Send is synchronous mebloduyommunicotytosohe.de
– is guaranteed to deadlock if you use Ssend! 的 比 10
Solutions
To avoid deadlock
– either match sends and receives explicitly
– e.g. for ping-pong
process A sends then receives process B receives then sends
任何匹配都显式地发送和接收
For a more general solution use non-blocking communications (see later)
For this course you should program with 心Ssend – more likely to pick up bugs such as deadlock than Send
11
Checking for Messages
MPI allows you to check if any messages have arrived 摫
– you can probe for matching messages
– same syntax as receive except no receive buffer specified
e.g. in C:
int MPI_Probe(int source, int tag,
MPI_Comm comm, MPI_Status *status)
Status is set as if the receive took place 状态设置为接收发生时的状态
– e.g. you can find out the size of the message and allocate space prior to receive
Be careful with wildcards
– you can use, e.g., MPI_ANY_SOURCE in call to probe
– but must use specific source in receive to guarantee matching same message
– e.g. MPI_Recv(buff, count, datatype, status.MPI_SOURCE, …) Zwrnrr
12
Tags
Every message can have a tag – this is a non-negative integer value
– maximum value can be queried using MPI_TAG_UB attribute
– MPI guarantees to support tags of at least 32767
– not everyone uses them; many MPI programs set all tags to zero
Tags can be useful in some situations
– can choose to receive messages only of a given tag
Most commonly used with MPI_ANY_TAG
– receives the most recent message regardless of the tag
– user then finds out the actual value by looking at the status 13
Communicators
All MPI communications take place within a communicator – a communicator is fundamentally a group of processes
– there is a pre-defined communicator: MPI_COMM_WORLD which contains ALL the processes
also MPI_COMM_SELF which contains only one process
A message can ONLY be received within the same communicator from which it was sent
– unlike tags, it is not possible to wildcard on comm 一
14
Uses of Communicators (i)
Can split MPI_COMM_WORLD into pieces
– each process has a new rank within each sub-communicator
– guarantees messages from the different pieces do not interact can attempt to do this using tags but there are no guarantees
size=7
rank=0 rank=2 rank=4
MPI_COMM_WORLD
rank=1 rank=3 rank=5rank=6
MPI_Comm_split()
rank=0 rank=1 rank=3 rank=2
size=4 comm1
rank=1
rank=0 rank=2
comm2
size=3
15
Uses of Communicators (ii)
Can make a copy of MPI_COMM_WORLD
– e.g.calltheMPI_Comm_duproutine
– containing all the same processes but in a new communicator
Enables processes to communicate with each other safely within a piece of code
– guaranteed that messages cannot be received by other code
– this is essential for people writing parallel libraries (e.g. a Fast
Fourier Transform) to stop library messages becoming mixed up with user messages 如果库对新通信器的标识保密,则用户不能拦截库消息
user cannot intercept the the library messages if the library keeps the identity of the new communicator a secret
not safe to simply try and reserve tag values due to wildcarding 16
Summary (i)
Question: Why bother with all these send modes?
Answer
– it is a little complicated, but you should make sure you understand – Ssend and Bsend are clear
map directly onto synchronous and asynchronous sends – Send can be either synchronous or asynchronous
MPI is trying to be helpful here, giving you the benefits of Bsend if there is sufficient system memory available, but not failing completely if buffer space runs out
in practice this leads to endless confusion!
The amount of system buffer space is variable
– programs that run on one machine may deadlock on another – you should NEVER assume that Send is asynchronous!
17
Summary (ii)
Question: What are the tags for? Answer
– if you dont need them dont use them!
perfectly acceptable to set all tags to zero
– can be useful for debugging
e.g. always tag messages with the rank of the sender
18
Summary (iii)
Question: Can I just use MPI_COMM_WORLD? Answer
– yes: many people never need to create new communicators in their MPI programs
– however, it is probably bad practice to specify MPI_COMM_WORLD explicitly in your routines
using a variable will allow for greater flexibility later on, e.g.:
MPI_Comm comm; /* or INTEGER for Fortran */ comm = MPI_COMM_WORLD;
…
MPI_Comm_rank(comm, &rank);
MPI_Comm_size(comm, &size); ….
19