Microsoft PowerPoint – COMP528 HAL08 MPI intro.pptx
Dr Michael K Bane, G14, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane/COMP528
COMP528: Multi-core and
Multi-Processor Programming
8 – HAL
SHARED MEMORY
• Memory on node
• Faster access
• Limited to that memory
• … and to those cores
• Programming typically OpenMP (or another
threaded model)
• Directives based
• Incremental changes
• Portable to single core / non-OpenMP
• Single code base
• Can use MPI too
DISTRIBUTED MEMORY
• Access memory of another node
• Latency & bandwidth issues
• IB .v. gigE
• Expandable (memory & nodes)
• Programming 99% always MPI
• Message Passing Interface
• Library calls
• More intrusive
• Different implementations of MPI standard
• Non-portable to non-MPI (without effort)
Parallel hardware
• Of interest
– SIMD (Single Instruction, Multiple Data)
• Vectorisation, some OpenMP
• (SIMT: single instr multiple threads) = SIMD + multithreading
– Particularly for GPUs
– MIMD (Multiple Instruction, Multiple Data)
• Options: shared-memory || distributed-memory || hybrid
• SPMD programs (single program, multiple data)
– single programs that utilize parallelism (branching, data decomposition, etc)
SPMD for Distributed Memory
• Distributed memory
• Require some way to interact between nodes
• Run same program on each physical core of each node
• SPMD: single program, multiple data
Core 0 Core 1 Core 2 Core 3
#include
#include
int main(void) {
int myID;
MPI_Init(NULL,NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &myID);
printf(“Hi from %d\n”, myID);
MPI_Finalize();
}
SPMD for Distributed Memory
• Distributed memory
• Require some way to interact between nodes
• Run same program on each physical core of each node
• SPMD: single program, multiple data
• Each program then runs the source code with its data
• Actual value of variables may differ differing computations
Core 0 Core 1 Core 2 Core 3
#include
#include
int main(void) {
int myID;
MPI_Init(NULL,NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &myID);
printf(“Hi from %d\n”, myID);
MPI_Finalize();
}
SPMD for Distributed Memory
• Distributed memory
• Require some way to interact between nodes
• Run same program on each physical core of each node
• Each program then runs the SAME source code with its data
• Actual value of variables may differ differing computations
Core 0 Core 1 Core 2 Core 3
#include
#include
int main(void) {
int myID;
MPI_Init(NULL,NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &myID);
printf(“Hi from %d\n”, myID);
MPI_Finalize();
}
myID myID
myID myID
This example would
be launched by
mpirun -np 4 ./a.out
SPMD for Distributed Memory
• Distributed memory
• Require some way to interact between nodes
• Run same program on each physical core of each node
• Each program then runs the SAME source code with its data
• Actual value of variables may differ differing computations
Core 0 Core 1 Core 2 Core 3
#include
#include
int main(void) {
int myID;
MPI_Init(NULL,NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &myID);
printf(“Hi from %d\n”, myID);
MPI_Finalize();
}
0 1
2 3 “rank”
SPMD for Distributed Memory
• Distributed memory
• Require some way to interact between nodes – eg to share data
• Run same program on each physical core of each node
Core 0 Core 1 Core 2 Core 3
SPMD for Distributed Memory
• Distributed memory
• Require some way to interact between nodes
• Run same program on each physical core of each node
• “Message Passing” of data between processes: MPI
Core 0 Core 1 Core 2 Core 3
Message Passing Interface
https://www.mpi-forum.org/
• Specifically v3.1
• MPI standard .v. an implementation of a version of the
standard
MPI: message-passing interface
• MPI: Message-Passing Interface (1991-…)
• MPI defines library of functions
which can be called from C, C++, Fortran…
Barkla MPI implementations
• for this course we use
Intel MPI implementation
• this loads the Intel MPI
compiler wrapper…
• … but we also need to
load the Intel compiler
MPI Vocabulary
• Process: each instance of the code runs as an MPI process,
typically with 1 MPI process per physical core. Each
process has a numbered “rank”
• Communicator: in MPI a communicator is a collection of
processes that can send messages to each other. The default
communicator is defined, via MPI_Init(), as
MPI_COMM_WORLD
• Rank: a numerical ID of a process within a communicator.
Processes are ranked 0, 1, 2, …
Getting Started, I
• If we are to use MPI in our C programs
– we need to call MPI_Init function before any other MPI functions: it
does all necessary setup
(RECOMMENDED to not do anything before calling MPI_Init())
– we need to call MPI_Finalize function after all MPI functions calls
(RECOMMENDED to do nothing other than return an rc after
MPI_Finalize())
man MPI_Init
man MPI_Finalize
Typical C program using MPI
. . .
#include
int main(int argc, char* argv[]) {
MPI_Init(&argc, &argv);
work_per_MPI_process();
MPI_Finalize();
return 0;
}
Typical C program using MPI
not using any command line args
. . .
#include
int main(void) {
MPI_Init(NULL, NULL);
work_per_MPI_process();
MPI_Finalize();
return 0;
}
MPI by Analysis of Simple Program
• Sending a message from one process to another:
POINT TO POINT COMMUNICATION
• Typically:
• “send” – one process will call MPI_Send(…)
• “receive” – another process will call MPI_Recv(…)
Core 0 Core 1 Core 2 Core 3
NB for “C” programming, all MPI
functions are named
MPI_Capitalised_function(…)
e.g.
MPI_Send
MPI_Isend
MPI_Comm_rank
COMP328/COMP528 (c) mkbane, university of liverpool
One Send, One Recv – But only One Prog?
One Send, One Recv – But only One Prog?
/* simple example for COMP528, (c) University of Liverpool */
#include
#include
int main(void) {
int myID, inputBuffer;
MPI_Init(NULL,NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &myID);
if (myID==0) {
MPI_Send(&myID, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
}
else {
MPI_Recv(&inputBuffer, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf(“%d received %d\n”, myID, inputBuffer);
}
MPI_Finalize();
}
We already seen each MPI process has a unique
rank – we save this in to memory location for
variable “myID” on each MPI process
MPI by Analysis of Simple Program
• Sending a message from one process to another:
POINT TO POINT COMMUNICATION
Core 0 Core 1 Core 2 Core 3
More precisely…
MPI_Send & MPI_Recv lib calls execute on
processor core;
The “send” process will at some point load
from memory; the “receive” process will at
some point save to memory
ADVANCED MPI: remote memory access AKA
one-sided communications
MPI by Analysis of Simple Program
• Later we will discuss
COLLECTIVE COMMUNICATIONS
• Where every MPI process participates
• eg to broadcast a value all process: MPI_Bcast()
Core 0 Core 1 Core 2 Core 3
Questions via MS Teams / email
Dr Michael K Bane, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane