CS计算机代考程序代写 compiler Fortran Microsoft PowerPoint – COMP528 HAL08 MPI intro.pptx

Microsoft PowerPoint – COMP528 HAL08 MPI intro.pptx

Dr Michael K Bane, G14, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane/COMP528

COMP528: Multi-core and
Multi-Processor Programming

8 – HAL

SHARED MEMORY

• Memory on node
• Faster access

• Limited to that memory

• … and to those cores

• Programming typically OpenMP (or another
threaded model)
• Directives based

• Incremental changes

• Portable to single core / non-OpenMP
• Single code base 

• Can use MPI too

DISTRIBUTED MEMORY

• Access memory of another node
• Latency & bandwidth issues
• IB .v. gigE
• Expandable (memory & nodes)

• Programming 99% always MPI
• Message Passing Interface
• Library calls
• More intrusive
• Different implementations of MPI standard
• Non-portable to non-MPI (without effort)

Parallel hardware

• Of interest
– SIMD (Single Instruction, Multiple Data)

• Vectorisation, some OpenMP

• (SIMT: single instr multiple threads) = SIMD + multithreading
– Particularly for GPUs

– MIMD (Multiple Instruction, Multiple Data)
• Options: shared-memory || distributed-memory || hybrid

• SPMD programs (single program, multiple data)
– single programs that utilize parallelism (branching, data decomposition, etc)

SPMD for Distributed Memory
• Distributed memory

• Require some way to interact between nodes

• Run same program on each physical core of each node

• SPMD: single program, multiple data

Core 0 Core 1 Core 2 Core 3

#include
#include
int main(void) {

int myID;
MPI_Init(NULL,NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &myID);
printf(“Hi from %d\n”, myID);
MPI_Finalize();

}

SPMD for Distributed Memory
• Distributed memory

• Require some way to interact between nodes

• Run same program on each physical core of each node

• SPMD: single program, multiple data

• Each program then runs the source code with its data
• Actual value of variables may differ  differing computations

Core 0 Core 1 Core 2 Core 3

#include
#include
int main(void) {

int myID;
MPI_Init(NULL,NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &myID);
printf(“Hi from %d\n”, myID);
MPI_Finalize();

}

SPMD for Distributed Memory
• Distributed memory

• Require some way to interact between nodes

• Run same program on each physical core of each node

• Each program then runs the SAME source code with its data
• Actual value of variables may differ  differing computations

Core 0 Core 1 Core 2 Core 3

#include
#include
int main(void) {
int myID;
MPI_Init(NULL,NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &myID);
printf(“Hi from %d\n”, myID);
MPI_Finalize();

}

myID myID
myID myID

This example would
be launched by
mpirun -np 4 ./a.out

SPMD for Distributed Memory
• Distributed memory

• Require some way to interact between nodes

• Run same program on each physical core of each node

• Each program then runs the SAME source code with its data
• Actual value of variables may differ  differing computations

Core 0 Core 1 Core 2 Core 3

#include
#include
int main(void) {
int myID;
MPI_Init(NULL,NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &myID);
printf(“Hi from %d\n”, myID);
MPI_Finalize();

}

0 1
2 3 “rank”

SPMD for Distributed Memory
• Distributed memory

• Require some way to interact between nodes – eg to share data

• Run same program on each physical core of each node

Core 0 Core 1 Core 2 Core 3

SPMD for Distributed Memory
• Distributed memory

• Require some way to interact between nodes

• Run same program on each physical core of each node

• “Message Passing” of data between processes: MPI

Core 0 Core 1 Core 2 Core 3

Message Passing Interface

https://www.mpi-forum.org/
• Specifically v3.1

• MPI standard .v. an implementation of a version of the
standard

MPI: message-passing interface

• MPI: Message-Passing Interface (1991-…)

• MPI defines library of functions
which can be called from C, C++, Fortran…

Barkla MPI implementations

• for this course we use
Intel MPI implementation

• this loads the Intel MPI
compiler wrapper…

• … but we also need to
load the Intel compiler

MPI Vocabulary

• Process: each instance of the code runs as an MPI process,
typically with 1 MPI process per physical core. Each
process has a numbered “rank”

• Communicator: in MPI a communicator is a collection of
processes that can send messages to each other. The default
communicator is defined, via MPI_Init(), as
MPI_COMM_WORLD

• Rank: a numerical ID of a process within a communicator.
Processes are ranked 0, 1, 2, …

Getting Started, I

• If we are to use MPI in our C programs
– we need to call MPI_Init function before any other MPI functions: it

does all necessary setup
(RECOMMENDED to not do anything before calling MPI_Init())

– we need to call MPI_Finalize function after all MPI functions calls
(RECOMMENDED to do nothing other than return an rc after
MPI_Finalize())

man MPI_Init

man MPI_Finalize

Typical C program using MPI
. . .

#include

int main(int argc, char* argv[]) {

MPI_Init(&argc, &argv);

work_per_MPI_process();

MPI_Finalize();

return 0;

}

Typical C program using MPI
not using any command line args
. . .

#include

int main(void) {

MPI_Init(NULL, NULL);

work_per_MPI_process();

MPI_Finalize();

return 0;

}

MPI by Analysis of Simple Program

• Sending a message from one process to another:
POINT TO POINT COMMUNICATION

• Typically:
• “send” – one process will call MPI_Send(…)

• “receive” – another process will call MPI_Recv(…)

Core 0 Core 1 Core 2 Core 3

NB for “C” programming, all MPI
functions are named
MPI_Capitalised_function(…)

e.g.
MPI_Send
MPI_Isend
MPI_Comm_rank

COMP328/COMP528 (c) mkbane, university of liverpool

One Send, One Recv – But only One Prog?

One Send, One Recv – But only One Prog?
/* simple example for COMP528, (c) University of Liverpool */
#include
#include
int main(void) {
int myID, inputBuffer;
MPI_Init(NULL,NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &myID);
if (myID==0) {

MPI_Send(&myID, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
}
else {

MPI_Recv(&inputBuffer, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf(“%d received %d\n”, myID, inputBuffer);

}
MPI_Finalize();

}

We already seen each MPI process has a unique
rank – we save this in to memory location for
variable “myID” on each MPI process

MPI by Analysis of Simple Program

• Sending a message from one process to another:
POINT TO POINT COMMUNICATION

Core 0 Core 1 Core 2 Core 3

More precisely…
MPI_Send & MPI_Recv lib calls execute on
processor core;
The “send” process will at some point load
from memory; the “receive” process will at
some point save to memory

ADVANCED MPI: remote memory access AKA
one-sided communications

MPI by Analysis of Simple Program

• Later we will discuss
COLLECTIVE COMMUNICATIONS

• Where every MPI process participates
• eg to broadcast a value all process: MPI_Bcast()

Core 0 Core 1 Core 2 Core 3

Questions via MS Teams / email
Dr Michael K Bane, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane