High Performance Computing
Course Notes
OpenMP
Dr Ligang He
OpenMP
Stands for Open specification for Multi-processing
An approach to writing parallel programs
• write a serial program and use the compiler to
automatically parallelize it
• OpenMP is used to assist compilers to understand the
serial program
Used in shared memory parallelism; support Fortran, C and C++ programs
OpenMP is a specification for
• a set of compiler directives,
• RUN TIME library routines,
• environment variables
Computer Science, University of Warwick
2
OpenMP
Stands for Open specification for Multi-processing
An approach to writing parallel programs
• The computers are of Von Neumann architecture,
which is designed to run sequential programs
• It is natural for us to write sequential programs, not
parallel programs
• This approach allows us to write sequential programs
and use the compiler to automatically parallelize the
programs
• OpenMP is used to assist the compiler to understand
and parallelize the sequential program
Computer Science, University of Warwick
3
History of OpenMP
OpenMP emerges as an industry standard
OpenMP Architecture Review Board: Compaq, HP, IBM, Intel, SGI, SUN
OpenMP versions:
OpenMP 1.0 for Fortran, 1997, OpenMP 1.0 for C/C++, 1998 OpenMP 2.0 for Fortran, 2000, OpenMP 2.0 for C/C++ , 2002 OpenMP 2.5 for C/C++ and Fortran, 2005
OpenMP 3.0 for C/C++ and Fortran, 2008, OpenMP 3.1, 2011 OpenMP 4.0, 2013, OpenMP 4.5, 2015
OpenMP 5.0, Nov 8th, 2018, OpenMP 5.1, Nov, 2020
Computer Science, University of Warwick
4
OpenMP programming model
An implementation of thread models Multiple threads running in parallel
Used for shared memory architecture Fork-join model
Computer Science, University of Warwick
5
How a new process is created
Use the fork function
All three segments and
the program counter are duplicated
Computer Science, University of Warwick
6
How Threads are Created?
Only the stack segment and the program counter are duplicated
Computer Science, University of Warwick
7
Features of Threads
Split a program into separate tasks, one to be run by a thread concurrently
Multiple threads exist within the context of a single process, sharing the process’s code, global information, other resources
Threads usually communicate by accessing shared global data values
Global shared space –a single global address space (heap segment) shared among the threads in the scope of a process
Local private space – each thread also has its own local private data (stack segment) that is not shared
Computer Science, University of Warwick
8
OpenMP code structure in C
#include
int var1, var2, var3;
Serial code …
/*Beginning of parallel section. Fork a team of threads. Specify variable scoping*/
#pragma omp parallel {
Parallel section executed by all threads
…
All threads join master thread and disband
}
}
Resume serial code…
Computer Science, University of Warwick
9
OpenMP code structure in Fortran
PROGRAM HELLO
INTEGER VAR1, VAR2, VAR3
Serial code …
!Beginning of parallel section. Fork a team of threads. Specify variable scoping
!$OMP PARALLEL
Parallel section executed by all threads
…
All threads join master thread and disband
!$OMP END PARALLEL
Resume serial code …
END
Computer Science, University of Warwick
10
OpenMP Directives Format
C/C++
Fortran
Computer Science, University of Warwick
11
General Features of OpenMP Directives
OpenMP directives are ignored by compilers that don’t support OpenMP. In this case, codes are run as sequential codes
Compiler directives used to specify
sections of code that can be executed in parallel
• Mainly used to parallelize loops, e.g. different threads handle separate iterations of the loop
critical sections
Scope of variables (private or shared)
Computer Science, University of Warwick
12
Parallel Directive and Fork-Join Model
• Parallel directive (construct) and parallel region #pragma omp parallel //for C or C++
{
… do stuff //parallel region }
• Multiple threads are created using the parallel directive
• Fork-Join model
Computer Science, University of Warwick
13
How many threads are generated
•
The number of threads in a parallel region is determined by the following ways, in order of precedence:
Use the omp_set_num_threads() library function
Set the OMP_NUM_THREADS environment variable By default – the number of CPUs on a node
Threads are numbered from 0 (master thread) to N-1
•
Computer Science, University of Warwick
14
Parallelizing loops in OpenMP
Compiler directive specifies that the instructions in a loop can be processed in parallel
For C and C++:
#pragma omp parallel for
for (i=0;i++;i
int x;
x = 0;
#pragma omp parallel shared(x) {
…
#pragma omp critical
x = x+1;
} /* end of parallel section */
}
Computer Science, University of Warwick
20
Example of Barrier in OpenMP
#include
#include
int main (int argc, char *argv[]) {
{
int th_id, nthreads;
#pragma omp parallel private(th_id)
th_id = omp_get_thread_num();
printf(“Hello World from thread %d\n”, th_id); #pragma omp barrier
if ( th_id == 0 ) {
nthreads = omp_get_num_threads(); printf(“There are %d threads\n”,nthreads);
} return 0;
}
Computer Science, University of Warwick
21
Data Scope Clauses in OpenMP
Used to explicitly define how variables should be viewed by threads
Used in conjunction with other directives (e.g. parallel)
Three often used clauses: Shared
Private
Reduction
Computer Science, University of Warwick
22
Shared and private data in OpenMP
shared(var) states that var is a global variable to be shared among threads
private(var) creates a local copy of var for each thread
Computer Science, University of Warwick
23
Reduction Clause
Reduction – reduction (operation : var)
The exemplar operations: add, logical OR (associative and
commutative operations)
A local copy of the variable is made for each thread
The local values of the variable can be updated by the threads.
At the end of parallel region, the local values are combined to generate a global value through the reduction operation
Computer Science, University of Warwick
24
An Example of Reduction Clause
double ZZ, res=0.0;
#pragma omp parallel for reduction (+:res) private(ZZ) for (i=1;i<=N;i++) {
ZZ = i;
res = res + ZZ: }
Computer Science, University of Warwick
25
Run-Time Library Routines
A run-time library contains several routines They can:
• Query the number of threads/thread no.
• Set the number of threads to be generated
• Query the number of processors in the computer
• Changing the number of threads
Computer Science, University of Warwick
26
Run-Time Library Routines
query routines allow you to get the number of threads and the ID of a specific thread
id = omp_get_thread_num(); //thread no.
Nthreads = omp_get_num_threads(); //number of threads
Can specify number of threads at runtime omp_set_num_threads(Nthreads);
Computer Science, University of Warwick
27
Environment Variables
Controlling the execution of parallel code Four environment variables
OMP_SCHEDULE: how iterations of a loop are scheduled
OMP_NUM_THREADS: maximum number of threads
OMP_DYNAMIC: enable or disable dynamic adjustment of the number of threads
OMP_NESTED: enable or disable nested parallelism
Computer Science, University of Warwick
28
Lab session this week
- Practice OpenMP programming
- Download lab instructions and code from here: https://warwick.ac.uk/fac/sci/dcs/teaching/material/cs402/openmp.pdf
Computer Science, University of Warwick
29