CS代考计算机代写 c++ c/c++ Fortran compiler High Performance Computing

High Performance Computing
Course Notes
OpenMP
Dr Ligang He

OpenMP
Stands for Open specification for Multi-processing
An approach to writing parallel programs
• write a serial program and use the compiler to
automatically parallelize it
• OpenMP is used to assist compilers to understand the
serial program
Used in shared memory parallelism; support Fortran, C and C++ programs
OpenMP is a specification for
• a set of compiler directives,
• RUN TIME library routines,
• environment variables
Computer Science, University of Warwick
2

OpenMP
Stands for Open specification for Multi-processing
An approach to writing parallel programs
• The computers are of Von Neumann architecture,
which is designed to run sequential programs
• It is natural for us to write sequential programs, not
parallel programs
• This approach allows us to write sequential programs
and use the compiler to automatically parallelize the
programs
• OpenMP is used to assist the compiler to understand
and parallelize the sequential program
Computer Science, University of Warwick
3

History of OpenMP
 OpenMP emerges as an industry standard
 OpenMP Architecture Review Board: Compaq, HP, IBM, Intel, SGI, SUN
 OpenMP versions:
 OpenMP 1.0 for Fortran, 1997, OpenMP 1.0 for C/C++, 1998  OpenMP 2.0 for Fortran, 2000, OpenMP 2.0 for C/C++ , 2002  OpenMP 2.5 for C/C++ and Fortran, 2005
 OpenMP 3.0 for C/C++ and Fortran, 2008, OpenMP 3.1, 2011  OpenMP 4.0, 2013, OpenMP 4.5, 2015
 OpenMP 5.0, Nov 8th, 2018, OpenMP 5.1, Nov, 2020
Computer Science, University of Warwick
4

OpenMP programming model
An implementation of thread models  Multiple threads running in parallel
Used for shared memory architecture Fork-join model
Computer Science, University of Warwick
5

How a new process is created
Use the fork function
All three segments and
the program counter are duplicated
Computer Science, University of Warwick
6

How Threads are Created?
Only the stack segment and the program counter are duplicated
Computer Science, University of Warwick
7

Features of Threads
 Split a program into separate tasks, one to be run by a thread concurrently
 Multiple threads exist within the context of a single process, sharing the process’s code, global information, other resources
 Threads usually communicate by accessing shared global data values
Global shared space –a single global address space (heap segment) shared among the threads in the scope of a process
Local private space – each thread also has its own local private data (stack segment) that is not shared
Computer Science, University of Warwick
8

OpenMP code structure in C
#include main () {
int var1, var2, var3;
Serial code …
/*Beginning of parallel section. Fork a team of threads. Specify variable scoping*/
#pragma omp parallel {
Parallel section executed by all threads

All threads join master thread and disband
}
}
Resume serial code…
Computer Science, University of Warwick
9

OpenMP code structure in Fortran
PROGRAM HELLO
INTEGER VAR1, VAR2, VAR3
Serial code …
!Beginning of parallel section. Fork a team of threads. Specify variable scoping
!$OMP PARALLEL
Parallel section executed by all threads

All threads join master thread and disband
!$OMP END PARALLEL
Resume serial code …
END
Computer Science, University of Warwick
10

OpenMP Directives Format
C/C++
Fortran
Computer Science, University of Warwick
11

General Features of OpenMP Directives
 OpenMP directives are ignored by compilers that don’t support OpenMP. In this case, codes are run as sequential codes
 Compiler directives used to specify
 sections of code that can be executed in parallel
• Mainly used to parallelize loops, e.g. different threads handle separate iterations of the loop
 critical sections
 Scope of variables (private or shared)
Computer Science, University of Warwick
12

Parallel Directive and Fork-Join Model
• Parallel directive (construct) and parallel region #pragma omp parallel //for C or C++
{
… do stuff //parallel region }
• Multiple threads are created using the parallel directive
• Fork-Join model
Computer Science, University of Warwick
13

How many threads are generated

The number of threads in a parallel region is determined by the following ways, in order of precedence:
 Use the omp_set_num_threads() library function
 Set the OMP_NUM_THREADS environment variable  By default – the number of CPUs on a node
Threads are numbered from 0 (master thread) to N-1

Computer Science, University of Warwick
14

Parallelizing loops in OpenMP
Compiler directive specifies that the instructions in a loop can be processed in parallel
For C and C++:
#pragma omp parallel for
for (i=0;i++;i main() {
int x;
x = 0;
#pragma omp parallel shared(x) {

#pragma omp critical
x = x+1;
} /* end of parallel section */
}
Computer Science, University of Warwick
20

Example of Barrier in OpenMP
#include
#include
int main (int argc, char *argv[]) {
{
int th_id, nthreads;
#pragma omp parallel private(th_id)
th_id = omp_get_thread_num();
printf(“Hello World from thread %d\n”, th_id); #pragma omp barrier
if ( th_id == 0 ) {
nthreads = omp_get_num_threads(); printf(“There are %d threads\n”,nthreads);
} return 0;
}
Computer Science, University of Warwick
21

Data Scope Clauses in OpenMP
Used to explicitly define how variables should be viewed by threads
Used in conjunction with other directives (e.g. parallel)
Three often used clauses:  Shared
 Private
 Reduction
Computer Science, University of Warwick
22

Shared and private data in OpenMP
 shared(var) states that var is a global variable to be shared among threads
 private(var) creates a local copy of var for each thread
Computer Science, University of Warwick
23

Reduction Clause
 Reduction – reduction (operation : var)
 The exemplar operations: add, logical OR (associative and
commutative operations)
 A local copy of the variable is made for each thread
 The local values of the variable can be updated by the threads.
 At the end of parallel region, the local values are combined to generate a global value through the reduction operation
Computer Science, University of Warwick
24

An Example of Reduction Clause
double ZZ, res=0.0;
#pragma omp parallel for reduction (+:res) private(ZZ) for (i=1;i<=N;i++) { ZZ = i; res = res + ZZ: } Computer Science, University of Warwick 25 Run-Time Library Routines A run-time library contains several routines They can: • Query the number of threads/thread no. • Set the number of threads to be generated • Query the number of processors in the computer • Changing the number of threads Computer Science, University of Warwick 26 Run-Time Library Routines query routines allow you to get the number of threads and the ID of a specific thread id = omp_get_thread_num(); //thread no. Nthreads = omp_get_num_threads(); //number of threads Can specify number of threads at runtime omp_set_num_threads(Nthreads); Computer Science, University of Warwick 27 Environment Variables Controlling the execution of parallel code Four environment variables OMP_SCHEDULE: how iterations of a loop are scheduled OMP_NUM_THREADS: maximum number of threads OMP_DYNAMIC: enable or disable dynamic adjustment of the number of threads OMP_NESTED: enable or disable nested parallelism Computer Science, University of Warwick 28 Lab session this week - Practice OpenMP programming - Download lab instructions and code from here: https://warwick.ac.uk/fac/sci/dcs/teaching/material/cs402/openmp.pdf Computer Science, University of Warwick 29