OpenMP: An Implementation of Thread Level Parallelism
aka “Real-world Multithreading!!” https://warwick.ac.uk/fac/sci/dcs/teaching/material/cs402/ 18/01/2022 ● CS402/922 High Performance Computing ● ●
18/01/2022
Copyright By PowCoder代写 加微信 powcoder
“Previously, on the HPC module…”
• ThreadàA small section of code that is split into multiple copies within a single process.
• Threads often share code, global information and other resources
• Multiprocessing vs. Multithreading
• Allocate 1 (or more if hyperthreading) thread to a processor core
18/01/2022
Parallelism made easy!
• OpenMP (shortened to OMP) is a pragma based multithreading library
• Compiler manages threads
• Programmers write specialised comments
• Supports FORTRAN, C and C++
• Version 1.0 came out on FORTRAN in 1997, with
C/C++ following the next year
• Version 3.0 (most widely used) came out in 2008
18/01/2022
Fork-Join Model
Not to be confused with fork and spoons…
• Different ways threads can be managed
• Thread PoolàA collection of persistent threads that
work could be allocated to
• Fork-JoinàThreads are created (forked) and
destroyed (joined) when required • OMP uses fork-join model
Master thread
Parallel region
Parallel region
18/01/2022
Building programs with OpenMP
Not to be confused with fork and spoons…
• OMP has been built into many compilers
• Careful!àDifferent compilers require different flags to
• Need to include the OpenMP header file (omp.h)
libomp to be installed separately (e.g. through Homebrew)
OpenMP Flag(s)
OpenMP Support
• GCC 6 onwards supports OMP 4.5
• GCC 9 has initial support for OMP 5
Clang (LLVM)
• Fully supports OMP 4.5
• Working on OMP 5.0 and 5.1
Clang (Apple)
-Xpreprocessor -fopenmp -lomp
• See Clang (LLVM)
• Intel 17.0 onwards supports OMP 4.5
• Intel oneAPI supports part of OMP 5.1
18/01/2022
Parallelising Loops
Finally, lets parallelise!
• OMP is most often utilised through pragma comments
#pragma omp
• Creates OMP threads and executes the following region in parallel
int driver1(int N, int* a, int* b, int* c)
#pragma omp
parallel for
• Specifies a for loop to be ran in parallel over all OMP threads
int kernel2(int N, int* a, int* b, int* c)
Other pragma commands
• #pragma omp parallel do
• Equivalent to parallel for, but for do … while loops
• #pragma omp parallel loop
• Allows for multiple loops to be ran concurrently
• #pragma omp simd
• Indicates a loop can be transformed into a SIMD loop
#pragma omp parallel {
kernel1(N, a, b, c); }}
#pragma omp parallel for
for (i = 0; i < N; i++) { c[i] = a[i] + b[i];
18/01/2022
Private variables
• Specifies a list of variables that are local to that thread
a=0 ... a=4
• The variables can be set and reset in different ways
• privateàIt is not given an initial value
• firstprivateàIts initial variable is set to the
value of the variable
• lastprivateàThe variables value is set to the value in the primary thread
a=9 ... a=4
a=9 ... a=4
a=0 ... a=5
a=9 ... a=5
a=0 ... a=6
a=9 ... a=6
a=9 ... a=5
a=9 ... a=6
Lastprivate firstprivate private
18/01/2022
So who’s taking what thing again?
• We can specify how the work is split up between threads • The most commonly used ones are:
• staticàworkload is split evenly between threads before compute
• dynamicàworkload is split into equally sized chunks, threads request chunks when required
• guidedàsame asdynamic, but successive chunks get smaller
• Great for load balancing and/or reducing overhead
18/01/2022
Syncing OpenMP Threads
Even threads need to coordinate sometimes!
• Synchronisation is sometimes required as well • #pragma omp critical
• Runs the following code in a single thread • #pragma omp atomic
• Ensures a memory location is accessed without conflict
• Difference between these operations:
• atomic has a lower overhead
• critical allows for multi-line statements
int factorialCritical(int N) { int i;
int resultShared = 0;
#pragma omp parallel
int resultLocal = 0;
#pragma omp for
for (i = 0; i < N; i++) {
resultLocal += i;
#pragma omp critical
resultShared += resultLocal; }
return resultShared;
int factorialAtomic(int N) {
int resultShared = 0;
#pragma omp parallel for
for (i = 0; i < N; i++) { #pragma omp atomic
resultShared += i;
return resultShared;
18/01/2022
Reductions
• Allows for the same operation to be applied to the same variable over multiple threads
• Often faster than atomic or critical
• Limited to a certain number of operations:
int factorialReduction(int N) { int i;
int resultShared = 0;
#pragma omp parallel for reduction(+:resultShared)
for (i = 0; i < N; i++) { resultShared += i;
return resultShared; }
Making dependencies easier one step at a time!
• Identifier function/expression (min, max) • +,-,*,&,|,^,&&,||
18/01/2022
OpenMP functions
What’s a library without functions!
• Some aspects of the OMP environment can be set or retrieved within the program
• Key examples include:
• omp_get_num_threads() à Gets the number of threads available
• omp_get_thread_num() à Gets the ID of the thread
• omp_set_num_threads(int) à Sets the number of threads that can be utilised
• omp_get_wtime() à Gets the wall clock time (thread safe)
18/01/2022
Environment Variables Why recompile when we can alter the environment‽
• Allows us to change key elements without changes
to the code
• Often used examples:
• OMP_NUM_THREADSàThe number of threads to be utilised in the program
• OMP_SCHEDULEàThe ordering with which the threads should iterate through a loop
• OMP_PROC_BINDàControls if and how threads can move between cores
18/01/2022
What’s next for OpenMP?
Onwards and upwards!
• OMP4.5, 5.0 and 5.1
• Target offloadàSpecify where the compute should
occur (CPU/GPU/Accelerator etc.)
• Memory managementàSpecify where the data should be stored and how
• New version out (5.2) out soon
18/01/2022
Interesting related reads
Some of this might even be fun...
• International Workshop on OpenMP (IWOMP)àhttps://www.iwomp.org/ • OpenMP Quick Reference Guide (Version 5.2)à
https://www.openmp.org/wp-content/uploads/OpenMPRefCard-5-2-web.pdf
• OpenMP Examples (Version 5.1)àhttps://www.openmp.org/wp- content/uploads/openmp-examples-5.1.pdf
Next lecture: Intro to Coursework 1
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com