Microsoft PowerPoint – COMP528 HAL19 OpenMP recap, scope, threads.pptx
Dr Michael K Bane, G14, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane/COMP528
COMP528: Multi-core and
Multi-Processor Programming
19 – HAL
• Thread based
• Shared Memory
• Fork-Join model
• Parallelisation via
WORK-SHARING and
TASKS
• FORTRAN, C, C++
• Directives +
Environment Variables +
Run Time
• OpenMP version 4.5
parallel regions
work sharing constructs
data clauses
synchronisation
tasks
accelerators (sort of!)
COMP328/COMP528 (c) mkbane, university of
liverpool
Background Reading
• “Using OpenMP – The Next Step: Affinity, Accelerators,
Tasking and SIMD”, van der Pas et al. MIT Press (2017)
https://ieeexplore-ieee-
org.liverpool.idm.oclc.org/xpl/bkabstractplus.jsp?bkn=8169743
– Homework: read Chapter 1 (a nice recap of v2.5 of OpenMP)
• “Using OpenMP: Portable Shared Memory Parallel Programming”
Chapman et al. MIT Press (2007)
– https://ebookcentral.proquest.com/lib/liverpool/reader.action?docID=33
38748&ppg=60
– Based on v2.5 so it does not cover: tasks, accelerators, some other refinements
OPENMP RECAP
3 Key elements
• Directives
#pragma omp …
• Run time functions
int n = omp_get_num_threads()
double t0 = omp_get_wtime()
• Environment variables
export OMP_NUM_THREADS = 17
Example
• Approximate integral is
sum of areas under line
• Each area approximated
by a rectangle
• Calculate these in parallel…
0
2
4
6
8
10
12
0 0.5 1 1.5 2 2.5 3 3.5
x x+h
0.5*[f(x)+f(x+h)]
#include
#include
#include
double func(double);
int main(void) {
double a=0.0, b=6000.0;
int num=10000000; // num of traps to sum
double stepsize=(b-a)/(double) num;
double x, sum=0.0; // x-space and local summation
int i;
double t0, t1; // timers
t0 = omp_get_wtime();
#pragma omp parallel for default(none) \
shared(num, a, stepsize) private(i,x) reduction(+:sum)
for (i=0; i
a[i] = b[i] + c[i];
w += func(a[i]);
}
x = x + 1;
green = parallel; red = serial
Scope = structured block
#pragma omp parallel [data clauses]
{
a = b + c;
w = func(a);
}
x = x + 1;
#pragma omp parallel [data clauses]
for (i=x; i>y; i++) {
a[i] = b[i] + c[i];
w += func(a[i]);
}
x = x + 1;
WHAT ABOUT…
#pragma omp parallel [data clauses]
a[i] = b[i] + c[i];
w += func(a[i]);
x = x + 1;
green = parallel; red = serial
Scope = structured block
#pragma omp parallel [data clauses]
{
a = b + c;
w = func(a);
}
x = x + 1;
#pragma omp parallel [data clauses]
for (i=x; i>y; i++) {
a[i] = b[i] + c[i];
w += func(a[i]);
}
x = x + 1;
ANSWER…
#pragma omp parallel [data clauses]
a[i] = b[i] + c[i];
w += func(a[i]);
x = x + 1;
green = parallel; red = serial
without an explicit
structured block (e.g. set
of parentheses or a ‘for’
construct) the scope is the
next statement ONLY
COMP328/COMP528 (c) mkbane, university of
liverpool
HOW MANY THREADS
Remember, remember…
• OpenMP comprises
– directives
– environment variables
– run time functions
• Each of these can determine number of threads
• There is likely to be an optimum number of threads
– too many threads => too little work per thread (& more overheads)
– too few threads => could go faster by sharing work between more threads
#threads
OpenMP standard states that the number of threads is…
• FIXED through-out a parallel region
• DEFINED at entry (to par reg)
• may VARY between different parallel regions
• AND… a parallel region could comprise only
a single thread (i.e. the “master” thread)
which is directive? run time? env var? other?
a) OMP_NUM_THREADS
a) set this to a reasonable number
b) #pragma omp parallel [data clauses] \
num_threads(flops_in_work/ideal_flops_per_thread)
c) #pragma omp parallel [data clauses] if(boolean_expr)
d) omp_set_num_threads(12);
directive
a) OMP_NUM_THREADS
a) set this to a reasonable number
b) #pragma omp parallel [data clauses] \
num_threads(flops_in_work/ideal_flops_per_thread)
c) #pragma omp parallel [data clauses] if(boolean_expr)
d) omp_set_num_threads(12); run time function
environment variable
directive
Determine during code execution…
How many threads in region?
• run time function:
n = omp_get_num_threads()
• will be 1 if called in non-OpenMP code / part of code
– which includes run-time serialised code
run time function
Timing
• omp_get_wtime()
– returns float of time since some arbitrary point
– call it before & after a code segment to get wallclock time spent
• can be called in serial or parallel regions
Questions via MS Teams / email
Dr Michael K Bane, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane