CS计算机代考程序代写 compiler c++ Fortran Microsoft PowerPoint – COMP528 HAL19 OpenMP recap, scope, threads.pptx

Microsoft PowerPoint – COMP528 HAL19 OpenMP recap, scope, threads.pptx

Dr Michael K Bane, G14, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane/COMP528

COMP528: Multi-core and
Multi-Processor Programming

19 – HAL

• Thread based

• Shared Memory

• Fork-Join model

• Parallelisation via
WORK-SHARING and
TASKS

• FORTRAN, C, C++

• Directives +
Environment Variables +
Run Time

• OpenMP version 4.5
 parallel regions

work sharing constructs

 data clauses

 synchronisation

 tasks

 accelerators (sort of!)

Home

COMP328/COMP528 (c) mkbane, university of
liverpool

Background Reading
• “Using OpenMP – The Next Step: Affinity, Accelerators,

Tasking and SIMD”, van der Pas et al. MIT Press (2017)
https://ieeexplore-ieee-
org.liverpool.idm.oclc.org/xpl/bkabstractplus.jsp?bkn=8169743

– Homework: read Chapter 1 (a nice recap of v2.5 of OpenMP)

• “Using OpenMP: Portable Shared Memory Parallel Programming”
Chapman et al. MIT Press (2007)
– https://ebookcentral.proquest.com/lib/liverpool/reader.action?docID=33

38748&ppg=60

– Based on v2.5 so it does not cover: tasks, accelerators, some other refinements

OPENMP RECAP

3 Key elements

• Directives
#pragma omp …

• Run time functions
int n = omp_get_num_threads()
double t0 = omp_get_wtime()

• Environment variables
export OMP_NUM_THREADS = 17

Example

• Approximate integral is
sum of areas under line

• Each area approximated
by a rectangle

• Calculate these in parallel…

0

2

4

6

8

10

12

0 0.5 1 1.5 2 2.5 3 3.5

x x+h

0.5*[f(x)+f(x+h)]

#include
#include
#include

double func(double);

int main(void) {
double a=0.0, b=6000.0;
int num=10000000; // num of traps to sum
double stepsize=(b-a)/(double) num;
double x, sum=0.0; // x-space and local summation
int i;
double t0, t1; // timers

t0 = omp_get_wtime();

#pragma omp parallel for default(none) \
shared(num, a, stepsize) private(i,x) reduction(+:sum)
for (i=0; iy; i++) {

a[i] = b[i] + c[i];

w += func(a[i]);

}

x = x + 1;

green = parallel; red = serial

Scope = structured block

#pragma omp parallel [data clauses]
{
a = b + c;
w = func(a);
}
x = x + 1;
#pragma omp parallel [data clauses]
for (i=x; i>y; i++) {

a[i] = b[i] + c[i];
w += func(a[i]);

}
x = x + 1;

WHAT ABOUT…

#pragma omp parallel [data clauses]

a[i] = b[i] + c[i];

w += func(a[i]);

x = x + 1;

green = parallel; red = serial

Scope = structured block

#pragma omp parallel [data clauses]
{
a = b + c;
w = func(a);
}
x = x + 1;
#pragma omp parallel [data clauses]
for (i=x; i>y; i++) {

a[i] = b[i] + c[i];
w += func(a[i]);

}
x = x + 1;

ANSWER…

#pragma omp parallel [data clauses]

a[i] = b[i] + c[i];

w += func(a[i]);

x = x + 1;

green = parallel; red = serial

without an explicit
structured block (e.g. set
of parentheses or a ‘for’
construct) the scope is the
next statement ONLY

COMP328/COMP528 (c) mkbane, university of
liverpool

HOW MANY THREADS

Remember, remember…

• OpenMP comprises
– directives

– environment variables

– run time functions

• Each of these can determine number of threads

• There is likely to be an optimum number of threads
– too many threads => too little work per thread (& more overheads)

– too few threads => could go faster by sharing work between more threads

#threads

OpenMP standard states that the number of threads is…

• FIXED through-out a parallel region

• DEFINED at entry (to par reg)

• may VARY between different parallel regions

• AND… a parallel region could comprise only
a single thread (i.e. the “master” thread)

which is directive? run time? env var? other?

a) OMP_NUM_THREADS
a) set this to a reasonable number

b) #pragma omp parallel [data clauses] \
num_threads(flops_in_work/ideal_flops_per_thread)

c) #pragma omp parallel [data clauses] if(boolean_expr)

d) omp_set_num_threads(12);

directive

a) OMP_NUM_THREADS
a) set this to a reasonable number

b) #pragma omp parallel [data clauses] \
num_threads(flops_in_work/ideal_flops_per_thread)

c) #pragma omp parallel [data clauses] if(boolean_expr)

d) omp_set_num_threads(12); run time function

environment variable

directive

Determine during code execution…
How many threads in region?

• run time function:
n = omp_get_num_threads()

• will be 1 if called in non-OpenMP code / part of code
– which includes run-time serialised code

run time function

Timing

• omp_get_wtime()
– returns float of time since some arbitrary point

– call it before & after a code segment to get wallclock time spent

• can be called in serial or parallel regions

Questions via MS Teams / email
Dr Michael K Bane, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane