Microsoft PowerPoint – COMP528 HAL20 OpenMP performance matters.pptx
Dr Michael K Bane, G14, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane/COMP528
COMP528: Multi-core and
Multi-Processor Programming
20 – HAL
• Thread based
• Shared Memory
• Fork-Join model
• Parallelisation via
WORK-SHARING and
TASKS
• FORTRAN, C, C++
• Directives +
Environment Variables +
Run Time
• OpenMP version 4.5
parallel regions
work sharing constructs
data clauses
synchronisation
tasks
accelerators (sort of!)
Background Reading
• “Using OpenMP – The Next Step: Affinity, Accelerators,
Tasking and SIMD”, van der Pas et al. MIT Press (2017)
https://ieeexplore-ieee-
org.liverpool.idm.oclc.org/xpl/bkabstractplus.jsp?bkn=8169743
– Homework: read Chapter 1 (a nice recap of v2.5 of OpenMP)
• “Using OpenMP: Portable Shared Memory Parallel Programming”
Chapman et al. MIT Press (2007)
– https://ebookcentral.proquest.com/lib/liverpool/reader.action?docID=33
38748&ppg=60
– Based on v2.5 so it does not cover: tasks, accelerators, some other refinements
PERFORMANCE MATTERS FOR
SHARED MEMORY PROGRAMMING
Performant OpenMP
• Granularity
• Load imbalance
– Scheduling
– (and not waiting…)
• First Touch
• Affinity
• False Sharing
Performant OpenMP
• Granularity
Fine grained OpenMP
Coarse grained OpenMP
COMP328/COMP528 (c) mkbane, univ of
liverpool (2018-2020)
Performant OpenMP
• Granularity
• Load imbalance
– Scheduling
– (and not waiting…)
• Without scheduling, iterations are
“block”
• For some examples this leads to
load imbalance
• More work => longer time
=> other threads just waiting
(rather than doing something
useful)
• With appropriate scheduling such as
“round robin” (or “cyclic”)
• Could aid load balance
• More equal sharing of work =>
all threads doing something
useful
=> all finish quicker
COMP328/COMP528 (c) mkbane, univ of
liverpool (2018-2020)
• We seen that load imbalance can be
an issue & use of optional data
clause “schedule” can help
• schedule(type, chunksize)
– type: static | dynamic | guided | auto | runtime
– chunksize: int (or int expr) – optional
• schedule(runtime) – uses value of env var OMP_SCHEDULE
export OMP_SCHEDULE=“guided,10”
Scheduling of Loops
versus
schedule(type, chunksize)
– type: static | dynamic | guided | runtime
– chunksize: int (or int expr) – optional
• (static): iterations divided into ~equal blocks with 1st
block on 1st thread, 2nd block on second thread, …
• (static, N): block of N iterations assigned in round-
robin fashion
• (dynamic): chunks dynamically assigned to threads
as they become free
• (guided): chunks of decreasing size are dynamically
assigned to the threads as they become available.
chunksize is min #iters handed out
(static)
(static,1)
(auto): leave to the compiler &|or run
time system to determine what is
best, with presumption that after a
few goes through a given for loop it
will determine the best scheduling…
i.e. default if no
explicit schedule
Performant OpenMP
• Granularity
• Load imbalance
– Scheduling
– (and not waiting…)
Performance / Load Imbalance / Not waiting
• implicit barriers at
– End of worksharing constructs “omp for”, “omp single”, (&
others)
• but it is not always necessary to have a barrier
• this can be removed with the “nowait” clause for the
worksharing construct
• it is up to user to ensure program remains correct
nowait examples…
• Consider 2 independent loops within a par region
– independent: we could do either loop first, or second
– so why should second loop wait for first to finish?
• Potential performance enhancement by using “nowait”
– data clause for “omp for” e.g.
#pragma omp for nowait
#pragma omp parallel default(none) shared(NUM,A,x,y,res) private(i,j,k) private(cksum)
{
#pragma omp for
for(i=0;i
var dictates location
• Consider arrays where work
happens
#pragma omp parallel for
for (int i=m; i
Not covered in this course but… “SCOPE”
• It’s all about the dynamic scope
– Can set up parallel region
within one C function
– But use worksharing within
another C function
that is called later
https://computing.llnl.gov/tutorials/openMP/#Scoping
Questions via MS Teams / email
Dr Michael K Bane, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane