CS计算机代考程序代写 compiler ER Microsoft PowerPoint – COMP528 HAL23 OpenMP tasks v1.1.pptx

Microsoft PowerPoint – COMP528 HAL23 OpenMP tasks v1.1.pptx

Dr Michael K Bane, G14, Computer Science, University of Liverpool
m.k. .uk https://cgi.csc.liv.ac.uk/~mkbane/COMP528

COMP528: Multi-core and
Multi-Processor Programming

23 – HAL

OPENMP TASKS (OVERVIEW)

Background Reading
• “Using OpenMP – The Next Step: Affinity, Accelerators,

Tasking and SIMD”, van der Pas et al. MIT Press (2017)
– https://ieeexplore-ieee-

org.liverpool.idm.oclc.org/xpl/bkabstractplus.jsp?bkn=8169743

– Homework: read Chapter 1 (a nice recap of v2.5 of OpenMP)

– NEW homework: read & Chapter 3 (tasking)
optional: Chapter 4 (thread affinity)


“Using OpenMP: Portable Shared Memory Parallel Programming”
Chapman et al. MIT Press (2007)
https://ebookcentral.proquest.com/lib/liverpool/reader.action?docID=3338748&ppg=60

– v2.5 so does not cover: tasks, accelerators, some refinements

Tasks
• Rather than considering workflow imperative style

– stmt then stmt then stmt

• … and work-sharing across a for loop

• Can we identify specific “tasks”, push them to a queue and let
the system run next appropriate task when resources available
– closer to data flow

– “appropriate” => user defines some dependencies,
RTS runs when they are resolved

Tasks / Tasking

• think task (as in a bundle of work) not threads

• can be used to exploit parallelism in workloads that are not
a set of for loops

• also support ‘while’ loops & recursion, & some dynamic
load balancing

• “task parallelism” (using data flow)

Tasks: Concept

• create task

Tasks: Concept

• create task

• add to task pool

Tasks: Concept

• create task

COMP528 (c) mkbane, univ of liverpool
(2018, 2019)

Tasks: Concept

• create task

• add to pool

COMP528 (c) mkbane, univ of liverpool
(2018, 2019)

a little later we may have…

• create task

• add to pool

COMP528 (c) mkbane, univ of liverpool
(2018, 2019)

T
I M

E
BUT we can also run tasks on avail resources

COMP528 (c) mkbane, univ of liverpool
(2018, 2019)

T
I M

E
BUT we can also run tasks on avail resources

• task runs on avail
thread

T
I M

E
BUT we can also run tasks on avail resources

• avail resource

• => assign task

• & repeat

COMP528 (c) mkbane, univ of liverpool
(2018, 2019)

T
I M

E
BUT we can also run tasks on avail resources

COMP528 (c) mkbane, univ of liverpool
(2018, 2019)

T
I M

E
BUT we can also run tasks on avail resources

• start new tasks
when resources
become available

COMP528 (c) mkbane, univ of liverpool
(2018, 2019)

T
I M

E
BUT we can also run tasks on avail resources

• aha

dependency

purple depends on
all yellow finished

COMP528 (c) mkbane, univ of liverpool
(2018, 2019)

T
I M

E
BUT we can also run tasks on avail resources

COMP528 (c) mkbane, univ of liverpool
(2018, 2019)

Outline of syntax

• task creation
– done in a parallel region

– each task created on a single thread
• often (in textbooks) within a single threaded region

– master | single (maybe critical as long as not same task created)

• but can be the individual iterations of a parallelised “omp for” loop

#pragma omp task

{

block becomes task

}

Running the task

• run-time decides!
– might be immediate

– might be deferred

• BUT programmer can use synchronisation to ensure when
must be run by (or rather where to wait until task/s has run)

Definitions

• task construct

• task – the actual instructions (& data created) when thread
runs (“encounters”) the “task construct”
– different encounters of same “task construct” generate different

tasks

• task region – the code encountered during execution of a task

Two Examples…

• DEMO SUB DIR:
~mkbane/HPC_DEMOS/openmp-tasks

1. Simple, non-recursive…
– print out adjectives, any order
storyTelling.c

– (You should also see what happens if you comment out the
“taskwait” directive)

Storytelling (task example)

4 tasks; each can run on a
different OpenMP thread

No ordering of tasks; we don’t
know (a priori) which thread a
task will run on

But all tasks have to complete
at the ‘taskwait’ before any
thread can continue past that
point (c.f. “barrier”)

COMP328/COMP528
(c) mkbane, univ of liverpool

COMP328/COMP528
(c) mkbane, univ of liverpool

Two Examples…

2. Fibonnaci series:
fib[i] = 1, 1, 2, 3, 5, 8, 10, …
eg fib[6] is 6th entry i.e. 8

COMP328/COMP528
(c) mkbane, univ of liverpool

COMP328/COMP528 (c) mkbane, univ of liverpool

-DMKB_PRINT=printf to output
-DMKB_PRINT=”//” to not output

COMP328/COMP528
(c) mkbane, univ of liverpool

Fibonacci Recursive Tasks Example
PERFORMANCE

3. ex_fib_tasks_LARGE.c for fib(NN=40) so we can time
(best of 10 runs on (exclusive) ‘course’ node (i.e. Intel Skylake chips,
with OMP_DYNAMIC & OMP_PROC_BIND set as recommended,
using Intel compiler with zero optimisation level)

fib 40 = 102334155; time taken:19.359815 on 1 threads

fib 40 = 102334155; time taken:93.365958 on 2 threads

fib 40 = 102334155; time taken:63.541324 on 3 threads

fib 40 = 102334155; time taken:48.035683 on 4 threads

fib 40 = 102334155; time taken:25.048191 on 8 threads

fib 40 = 102334155; time taken:17.036860 on 12 threads

WHAT DO YOU THINK OF THE PERFORMANCE?

Fibonacci Recursive Tasks Example
PERFORMANCE

3. ex_fib_tasks_LARGE.c for fib(NN=40) so we can time
(best of 10 runs on (exclusive) ‘course’ node (i.e. Intel Skylake chips,
with OMP_DYNAMIC & OMP_PROC_BIND set as recommended,
using Intel compiler with zero optimisation level)

fib 40 = 102334155; time taken:19.359815 on 1 threads

fib 40 = 102334155; time taken:93.365958 on 2 threads

fib 40 = 102334155; time taken:63.541324 on 3 threads

fib 40 = 102334155; time taken:48.035683 on 4 threads

fib 40 = 102334155; time taken:25.048191 on 8 threads

fib 40 = 102334155; time taken:17.036860 on 12 threads

WHAT DO YOU THINK OF THE PERFORMANCE?

C
O

M
P

3
2

8
/C

O
M

P
5

2
8

(
c)

m
k

b
an

e,
u

n
iv

o
f

li
v

er
p

o
o

l

• overheads of tasks can be high
– if “omp par for” fits, use it

• can arrange for task to be on a team of threads

• can nest tasks

• can use tasks for dynamic load balancing of irregular
problems

Tasks

• useful to know

• worth consideration…

• in depth further reading

chpt 3: “Using OpenMP: The Next Step…”
 https://hpc-

forge.cineca.it/files/ScuolaCalcoloParallelo_WebDAV/public/anno-
2016/12_Advanced_School/OpenMP_4_tasks.pdf (Cineca)

Background Reading
• “Using OpenMP – The Next Step: Affinity, Accelerators,

Tasking and SIMD”, van der Pas et al. MIT Press (2017)
– https://ieeexplore-ieee-

org.liverpool.idm.oclc.org/xpl/bkabstractplus.jsp?bkn=8169743

– Homework: read Chapter 1 (a nice recap of v2.5 of OpenMP)

– NEW homework: read & Chapter 3 (tasking)
optional: Chapter 4 (thread affinity)


“Using OpenMP: Portable Shared Memory Parallel Programming”
Chapman et al. MIT Press (2007)
https://ebookcentral.proquest.com/lib/liverpool/reader.action?docID=3338748&ppg=60

– v2.5 so does not cover: tasks, accelerators, some refinements

RECAP

SHARED MEMORY

• Memory on chip
– Faster access

– Limited to that memory

– … and to those nodes

• Programming typically OpenMP
– Directives based + environment vars

+ run time functions

– Incremental changes to code
(e.g. loop by loop)

– Portable to single core / non-OpenMP
• Single code base (or use of “stubs”)

DISTRIBUTED MEMORY

• Access memory of another node
– Latency & bandwidth issues

– Which interconnect: IB .v. gigE (v. etc)

– Expandable (memory & nodes)

• Programming 99% always MPI
– Message Passing Interface

– Library calls  more intrusive

– Different MPI libs / implementations

– Non-portable to non-MPI (without effort)

OpenMP

• Only for “globally addressable” shared
memory
– (generally) only a single node

• Threads

• Fork-join model: parallel regions
– Work-sharing

– Tasks

• Need to think about
private(x) .v. shared (x)

MPI

• For distributed memory
– Includes subset of a single node

• Processes

• Each process on a different core

• Need to think message-passing in
order to share information

OpenMP MPI

Intel modules on Barkla intel intel intel-mpi

compile (with no optimisation) icc –qopenmp -O0 myOMP.c -o myOMP.exe mpiicc -O0 myMPI.c –o myMPI.exe

run on 7 cores export OMP_NUM_THREADS=7
./myOMP.exe

mpirun -np 7 ./myMPI.exe

• OpenMP for Accelerators…