More Advanced
OpenMP
This is an abbreviated form of Tim Mattson’s and Larry
Meadow’s (both at Intel) SC ’08 tutorial located at http://
openmp.org/mp-documents/omp-hands-on-SC08.pdf
All errors are my responsibility
Saturday, January 30, 16
http://openmp.org/mp-documents/omp-hands-on-SC08.pdf
http://openmp.org/mp-documents/omp-hands-on-SC08.pdf
http://openmp.org/mp-documents/omp-hands-on-SC08.pdf
http://openmp.org/mp-documents/omp-hands-on-SC08.pdf
Topics
• Creating Threads
• Synchronization
• Runtime library calls
• Data environment
• Scheduling for and
sections
• Memory Model
• OpenMP 3.0 and Tasks
OpenMP 4
• Extensions to tasking
• User defined reduction
operators
• Construct cancellation
• Portable SIMD directives
• Thread affinity
Saturday, January 30, 16
Creating threads
• We already know about
• parallel regions (omp parallel)
• parallel sections (omp parallel sections)
• parallel for (omp parallel for) or omp for
when in a parallel region
• We will now talk about Tasks
Saturday, January 30, 16
Tasks
• OpenMP before OpenMP 3.0 has always had tasks
• A parallel construct created implicit tasks, one per thread
• A team of threads was created to execute the tasks
• Each thread in the team is assigned (and tied) to one task
• Barrier holds the original master thread until all tasks are
finished (note that the master may also execute a task)
• OpenMP 3.0 allows us to explicitly create tasks.
• Every part of an OpenMP program is part of some task, with the
master task executing the program even if there is no explicit
task
Saturday, January 30, 16
task construct syntax
#pragma omp task [clause[[,]clause] …]
structured-block
clauses:
if (expression)
untied
shared (list)
private (list)
firstprivate (list)
default( shared | none )
if (false) says execute the task by the
spawning thread
• different task with respect to
synchronization
• Data environment is local to the thread
• User optimization for cache affinity and
cost of executing on a different thread
untied says the task can be executed by
more than one thread, i.e., different
threads execute different parts of the
task
Blue options are as before and
associated with whether storage
is shared or private
Saturday, January 30, 16
When do we know a task is
finished?
• At explicit or implicit thread barriers
• All tasks generated in the current parallel region are
finished when the barrier for that parallel region finishes
• Matches what you expect, i.e., when a barrier is reached the
work preceding the barrier is finished
• At task barriers
• Wait until all tasks defined in the current task are finished
#pragma omp taskwait
• Applies to tasks T generated in the current task, not to
tasks generated by those tasks T
Saturday, January 30, 16
Example: parallel pointer
chasing
#pragma omp parallel
{
#pragma omp single private(p)
{
p = listhead ;
while (p) {
#pragma omp task
process (p)
p=next (p) ;
}
}
}
value of p passed is
value of p at the time of
the invocation. Saved on
the stack like with any
function call
process is an ordinary
user function.
Saturday, January 30, 16
Example: parallel pointer
chasing
#pragma omp parallel
{
#pragma omp for private(p)
for ( int i =0; i
#pragma omp task
postorder(p->left);
if (p->right)
#pragma omp task
postorder(p->right);
#pragma omp taskwait // wait for descendants
process(p->data);
}
This is a task
scheduling point
Saturday, January 30, 16
Example: parallel pointer
chasing
void postorder(node *p) { // p is initially
if (p->left)
#pragma omp task
postorder(p->left);
if (p->right)
#pragma omp task
postorder(p->right);
#pragma omp taskwait // wait for descendants
process(p->data);
} Code is within an omp parallel section
Saturday, January 30, 16
Example: parallel pointer
chasing
void postorder(node *p) { // p is
if (p->left)
#pragma omp task
postorder(p->left);
if (p->right)
#pragma omp task
postorder(p->right);
#pragma omp taskwait // wait for descendants
process(p->data);
}
Saturday, January 30, 16
Example: parallel pointer
chasing
void postorder(node *p) { // p is
if (p->left)
#pragma omp task
postorder(p->left);
if (p->right)
#pragma omp task
postorder(p->right);
#pragma omp taskwait // wait for descendants
process(p->data);
}
Saturday, January 30, 16
Example: parallel pointer
chasing
void postorder(node *p) { // p is
if (p->left)
#pragma omp task
postorder(p->left);
if (p->right)
#pragma omp task
postorder(p->right);
#pragma omp taskwait // wait for descendants
process(p->data);
}
process process process process
, , , ,
Saturday, January 30, 16
Example: parallel pointer
chasing
void postorder(node *p) { // p is
if (p->left)
#pragma omp task
postorder(p->left);
if (p->right)
#pragma omp task
postorder(p->right);
#pragma omp taskwait // wait for descendants
process(p->data);
} process
Saturday, January 30, 16
Example: parallel pointer
chasing
void postorder(node *p) { // p is
if (p->left)
#pragma omp task
postorder(p->left);
if (p->right)
#pragma omp task
postorder(p->right);
#pragma omp taskwait // wait for descendants
process(p->data);
} process process
Saturday, January 30, 16
Example: parallel pointer
chasing
void postorder(node *p) {
if (p->left)
#pragma omp task
postorder(p->left);
if (p->right)
#pragma omp task
postorder(p->right);
#pragma omp taskwait // wait for descendants
process(p->data);
}
process process
process
Saturday, January 30, 16
• Certain constructs have task scheduling points in
them (task constructs, taskwait constructs, taskyield
[#pragma omp taskyield] constructs, barriers
(implicit and explicit), the end of a tied region)
• Threads at task scheduling points can suspend their
thread and begin executing another task in the task
pool (task switching)
• At the completion of the task or another task
scheduling point it can resume executing the original
task
Task scheduling points
Saturday, January 30, 16
Example: task switching
#pragma omp single
{
for (i=0; i