程序代写代做代考 c++ compiler cache More Advanced

More Advanced
OpenMP

This is an abbreviated form of Tim Mattson’s and Larry
Meadow’s (both at Intel) SC ’08 tutorial located at http://
openmp.org/mp-documents/omp-hands-on-SC08.pdf
All errors are my responsibility

Saturday, January 30, 16

http://openmp.org/mp-documents/omp-hands-on-SC08.pdf
http://openmp.org/mp-documents/omp-hands-on-SC08.pdf
http://openmp.org/mp-documents/omp-hands-on-SC08.pdf
http://openmp.org/mp-documents/omp-hands-on-SC08.pdf

Topics
• Creating Threads
• Synchronization
• Runtime library calls
• Data environment
• Scheduling for and

sections

• Memory Model
• OpenMP 3.0 and Tasks

OpenMP 4

• Extensions to tasking
• User defined reduction

operators

• Construct cancellation
• Portable SIMD directives
• Thread affinity

Saturday, January 30, 16

Creating threads

• We already know about
• parallel regions (omp parallel)
• parallel sections (omp parallel sections)
• parallel for (omp parallel for) or omp for

when in a parallel region

• We will now talk about Tasks

Saturday, January 30, 16

Tasks
• OpenMP before OpenMP 3.0 has always had tasks
• A parallel construct created implicit tasks, one per thread
• A team of threads was created to execute the tasks
• Each thread in the team is assigned (and tied) to one task
• Barrier holds the original master thread until all tasks are

finished (note that the master may also execute a task)

• OpenMP 3.0 allows us to explicitly create tasks.
• Every part of an OpenMP program is part of some task, with the

master task executing the program even if there is no explicit
task

Saturday, January 30, 16

task construct syntax
#pragma omp task [clause[[,]clause] …]
structured-block

clauses:
if (expression)
untied
shared (list)
private (list)
firstprivate (list)
default( shared | none )

if (false) says execute the task by the
spawning thread
• different task with respect to

synchronization
• Data environment is local to the thread
• User optimization for cache affinity and

cost of executing on a different thread

untied says the task can be executed by
more than one thread, i.e., different
threads execute different parts of the
task

Blue options are as before and
associated with whether storage
is shared or private

Saturday, January 30, 16

When do we know a task is
finished?

• At explicit or implicit thread barriers
• All tasks generated in the current parallel region are

finished when the barrier for that parallel region finishes

• Matches what you expect, i.e., when a barrier is reached the
work preceding the barrier is finished

• At task barriers
• Wait until all tasks defined in the current task are finished

#pragma omp taskwait

• Applies to tasks T generated in the current task, not to
tasks generated by those tasks T

Saturday, January 30, 16

Example: parallel pointer
chasing

#pragma omp parallel
{
#pragma omp single private(p)
{
p = listhead ;
while (p) {
#pragma omp task
process (p)
p=next (p) ;
}
}
}

value of p passed is
value of p at the time of
the invocation. Saved on
the stack like with any
function call

process is an ordinary
user function.

Saturday, January 30, 16

Example: parallel pointer
chasing

#pragma omp parallel
{
#pragma omp for private(p)
for ( int i =0; i left)
#pragma omp task
postorder(p->left);
if (p->right)
#pragma omp task
postorder(p->right);
#pragma omp taskwait // wait for descendants
process(p->data);
}

This is a task
scheduling point

Saturday, January 30, 16

Example: parallel pointer
chasing

void postorder(node *p) { // p is initially
if (p->left)
#pragma omp task
postorder(p->left);
if (p->right)
#pragma omp task
postorder(p->right);
#pragma omp taskwait // wait for descendants
process(p->data);
} Code is within an omp parallel section

Saturday, January 30, 16

Example: parallel pointer
chasing

void postorder(node *p) { // p is
if (p->left)
#pragma omp task
postorder(p->left);
if (p->right)
#pragma omp task
postorder(p->right);
#pragma omp taskwait // wait for descendants
process(p->data);
}

Saturday, January 30, 16

Example: parallel pointer
chasing

void postorder(node *p) { // p is
if (p->left)
#pragma omp task
postorder(p->left);
if (p->right)
#pragma omp task
postorder(p->right);
#pragma omp taskwait // wait for descendants
process(p->data);
}

Saturday, January 30, 16

Example: parallel pointer
chasing

void postorder(node *p) { // p is
if (p->left)
#pragma omp task
postorder(p->left);
if (p->right)
#pragma omp task
postorder(p->right);
#pragma omp taskwait // wait for descendants
process(p->data);
}

process process process process

, , , ,

Saturday, January 30, 16

Example: parallel pointer
chasing

void postorder(node *p) { // p is
if (p->left)
#pragma omp task
postorder(p->left);
if (p->right)
#pragma omp task
postorder(p->right);
#pragma omp taskwait // wait for descendants
process(p->data);
} process

Saturday, January 30, 16

Example: parallel pointer
chasing

void postorder(node *p) { // p is
if (p->left)
#pragma omp task
postorder(p->left);
if (p->right)
#pragma omp task
postorder(p->right);
#pragma omp taskwait // wait for descendants
process(p->data);
} process process

Saturday, January 30, 16

Example: parallel pointer
chasing

void postorder(node *p) {
if (p->left)
#pragma omp task
postorder(p->left);
if (p->right)
#pragma omp task
postorder(p->right);
#pragma omp taskwait // wait for descendants
process(p->data);
}

process process

process

Saturday, January 30, 16

• Certain constructs have task scheduling points in
them (task constructs, taskwait constructs, taskyield
[#pragma omp taskyield] constructs, barriers
(implicit and explicit), the end of a tied region)

• Threads at task scheduling points can suspend their
thread and begin executing another task in the task
pool (task switching)

• At the completion of the task or another task
scheduling point it can resume executing the original
task

Task scheduling points

Saturday, January 30, 16

Example: task switching
#pragma omp single
{
for (i=0; i