OpenMP API 4.0 C/C++ Page 1
OpenMP 4.0 API C/C++ Syntax Quick Reference Card
OpenMP Application Program Interface (API) is
a portable, scalable model that gives parallel programmers a simple and flexible interface
Copyright By PowCoder代写 加微信 powcoder
for developing portable parallel applications. OpenMP supports multi-platform shared-memory
parallel programming in C/C++ and Fortran on all architectures, including Unix platforms and Windows platforms. See www.openmp.org for specifications.
4.0 Refers to functionality new in version 4.0. [n.n.n] refers to sections in the OpenMP API specification version 4.0, and [n.n.n] refers to version 3.1.
Directives
An OpenMP executable directive applies to the succeeding structured block or an OpenMP construct. Each directive starts with #pragma omp. The remainder of the directive follows the conventions of the C and C++ standards for compiler directives. A structured-block is a single statement or a compound statement with a single entry at the top and a single exit at the bottom.
parallel [2.5] [2.4]
Forms a team of threads and starts parallel execution.
#pragma omp parallel [clause[ [, ]clause] …] structured-block
if(scalar-expression) num_threads(integer-expression) default(shared | none) private(list)
firstprivate(list)
shared(list)
copyin(list) reduction(reduction-identifier: list)
4.0 proc_bind(master | close | spread) loop [2.7.1] [2.5.1]
Specifies that the iterations of associated loops will be executed in parallel by threads in the team in the context of their implicit tasks.
#pragma omp for [clause[ [, ]clause] …] for-loops
clause: private(list)
firstprivate(list)
lastprivate(list) reduction(reduction-identifier: list) schedule(kind[, chunk_size]) collapse(n)
• static: Iterations are divided into chunks of size
chunk_size and assigned to threads in the team in round-robin fashion in order of thread number.
• dynamic: Each thread executes a chunk of iterations then requests another chunk until none remain.
• guided: Each thread executes a chunk of iterations then requests another chunk until no chunks remain to be assigned.
• auto: The decision regarding scheduling is delegated to the compiler and/or runtime system.
• runtime: The schedule and chunk size are taken from the run-sched-var ICV.
sections [2.7.2] [2.5.2]
A noniterative worksharing construct that contains a set
of structured blocks that are to be distributed among and executed by the threads in a team.
#pragma omp sections [clause[ [, ] clause] …] {[#pragma omp section]
structured-block [#pragma omp section
structured-block]
single [2.7.3] [2.5.3]
Specifies that the associated structured block is executed by only one of the threads in the team.
#pragma omp single [clause[ [, ]clause] …] structured-block
clause: private(list)
firstprivate(list) copyprivate(list) nowait
4.0 target update [2.9.3]
Makes the corresponding list items in the device data
environment consistent with their original list items, according to the specified motion clauses.
#pragma omp target update clause[ [, ]clause] ,…]
clause is motion-clause or one of: device(integer-expression) if(scalar-expression)
motion-clause: to(list)
from(list)
clause: private(list)
firstprivate(list)
lastprivate(list) reduction(reduction-identifier: list) nowait
simd [2.8.1]
Applied to a loop to indicate that the loop can be transformed into a SIMD loop.
#pragma omp simd [clause[ [, ]clause] …] for-loops
clause: safelen(length)
linear(list[:linear-step]) aligned(list[:alignment]) private(list)
lastprivate(list) reduction(reduction-identifier: list) collapse(n)
declare simd [2.8.2]
Enables the creation of one or more versions that can process multiple arguments using SIMD instructions from a single invocation from a SIMD loop.
#pragma omp declare simd [clause[ [, ]clause] …] [#pragma omp declare simd [clause[ [, ]clause] …] ]
[…]function definition or declaration clause:
simdlen(length) linear(argument-list[:constant-linear-step]) aligned(argument-list[:alignment]) uniform(argument-list)
notinbranch
4.0 loop simd [2.8.3]
Specifies that a loop that can be executed concurrently using SIMD instructions, and that those iterations will also be executed in parallel by threads in the team.
#pragma omp for simd [clause[ [, ]clause] …] for-loops
Any accepted by the simd or for directives with identical meanings and restrictions.
4.0 target [data] [2.9.1, 2.9.2]
These constructs create a device data environment for the extent of the region. target also starts execution on the device.
#pragma omp target data [clause[ [, ]clause] …] structured-block
#pragma omp target [clause[ [, ]clause] …] structured-block
clause: device(integer-expression) map([map-type: ] list) if(scalar-expression)
declare target [2.9.4]
A declarative directive that specifies that variables and functions are mapped to a device.
#pragma omp declare target
declarations-definition-seq
#pragma omp end declare target
4.0 teams [2.9.5]
Creates a league of thread teams where the master thread of each team executes the region.
#pragma omp teams [clause[ [, ]clause] ,…] structured-block
clause: num_teams(integer-expression) thread_limit(integer-expression) default(shared | none) private(list)
firstprivate(list)
shared(list) reduction(reduction-identifier: list)
distribute [simd] [2.9.6, 2.9.7]
distribute specifies loops which are executed by the
thread teams. distribute simd specifies loops which are executed concurrently using SIMD instructions.
#pragma omp distribute [clause[ [, ]clause] …] for-loops
#pragma omp distribute simd [clause[ [, ]clause] …] for-loops
clause: private(list)
firstprivate(list)
collapse(n)
dist_schedule(kind[, chunk_size])
distribute parallel for [simd] [2.9.8, 2.9.9] These constructs specify a loop that can be executed in parallel [using SIMD semantics in the simd case] by multiple threads that are members of multiple teams.
#pragma omp distribute parallel for [clause[ [, ]clause] …] for-loops
#pragma omp distribute parallel for simd [clause[ [, ]clause] …] for-loops
clause: See clause for distribute
Continued4
© 2013 OpenMP ARB OMP1013C
OpenMP API 4.0 C/C++ Page 2
Directives (Continued)
parallel loop [2.10.1] [2.6.1]
Shortcut for specifying a parallel construct containing one or more associated loops and no other statements.
#pragma omp parallel for [clause[ [, ]clause] …] for-loop
clause: Any accepted by the parallel or for directives, except the nowait clause, with identical meanings and restrictions.
parallel sections [2.10.2] [2.6.2]
Shortcut for specifying a parallel construct containing one sections construct and no other statements.
#pragma omp parallel sections [clause[ [, ]clause] …]
{[#pragma omp section] structured-block
[#pragma omp section
task [2.11.1] [2.7.1]
Defines an explicit task. The data environment of the task is created according to data-sharing attribute clauses on task construct and any defaults that apply.
#pragma omp task [clause[ [, ]clause] …] structured-block
if(scalar-expression) final(scalar-expression) untied
default(shared | none) mergeable
private(list) firstprivate(list) shared(list)
4.0 depend(dependence-type: list)
The list items that appear in the depend clause may
include array sections.
dependence-type: The generated task will be a dependent task of all previously generated sibling tasks that reference at least one of the list items…
• in: …in an out or inout clause.
• out and inout: …in an in, out, or inout clause.
taskyield [2.11.2] [2.7.2]
Specifies that the current task can be suspended in favor of execution of a different task.
#pragma omp taskyield
master [2.12.1] [2.8.1]
Specifies a structured block that is executed by the master thread of the team.
#pragma omp master
structured-block
critical [2.12.2] [2.8.2]
Restricts execution of the associated structured block to a single thread at a time.
#pragma omp critical [(name)] structured-block
barrier [2.12.3] [2.8.3]
Specifies an explicit barrier at the point at which the construct appears.
#pragma omp barrier
taskwait [2.12.4] [2.8.4], 4.0
These constructs each specify a wait on the completion of child tasks of the current task. taskgroup also waits for descendant tasks.
#pragma omp taskwait #pragma omp taskgroup
structured-block
atomic [2.12.6] [2.8.5]
Ensures that a specific storage location is accessed atomically. [seq_cst] is 4.0 .
#pragma omp atomic [read | write | update | capture] [seq_cst]
atomic (continued)
and where structured-block may be one of the following
structured-block]
clause: Any of the clauses accepted by the parallel or
sections directives, except the nowait clause, with identical meanings and restrictions.
flush [2.12.7] [2.8.6]
Executes the OpenMP flush operation, which makes
a thread’s temporary view of memory consistent with memory, and enforces an order on the memory operations of the variables.
#pragma omp flush [(list)]
ordered [2.12.8] [2.8.7]
Specifies a structured block in a loop region that will be executed in the order of the loop iterations.
#pragma omp ordered
structured-block
4.0 cancel [2.13.1]
Requests cancellation of the innermost enclosing region of
the type specified. The cancel directive may not be used in place of the statement following an if, while, do, switch, or label.
#pragma omp cancel construct-type-clause[ [, ] if-clause] construct-type-clause:
parallel sections for taskgroup
if-clause: if(scalar-expression)
4.0 cancellation point [2.13.2]
Introduces a user-defined cancellation point at which tasks
check if cancellation of the innermost enclosing region of the type specified has been requested.
#pragma omp cancellation point construct-type-clause
construct-type-clause: parallel
sections for taskgroup
threadprivate [2.14.2] [2.9.2]
Specifies that variables are replicated, with each thread having its own copy. Each copy of a threadprivate variable is initialized once prior to the first reference to that copy.
#pragma omp threadprivate(list) list:
A comma-separated list of file-scope, namespace- scope, or static block-scope variables that do not have incomplete types.
4.0 declare reduction [2.15]
Declares a reduction-identifier that can be used in a reduction clause.
#pragma omp declare reduction(reduction-identifier : typename-list : combiner) [initializer-clause]
reduction-identifier: A base language identifier or one of the following operators: +, -, *, &, |, ^, && and || In C++, this may also be an operator-function-id
typename-list: A list of type names
combiner: An expression
initializer-clause: initializer ( omp_priv = initializer | function-name (argument-list ))
parallel loop simd [2.10.4]
Shortcut for specifying a parallel construct containing one loop SIMD construct and no other statements.
#pragma omp parallel for simd [clause[ [, ]clause] …] for-loops
clause: Any accepted by the parallel, for or simd directives, except the nowait clause, with identical meanings and restrictions.
4.0 target teams [2.10.5]
Shortcut for specifying a target construct containing a teams construct.
#pragma omp target teams [clause[ [, ]clause] …] structured-block
clause: See clause for target or teams
4.0 teams distribute [simd] [2.10.6, 2.10.7] Shortcuts for specifying a teams construct containing a distribute [simd] construct.
#pragma omp teams distribute [clause[ [, ]clause] …] for-loops
#pragma omp teams distribute simd [clause[ [, ]clause] …] for-loops
clause: Any clause used for teams or distribute [simd] 4.0 target teams distribute [simd] [2.10.8, 2.10.9]
Shortcuts for specifying a target construct containing a teams distribute [simd] construct.
#pragma omp target teams distribute [clause[ [, ]clause] …] for-loops
#pragma omp target teams distribute simd [clause[ [, ]clause] …] for-loops
clause: Any clause used for target or teams distribute [simd] 4.0 teams distribute parallel for [simd] [2.10.10, 12]
Shortcuts for specifying a teams construct containing a distribute parallel for [simd] construct.
#pragma omp teams distribute parallel for [clause[ [, ]clause] …] for-loops
#pragma omp teams distribute parallel for simd [clause[ [, ]clause] …] for-loops
clause: Any clause used for teams or distribute parallel for [simd]
4.0 target teams distribute parallel for [simd] [2.10.11, 13] Shortcut for specifying a target construct containing a teams distribute parallel for [simd] construct.
#pragma omp target teams distribute parallel for \
[clause[ [, ]clause] …] for-loops
#pragma omp target teams distribute parallel for simd \
[clause[ [, ]clause] …]
clause: Any clause used for target or teams distribute parallel for [simd]
expression-stmt
#pragma omp atomic capture [seq_cst] structured-block
where expression-stmt may be one of:
taskgroup [2.12.5]
{v = x; x binop= expr;} {v=x;x=xbinopexpr;} {x=xbinopexpr;v=x;} {v=x;x=expr;}
{v = x; ++x;} {x++; v = x;} {v = x; –x;} {x–; v = x;}
{x binop= expr; v = x;} {v=x;x=exprbinopx;} {x=exprbinopx;v=x;} {v = x; x++;}
{++x; v = x;} {v = x; x–;} {–x; v = x;}
if clause is…
expression-stmt:
is not present
x++; x–; ++x; –x; x binop= expr; x = x binop expr;
x = expr binop x;
v=x++; v=x–; v=++x; v= –x; v=xbinop=expr; v=x=xbinopexpr; v=x = expr binop x;
(Continued >)
© 2013 OpenMP ARB OMP1013C
OpenMP API 4.0 C/C++ Page 3
Runtime Library Routines Return types are shown in green. Execution environment routines affect and monitor threads, processors, and the parallel environment. The library routines are external functions with “C” linkage.
Execution Environment Routines
omp_set_num_threads [3.2.1] [3.2.1]
Affects the number of threads used for subsequent parallel regions not specifying a num_threads clause, by setting the value of the first element of the nthreads-var ICV of the current task to num_threads.
void omp_set_num_threads(int num_threads);
omp_get_num_threads [3.2.2] [3.2.2]
Returns the number of threads in the current team. The
binding region for an omp_get_num_threads region is the innermost enclosing parallel region.
int omp_get_num_threads(void);
omp_get_max_threads [3.2.3] [3.2.3]
Returns an upper bound on the number of threads that could be used to form a new team if a parallel construct without a num_threads clause were encountered after execution returns from this routine.
int omp_get_max_threads(void);
omp_get_thread_num [3.2.4] [3.2.4]
Returns the thread number of the calling thread within the current team.
int omp_get_thread_num(void);
omp_get_num_procs [3.2.5] [3.2.5]
Returns the number of processors that are available to the device at the time the routine is called.
int omp_get_num_procs(void);
omp_in_parallel [3.2.6] [3.2.6]
Returns true if the active-levels-var ICV is greater than zero; otherwise it returns false.
int omp_in_parallel(void);
omp_set_dynamic [3.2.7] [3.2.7]
Returns the value of the dyn-var ICV, which indicates if
dynamic adjustment of the number of threads is enabled or disabled.
void omp_set_dynamic(int dynamic_threads);
omp_get_dynamic [3.2.8] [3.2.8]
This routine returns the value of the dyn-var ICV, which
is true if dynamic adjustment of the number of threads is enabled for the current task.
int omp_get_dynamic(void);
4.0 omp_get_cancellation [3.2.9]
Returns the value of the cancel-var ICV, which controls the behavior of cancel construct and cancellation points. int omp_get_cancellation(void);
omp_set_nested [3.2.10] [3.2.9]
Enables or disables nested parallelism, by setting the nest-var ICV.
void omp_set_nested(int nested);
omp_get_nested [3.2.11] [3.2.10]
Returns the value of the nest-var ICV, which indicates if nested parallelism is enabled or disabled.
int omp_get_nested(void);
omp_set_schedule [3.2.12] [3.2.11]
Affects the schedule that is applied when runtime is used as schedule kind.
void omp_set_schedule(omp_sched_t kind, int modifier);
kind: one of the folowing, or an implementation-defined schedule:
omp_sched_static = 1 omp_sched_dynamic = 2 omp_sched_guided = 3 omp_sched_auto = 4
omp_get_schedule [3.2.13] [3.2.12]
Returns the value of run-sched-var ICV, which is the schedule applied when runtime schedule is used.
void omp_get_schedule( omp_sched_t *kind, int *modifier);
See kind above.
omp_get_thread_limit [3.2.14] [3.2.13]
Returns the value of the thread-limit-var ICV, which is the maximum number of OpenMP threads available.
int omp_get_thread_limit(void);
omp_set_max_active_levels [3.2.15] [3.2.14] Limits the number of nested active parallel regions, by setting max-active-levels-var ICV.
void omp_set_max_active_levels(int max_levels);
omp_get_max_active_levels [3.2.16] [3.2.15] Returns the value of max-active-levels-var
ICV, which determines the maximum number of nested active parallel regions.
int omp_get_max_active_levels(void);
omp_get_level [3.2.17] [3.2.16]
For the enclosing device region, returns the levels-vars
ICV, which is the number of nested parallel regions that enclose the task containing the call.
int omp_get_level(void); omp_get_ancestor_thread_num [3.2.18] [3.2.17]
Returns, for a given nested level of the current thread, the thread number of the ancestor of the current thread. int omp_get_ancestor_thread_num(int level);
omp_get_team_size [3.2.19] [3.2.18]
Returns, for a given nested level of the current thread,
the size of the thread team to which the ancestor or the current thread belongs.
int omp_get_team_size(int level);
omp_get_active_level [3.2.20] [3.2.19]
Returns the value of the active-level-vars ICV, which determines the number of active, nested parallel regions enclosing the task that contains the call.
int omp_get_active_level(void);
omp_in_final [3.2.21] [3.2.20]
Returns true if the routine is executed in a final task region; otherwise, it returns false.
int omp_in_final(void);
4.0 omp_get_proc_bind [3.2.22]
Returns the thread affinity policy to be used for the
subsequent nested parallel regions that do not specify a proc_bind clause.
omp_proc_bind_t omp_get_proc_bind(void); Returns one of:
omp_proc_bind_false =0
4.0 omp_get_num_devices [3.2.25] Returns the number of target devices.
int omp_get_num_devices(void);
4.0 omp_get_num_teams [3.2.26]
Returns the number of teams in the current teams region, or 1 if called from outside of a teams region.
int omp_get_num_teams(void);
4.0 omp_get_team_num [3.2.27]
Returns the team number of calling thread. The
team number is an integer between 0 and one less than the value returned by omp_get_num_teams, inclusive.
int omp_get_team_num(void);
4.0 omp_is_initial_device [3.2.28]
Returns true if the current task is executing on the host device; otherwise, it returns false. int omp_is_initial_device(void);
Lock Routines
General-purpose lock routines. Two types of locks are supported: simple locks and nestable locks. A nestable lock can be set multiple times by the same task before being unset; a simple lock cannot be set if it is already owned by the task trying to set it.
Initialize lock [3.3.1] [3.3.1]
Initialize an OpenMP lock.
void omp_init_lock(omp_lock_t *lock);
void omp_init_nest_lock(omp_nest_lock_t *lock);
Destroy lock [3.3.2] [3.3.2]
Ensure that the OpenMP lock is uninitialized.
void omp_destroy_lock(omp_lock_t *lock);
void omp_destroy_nest_lock(omp_nest_lock_t *lock);
Set lock [3.3.3] [3.3.3]
Sets an OpenMP lock. The calling task region is suspended until the lock is set.
void omp_set_lock(omp_lock_t *lock);
void omp_set_nest_lock(omp_nest_lock_t *lock);
Unset lock [3.3.4] [3.3.4]
Unsets an OpenMP lock.
void omp_unset_lock(omp_lock_t *lock);
void omp_unset_nest_lock(omp_nest_lock_t *lock);
Test lock [3.3.5] [3.3.5]
Attempt to set an OpenMP lock but do not suspend execution of the task executing the routine.
int omp_test_lock(omp_lock_t *lock);
int omp_test_nest_lock(omp_nest_lock_t *lock); Timing Routines
Timing routines support a portable wall clock timer.These record elapsed time per-thread and are not guaranteed to be globally consistent across all the threads part
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com