OpenMP Application Programming Interface
Version 5.0 November 2018
Copyright ⃝c 1997-2018 OpenMP Architecture Review Board.
Permission to copy without fee all or part of this material is granted, provided the OpenMP Architecture Review Board copyright notice and the title of this document appear. Notice is given that copying is by permission of the OpenMP Architecture Review Board.
This page intentionally left blank.
Contents
1 Introduction 1 1.1 Scope………………………………….. 1 1.2 Glossary………………………………… 2
1.2.1 ThreadingConcepts ………………………… 2
1.2.2 OpenMPLanguageTerminology ………………….. 2
1.2.3 LoopTerminology…………………………. 8
1.2.4 SynchronizationTerminology……………………. 9
1.2.5 TaskingTerminology ……………………….. 10
1.2.6 DataTerminology …………………………. 12
1.2.7 ImplementationTerminology ……………………. 17
1.2.8 ToolTerminology …………………………. 17
1.3 ExecutionModel…………………………….. 20 1.4 MemoryModel …………………………….. 23
1.4.1 StructureoftheOpenMPMemoryModel………………. 23
1.4.2 DeviceDataEnvironments …………………….. 24
1.4.3 MemoryManagement……………………….. 25
1.4.4 TheFlushOperation………………………… 25
1.4.5 FlushSynchronizationandHappensBefore. . . . . . . . . . . . . . . . . . 27
1.4.6 OpenMPMemoryConsistency …………………… 28
1.5 ToolInterfaces……………………………… 29 1.5.1 OMPT………………………………. 29 1.5.2 OMPD………………………………. 30
i
1.6 OpenMPCompliance ………………………….. 31 1.7 NormativeReferences ………………………….. 31 1.8 OrganizationofthisDocument ……………………… 34
2 Directives 37 2.1 DirectiveFormat…………………………….. 38
2.1.1 FixedSourceFormDirectives……………………. 41
2.1.2 FreeSourceFormDirectives ……………………. 41
2.1.3 Stand-AloneDirectives ………………………. 42
2.1.4 ArrayShaping…………………………… 43
2.1.5 ArraySections…………………………… 44
2.1.6 Iterators ……………………………… 47
2.2 ConditionalCompilation…………………………. 49
2.2.1 Fixed Source Form Conditional Compilation Sentinels . . . . . . . . . . . . 50
2.2.2 Free Source Form Conditional Compilation Sentinel . . . . . . . . . . . . . 50
2.3 VariantDirectives ……………………………. 51
2.3.1 OpenMPContext………………………….. 51
2.3.2 ContextSelectors …………………………. 53
2.3.3 MatchingandScoringContextSelectors ………………. 55
2.3.4 Metadirectives…………………………… 56
2.3.5 declarevariantDirective …………………… 58
2.4 requiresDirective ………………………….. 60 2.5 InternalControlVariables………………………… 63
2.5.1 ICVDescriptions …………………………. 64
2.5.2 ICVInitialization …………………………. 66
2.5.3 ModifyingandRetrievingICVValues ……………….. 68
2.5.4 HowICVsareScoped……………………….. 70
2.5.4.1 HowthePer-DataEnvironmentICVsWork . . . . . . . . . . . . . . . 72 2.5.5 ICVOverrideRelationships…………………….. 72 2.6 parallelConstruct………………………….. 74 2.6.1 Determining the Number of Threads for a parallel Region . . . . . . . . 78 2.6.2 ControllingOpenMPThreadAffinity………………… 80 2.7 teamsConstruct ……………………………. 82
ii OpenMP API – Version 5.0 November 2018
2.8 WorksharingConstructs…………………………. 86 2.8.1 sectionsConstruct……………………….. 86 2.8.2 singleConstruct…………………………. 89 2.8.3 workshareConstruct ………………………. 92
2.9 Loop-RelatedDirectives…………………………. 95 2.9.1 CanonicalLoopForm ……………………….. 95 2.9.2 Worksharing-LoopConstruct ……………………. 101
2.9.2.1 Determining the Schedule of a Worksharing-Loop . . . . . . . . . . . . 109 2.9.3 SIMDDirectives………………………….. 110 2.9.3.1 simdConstruct………………………… 110 2.9.3.2 Worksharing-LoopSIMDConstruct ………………. 114 2.9.3.3 declaresimdDirective …………………… 116 2.9.4 distributeLoopConstructs…………………… 120
2.9.4.1 distributeConstruct ……………………. 120
2.9.4.2 distributesimdConstruct…………………. 123
2.9.4.3 Distribute Parallel Worksharing-Loop Construct . . . . . . . . . . . . . 125
2.9.4.4 Distribute Parallel Worksharing-Loop SIMD Construct . . . . . . . . . 126
2.9.5 loopConstruct ………………………….. 128 2.9.6 scanDirective ………………………….. 132 2.10TaskingConstructs……………………………. 135 2.10.1 taskConstruct ………………………….. 135 2.10.2 taskloopConstruct……………………….. 140 2.10.3 taskloopsimdConstruct ……………………. 146 2.10.4 taskyieldConstruct ………………………. 147 2.10.5 InitialTask…………………………….. 148 2.10.6 TaskScheduling ………………………….. 149 2.11MemoryManagementDirectives …………………….. 152 2.11.1 MemorySpaces ………………………….. 152 2.11.2 MemoryAllocators ………………………… 152 2.11.3 allocateDirective ……………………….. 156 2.11.4 allocateClause ………………………… 158 2.12DeviceDirectives ……………………………. 160 2.12.1 DeviceInitialization………………………… 160
Contents iii
2.12.2 targetdataConstruct ……………………… 161 2.12.3 targetenterdataConstruct………………….. 164 2.12.4 targetexitdataConstruct ………………….. 166 2.12.5 targetConstruct…………………………. 170 2.12.6 targetupdateConstruct ……………………. 176 2.12.7 declaretargetDirective……………………. 180
2.13CombinedConstructs ………………………….. 185 2.13.1 ParallelWorksharing-LoopConstruct………………… 185 2.13.2 parallelloopConstruct ……………………. 186 2.13.3 parallelsectionsConstruct …………………. 188 2.13.4 parallelworkshareConstruct ………………… 189 2.13.5 ParallelWorksharing-LoopSIMDConstruct . . . . . . . . . . . . . . . . . 190 2.13.6 parallelmasterConstruct…………………… 191 2.13.7 mastertaskloopConstruct…………………… 192 2.13.8 mastertaskloopsimdConstruct ……………….. 194 2.13.9 parallelmastertaskloopConstruct …………….. 195 2.13.10 parallelmastertaskloopsimdConstruct. . . . . . . . . . . . . . 196 2.13.11teamsdistributeConstruct ………………….. 197 2.13.12teamsdistributesimdConstruct ………………. 198 2.13.13 Teams Distribute Parallel Worksharing-Loop Construct . . . . . . . . . . . 200 2.13.14 Teams Distribute Parallel Worksharing-Loop SIMD Construct . . . . . . . . 201 2.13.15teamsloopConstruct………………………. 202 2.13.16targetparallelConstruct…………………… 203 2.13.17 TargetParallelWorksharing-LoopConstruct . . . . . . . . . . . . . . . . . 205 2.13.18 Target Parallel Worksharing-Loop SIMD Construct . . . . . . . . . . . . . 206 2.13.19targetparallelloopConstruct ……………….. 208 2.13.20targetsimdConstruct ……………………… 209 2.13.21targetteamsConstruct …………………….. 210 2.13.22targetteamsdistributeConstruct……………… 211 2.13.23 targetteamsdistributesimdConstruct . . . . . . . . . . . . . . 213 2.13.24targetteamsloopConstruct………………….. 214 2.13.25 Target Teams Distribute Parallel Worksharing-Loop Construct . . . . . . . . 215 2.13.26 Target Teams Distribute Parallel Worksharing-Loop SIMD Construct . . . . 216
iv OpenMP API – Version 5.0 November 2018
2.14 ClausesonCombinedandCompositeConstructs . . . . . . . . . . . . . . . . . . 218 2.15ifClause ……………………………….. 220 2.16masterConstruct……………………………. 221 2.17SynchronizationConstructsandClauses …………………. 223
2.17.1 criticalConstruct……………………….. 223 2.17.2 barrierConstruct………………………… 226 2.17.3 ImplicitBarriers………………………….. 228 2.17.4 Implementation-SpecificBarriers………………….. 230 2.17.5 taskwaitConstruct……………………….. 230 2.17.6 taskgroupConstruct ………………………. 232 2.17.7 atomicConstruct…………………………. 234 2.17.8 flushConstruct …………………………. 242
2.17.8.1 ImplicitFlushes………………………… 246 2.17.9 orderedConstruct………………………… 250 2.17.10DependObjects ………………………….. 254
2.17.10.1depobjConstruct ………………………. 254 2.17.11dependClause ………………………….. 255 2.17.12SynchronizationHints……………………….. 260
2.18CancellationConstructs …………………………. 263 2.18.1 cancelConstruct…………………………. 263 2.18.2 cancellationpointConstruct ………………… 267
2.19DataEnvironment ……………………………. 269 2.19.1 Data-SharingAttributeRules ……………………. 269 2.19.1.1 VariablesReferencedinaConstruct ………………. 270 2.19.1.2 Variables Referenced in a Region but not in a Construct . . . . . . . . . 273 2.19.2 threadprivateDirective ……………………. 274 2.19.3 ListItemPrivatization……………………….. 279 2.19.4 Data-SharingAttributeClauses …………………… 282 2.19.4.1 defaultClause ……………………….. 282 2.19.4.2 sharedClause………………………… 283 2.19.4.3 privateClause ……………………….. 285 2.19.4.4 firstprivateClause ……………………. 286 2.19.4.5 lastprivateClause…………………….. 288
Contents v
2.19.4.6 linearClause………………………… 290 2.19.5 ReductionClausesandDirectives………………….. 293 2.19.5.1 PropertiesCommonToAllReductionClauses . . . . . . . . . . . . . . 294 2.19.5.2 ReductionScopingClauses…………………… 299 2.19.5.3 ReductionParticipatingClauses ………………… 300 2.19.5.4 reductionClause ……………………… 300 2.19.5.5 task_reductionClause ………………….. 303 2.19.5.6 in_reductionClause ……………………. 303 2.19.5.7 declarereductionDirective ……………….. 304 2.19.6 DataCopyingClauses……………………….. 309 2.19.6.1 copyinClause………………………… 310 2.19.6.2 copyprivateClause…………………….. 312 2.19.7 Data-Mapping Attribute Rules, Clauses, and Directives . . . . . . . . . . . 314 2.19.7.1 mapClause ………………………….. 315 2.19.7.2 defaultmapClause……………………… 324 2.19.7.3 declaremapperDirective………………….. 326 2.20NestingofRegions……………………………. 328
3 Runtime Library Routines 331 3.1 RuntimeLibraryDefinitions……………………….. 332 3.2 ExecutionEnvironmentRoutines …………………….. 334
3.2.1 omp_set_num_threads…………………….. 334 3.2.2 omp_get_num_threads…………………….. 335 3.2.3 omp_get_max_threads…………………….. 336 3.2.4 omp_get_thread_num …………………….. 337 3.2.5 omp_get_num_procs ……………………… 338 3.2.6 omp_in_parallel……………………….. 339 3.2.7 omp_set_dynamic……………………….. 340 3.2.8 omp_get_dynamic……………………….. 341 3.2.9 omp_get_cancellation……………………. 342 3.2.10 omp_set_nested………………………… 343 3.2.11 omp_get_nested………………………… 344 3.2.12 omp_set_schedule ………………………. 345 3.2.13 omp_get_schedule ………………………. 347
vi OpenMP API – Version 5.0 November 2018
3.2.14 omp_get_thread_limit……………………. 348 3.2.15 omp_get_supported_active_levels . . . . . . . . . . . . . . . . 349 3.2.16 omp_set_max_active_levels………………… 350 3.2.17 omp_get_max_active_levels………………… 351 3.2.18 omp_get_level ………………………… 352 3.2.19 omp_get_ancestor_thread_num ………………. 353 3.2.20 omp_get_team_size ……………………… 354 3.2.21 omp_get_active_level……………………. 355 3.2.22 omp_in_final …………………………. 356 3.2.23 omp_get_proc_bind ……………………… 357 3.2.24 omp_get_num_places …………………….. 358 3.2.25 omp_get_place_num_procs …………………. 359 3.2.26 omp_get_place_proc_ids ………………….. 360 3.2.27 omp_get_place_num ……………………… 362 3.2.28 omp_get_partition_num_places ……………… 362 3.2.29 omp_get_partition_place_nums ……………… 363 3.2.30 omp_set_affinity_format …………………. 364 3.2.31 omp_get_affinity_format …………………. 366 3.2.32 omp_display_affinity……………………. 367 3.2.33 omp_capture_affinity……………………. 368 3.2.34 omp_set_default_device ………………….. 369 3.2.35 omp_get_default_device ………………….. 370 3.2.36 omp_get_num_devices…………………….. 371 3.2.37 omp_get_device_num …………………….. 372 3.2.38 omp_get_num_teams ……………………… 373 3.2.39 omp_get_team_num ………………………. 374 3.2.40 omp_is_initial_device …………………… 375 3.2.41 omp_get_initial_device ………………….. 376 3.2.42 omp_get_max_task_priority………………… 377 3.2.43 omp_pause_resource …………………….. 378 3.2.44 omp_pause_resource_all ………………….. 380
3.3 LockRoutines ……………………………… 381 3.3.1 omp_init_lock and omp_init_nest_lock . . . . . . . . . . . . . 384
Contents vii
3.3.2 omp_init_lock_with_hint and
omp_init_nest_lock_with_hint ……………… 385
3.3.3 omp_destroy_lock and omp_destroy_nest_lock . . . . . . . . . 387
3.3.4 omp_set_lockandomp_set_nest_lock . . . . . . . . . . . . . . . 388
3.3.5 omp_unset_lock and omp_unset_nest_lock . . . . . . . . . . . . 390
3.3.6 omp_test_lock and omp_test_nest_lock . . . . . . . . . . . . . 392
3.4 TimingRoutines…………………………….. 394 3.4.1 omp_get_wtime ………………………… 394 3.4.2 omp_get_wtick ………………………… 395
3.5 EventRoutine ……………………………… 396 3.5.1 omp_fulfill_event ……………………… 396 3.6 DeviceMemoryRoutines ………………………… 397 3.6.1 omp_target_alloc ………………………. 397 3.6.2 omp_target_free……………………….. 399 3.6.3 omp_target_is_present …………………… 400 3.6.4 omp_target_memcpy ……………………… 400 3.6.5 omp_target_memcpy_rect ………………….. 402 3.6.6 omp_target_associate_ptr…………………. 403 3.6.7 omp_target_disassociate_ptr ………………. 405 3.7 MemoryManagementRoutines……………………… 406 3.7.1 MemoryManagementTypes ……………………. 406 3.7.2 omp_init_allocator …………………….. 409 3.7.3 omp_destroy_allocator …………………… 410 3.7.4 omp_set_default_allocator………………… 411 3.7.5 omp_get_default_allocator………………… 412 3.7.6 omp_alloc……………………………. 413 3.7.7 omp_free ……………………………. 414 3.8 ToolControlRoutine…………………………… 415
4 OMPT Interface 419 4.1 OMPTInterfacesDefinitions ………………………. 419 4.2 ActivatingaFirst-PartyTool……………………….. 420
4.2.1 ompt_start_tool……………………….. 420 4.2.2 Determining Whether a First-Party Tool Should be Initialized . . . . . . . . 421
viii OpenMP API – Version 5.0 November 2018
4.2.3 InitializingaFirst-PartyTool ……………………. 423 4.2.3.1 Binding Entry Points in the OMPT Callback Interface . . . . . . . . . . 424 4.2.4 MonitoringActivityontheHostwithOMPT . . . . . . . . . . . . . . . . . 425 4.2.5 TracingActivityonTargetDeviceswithOMPT . . . . . . . . . . . . . . . 427 4.3 FinalizingaFirst-PartyTool……………………….. 432 4.4 OMPTDataTypes……………………………. 433 4.4.1 ToolInitializationandFinalization …………………. 433 4.4.2 Callbacks……………………………… 434 4.4.3 Tracing………………………………. 435 4.4.3.1 RecordType………………………….. 435 4.4.3.2 NativeRecordKind………………………. 435 4.4.3.3 NativeRecordAbstractType ………………….. 436 4.4.3.4 RecordType………………………….. 436 4.4.4 MiscellaneousTypeDefinitions…………………… 438 4.4.4.1 ompt_callback_t……………………… 438 4.4.4.2 ompt_set_result_t……………………. 438 4.4.4.3 ompt_id_t …………………………. 439 4.4.4.4 ompt_data_t………………………… 440 4.4.4.5 ompt_device_t ………………………. 441 4.4.4.6 ompt_device_time_t …………………… 441 4.4.4.7 ompt_buffer_t ………………………. 441 4.4.4.8 ompt_buffer_cursor_t………………….. 442 4.4.4.9 ompt_dependence_t……………………. 442 4.4.4.10 ompt_thread_t ………………………. 443 4.4.4.11 ompt_scope_endpoint_t…………………. 443 4.4.4.12 ompt_dispatch_t……………………… 444 4.4.4.13 ompt_sync_region_t …………………… 444 4.4.4.14 ompt_target_data_op_t…………………. 444 4.4.4.15 ompt_work_t………………………… 445 4.4.4.16 ompt_mutex_t……………………….. 445 4.4.4.17 ompt_native_mon_flag_t………………… 446 4.4.4.18 ompt_task_flag_t…………………….. 446 4.4.4.19 ompt_task_status_t …………………… 447
Contents ix
4.4.4.20 ompt_target_t ………………………. 448 4.4.4.21 ompt_parallel_flag_t………………….. 448 4.4.4.22 ompt_target_map_flag_t………………… 449 4.4.4.23 ompt_dependence_type_t………………… 450 4.4.4.24 ompt_cancel_flag_t …………………… 450 4.4.4.25 ompt_hwid_t………………………… 451 4.4.4.26 ompt_state_t……………………….. 452 4.4.4.27 ompt_frame_t……………………….. 454 4.4.4.28 ompt_frame_flag_t……………………. 455 4.4.4.29 ompt_wait_id_t ……………………… 456
4.5 OMPTToolCallbackSignaturesandTraceRecords . . . . . . . . . . . . . . . . 457 4.5.1 Initialization and Finalization Callback Signature . . . . . . . . . . . . . . . 457 4.5.1.1 ompt_initialize_t……………………. 457 4.5.1.2 ompt_finalize_t……………………… 458 4.5.2 EventCallbackSignaturesandTraceRecords. . . . . . . . . . . . . . . . . 459 4.5.2.1 ompt_callback_thread_begin_t ……………. 459 4.5.2.2 ompt_callback_thread_end_t……………… 460 4.5.2.3 ompt_callback_parallel_begin_t . . . . . . . . . . . . . . . 461 4.5.2.4 ompt_callback_parallel_end_t ……………. 463 4.5.2.5 ompt_callback_work_t………………….. 464 4.5.2.6 ompt_callback_dispatch_t ………………. 465 4.5.2.7 ompt_callback_task_create_t…………….. 467 4.5.2.8 ompt_callback_dependences_t…………….. 468 4.5.2.9 ompt_callback_task_dependence_t . . . . . . . . . . . . . . 470 4.5.2.10 ompt_callback_task_schedule_t . . . . . . . . . . . . . . . 470 4.5.2.11 ompt_callback_implicit_task_t . . . . . . . . . . . . . . . 471 4.5.2.12 ompt_callback_master_t………………… 473 4.5.2.13 ompt_callback_sync_region_t…………….. 474 4.5.2.14 ompt_callback_mutex_acquire_t . . . . . . . . . . . . . . . 476 4.5.2.15 ompt_callback_mutex_t…………………. 477 4.5.2.16 ompt_callback_nest_lock_t………………. 479 4.5.2.17 ompt_callback_flush_t…………………. 480 4.5.2.18 ompt_callback_cancel_t………………… 481
x OpenMP API – Version 5.0 November 2018
4.5.2.19 ompt_callback_device_initialize_t . . . . . . . . . . . . 482 4.5.2.20 ompt_callback_device_finalize_t . . . . . . . . . . . . . . 484 4.5.2.21 ompt_callback_device_load_t…………….. 484 4.5.2.22 ompt_callback_device_unload_t . . . . . . . . . . . . . . . 486 4.5.2.23 ompt_callback_buffer_request_t . . . . . . . . . . . . . . . 486 4.5.2.24 ompt_callback_buffer_complete_t . . . . . . . . . . . . . . 487 4.5.2.25 ompt_callback_target_data_op_t . . . . . . . . . . . . . . . 488 4.5.2.26 ompt_callback_target_t………………… 490 4.5.2.27 ompt_callback_target_map_t……………… 492 4.5.2.28 ompt_callback_target_submit_t . . . . . . . . . . . . . . . 494 4.5.2.29 ompt_callback_control_tool_t . . . . . . . . . . . . . . . . 495
4.6 OMPTRuntimeEntryPointsforTools ………………….. 497 4.6.1 EntryPointsintheOMPTCallbackInterface . . . . . . . . . . . . . . . . . 497 4.6.1.1 ompt_enumerate_states_t ……………….. 498 4.6.1.2 ompt_enumerate_mutex_impls_t ……………. 499 4.6.1.3 ompt_set_callback_t ………………….. 500 4.6.1.4 ompt_get_callback_t ………………….. 502 4.6.1.5 ompt_get_thread_data_t………………… 503 4.6.1.6 ompt_get_num_procs_t………………….. 503 4.6.1.7 ompt_get_num_places_t…………………. 504 4.6.1.8 ompt_get_place_proc_ids_t………………. 505 4.6.1.9 ompt_get_place_num_t………………….. 506 4.6.1.10 ompt_get_partition_place_nums_t . . . . . . . . . . . . . . 507 4.6.1.11 ompt_get_proc_id_t …………………… 508 4.6.1.12 ompt_get_state_t…………………….. 508 4.6.1.13 ompt_get_parallel_info_t ………………. 510 4.6.1.14 ompt_get_task_info_t………………….. 512 4.6.1.15 ompt_get_task_memory_t………………… 514 4.6.1.16 ompt_get_target_info_t………………… 515 4.6.1.17 ompt_get_num_devices_t………………… 516 4.6.1.18 ompt_get_unique_id_t………………….. 517 4.6.1.19 ompt_finalize_tool_t………………….. 517
Contents xi
4.6.2 EntryPointsintheOMPTDeviceTracingInterface . . . . . . . . . . . . . 518 4.6.2.1 ompt_get_device_num_procs_t…………….. 518 4.6.2.2 ompt_get_device_time_t………………… 519 4.6.2.3 ompt_translate_time_t…………………. 520 4.6.2.4 ompt_set_trace_ompt_t…………………. 521 4.6.2.5 ompt_set_trace_native_t ……………….. 522 4.6.2.6 ompt_start_trace_t …………………… 523 4.6.2.7 ompt_pause_trace_t …………………… 524 4.6.2.8 ompt_flush_trace_t …………………… 525 4.6.2.9 ompt_stop_trace_t……………………. 526 4.6.2.10 ompt_advance_buffer_cursor_t . . . . . . . . . . . . . . . . 527 4.6.2.11 ompt_get_record_type_t………………… 528 4.6.2.12 ompt_get_record_ompt_t………………… 529 4.6.2.13 ompt_get_record_native_t ………………. 530 4.6.2.14 ompt_get_record_abstract_t……………… 531
4.6.3 Lookup Entry Points: ompt_function_lookup_t . . . . . . . . . . . 531
5 OMPD Interface 533 5.1 OMPDInterfacesDefinitions ………………………. 534 5.2 ActivatinganOMPDTool………………………… 534
5.2.1 EnablingtheRuntimeforOMPD ………………….. 534 5.2.2 ompd_dll_locations …………………….. 535 5.2.3 ompd_dll_locations_valid…………………. 536
5.3 OMPDDataTypes……………………………. 536 5.3.1 SizeType……………………………… 536 5.3.2 WaitIDType……………………………. 537 5.3.3 BasicValueTypes …………………………. 537 5.3.4 AddressType……………………………. 538 5.3.5 FrameInformationType………………………. 538 5.3.6 SystemDeviceIdentifiers ……………………… 539 5.3.7 NativeThreadIdentifiers………………………. 539 5.3.8 OMPDHandleTypes ……………………….. 540 5.3.9 OMPDScopeTypes………………………… 541 5.3.10 ICVIDType……………………………. 542
xii OpenMP API – Version 5.0 November 2018
5.3.11 ToolContextTypes ………………………… 542 5.3.12 ReturnCodeTypes…………………………. 543 5.3.13 PrimitiveTypeSizes………………………… 544
5.4 OMPDToolCallbackInterface ……………………… 545 5.4.1 MemoryManagementofOMPDLibrary………………. 545 5.4.1.1 ompd_callback_memory_alloc_fn_t . . . . . . . . . . . . . . 546 5.4.1.2 ompd_callback_memory_free_fn_t . . . . . . . . . . . . . . . 546 5.4.2 ContextManagementandNavigation ………………… 547
5.4.2.1 ompd_callback_get_thread_context_for_thread_id _fn_t…………………………….. 547
5.4.2.2 ompd_callback_sizeof_fn_t………………. 549
5.4.3 AccessingMemoryintheOpenMPProgramorRuntime . . . . . . . . . . . 549 5.4.3.1 ompd_callback_symbol_addr_fn_t . . . . . . . . . . . . . . . 550 5.4.3.2 ompd_callback_memory_read_fn_t . . . . . . . . . . . . . . . 551 5.4.3.3 ompd_callback_memory_write_fn_t . . . . . . . . . . . . . . 553
5.4.4 Data Format Conversion: ompd_callback_device_host_fn_t . . . 554
5.4.5 Output: ompd_callback_print_string_fn_t . . . . . . . . . . . . 556
5.4.6 TheCallbackInterface……………………….. 556
5.5 OMPDToolInterfaceRoutines ……………………… 558 5.5.1 Per OMPD Library Initialization and Finalization . . . . . . . . . . . . . . 558 5.5.1.1 ompd_initialize……………………… 558 5.5.1.2 ompd_get_api_version………………….. 559 5.5.1.3 ompd_get_version_string ……………….. 560 5.5.1.4 ompd_finalize ………………………. 561 5.5.2 Per OpenMP Process Initialization and Finalization . . . . . . . . . . . . . 562 5.5.2.1 ompd_process_initialize ……………….. 562 5.5.2.2 ompd_device_initialize………………… 563 5.5.2.3 ompd_rel_address_space_handle . . . . . . . . . . . . . . . 564 5.5.3 ThreadandSignalSafety ……………………… 565 5.5.4 AddressSpaceInformation …………………….. 565 5.5.4.1 ompd_get_omp_version………………….. 565 5.5.4.2 ompd_get_omp_version_string…………….. 566
Contents xiii
5.5.5 ThreadHandles ………………………….. 567 5.5.5.1 ompd_get_thread_in_parallel…………….. 567 5.5.5.2 ompd_get_thread_handle………………… 568 5.5.5.3 ompd_rel_thread_handle………………… 569 5.5.5.4 ompd_thread_handle_compare……………… 570 5.5.5.5 ompd_get_thread_id …………………… 570
5.5.6 ParallelRegionHandles………………………. 571 5.5.6.1 ompd_get_curr_parallel_handle . . . . . . . . . . . . . . . 571 5.5.6.2 ompd_get_enclosing_parallel_handle . . . . . . . . . . . 572 5.5.6.3 ompd_get_task_parallel_handle . . . . . . . . . . . . . . . 573 5.5.6.4 ompd_rel_parallel_handle ………………. 574 5.5.6.5 ompd_parallel_handle_compare ……………. 575
5.5.7 TaskHandles……………………………. 576 5.5.7.1 ompd_get_curr_task_handle………………. 576 5.5.7.2 ompd_get_generating_task_handle . . . . . . . . . . . . . . 577 5.5.7.3 ompd_get_scheduling_task_handle . . . . . . . . . . . . . . 578 5.5.7.4 ompd_get_task_in_parallel………………. 579 5.5.7.5 ompd_rel_task_handle………………….. 580 5.5.7.6 ompd_task_handle_compare ………………. 580 5.5.7.7 ompd_get_task_function………………… 581 5.5.7.8 ompd_get_task_frame ………………….. 582 5.5.7.9 ompd_enumerate_states…………………. 583 5.5.7.10 ompd_get_state ……………………… 585
5.5.8 DisplayControlVariables ……………………… 586 5.5.8.1 ompd_get_display_control_vars . . . . . . . . . . . . . . . 586 5.5.8.2 ompd_rel_display_control_vars . . . . . . . . . . . . . . . 587
5.5.9 AccessingScope-SpecificInformation ……………….. 588 5.5.9.1 ompd_enumerate_icvs ………………….. 588 5.5.9.2 ompd_get_icv_from_scope ……………….. 590 5.5.9.3 ompd_get_icv_string_from_scope . . . . . . . . . . . . . . . 591 5.5.9.4 ompd_get_tool_data …………………… 592
5.6 RuntimeEntryPointsforOMPD …………………….. 594 5.6.1 BeginningParallelRegions …………………….. 594
xiv OpenMP API – Version 5.0 November 2018
5.6.2 EndingParallelRegions………………………. 595 5.6.3 BeginningTaskRegions………………………. 595 5.6.4 EndingTaskRegions ……………………….. 596 5.6.5 BeginningOpenMPThreads…………………….. 597 5.6.6 EndingOpenMPThreads ……………………… 597 5.6.7 InitializingOpenMPDevices ……………………. 598 5.6.8 FinalizingOpenMPDevices…………………….. 599
6 Environment Variables 601 6.1 OMP_SCHEDULE ……………………………. 601 6.2 OMP_NUM_THREADS………………………….. 602 6.3 OMP_DYNAMIC …………………………….. 603 6.4 OMP_PROC_BIND …………………………… 604 6.5 OMP_PLACES……………………………… 605 6.6 OMP_STACKSIZE …………………………… 607 6.7 OMP_WAIT_POLICY………………………….. 608 6.8 OMP_MAX_ACTIVE_LEVELS ……………………… 608 6.9 OMP_NESTED……………………………… 609 6.10OMP_THREAD_LIMIT …………………………. 610 6.11OMP_CANCELLATION …………………………. 610 6.12OMP_DISPLAY_ENV………………………….. 611 6.13OMP_DISPLAY_AFFINITY………………………. 612 6.14OMP_AFFINITY_FORMAT……………………….. 613 6.15OMP_DEFAULT_DEVICE ……………………….. 615 6.16OMP_MAX_TASK_PRIORITY ……………………… 615 6.17OMP_TARGET_OFFLOAD ……………………….. 615 6.18OMP_TOOL ………………………………. 616 6.19OMP_TOOL_LIBRARIES ……………………….. 617 6.20OMP_DEBUG………………………………. 617 6.21OMP_ALLOCATOR …………………………… 618
A OpenMP Implementation-Defined Behaviors 619 B Features History 627
B.1 DeprecatedFeatures…………………………… 627
Contents xv
B.2 Version4.5to5.0Differences………………………. 627 B.3 Version4.0to4.5Differences………………………. 631 B.4 Version3.1to4.0Differences………………………. 633 B.5 Version3.0to3.1Differences………………………. 634 B.6 Version2.5to3.0Differences………………………. 635
Index 639
xvi OpenMP API – Version 5.0 November 2018
List of Figures
2.1 DeterminingthescheduleforaWorksharing-Loop . . . . . . . . . . . . . . . . 109 4.1 First-PartyToolActivationFlowChart…………………… 422
xvii
List of Tables
1.1 Map-TypeDecayofMapTypeCombinations ……………….. 16
2.1 ICVInitialValues…………………………….. 66 2.2 WaystoModifyandtoRetrieveICVValues………………… 68 2.3 ScopesofICVs ……………………………… 70 2.4 ICVOverrideRelationships ……………………….. 72 2.5 scheduleClausekindValues ……………………… 104 2.6 scheduleClausemodifierValues ……………………. 106 2.7 ompt_callback_task_create callback flags evaluation . . . . . . . . . . . 139 2.8 PredefinedMemorySpaces………………………… 152 2.9 AllocatorTraits ……………………………… 153 2.10PredefinedAllocators…………………………… 155
2.11 ImplicitlyDeclaredC/C++reduction-identifiers . . . . . . . . . . . . . . . . . . . 294
2.12 ImplicitlyDeclaredFortranreduction-identifiers. . . . . . . . . . . . . . . . . . . 295
3.1 StandardToolControlCommands …………………….. 417
4.1 OMPT Callback Interface Runtime Entry Point Names and Their Type Signatures . 426
4.2 Valid Return Codes of ompt_set_callback for Each Callback . . . . . . . . . 428
4.3 OMPT Tracing Interface Runtime Entry Point Names and Their Type Signatures . . 430
5.1 MappingofScopeTypeandOMPDHandles ……………….. 542
5.2 OMPD-specificICVs…………………………… 589
6.1 DefinedAbstractNamesforOMP_PLACES ………………… 605
6.2 Available Field Types for Formatting OpenMP Thread Affinity Information . . . . 613
xviii
CHAPTER 1
1 2
3 4 5
6 7 8
9
10
11
12
13
14
15
16
17
18 1.1
19
20
21
22
23
24
Introduction
The collection of compiler directives, library routines, and environment variables described in this document collectively define the specification of the OpenMP Application Program Interface (OpenMP API) for parallelism in C, C++ and Fortran programs.
This specification provides a model for parallel programming that is portable across architectures from different vendors. Compilers from numerous vendors support the OpenMP API. More information about the OpenMP API can be found at the following web site
The directives, library routines, environment variables, and tool support defined in this document allow users to create, to manage, to debug and to analyze parallel programs while permitting portability. The directives extend the C, C++ and Fortran base languages with single program multiple data (SPMD) constructs, tasking constructs, device constructs, worksharing constructs,and synchronization constructs, and they provide support for sharing, mapping and privatizing data. The functionality to control the runtime environment is provided by library routines and environment variables. Compilers that support the OpenMP API often include a command line option to the compiler that activates and allows interpretation of all OpenMP directives.
Scope
The OpenMP API covers only user-directed parallelization, wherein the programmer explicitly specifies the actions to be taken by the compiler and runtime system in order to execute the program in parallel. OpenMP-compliant implementations are not required to check for data dependencies, data conflicts, race conditions, or deadlocks, any of which may occur in conforming programs. In addition, compliant implementations are not required to check for code sequences that cause a program to be classified as non-conforming. Application developers are responsible for correctly
1
1 2
3 4
5 6
7
8
9 10
11
12 13
14 15
16
17
18
19
20 21
22
23
24 25
1.2 1.2.1
using the OpenMP API to produce a conforming program. The OpenMP API does not cover compiler-generated automatic parallelization.
Glossary Threading Concepts
1.2.2
2
OpenMP Language Terminology
OpenMP thread thread number
idle thread thread-safe routine
processor device
host device target device parent device
base language
A programming language that serves as the foundation of the OpenMP specification. COMMENT: See Section 1.7 on page 31 for a listing of current base
thread
An execution entity with a stack and associated static memory, called threadprivate memory.
A thread that is managed by the OpenMP implementation.
A number that the OpenMP implementation assigns to an OpenMP thread. For
threads within the same team, zero identifies the master thread and consecutive numbers identify the other threads of this team.
An OpenMP thread that is not currently part of any parallel region.
A routine that performs the intended function even when executed concurrently (by
more than one thread).
Implementation-defined hardware unit on which one or more OpenMP threads can
execute.
An implementation-defined logical execution engine.
COMMENT: A device could have one or more processors.
The device on which the OpenMP program begins execution.
A device onto which code and data may be offloaded from the host device.
For a given target region, the device on which the corresponding target construct was encountered.
languages for the OpenMP API. OpenMP API – Version 5.0 November 2018
1
2 3
4 5 6
7 8
9 10
11 12
13
14
15
16
17 18
19 20
21 22
23 24
25 26
27 28 29
30 31
32 33
base program program order
structured block
compilation unit
enclosing context
directive
metadirective white space OpenMP program
conforming program declarative directive
executable directive stand-alone directive
A program written in a base language.
An ordering of operations performed by the same thread as determined by the
execution sequence of operations specified by the base language.
COMMENT: For C11 and C++11, program order corresponds to the sequenced before relation between operations performed by the same thread.
For C/C++, an executable statement, possibly compound, with a single entry at the top and a single exit at the bottom, or an OpenMP construct.
For Fortran, a block of executable statements with a single entry at the top and a single exit at the bottom, or an OpenMP construct.
COMMENT: See Section 2.1 on page 38 for restrictions on structured blocks.
For C/C++, a translation unit.
For Fortran, a program unit.
For C/C++, the innermost scope enclosing an OpenMP directive.
For Fortran, the innermost scoping unit enclosing an OpenMP directive.
For C/C++, a #pragma, and for Fortran, a comment, that specifies OpenMP program behavior.
COMMENT: See Section 2.1 on page 38 for a description of OpenMP directive syntax.
A directive that conditionally resolves to another directive at compile time. A non-empty sequence of space and/or horizontal tab characters.
A program that consists of a base program that is annotated with OpenMP directives or that calls OpenMP API runtime library routines
An OpenMP program that follows all rules and restrictions of the OpenMP specification.
An OpenMP directive that may only be placed in a declarative context. A declarative directive results in one or more declarations only; it is not associated with the immediate execution of any user code.
An OpenMP directive that is not declarative. That is, it may be placed in an executable context.
An OpenMP executable directive that has no associated user code except for that which appears in clauses in the directive.
CHAPTER1. INTRODUCTION 3
1 construct 2
3
4 combined construct 5
6
7
8 composite construct
9 10 11
12 combined target
13 construct
An OpenMP executable directive (and for Fortran, the paired end directive, if any) and the associated statement, loop or structured block, if any, not including the code in any called routines. That is, the lexical extent of an executable directive.
A construct that is a shortcut for specifying one construct immediately nested inside another construct. A combined construct is semantically identical to that of explicitly specifying the first construct containing one instance of the second construct and no other statements.
A construct that is composed of two constructs but does not have identical semantics to specifying one of the constructs immediately nested inside the other. A composite construct either adds semantics not included in the constructs from which it is composed or the nesting of the one construct inside the other is not conforming.
A combined construct that is composed of a target construct along with another construct.
All code encountered during a specific instance of the execution of a given construct or of an OpenMP library routine. A region includes any code in called routines as well as any implicit code introduced by the OpenMP implementation. The generation of a task at the point where a task generating construct is encountered is a part of the region of the encountering thread. However, an explicit task region corresponding to a task generating construct is not part of the region of the encountering thread unless it is an included task region. The point where a target or teams directive is encountered is a part of the region of the encountering thread, but the region corresponding to the target or teams directive is not.
COMMENTS:
A region may also be thought of as the dynamic or runtime extent of a
construct or of an OpenMP library routine.
During the execution of an OpenMP program, a construct may give rise to
many regions.
A parallel region that is executed by a team consisting of more than one thread.
A parallel region that is executed by a team of only one thread.
A target region that is executed on a device other than the device that encountered
the target construct.
A target region that is executed on the same device that encountered the target
construct.
14
15
16
17
18
19
20
21
22
23
24 25
26 27
28 active parallel
29 inactive parallel
30 active target
31
32 inactive target 33
region
region
region region
region
4 OpenMP API – Version 5.0 November 2018
1 2 3
4 5
6 7 8
9 10 11 12
13 14 15 16
17 18 19 20
21
22 23
24 25
26 27
28
29
30
31 32 33
34
sequential part
All code encountered during the execution of an initial task region that is not part of a parallel region corresponding to a parallel construct or a task region corresponding to a task construct.
COMMENTS:
A sequential part is enclosed by an implicit parallel region.
Executable statements in called routines may be in both a sequential part and any number of explicit parallel regions at different points in the program execution.
An OpenMP thread that has thread number 0. A master thread may be an initial thread or the thread that encounters a parallel construct, creates a team, generates a set of implicit tasks, and then executes one of those tasks as thread number 0.
The thread that encountered the parallel construct and generated a parallel region is the parent thread of each of the threads in the team of that parallel region. The master thread of a parallel region is the same thread as its parent thread with respect to any resources associated with an OpenMP thread.
When a thread encounters a parallel construct, each of the threads in the generated parallel region’s team are child threads of the encountering thread. The target or teams region’s initial thread is not a child thread of the thread that encountered the target or teams construct.
For a given thread, its parent thread or one of its parent thread’s ancestor threads. For a given thread, one of its child threads or one of its child threads’ descendent
threads.
A set of one or more threads participating in the execution of a parallel region.
COMMENTS:
For an active parallel region, the team comprises the master thread and at
least one additional thread.
For an inactive parallel region, the team comprises only the master thread. The set of teams created by a teams construct.
An initial thread and its descendent threads.
An inactive parallel region that is not generated from a parallel construct. Implicit parallel regions surround the whole OpenMP program, all target regions, and all teams regions.
The thread that executes an implicit parallel region.
master thread
parent thread
child thread
ancestor thread descendent thread
team
league contention group implicit parallel region
initial thread
CHAPTER1. INTRODUCTION 5
1 2
3 4
5 6
7 8
9 10
11
12
13
14
15
16 17 18
19
20 21
22 23 24
25 26
27 28
29 30 31
32 33
initial team nested construct closely nested construct
nested region
closely nested region
strictly nested region all threads current team encountering thread all tasks current team tasks
generating task binding thread set
binding task set
The team that comprises an initial thread executing an implicit parallel region. A construct (lexically) enclosed by another construct.
A construct nested inside another construct with no other construct nested between them.
A region (dynamically) enclosed by another region. That is, a region generated from the execution of another region or one of its nested regions.
COMMENT: Some nestings are conforming and some are not. See Section 2.20 on page 328 for the restrictions on nesting.
A region nested inside another region with no parallel region nested between them.
A region nested inside another region with no other region nested between them. All OpenMP threads participating in the OpenMP program.
All threads in the team executing the innermost enclosing parallel region. For a given region, the thread that encounters the corresponding construct.
All tasks participating in the OpenMP program.
All tasks encountered by the corresponding team. The implicit tasks constituting the parallel region and any descendent tasks encountered during the execution of these implicit tasks are included in this set of tasks.
For a given region, the task for which execution by a thread generated the region. The set of threads that are affected by, or provide the context for, the execution of a
region.
The binding thread set for a given region can be all threads on a device, all threads in a contention group, all master threads executing an enclosing teams region, the current team, or the encountering thread.
COMMENT: The binding thread set for a particular region is described in its corresponding subsection of this specification.
The set of tasks that are affected by, or provide the context for, the execution of a region.
The binding task set for a given region can be all tasks, the current team tasks, all tasks of the current team that are generated in the region, the binding implicit task, or the generating task.
COMMENT: The binding task set for a particular region (if applicable) is described in its corresponding subsection of this specification.
6
OpenMP API – Version 5.0 November 2018
1 binding region 2
3 4 5
6
7 8
9 10
11 12 13
14 15 16
17 18 19
20 21
22 23
24
25 26
27 orphaned construct 28
29 worksharing construct 30
31
32 33
34 device construct
The enclosing region that determines the execution context and limits the scope of the effects of the bound region is called the binding region.
Binding region is not defined for regions for which the binding thread set is all threads or the encountering thread, nor is it defined for regions for which the binding task set is all tasks.
COMMENTS:
The binding region for an ordered region is the innermost enclosing
loop region.
The binding region for a taskwait region is the innermost enclosing
task region.
The binding region for a cancel region is the innermost enclosing region corresponding to the construct-type-clause of the cancel construct.
Thebindingregionforacancellation pointregionisthe innermost enclosing region corresponding to the construct-type-clause of thecancellation pointconstruct.
For all other regions for which the binding thread set is the current team or the binding task set is the current team tasks, the binding region is the innermost enclosing parallel region.
For regions for which the binding task set is the generating task, the binding region is the region of the generating task.
A parallel region need not be active nor explicit to be a binding region.
A task region need not be explicit to be a binding region.
A region never binds to any region outside of the innermost enclosing
parallel region.
A construct that gives rise to a region for which the binding thread set is the current
team, but is not nested within another construct giving rise to the binding region.
A construct that defines units of work, each of which is executed exactly once by one
of the threads in the team executing the construct.
For C/C++, worksharing constructs are for, sections, and single.
For Fortran, worksharing constructs are do, sections, single and workshare.
An OpenMP construct that accepts the device clause.
CHAPTER1. INTRODUCTION 7
1 2
3
4 5
6 7 8
9 10 11
12 13
14 15
16 17
18
19 20
21 22
23
24
25 26
27
28 29
30
device routine
place place list
place partition
place number
thread affinity SIMD instruction SIMD lane
SIMD chunk
memory memory space
memory allocator handle
A function (for C/C+ and Fortran) or subroutine (for Fortran) that can be executed on a target device, as part of a target region.
An unordered set of processors on a device.
The ordered list that describes all OpenMP places available to the execution
environment.
An ordered list that corresponds to a contiguous interval in the OpenMP place list. It
describes the places currently available to the execution environment for a given parallel region.
A number that uniquely identifies a place in the place list, with zero identifying the first place in the place list, and each consecutive whole number identifying the next place in the place list.
A binding of threads to places within the current place partition.
A single machine instruction that can operate on multiple data elements.
A software or hardware mechanism capable of processing one data element from a SIMD instruction.
A set of iterations executed concurrently, each by a SIMD lane, by a single thread by means of SIMD instructions.
A storage resource to store and to retrieve variables accessible by OpenMP threads.
A representation of storage resources from which memory can be allocated or deallocated. More than one memory space may exist.
An OpenMP object that fulfills requests to allocate and to deallocate memory for program variables from the storage resources of its associated memory space.
An opaque reference that uniquely identifies an abstraction.
1.2.3
Loop Terminology
8
OpenMP API – Version 5.0 November 2018
loop-associated directive
associated loop(s)
sequential loop
An OpenMP executable directive for which the associated user code must be a loop nest that is a structured block.
The loop(s) controlled by a loop-associated directive.
COMMENT: If the loop-associated directive contains a collapse or an
ordered(n) clause then it may have more than one associated loop. A loop that is not associated with any OpenMP loop-associated directive.
1
2 3
4 5
6 7
8
9 10 11 12 13
14 15
16 17
18 19
20 21
22 23
24 25
26 27
28 29
30 31
SIMD loop
non-rectangular loop nest
doacross loop nest
1.2.4
A loop that includes at least one SIMD chunk.
A loop nest for which the iteration count of a loop inside the loop nest is the not same
for all occurrences of the loop in the loop nest.
A loop nest that has cross-iteration dependence. An iteration is dependent on one or more lexicographically earlier iterations.
COMMENT: The ordered clause parameter on a worksharing-loop directive identifies the loop(s) associated with the doacross loop nest.
Synchronization Terminology
cancellation cancellation point
flush flush property
barrier
A point in the execution of a program encountered by a team of threads, beyond which no thread in the team may execute until all threads in the team have reached the barrier and all explicit tasks generated by the team have executed to completion. If cancellation has been requested, threads may proceed to the end of the canceled region even if some threads in the team have not reached the barrier.
An action that cancels (that is, aborts) an OpenMP region and causes executing implicit or explicit tasks to proceed to the end of the canceled region.
A point at which implicit and explicit tasks check if cancellation has been requested. If cancellation has been observed, they perform the cancellation.
COMMENT: For a list of cancellation points, see Section 2.18.1 on page 263.
An operation that a thread performs to enforce consistency between its view and other threads’ view of memory.
Properties that determine the manner in which a flush operation enforces memory consistency. These properties are:
• strong: flushes a set of variables from the current thread’s temporary view of the memory to the memory;
• release: orders memory operations that precede the flush before memory operations performed by a different thread with which it synchronizes;
• acquire: orders memory operations that follow the flush after memory operations performed by a different thread that synchronizes with it.
COMMENT: Any flush operation has one or more flush properties. A flush operation that has the strong flush property.
strong flush
CHAPTER1. INTRODUCTION 9
1 2
3 4
5 6
7 8
9 10
11 12
13
14 15
16 17
18
19
20 21
22
23 24
25 26
release flush acquire flush atomic operation
atomic read atomic write atomic update atomic captured
update read-modify-write
sequentially consistent atomic construct
non-sequentially consistent atomic construct
sequentially consistent atomic operation
A flush operation that has the release flush property. A flush operation that has the acquire flush property.
An operation that is specified by an atomic construct and atomically accesses and/or modifies a specific storage location.
An atomic operation that is specified by an atomic construct on which the read clause is present.
An atomic operation that is specified by an atomic construct on which the write clause is present.
An atomic operation that is specified by an atomic construct on which the update clause is present.
An atomic operation that is specified by an atomic construct on which the capture clause is present.
An atomic operation that reads and writes to a given storage location. COMMENT: All atomic update and atomic captured update operations
are read-modify-write operations.
An atomic construct for which the seq_cst clause is specified.
An atomic construct for which the seq_cst clause is not specified
An atomic operation that is specified by a sequentially consistent atomic construct.
1.2.5
Tasking Terminology
10
construct is encountered during execution. OpenMP API – Version 5.0 November 2018
task task region
implicit task
A specific instance of executable code and its data environment that the OpenMP implementation can schedule for execution by threads.
A region consisting of all code encountered during the execution of a task. COMMENT: A parallel region consists of one or more implicit task
regions.
A task generated by an implicit parallel region or generated when a parallel
1
2
3
4
5 6
7 8 9
10
11 12
13 14 15
16 17
18
19 20
21 22
23 24 25
26 27
28 29
30 31
binding implicit task explicit task initial task current task child task
sibling tasks descendent task task completion
task scheduling point
task switching tied task
untied task undeferred task
included task merged task
mergeable task final task
The implicit task of the current thread team assigned to the encountering thread.
A task that is not an implicit task.
An implicit task associated with an implicit parallel region.
For a given thread, the task corresponding to the task region in which it is executing.
A task is a child task of its generating task region. A child task region is not part of its generating task region.
Tasks that are child tasks of the same task region.
A task that is the child task of a task region or of one of its descendent task regions.
Task completion occurs when the end of the structured block associated with the construct that generated the task is reached.
COMMENT: Completion of the initial task that is generated when the program begins occurs at program exit.
A point during the execution of the current task region at which it can be suspended to be resumed later; or the point of task completion, after which the executing thread may switch to a different task region.
COMMENT: For a list of task scheduling points, see Section 2.10.6 on page 149.
The act of a thread switching from the execution of one task to another task.
A task that, when its task region is suspended, can be resumed only by the same
thread that suspended it. That is, the task is tied to that thread.
A task that, when its task region is suspended, can be resumed by any thread in the
team. That is, the task is not tied to any thread.
A task for which execution is not deferred with respect to its generating task region. That is, its generating task region is suspended until execution of the structured block associated with the undeferred task is completed.
A task for which execution is sequentially included in the generating task region. That is, an included task is undeferred and executed by the encountering thread.
A task for which the data environment, inclusive of ICVs, is the same as that of its generating task region.
A task that may be a merged task if it is an undeferred task or an included task. A task that forces all of its child tasks to become final and included tasks.
CHAPTER1. INTRODUCTION 11
1 2 3
4 5
6
7 8
9
10 11
12
13
14 15
16 17
18
19 20
21
22 23
24
25 26
task dependence
dependent task
mutually exclusive tasks
predecessor task
task synchronization construct
An ordering relation between two sibling tasks: the dependent task and a previously generated predecessor task. The task dependence is fulfilled when the predecessor task has completed.
A task that because of a task dependence cannot be executed until its predecessor tasks have completed.
1.2.6
Data Terminology
12
OpenMP API –
Version 5.0 November 2018
task generating construct
target task taskgroup set
Tasks that may be executed in any order, but not at the same time.
A task that must complete before its dependent tasks can be executed.
A taskwait, taskgroup, or a barrier construct. A construct that generates one or more explicit tasks.
A mergeable and untied task that is generated by a target, target data, target exit data, or target update construct.
A set of tasks that are logically grouped by a taskgroup region.
enter
variable
scalar variable
A named data storage block, for which the value can be defined and redefined during the execution of a program.
COMMENT: An array element or structure element is a variable that is part of another variable.
For C/C++, a scalar variable, as defined by the base language.
For Fortran, a scalar variable with intrinsic type, as defined by the base language, excluding character type.
A variable, such as an array or structure, composed of other variables.
A designated subset of the elements of an array that is specified using a subscript notation that can select more than one element.
An array, an array section, or an array element.
For C/C++, an array shaping operator that reinterprets a pointer expression as an array with one or more specified dimensions.
aggregate variable array section
array item shape-operator
1 2 3
4
5 6
7 8 9
10 11 12
13 14 15 16
17 18
19 20
21 22 23 24
25 26 27
28 29 30
31 32 33 34 35
implicit array
base pointer
For C/C++, the set of array elements of non-array type T that may be accessed by applying a sequence of [] operators to a given pointer that is either a pointer to type T or a pointer to a multidimensional array of elements of type T.
For Fortran, the set of array elements for a given array pointer. COMMENT: For C/C++, the implicit array for pointer p with type T
(*)[10] consists of all accessible elements p[i][j], for all i and j=0..9.
For C/C++, an lvalue pointer expression that is used by a given lvalue expression or
array section to refer indirectly to its storage, where the lvalue expression or array section is part of the implicit array for that lvalue pointer expression.
For Fortran, a data pointer that appears last in the designator for a given variable or array section, where the variable or array section is part of the pointer target for that data pointer.
COMMENT: For the array section (*p0).x0[k1].p1->p2[k2].x1[k3].x2[4][0:n], where identifiers pi have a pointer type declaration and identifiers xi have an array type declaration, the base pointer is: (*p0).x0[k1].p1->p2.
For C/C++, the base pointer of a given lvalue expression or array section, or the base pointer of one of its named pointers.
For Fortran, the base pointer of a given variable or array section, or the base pointer of one of its named pointers.
COMMENT: For the array section (*p0).x0[k1].p1->p2[k2].x1[k3].x2[4][0:n], where identifiers pi have a pointer type declaration and identifiers xi have an array type declaration, the named pointers are: p0, (*p0).x0[k1].p1, and (*p0).x0[k1].p1->p2.
For C/C++, a non-subscripted array (a containing array) that appears in a given lvalue expression or array section, where the lvalue expression or array section is part of that containing array.
For Fortran, an array (a containing array) without the POINTER attribute and without a subscript list that appears in the designator of a given variable or array section, where the variable or array section is part of that containing array.
COMMENT: For the array section (*p0).x0[k1].p1->p2[k2].x1[k3].x2[4][0:n], where identifiers pi have a pointer type declaration and identifiers xi have an array type declaration, the containing arrays are: (*p0).x0[k1].p1->p2[k2].x1 and (*p0).x0[k1].p1->p2[k2].x1[k3].x2.
named pointer
containing array
CHAPTER1. INTRODUCTION 13
1 2
3 4
5 6 7 8
9 10
11 12
13 14 15 16 17
18 19
20 21 22 23
24 25
26 27
28 29
30 31
32 33 34 35
base array
For C/C++, a containing array of a given lvalue expression or array section that does not appear in the expression of any of its other containing arrays.
For Fortran, a containing array of a given variable or array section that does not appear in the designator of any of its other containing arrays.
COMMENT: For the array section (*p0).x0[k1].p1->p2[k2].x1[k3].x2[4][0:n], where identifiers pi have a pointer type declaration and identifiers xi have an array type declaration, the base array is: (*p0).x0[k1].p1->p2[k2].x1[k3].x2.
For C/C++, a containing array of a given lvalue expression or array section, or a containing array of one of its named pointers.
For Fortran, a containing array of a given variable or array section, or a containing array of one of its named pointers.
COMMENT: For the array section (*p0).x0[k1].p1->p2[k2].x1[k3].x2[4][0:n], where identifiers pi have a pointer type declaration and identifiers xi have an array type declaration, the named arrays are: (*p0).x0, (*p0).x0[k1].p1->p2[k2].x1, and (*p0).x0[k1].p1->p2[k2].x1[k3].x2.
The base array of a given array section or array element, if it exists; otherwise, the base pointer of the array section or array element.
COMMENT: For the array section (*p0).x0[k1].p1->p2[k2].x1[k3].x2[4][0:n], where identifiers pi have a pointer type declaration and identifiers xi have an array type declaration, the base expression is: (*p0).x0[k1].p1->p2[k2].x1[k3].x2.
More examples for C/C++:
• The base expression for x[i] and for x[i:n] is x, if x is an array or pointer.
• The base expression for x[5][i] and for x[5][i:n] is x, if x is a pointer to an array or x is 2-dimensional array.
• The base expression for y[5][i] and for y[5][i:n] is y[5], if y is an array of pointers or y is a pointer to a pointer.
Examples for Fortran:
• The base expression for x(i) and for x(i:j) is x.
A pointer variable in a device data environment to which the effect of a map clause
assigns the address of an object, minus some offset, that is created in the device data environment. The pointer is an attached pointer for the remainder of its lifetime in the device data environment.
14
OpenMP API –
Version 5.0 November 2018
named array
base expression
attached pointer
1 2
3
4
5
6
7 8 9
10 11
12 13 14
15 16 17
18 19 20
21 22 23
24 25 26
27 28
29 30
31
simply contiguous array section
structure
private variable
shared variable
threadprivate variable
threadprivate memory data environment
device data environment
device address device pointer mapped variable
An array section that statically can be determined to have contiguous storage or that, in Fortran, has the CONTIGUOUS attribute.
A structure is a variable that contains one or more variables. For C/C++: Implemented using struct types.
For C++: Implemented using class types.
For Fortran: Implemented using derived types.
With respect to a given set of task regions or SIMD lanes that bind to the same parallel region, a variable for which the name provides access to a different block of storage for each task region or SIMD lane.
A variable that is part of another variable (as an array or structure element) cannot be made private independently of other components.
With respect to a given set of task regions that bind to the same parallel region, a variable for which the name provides access to the same block of storage for each task region.
A variable that is part of another variable (as an array or structure element) cannot be shared independently of the other components, except for static data members of C++ classes.
A variable that is replicated, one instance per thread, by the OpenMP implementation. Its name then provides access to a different block of storage for each thread.
A variable that is part of another variable (as an array or structure element) cannot be made threadprivate independently of the other components, except for static data members of C++ classes.
The set of threadprivate variables associated with each thread. The variables associated with the execution of a given region. The initial data environment associated with a device.
An implementation-defined reference to an address in a device data environment. A variable that contains a device address.
An original variable in a data environment with a corresponding variable in a device data environment.
COMMENT: The original and corresponding variables may share storage.
CHAPTER1. INTRODUCTION 15
TABLE 1.1: Map-Type Decay of Map Type Combinations
1 2 3
4 5 6
7 8
9
10
11
12 13
14
15 16
17 18
19 20
21 22
23 24
25 26
27
map-type decay
mappable type
alloc to from tofrom release delete alloc alloc alloc alloc alloc release delete to alloc to alloc to release delete from alloc alloc from from release delete tofrom alloc to from tofrom release delete
The process used to determine the final map type when mapping a variable with a user defined mapper. Table 1.1 shows the final map type that the combination of the two map types determines.
A type that is valid for a mapped variable. If a type is composed from other types (such as the type of an array or structure element) and any of the other types are not mappable then the type is not mappable.
COMMENT: Pointer types are mappable but the memory block to which the pointer refers is not mapped.
For C, the type must be a complete type.
For C++, the type must be a complete type.
In addition, for class types:
• All member functions accessed in any target region must appear in a declare target directive.
For Fortran, no restrictions on the type except that for derived types:
• All type-bound procedures accessed in any target region must appear in a
declare target directive.
For variables, the property of having a valid value.
For C, for the contents of variables, the property of having a valid value.
For C++, for the contents of variables of POD (plain old data) type, the property of having a valid value.
For variables of non-POD class type, the property of having been constructed but not subsequently destructed.
For Fortran, for the contents of variables, the property of having a valid value. For the allocation or association status of variables, the property of having a valid status.
COMMENT: Programs that rely upon variables that are not defined are non-conforming programs.
For C++, variables declared with one of the class, struct, or union keywords.
16
OpenMP API –
Version 5.0 November 2018
defined
class type
2 3
4 5
6 7
8 9
10 11
12 13
14 15
16
17
18
19
20 21 22
23 24
25 26
27
28 29
supporting n active levels of parallelism
supporting the OpenMP API
supporting nested parallelism
internal control variable
compliant implementation
unspecified behavior
implementation defined
Implies allowing an active parallel region to be enclosed by n-1 active parallel regions.
Supporting at least one active level of parallelism. Supporting more than one active level of parallelism.
A conceptual variable that specifies runtime behavior of a set of threads or tasks in an OpenMP program.
COMMENT: The acronym ICV is used interchangeably with the term internal control variable in the remainder of this specification.
An implementation of the OpenMP specification that compiles and executes any conforming program as defined by the specification.
COMMENT: A compliant implementation may exhibit unspecified behavior when compiling or executing a non-conforming program.
A behavior or result that is not specified by the OpenMP specification or not known prior to the compilation or execution of an OpenMP program.
Such unspecified behavior may result from:
• Issues documented by the OpenMP specification as having unspecified behavior. • A non-conforming program.
• A conforming program exhibiting an implementation-defined behavior.
Behavior that must be documented by the implementation, and is allowed to vary among different compliant implementations. An implementation is allowed to define this behavior as unspecified.
COMMENT: All features that have implementation-defined behavior are documented in Appendix A.
For a construct, clause, or other feature, the property that it is normative in the current specification but is considered obsolescent and will be removed in the future.
1
1.2.7 Implementation Terminology
1.2.8
Tool Terminology
tool Executable code, distinct from application or runtime code, that can observe and/or modify the execution of an application.
deprecated
CHAPTER1. INTRODUCTION 17
1
2 3
4
5
6
7 8
9
10 11
12 13
14 15 16
17 18 19
20 21 22
23 24
25 26
27
28
29
30 31 32
first-party tool third-party tool
activated tool event native thread tool callback
registering a callback
dispatching a callback at an event
thread state wait identifier
frame
canonical frame address
runtime entry point trace record
native trace record signal signal handler async signal safe
A tool that executes in the address space of the program that it is monitoring.
A tool that executes as a separate process from the process that it is monitoring and potentially controlling.
A first-party tool that successfully completed its initialization.
A point of interest in the execution of a thread.
A thread defined by an underlying thread implementation.
A function that a tool provides to an OpenMP implementation to invoke when an associated event occurs.
Providing a tool callback to an OpenMP implementation.
Processing a callback when an associated event occurs in a manner consistent with
the return code provided when a first-party tool registered the callback.
An enumeration type that describes the current OpenMP activity of a thread. A
thread can be in only one state at any time.
A unique opaque handle associated with each data object (for example, a lock) used by the OpenMP runtime to enforce mutual exclusion that may cause a thread to wait actively or passively.
A storage area on a thread’s stack associated with a procedure invocation. A frame includes space for one or more saved registers and often also includes space for saved arguments, local variables, and padding for alignment.
An address associated with a procedure frame on a call stack that was the value of the stack pointer immediately prior to calling the procedure for which the invocation is represented by the frame.
A function interface provided by an OpenMP runtime for use by a tool. A runtime entry point is typically not associated with a global function symbol.
A data structure in which to store information associated with an occurrence of an event.
A trace record for an OpenMP device that is in a device-specific format. A software interrupt delivered to a thread.
A function called asynchronously when a signal is delivered to a thread.
The guarantee that interruption by signal delivery will not interfere with a set of operations. An async signal safe runtime entry point is safe to call from a signal handler.
18
OpenMP API – Version 5.0 November 2018
1 2
3 4
5 6
7 8 9
10 11 12
13 14
15 16
17 18
19 20 21
22 23 24
25 26 27
28
29
30
31
32 33
code block OMPT OMPT interface state OMPT active
OMPT pending
OMPT inactive OMPD
OMPD library image file address space
segment OpenMP architecture tool architecture OpenMP process
address space handle thread handle parallel handle task handle descendent handle
A contiguous region of memory that contains code of an OpenMP program to be executed on a device.
An interface that helps a first-party tool monitor the execution of an OpenMP program.
A state that indicates the permitted interactions between a first-party tool and the OpenMP implementation.
An OMPT interface state in which the OpenMP implementation is prepared to accept runtime calls from a first party tool and it dispatches any registered callbacks and in which a first-party tool can invoke runtime entry points if not otherwise restricted.
An OMPT interface state in which the OpenMP implementation can only call functions to initialize a first party tool and in which a first-party tool cannot invoke runtime entry points.
An OMPT interface state in which the OpenMP implementation will not make any callbacks and in which a first-party tool cannot invoke runtime entry points.
An interface that helps a third-party tool inspect the OpenMP state of a program that has begun execution.
A dynamically loadable library that implements the OMPD interface. An executable or shared library.
A collection of logical, virtual, or physical memory address ranges that contain code, stack, and/or data. Address ranges within an address space need not be contiguous. An address space consists of one or more segments.
A portion of an address space associated with a set of address ranges. The architecture on which an OpenMP region executes.
The architecture on which an OMPD tool executes.
A collection of one or more threads and address spaces. A process may contain threads and address spaces for multiple OpenMP architectures. At least one thread in an OpenMP process is an OpenMP thread. A process may be live or a core file.
A handle that refers to an address space within an OpenMP process. A handle that refers to an OpenMP thread.
A handle that refers to an OpenMP parallel region.
A handle that refers to an OpenMP task region.
An output handle that is returned from the OMPD library in a function that accepts an input handle: the output handle is a descendent of the input handle.
CHAPTER1. INTRODUCTION 19
1 ancestor handle 2
3
4 5 6
7 tool context 8
9 address space context
10 thread context
11 native thread identifier
An input handle that is passed to the OMPD library in a function that returns an output handle: the input handle is an ancestor of the output handle. For a given handle, the ancestors of the handle are also the ancestors of the handle’s descendent.
COMMENT: A handle cannot be used by the tool in an OMPD call if any ancestor of the handle has been released, except for OMPD calls that release the handle.
An opaque reference provided by a tool to an OMPD library. A tool context uniquely identifies an abstraction.
A tool context that refers to an address space within a process.
A tool context that refers to a native thread.
An identifier for a native thread defined by a thread implementation.
12 1.3
13
14
15
16
17
18
19
20
21
22
23
24 25 26
27
28
29
30
31
32
Execution Model
The OpenMP API uses the fork-join model of parallel execution. Multiple threads of execution perform tasks defined implicitly or explicitly by OpenMP directives. The OpenMP API is intended to support programs that will execute correctly both as parallel programs (multiple threads of execution and a full OpenMP support library) and as sequential programs (directives ignored and a simple OpenMP stubs library). However, it is possible and permitted to develop a program that executes correctly as a parallel program but not as a sequential program, or that produces different results when executed as a parallel program compared to when it is executed as a sequential program. Furthermore, using different numbers of threads may result in different numeric results because of changes in the association of numeric operations. For example, a serial addition reduction may have a different pattern of addition associations than a parallel reduction. These different associations may change the results of floating-point addition.
An OpenMP program begins as a single thread of execution, called an initial thread. An initial thread executes sequentially, as if the code encountered is part of an implicit task region, called an initial task region, that is generated by the implicit parallel region surrounding the whole program.
The thread that executes the implicit parallel region that surrounds the whole program executes on the host device. An implementation may support other target devices. If supported, one or more devices are available to the host device for offloading code and data. Each device has its own threads that are distinct from threads that execute on another device. Threads cannot migrate from one device to another device. The execution model is host-centric such that the host device offloads target regions to target devices.
20
OpenMP API – Version 5.0 November 2018
1 When a target construct is encountered, a new target task is generated. The target task region
2 encloses the target region. The target task is complete after the execution of the target region
3 is complete.
4 When a target task executes, the enclosed target region is executed by an initial thread. The
5 initial thread may execute on a target device. The initial thread executes sequentially, as if the target
6 region is part of an initial task region that is generated by an implicit parallel region. If the target
7 device does not exist or the implementation does not support the target device, all target regions
8 associated with that device execute on the host device.
9 The implementation must ensure that the target region executes as if it were executed in the data
10 environment of the target device unless an if clause is present and the if clause expression
11 evaluates to false.
12 The teams construct creates a league of teams, where each team is an initial team that comprises
13 an initial thread that executes the teams region. Each initial thread executes sequentially, as if the
14 code encountered is part of an initial task region that is generated by an implicit parallel region
15 associated with each team.
16 If a construct creates a data environment, the data environment is created at the time the construct is
17 encountered. The description of a construct defines whether it creates a data environment.
18 When any thread encounters a parallel construct, the thread creates a team of itself and zero or
19 more additional threads and becomes the master of the new team. A set of implicit tasks, one per
20 thread, is generated. The code for each task is defined by the code inside the parallel construct.
21 Each task is assigned to a different thread in the team and becomes tied; that is, it is always
22 executed by the thread to which it is initially assigned. The task region of the task being executed
23 by the encountering thread is suspended, and each member of the new team executes its implicit
24 task. There is an implicit barrier at the end of the parallel construct. Only the master thread
25 resumes execution beyond the end of the parallel construct, resuming the task region that was
26 suspended upon encountering the parallel construct. Any number of parallel constructs
27 can be specified in a single program.
28 parallel regions may be arbitrarily nested inside each other. If nested parallelism is disabled, or
29 is not supported by the OpenMP implementation, then the new team that is created by a thread
30 encountering a parallel construct inside a parallel region will consist only of the
31 encountering thread. However, if nested parallelism is supported and enabled, then the new team
32 can consist of more than one thread. A parallel construct may include a proc_bind clause to
33 specify the places to use for the threads in the team within the parallel region.
34 When any team encounters a worksharing construct, the work inside the construct is divided among
35 the members of the team, and executed cooperatively instead of being executed by every thread.
36 There is a default barrier at the end of each worksharing construct unless the nowait clause is
37 present. Redundant execution of code by every thread in the team resumes after the end of the
38 worksharing construct.
CHAPTER1. INTRODUCTION 21
1 2 3 4 5 6 7 8 9
10 11 12
13 14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35 36 37 38
39 40
When any thread encounters a task generating construct, one or more explicit tasks are generated. Execution of explicitly generated tasks is assigned to one of the threads in the current team, subject to the thread’s availability to execute work. Thus, execution of the new task could be immediate, or deferred until later according to task scheduling constraints and thread availability. Threads are allowed to suspend the current task region at a task scheduling point in order to execute a different task. If the suspended task region is for a tied task, the initially assigned thread later resumes execution of the suspended task region. If the suspended task region is for an untied task, then any thread may resume its execution. Completion of all explicit tasks bound to a given parallel region is guaranteed before the master thread leaves the implicit barrier at the end of the region. Completion of a subset of all explicit tasks bound to a given parallel region may be specified through the use of task synchronization constructs. Completion of all explicit tasks bound to the implicit parallel region is guaranteed by the time the program exits.
When any thread encounters a simd construct, the iterations of the loop associated with the construct may be executed concurrently using the SIMD lanes that are available to the thread.
When a loop construct is encountered, the iterations of the loop associated with the construct are executed in the context of its encountering thread(s), as determined according to its binding region. If the loop region binds to a teams region, the region is encountered by the set of master threads that execute the teams region. If the loop region binds to a parallel region, the region is encountered by the team of threads executing the parallel region. Otherwise, the region is encountered by a single thread.
If the loop region binds to a teams region, the encountering threads may continue execution after the loop region without waiting for all iterations to complete; the iterations are guaranteed to complete before the end of the teams region. Otherwise, all iterations must complete before the encountering thread(s) continue execution after the loop region. All threads that encounter the loop construct may participate in the execution of the iterations. Only one of these threads may execute any given iteration.
The cancel construct can alter the previously described flow of execution in an OpenMP region. The effect of the cancel construct depends on its construct-type-clause. If a task encounters a cancel construct with a taskgroup construct-type-clause, then the task activates cancellation and continues execution at the end of its task region, which implies completion of that task. Any other task in that taskgroup that has begun executing completes execution unless it encounters a cancellation point construct, in which case it continues execution at the end of its task region, which implies its completion. Other tasks in that taskgroup region that have not begun execution are aborted, which implies their completion.
For all other construct-type-clause values, if a thread encounters a cancel construct, it activates cancellation of the innermost enclosing region of the type specified and the thread continues execution at the end of that region. Threads check if cancellation has been activated for their region at cancellation points and, if so, also resume execution at the end of the canceled region.
If cancellation has been activated regardless of construct-type-clause, threads that are waiting inside a barrier other than an implicit barrier at the end of the canceled region exit the barrier and
22
OpenMP API – Version 5.0 November 2018
1 2
3 4 5
6 7 8 9
10
11 1.4
12 1.4.1
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
resume execution at the end of the canceled region. This action can occur before the other threads reach that barrier.
Synchronization constructs and library routines are available in the OpenMP API to coordinate tasks and data access in parallel regions. In addition, library routines and environment variables are available to control or to query the runtime environment of OpenMP programs.
The OpenMP specification makes no guarantee that input or output to the same file is synchronous when executed in parallel. In this case, the programmer is responsible for synchronizing input and output processing with the assistance of OpenMP synchronization constructs or library routines. For the case where each thread accesses a different file, no synchronization by the programmer is necessary.
Memory Model
Structure of the OpenMP Memory Model
The OpenMP API provides a relaxed-consistency, shared-memory model. All OpenMP threads have access to a place to store and to retrieve variables, called the memory. In addition, each thread is allowed to have its own temporary view of the memory. The temporary view of memory for each thread is not a required part of the OpenMP memory model, but can represent any kind of intervening structure, such as machine registers, cache, or other local storage, between the thread and the memory. The temporary view of memory allows the thread to cache variables and thereby to avoid going to memory for every reference to a variable. Each thread also has access to another type of memory that must not be accessed by other threads, called threadprivate memory.
A directive that accepts data-sharing attribute clauses determines two kinds of access to variables used in the directive’s associated structured block: shared and private. Each variable referenced in the structured block has an original variable, which is the variable by the same name that exists in the program immediately outside the construct. Each reference to a shared variable in the structured block becomes a reference to the original variable. For each private variable referenced in the structured block, a new version of the original variable (of the same type and size) is created in memory for each task or SIMD lane that contains code associated with the directive. Creation of the new version does not alter the value of the original variable. However, the impact of attempts to access the original variable during the region corresponding to the directive is unspecified; see Section 2.19.4.3 on page 285 for additional details. References to a private variable in the structured block refer to the private version of the original variable for the current task or SIMD lane. The relationship between the value of the original variable and the initial or final value of the private version depends on the exact clause that specifies it. Details of this issue, as well as other issues with privatization, are provided in Section 2.19 on page 269.
CHAPTER1. INTRODUCTION 23
1 2 3
4 5 6 7 8
9 10 11 12 13
14
15
16
17
18
19
20 1.4.2
21 22 23 24 25
26 27 28 29 30
31
32
33
34
35
36
The minimum size at which a memory update may also read and write back adjacent variables that are part of another variable (as array or structure elements) is implementation defined but is no larger than required by the base language.
A single access to a variable may be implemented with multiple load or store instructions and, thus, is not guaranteed to be atomic with respect to other accesses to the same variable. Accesses to variables smaller than the implementation defined minimum size or to C or C++ bit-fields may be implemented by reading, modifying, and rewriting a larger unit of memory, and may thus interfere with updates of variables or fields in the same unit of memory.
If multiple threads write without synchronization to the same memory unit, including cases due to atomicity considerations as described above, then a data race occurs. Similarly, if at least one thread reads from a memory unit and at least one thread writes without synchronization to that same memory unit, including cases due to atomicity considerations as described above, then a data race occurs. If a data race occurs then the result of the program is unspecified.
A private variable in a task region that subsequently generates an inner nested parallel region is permitted to be made shared by implicit tasks in the inner parallel region. A private variable in a task region can also be shared by an explicit task region generated during its execution. However, it is the programmer’s responsibility to ensure through synchronization that the lifetime of the variable does not end before completion of the explicit task region sharing it. Any other access by one task to the private variables of another task results in unspecified behavior.
Device Data Environments
When an OpenMP program begins, an implicit target data region for each device surrounds the whole program. Each device has a device data environment that is defined by its implicit target data region. Any declare target directives and the directives that accept data-mapping attribute clauses determine how an original variable in a data environment is mapped to a corresponding variable in a device data environment.
When an original variable is mapped to a device data environment and a corresponding variable is not present in the device data environment, a new corresponding variable (of the same type and size as the original variable) is created in the device data environment. Conversely, the original variable becomes the new variable’s corresponding variable in the device data environment of the device that performs the mapping operation.
The corresponding variable in the device data environment may share storage with the original variable. Writes to the corresponding variable may alter the value of the original variable. The impact of this possibility on memory consistency is discussed in Section 1.4.6 on page 28. When a task executes in the context of a device data environment, references to the original variable refer to the corresponding variable in the device data environment. If an original variable is not currently mapped and a corresponding variable does not exist in the device data environment then accesses to
24
OpenMP API – Version 5.0 November 2018
1 2
3 4 5
6 7
8 1.4.3
9 10 11 12 13 14
15
16
17
18
19
20
21 1.4.4
22
23
24
25
26
27
28 29 30 31
the original variable result in unspecified behavior unless the unified_shared_memory clause is specified on a requires directive for the compilation unit.
The relationship between the value of the original variable and the initial or final value of the corresponding variable depends on the map-type. Details of this issue, as well as other issues with mapping a variable, are provided in Section 2.19.7.1 on page 315.
The original variable in a data environment and the corresponding variable(s) in one or more device data environments may share storage. Without intervening synchronization data races can occur.
Memory Management
The host device, and target devices that an implementation may support, have attached storage resources where program variables are stored. These resources can have different traits. A memory space in an OpenMP program represents a set of these storage resources. Memory spaces are defined according to a set of traits, and a single resource may be exposed as multiple memory spaces with different traits or may be part of multiple memory spaces. In any device, at least one memory space is guaranteed to exist.
An OpenMP program can use a memory allocator to allocate memory in which to store variables. This memory will be allocated from the storage resources of the memory space associated with the memory allocator. Memory allocators are also used to deallocate previously allocated memory. When an OpenMP memory allocator is not used to allocate memory, OpenMP does not prescribe the storage resource for the allocation; the memory for the variables may be allocated in any storage resource.
The Flush Operation
The memory model has relaxed-consistency because a thread’s temporary view of memory is not required to be consistent with memory at all times. A value written to a variable can remain in the thread’s temporary view until it is forced to memory at a later time. Likewise, a read from a variable may retrieve the value from the thread’s temporary view, unless it is forced to read from memory. OpenMP flush operations are used to enforce consistency between a thread’s temporary view of memory and memory, or between multiple threads’ view of memory.
If a flush operation is a strong flush, it enforces consistency between a thread’s temporary view and memory. A strong flush operation is applied to a set of variables called the flush-set. A strong flush restricts reordering of memory operations that an implementation might otherwise do. Implementations must not reorder the code for a memory operation for a given variable, or the code
CHAPTER1. INTRODUCTION 25
1 2
3 4 5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38 39
for a flush operation for the variable, with respect to a strong flush operation that refers to the same variable.
If a thread has performed a write to its temporary view of a shared variable since its last strong flush of that variable, then when it executes another strong flush of the variable, the strong flush does not complete until the value of the variable has been written to the variable in memory. If a thread performs multiple writes to the same variable between two strong flushes of that variable, the strong flush ensures that the value of the last write is written to the variable in memory. A strong flush of a variable executed by a thread also causes its temporary view of the variable to be discarded, so that if its next memory operation for that variable is a read, then the thread will read from memory and capture the value in its temporary view. When a thread executes a strong flush, no later memory operation by that thread for a variable involved in that strong flush is allowed to start until the strong flush completes. The completion of a strong flush executed by a thread is defined as the point at which all writes to the flush-set performed by the thread before the strong flush are visible in memory to all other threads, and at which that thread’s temporary view of the flush-set is discarded.
A strong flush operation provides a guarantee of consistency between a thread’s temporary view and memory. Therefore, a strong flush can be used to guarantee that a value written to a variable by one thread may be read by a second thread. To accomplish this, the programmer must ensure that the second thread has not written to the variable since its last strong flush of the variable, and that the following sequence of events are completed in this specific order:
1. The value is written to the variable by the first thread;
2. The variable is flushed, with a strong flush, by the first thread;
3. The variable is flushed, with a strong flush, by the second thread; and 4. The value is read from the variable by the second thread.
If a flush operation is a release flush or acquire flush, it can enforce consistency between the views of memory of two synchronizing threads. A release flush guarantees that any prior operation that writes or reads a shared variable will appear to be completed before any operation that writes or reads the same shared variable and follows an acquire flush with which the release flush synchronizes (see Section 1.4.5 on page 27 for more details on flush synchronization). A release flush will propagate the values of all shared variables in its temporary view to memory prior to the thread performing any subsequent atomic operation that may establish a synchronization. An acquire flush will discard any value of a shared variable in its temporary view to which the thread has not written since last performing a release flush, so that it may subsequently read a value propagated by a release flush that synchronizes with it. Therefore, release and acquire flushes may also be used to guarantee that a value written to a variable by one thread may be read by a second thread. To accomplish this, the programmer must ensure that the second thread has not written to the variable since its last acquire flush, and that the following sequence of events happen in this specific order:
1. The value is written to the variable by the first thread; 2. The first thread performs a release flush;
26
OpenMP API – Version 5.0 November 2018
1 2 3
4 5 6 7
8
9 10 11
12 1.4.5
13 14 15
16
17
18
19
20
21
22 23
24 25 26
27 28 29
30 31 32
33 34
3. The second thread performs an acquire flush; and
4. The value is read from the variable by the second thread.
Note – OpenMP synchronization operations, described in Section 2.17 on page 223 and in Section 3.3 on page 381, are recommended for enforcing this order. Synchronization through variables is possible but is not recommended because the proper timing of flushes is difficult.
The flush properties that define whether a flush operation is a strong flush, a release flush, or an acquire flush are not mutually disjoint. A flush operation may be a strong flush and a release flush; it may be a strong flush and an acquire flush; it may be a release flush and an acquire flush; or it may be all three.
Flush Synchronization and Happens Before
OpenMP supports thread synchronization with the use of release flushes and acquire flushes. For any such synchronization, a release flush is the source of the synchronization and an acquire flush is the sink of the synchronization, such that the release flush synchronizes with the acquire flush.
A release flush has one or more associated release sequences that define the set of modifications that may be used to establish a synchronization. A release sequence starts with an atomic operation that follows the release flush and modifies a shared variable and additionally includes any read-modify-write atomic operations that read a value taken from some modification in the release sequence. The following rules determine the atomic operation that starts an associated release sequence.
• •
•
If a release flush is performed on entry to an atomic operation, that atomic operation starts its release sequence.
If a release flush is performed in an implicit flush region, an atomic operation that is provided by the implementation and that modifies an internal synchronization variable, starts its release sequence.
If a release flush is performed by an explicit flush region, any atomic operation that modifies a shared variable and follows the flush region in its thread’s program order starts an associated release sequence.
An acquire flush is associated with one or more prior atomic operations that read a shared variable and that may be used to establish a synchronization. The following rules determine the associated atomic operation that may establish a synchronization.
• If an acquire flush is performed on exit from an atomic operation, that atomic operation is its associated atomic operation.
CHAPTER1. INTRODUCTION 27
1 2 3
4 5 6
7 8 9
10 11
12
13 14
15 16
17
18 19
20
21 22
23 1.4.6 24
25
26 27 28
29 30 31
32 33
•
•
If an acquire flush is performed in an implicit flush region, an atomic operation that is provided by the implementation and that reads an internal synchronization variable is its associated atomic operation.
If an acquire flush is performed by an explicit flush region, any atomic operation that reads a shared variable and precedes the flush region in its thread’s program order is an associated atomic operation.
28
OpenMP API – Version 5.0 November 2018
A
acquire flush reads a value written by a modification from a release sequence associated with the release flush.
release flush synchronizes with an acquire flush if an atomic operation associated with the
An operation X simply happens before an operation Y if any of the following conditions are satisfied:
1. X and Y are performed by the same thread, and X precedes Y in the thread’s program order;
2. X synchronizes with Y according to the flush synchronization conditions explained above or
according to the base language’s definition of synchronizes with, if such a definition exists; or
3. There exists another operation Z, such that X simply happens before Z and Z simply happens
before Y.
An operation X happens before an operation Y if any of the following conditions are satisfied:
1. X happens before Y according to the base language’s definition of happens before, if such a definition exists; or
2. X simply happens before Y.
A variable with an initial value is treated as if the value is stored to the variable by an operation that
happens before all operations that access or modify the variable in the program.
OpenMP Memory Consistency
The following rules guarantee the observable completion order of memory operations, as seen by all threads.
•
•
•
If two operations performed by different threads are sequentially consistent atomic operations or they are strong flushes that flush the same variable, then they must be completed as if in some sequential order, seen by all threads.
If two operations performed by the same thread are sequentially consistent atomic operations or they access, modify, or, with a strong flush, flush the same variable, then they must be completed as if in that thread’s program order, as seen by all threads.
If two operations are performed by different threads and one happens before the other, then they must be completed as if in that happens before order, as seen by all threads, if:
1
– both operations access or modify the same variable;
– both operations are strong flushes that flush the same variable; or
– both operations are sequentially consistent atomic operations.
Any two atomic memory operations from different atomic regions must be completed as if in the same order as the strong flushes implied in their respective regions, as seen by all threads.
2
3
4 5
6 7
8
9 10 11
12 13
14 15
16
17 18 19
20 1.5
21 22 23
•
The flush operation can be specified using the flush directive, and is also implied at various locations in an OpenMP program: see Section 2.17.8 on page 242 for details.
Note – Since flush operations by themselves cannot prevent data races, explicit flush operations are only useful in combination with non-sequentially consistent atomic directives.
OpenMP programs that:
• Do not use non-sequentially consistent atomic directives;
• Do not rely on the accuracy of a false result from omp_test_lock and omp_test_nest_lock; and
• Correctly avoid data races as required in Section 1.4.1 on page 23,
behave as though operations on shared variables were simply interleaved in an order consistent with the order in which they are performed by each thread. The relaxed consistency model is invisible for such programs, and any explicit flush operations in such programs are redundant.
Tool Interfaces
The OpenMP API includes two tool interfaces, OMPT and OMPD, to enable development of high-quality, portable, tools that support monitoring, performance, or correctness analysis and debugging of OpenMP programs developed using any implementation of the OpenMP API,
24 1.5.1 OMPT
25 The OMPT interface, which is intended for first-party tools, provides the following:
26 • A mechanism to initialize a first-party tool;
CHAPTER1. INTRODUCTION 29
1 •
2 •
3 •
4
5 •
6 •
7 •
Routines that enable a tool to determine the capabilities of an OpenMP implementation; Routines that enable a tool to examine OpenMP state information associated with a thread;
Mechanisms that enable a tool to map implementation-level calling contexts back to their source-level representations;
A callback interface that enables a tool to receive notification of OpenMP events;
A tracing interface that enables a tool to trace activity on OpenMP target devices; and A runtime library routine that an application can use to control a tool.
8 OpenMP implementations may differ with respect to the thread states that they support, the mutual
9 exclusion implementations that they employ, and the OpenMP events for which tool callbacks are
10 invoked. For some OpenMP events, OpenMP implementations must guarantee that a registered
11 callback will be invoked for each occurrence of the event. For other OpenMP events, OpenMP
12 implementations are permitted to invoke a registered callback for some or no occurrences of the
13 event; for such OpenMP events, however, OpenMP implementations are encouraged to invoke tool
14 callbacks on as many occurrences of the event as is practical. Section 4.2.4 specifies the subset of
15 OMPT callbacks that an OpenMP implementation must support for a minimal implementation of
16 the OMPT interface.
17 An implementation of the OpenMP API may differ from the abstract execution model described by
18 its specification. The ability of tools that use the OMPT interface to observe such differences does
19 not constrain implementations of the OpenMP API in any way.
20 With the exception of the omp_control_tool runtime library routine for tool control, all other
21 routines in the OMPT interface are intended for use only by tools and are not visible to
22 applications. For that reason, a Fortran binding is provided only for omp_control_tool; all
23 other OMPT functionality is described with C syntax only.
24 1.5.2 OMPD
25 26 27 28
29 30
31 32
The OMPD interface is intended for third-party tools, which run as separate processes. An OpenMP implementation must provide an OMPD library that can be dynamically loaded and used by a third-party tool. A third-party tool, such as a debugger, uses the OMPD library to access OpenMP state of a program that has begun execution. OMPD defines the following:
• An interface that an OMPD library exports, which a tool can use to access OpenMP state of a program that has begun execution;
• A callback interface that a tool provides to the OMPD library so that the library can use it to access the OpenMP state of a program that has begun execution; and
30
OpenMP API – Version 5.0 November 2018
1 2 3
4
5 1.6
6 7 8 9
10
11
12
13
14
15
16
17
18
19
20
21
22 23 24
25 26
27 1.7
28 29
• A small number of symbols that must be defined by an OpenMP implementation to help the tool find the correct OMPD library to use for that OpenMP implementation and to facilitate notification of events.
Section 5 describes OMPD in detail.
OpenMP Compliance
The OpenMP API defines constructs that operate in the context of the base language that is supported by an implementation. If the implementation of the base language does not support a language construct that appears in this document, a compliant OpenMP implementation is not required to support it, with the exception that for Fortran, the implementation must allow case insensitivity for directive and API routines names, and must allow identifiers of more than six characters. An implementation of the OpenMP API is compliant if and only if it compiles and executes all other conforming programs, and supports the tool interface, according to the syntax and semantics laid out in Chapters 1, 2, 3, 4 and 5. Appendices A, B, C, and D, as well as sections designated as Notes (see Section 1.8 on page 34) are for information purposes only and are not part of the specification.
All library, intrinsic and built-in routines provided by the base language must be thread-safe in a compliant implementation. In addition, the implementation of the base language must also be thread-safe. For example, ALLOCATE and DEALLOCATE statements must be thread-safe in Fortran. Unsynchronized concurrent use of such routines by different threads must produce correct results (although not necessarily the same as serial execution results, as in the case of random number generation routines).
Starting with Fortran 90, variables with explicit initialization have the SAVE attribute implicitly. This is not the case in Fortran 77. However, a compliant OpenMP Fortran implementation must give such a variable the SAVE attribute, regardless of the underlying base language version.
Appendix A lists certain aspects of the OpenMP API that are implementation defined. A compliant implementation must define and document its behavior for each of the items in Appendix A.
Normative References
• ISO/IEC 9899:1990, Information Technology – Programming Languages – C. This OpenMP API specification refers to ISO/IEC 9899:1990 as C90.
CHAPTER1. INTRODUCTION 31
1 2 3
4 5 6
7 8 9
10
11
12
13
14
15
16
17 18 19
20
21
22
23
24
25
26
27
28
29
• •
ISO/IEC 9899:1999, Information Technology – Programming Languages – C. This OpenMP API specification refers to ISO/IEC 9899:1999 as C99. ISO/IEC 9899:2011, Information Technology – Programming Languages – C.
This OpenMP API specification refers to ISO/IEC 9899:2011 as C11. While future versions of the OpenMP specification are expected to address the following features, currently their use may result in unspecified behavior.
– Supporting the noreturn property – Adding alignment support
– Creation of complex value
– Threads for the C standard library – Thread-local storage
– Parallel memory sequencing model
– Atomic
ISO/IEC 14882:1998, Information Technology – Programming Languages – C++. This OpenMP API specification refers to ISO/IEC 14882:1998 as C++98. ISO/IEC 14882:2011, Information Technology – Programming Languages – C++.
This OpenMP API specification refers to ISO/IEC 14882:2011 as C++11. While future versions of the OpenMP specification are expected to address the following features, currently their use may result in unspecified behavior.
– Alignment support
– Standard layout types
– Allowing move constructs to throw
– Defining move special member functions
– Concurrency
– Data-dependency ordering: atomics and memory model – Additions to the standard library
– Thread-local storage
– Dynamic initialization and destruction with concurrency – C++11 library
32
OpenMP API – Version 5.0 November 2018
• •
1 •
ISO/IEC 14882:2014, Information Technology – Programming Languages – C++.
2 3 4
5
6
7 • 8
9 •
10
11 • 12
13 • 14
15 • 16
17 •
18 19 20
21
22
23
24
25
26
27
28
29
30
This OpenMP API specification refers to ISO/IEC 14882:2014 as C++14. While future versions of the OpenMP specification are expected to address the following features, currently their use may result in unspecified behavior.
– Sized deallocation
– What signal handlers can do
ISO/IEC 14882:2017, Information Technology – Programming Languages – C++. This OpenMP API specification refers to ISO/IEC 14882:2017 as C++17.
ISO/IEC 1539:1980, Information Technology – Programming Languages – Fortran. This OpenMP API specification refers to ISO/IEC 1539:1980 as Fortran 77. ISO/IEC 1539:1991, Information Technology – Programming Languages – Fortran. This OpenMP API specification refers to ISO/IEC 1539:1991 as Fortran 90. ISO/IEC 1539-1:1997, Information Technology – Programming Languages – Fortran. This OpenMP API specification refers to ISO/IEC 1539-1:1997 as Fortran 95. ISO/IEC 1539-1:2004, Information Technology – Programming Languages – Fortran. This OpenMP API specification refers to ISO/IEC 1539-1:2004 as Fortran 2003. ISO/IEC 1539-1:2010, Information Technology – Programming Languages – Fortran.
This OpenMP API specification refers to ISO/IEC 1539-1:2010 as Fortran 2008. While future versions of the OpenMP specification are expected to address the following features, currently their use may result in unspecified behavior.
– Submodules
– Coarrays
– DO CONCURRENT
– Allocatable components of recursive type
– Pointer initialization
– Value attribute is permitted for any nonallocatable nonpointer nonarray – Simply contiguous arrays rank remapping to rank>1 target
– Polymorphic assignment
– Accessing real and imaginary parts
– Pointer function reference is a variable
CHAPTER1. INTRODUCTION 33
1 2 3 4 5 6 7 8 9
10
11 1.8
12
13
14
15
16
17
18
19
20 21
22 23 24
– Recursive I/O
– The BLOCK construct
– EXIT statement (to terminate a non-DO construct) – ERROR STOP
– Internal procedure as an actual argument
– Generic resolution by procedureness
– Generic resolution by pointer vs. allocatable
– Impure elemental procedures
Where this OpenMP API specification refers to C, C++ or Fortran, reference is made to the base language supported by the implementation.
Organization of this Document
The remainder of this document is structured as follows: • Chapter 2 “Directives”
• Chapter 3 “Runtime Library Routines”
• Chapter 4 “OMPT Interface”
• Chapter 5 “OMPD Interface”
• Chapter 6 “Environment Variables”
• Appendix A “OpenMP Implementation-Defined Behaviors”
• Appendix B “Features History”
Some sections of this document only apply to programs written in a certain base language. Text that applies only to programs for which the base language is C or C++ is shown as follows:
C / C++
C/C++ specific text…
Text that applies only to programs for which the base language is C only is shown as follows:
C / C++
C specific text…
C C
34
OpenMP API – Version 5.0 November 2018
1 Text that applies only to programs for which the base language is C90 only is shown as follows: C90
2 C90 specific text…
3 Text that applies only to programs for which the base language is C99 only is shown as follows:
C99
4 C99 specific text…
5 Text that applies only to programs for which the base language is C++ only is shown as follows:
C++
6 C++ specific text…
7 Text that applies only to programs for which the base language is Fortran is shown as follows:
10 page. For Fortran-specific text, the marker is:
Fortran (cont.)
11 For C/C++-specific text, the marker is:
C/C++ (cont.)
12 Some text is for information only, and is not part of the normative specification. Such text is
13 designated as a note, like this:
14
15 Note – Non-normative text… 16
C90
C99
C++
Fortran Fortran
8 Fortran specific text……
9 Where an entire page consists of base language specific text, a marker is shown at the top of the
CHAPTER1. INTRODUCTION 35
This page intentionally left blank
CHAPTER 2
1 Directives 2
3 This chapter describes the syntax and behavior of OpenMP directives.
C / C++
4 In C/C++, OpenMP directives are specified by using the #pragma mechanism provided by the C
5 and C++ standards.
C / C++ Fortran
6 In Fortran, OpenMP directives are specified by using special comments that are identified by
7 unique sentinels. Also, a special comment form is available for conditional compilation.
Fortran
8 Compilers can therefore ignore OpenMP directives and conditionally compiled code if support of
9 the OpenMP API is not provided or enabled. A compliant implementation must provide an option
10 or interface that ensures that underlying support of all OpenMP directives and OpenMP conditional
11 compilation mechanisms is enabled. In the remainder of this document, the phrase OpenMP
12 compilation is used to mean a compilation with these OpenMP features enabled.
Fortran
13 Restrictions
14 The following restriction applies to all OpenMP directives:
15 • OpenMP directives, except simd and any declarative directive, may not appear in pure
16 procedures.
17 • OpenMP directives may not appear in the WHERE and FORALL constructs. Fortran
CHAPTER2. DIRECTIVES 37
1 2.1 2
3 4
5 6 7 8
9 10
11
12 13
14 15 16
17 18
19 20 21
22 23
24 25
26 27
Directive Format
C / C++
OpenMP directives for C/C++ are specified with #pragma directives. The syntax of an OpenMP directive is as follows:
#pragma ompdirective-name[clause[[,]clause]…]new-line
Each directive starts with #pragma omp. The remainder of the directive follows the conventions of the C and C++ standards for compiler directives. In particular, white space can be used before and after the #, and sometimes white space must be used to separate the words in a directive. Preprocessing tokens following #pragma omp are subject to macro replacement.
Some OpenMP directives may be composed of consecutive #pragma directives if specified in their syntax.
Directives are case-sensitive.
Each of the expressions used in the OpenMP syntax inside of the clauses must be a valid
assignment-expression of the base language unless otherwise specified. C / C++
C++
Directives may not appear in constexpr functions or in constant expressions. Variadic parameter packs cannot be expanded into a directive or its clauses except as part of an expression argument to be evaluated by the base language, such as into a function call inside an if clause.
C++ Fortran
OpenMP directives for Fortran are specified as follows:
sentinel directive-name [clause[ [,] clause]…]
All OpenMP compiler directives must begin with a directive sentinel. The format of a sentinel differs between fixed form and free form source files, as described in Section 2.1.1 on page 41 and Section 2.1.2 on page 41.
Directives are case insensitive. Directives cannot be embedded within continued statements, and statements cannot be embedded within directives.
Each of the expressions used in the OpenMP syntax inside of the clauses must be a valid expression of the base language unless otherwise specified.
In order to simplify the presentation, free form is used for the syntax of OpenMP directives for Fortran in the remainder of this document, except as noted.
Fortran
38
OpenMP API – Version 5.0 November 2018
1 Only one directive-name can be specified per directive (note that this includes combined directives,
2 see Section 2.13 on page 185). The order in which clauses appear on directives is not significant.
3 Clauses on directives may be repeated as needed, subject to the restrictions listed in the description
4 of each clause.
5 Some clauses accept a list, an extended-list, or a locator-list. A list consists of a comma-separated
6 collection of one or more list items. An extended-list consists of a comma-separated collection of
7 one or more extended list items. A locator-list consists of a comma-separated collection of one or
8 more locator list items.
C / C++
9 A list item is a variable or an array section. An extended list item is a list item or a function name. A
10 locator list item is any lvalue expression, including variables, or an array section.
C / C++ Fortran
11 A list item is a variable, array section or common block name (enclosed in slashes). An extended
12 list item is a list item or a procedure name. A locator list item is a list item.
13 When a named common block appears in a list, it has the same meaning as if every explicit member
14 of the common block appeared in the list. An explicit member of a common block is a variable that
15 is named in a COMMON statement that specifies the common block name and is declared in the same
16 scoping unit in which the clause appears.
17 Although variables in common blocks can be accessed by use association or host association,
18 common block names cannot. As a result, a common block name specified in a data-sharing
19 attribute, a data copying or a data-mapping attribute clause must be declared to be a common block
20 in the same scoping unit in which the clause appears.
21 If a list item that appears in a directive or clause is an optional dummy argument that is not present,
22 the directive or clause for that list item is ignored.
23 If the variable referenced inside a construct is an optional dummy argument that is not present, any
24 explicitly determined, implicitly determined, or predetermined data-sharing and data-mapping
25 attribute rules for that variable are ignored. Otherwise, if the variable is an optional dummy
26 argument that is present, it is present inside the construct.
Fortran
27 For all base languages, a list item, an extended list item, or a locator list item is subject to the
28 restrictions specified in Section 2.1.5 on page 44 and in each of the sections describing clauses and
29 directives for which the list, the extended-list, or the locator-list appears.
30 Some executable directives include a structured block. A structured block:
31 • may contain infinite loops where the point of exit is never reached;
32 • may halt due to an IEEE exception;
CHAPTER2. DIRECTIVES 39
1 2
3 4 5
6
7 8 9
10
11 12
C / C++
• may contain calls to exit(), _Exit(), quick_exit(), abort() or functions with a _Noreturn specifier (in C) or a noreturn attribute (in C/C++);
• may be an expression statement, iteration statement, selection statement, or try block, provided that the corresponding compound statement obtained by enclosing it in { and } would be a structured block; and
• may contain STOP statements.
Restrictions
C / C++ Fortran
Fortran
Restrictions to structured blocks are as follows:
• Entry to a structured block must not be the result of a branch.
• The point of exit cannot be a branch out of the structured block.
C / C++
• The point of entry to a structured block must not be a call to setjmp().
• longjmp() and throw() must not violate the entry/exit criteria. C / C++
40
OpenMP API – Version 5.0 November 2018
1 2.1.1 2
3
4 5 6 7
8
9 10 11
12
13 14
15
16
17
18
19
20
21
22
23 2.1.2 24
25
26
27
28
29
30
31
Fortran
Fixed Source Form Directives
The following sentinels are recognized in fixed form source files:
!$omp | c$omp | *$omp
Sentinels must start in column 1 and appear as a single word with no intervening characters. Fortran fixed form line length, white space, continuation, and column rules apply to the directive line. Initial directive lines must have a space or a zero in column 6, and continuation directive lines must have a character other than a space or a zero in column 6.
Comments may appear on the same line as a directive. The exclamation point initiates a comment when it appears after column 6. The comment extends to the end of the source line and is ignored. If the first non-blank character after the directive sentinel of an initial or continuation directive line is an exclamation point, the line is ignored.
Note – In the following example, the three formats for specifying the directive are equivalent (the first line represents the position of the first 9 columns):
c23456789
!$omp parallel do shared(a,b,c)
c$omp parallel do
c$omp+shared(a,b,c)
c$omp paralleldoshared(a,b,c)
Free Source Form Directives
The following sentinel is recognized in free form source files:
!$omp
The sentinel can appear in any column as long as it is preceded only by white space. It must appear as a single word with no intervening white space. Fortran free form line length, white space, and continuation rules apply to the directive line. Initial directive lines must have a space after the sentinel. Continued directive lines must have an ampersand (&) as the last non-blank character on the line, prior to any comment placed inside the directive. Continuation directive lines can have an ampersand after the directive sentinel with optional white space before and after the ampersand.
CHAPTER2. DIRECTIVES 41
1 2 3
4 5
6
7 8
9
10
11
12
13
14
15
16
17 18
19 2.1.3 20
21
22
23 24 25 26 27
Comments may appear on the same line as a directive. The exclamation point (!) initiates a comment. The comment extends to the end of the source line and is ignored. If the first non-blank character after the directive sentinel is an exclamation point, the line is ignored.
One or more blanks or horizontal tabs are optional to separate adjacent keywords in directive-names unless otherwise specified.
Note – In the following example the three formats for specifying the directive are equivalent (the first line represents the position of the first 9 columns):
!23456789
!$omp parallel do &
!$omp shared(a,b,c)
!$omp parallel &
!$omp&do shared(a,b,c)
!$omp paralleldo shared(a,b,c)
Fortran
Stand-Alone Directives Summary
Stand-alone directives are executable directives that have no associated user code.
Description
Stand-alone directives do not have any associated executable user code. Instead, they represent executable statements that typically do not have succinct equivalent statements in the base language. There are some restrictions on the placement of a stand-alone directive within a program. A stand-alone directive may be placed only at a point where a base language executable statement is allowed.
42
OpenMP API – Version 5.0 November 2018
1
Restrictions
2 3
4 5
6 2.1.4
7 8 9
10 11
12 13
14 15
16
17
18
19
20
21
22 23
C / C++
• A stand-alone directive may not be used in place of the statement following an if, while, do, switch, or label.
C / C++ Fortran
• A stand-alone directive may not be used as the action statement in an if statement or as the executable statement following a label if the label is referenced in the program.
Fortran C / C++
Array Shaping
If an expression has a type of pointer to T, then a shape-operator can be used to specify the extent of that pointer. In other words, the shape-operator is used to reinterpret, as an n-dimensional array, the region of memory to which that expression points.
Formally, the syntax of the shape-operator is as follows:
shaped-expression := ([s1 ][s2 ]…[sn ])cast-expression
The result of applying the shape-operator to an expression is an lvalue expression with an
n-dimensional array type with dimensions s1 × s2 . . . × sn and element type T. The precedence of the shape-operator is the same as a type cast.
Each si is an integral type expression that must evaluate to a positive integer.
Restrictions
Restrictions to the shape-operator are as follows:
• • • •
•
The type T must be a complete type.
The shape-operator can appear only in clauses where it is explicitly allowed.
The result of a shape-operator must be a named array of a list item.
The type of the expression upon which a shape-operator is applied must be a pointer type.
C++
If the type T is a reference to a type T’, then the type will be considered to be T’ for all purposes of the designated array.
C++ C / C++
CHAPTER2. DIRECTIVES 43
1 2.1.5 2
3 4
5 6 7 8 9
10
11
12
13
14
15
16
17
18 19
20 21
22 23
24
25
26
27
28 29
30
Array Sections
An array section designates a subset of the elements in an array.
C / C++
To specify an array section in an OpenMP construct, array subscript expressions are extended with the following syntax:
[lower-bound:length:stride] or [lower-bound:length:] or
[lower-bound:length] or [lower-bound::stride] or [lower-bound::] or [lower-bound:] or
[ :length:stride] or [ :length:] or
[ :length] or
[ : : stride]
[::] [:]
44
OpenMP API – Version 5.0 November 2018
The array section must be a subset of the original array.
Array sections are allowed on multidimensional arrays. Base language array subscript expressions can be used to specify length-one dimensions of multidimensional array sections.
Each of the lower-bound, length, and stride expressions if specified must be an integral type expression of the base language. When evaluated they represent a set of integer values as follows:
{ lower-bound, lower-bound + stride, lower-bound + 2 * stride,… , lower-bound + ((length – 1) * stride) }
The length must evaluate to a non-negative integer.
The stride must evaluate to a positive integer.
When the size of the array dimension is not known, the length must be specified explicitly.
When the stride is absent it defaults to 1.
When the length is absent it defaults to ⌈(size − lower-bound)/stride⌉, where size is the size of the array dimension.
When the lower-bound is absent it defaults to 0.
C/C++ (cont.)
1 The precedence of a subscript operator that uses the array section syntax is the same as the
2 precedence of a subscript operator that does not use the array section syntax.
3
4 Note – The following are examples of array sections:
5 6 7 8 9
10
11
12
13
14
15
16
17
18
19 Assume a is declared to be a 1-dimensional array with dimension size 11. The first two examples
20 are equivalent, and the third and fourth examples are equivalent. The fifth example specifies a stride
21 of 2 and therefore is not contiguous.
22 Assume b is declared to be a pointer to a 2-dimensional array with dimension sizes 10 and 10. The
23 sixth example refers to all elements of the 2-dimensional array given by b[10]. The seventh
24 example is a zero-length array section.
25 Assume c is declared to be a 3-dimensional array with dimension sizes 50, 50, and 50. The eighth
26 example is contiguous, while the ninth and tenth examples are not contiguous.
27 The final four examples show array sections that are formed from more general base expressions.
28 The following are examples that are non-conforming array sections:
29 30 31
a[0:6]
a[0:6:1]
a[1:10]
a[1:]
a[:10:2]
b[10][:][:]
b[10][:][:0]
c[42][0:6][:]
c[42][0:6:2][:]
c[1:10][42][0:6]
S.c[:100]
p->y[:10]
this->a[:N]
(p+10)[:N]
s[:10].x
p[:10]->y
*(xp[:10])
CHAPTER2. DIRECTIVES 45
1 2 3 4 5
6 7
8
9 10 11
12 13
14 15 16 17
18 19
20
21 22
23 24
25
For all three examples, a base language operator is applied in an undefined manner to an array section. The only operator that may be applied to an array section is a subscript operator for which the array section appears as the postfix expression.
C / C++ Fortran
Fortran has built-in support for array sections although some restrictions apply to their use, as enumerated in the following section.
Fortran
Restrictions
Restrictions to array sections are as follows:
• An array section can appear only in clauses where it is explicitly allowed. • A stride expression may not be specified unless otherwise stated.
C / C++
• An element of an array section with a non-zero size must have a complete type.
• The base expression of an array section must have an array or pointer type.
• If a consecutive sequence of array subscript expressions appears in an array section, and the first subscript expression in the sequence uses the extended array section syntax defined in this section, then only the last subscript expression in the sequence may select array elements that have a pointer type.
C / C++
C++
• If the type of the base expression of an array section is a reference to a type T, then the type will
be considered to be T for all purposes of the array section.
• An array section cannot be used in an overloaded [] operator. C++
Fortran
• If a stride expression is specified, it must be positive.
• The upper bound for the last dimension of an assumed-size dummy array must be specified.
• If a list item is an array section with vector subscripts, the first array element must be the lowest in the array element order of the array section.
• If a list item is an array section, the last part-ref of the list item must have a section subscript list. Fortran
46
OpenMP API – Version 5.0 November 2018
1 2.1.6 Iterators
2 Iterators are identifiers that expand to multiple values in the clause on which they appear.
3 The syntax of the iterator modifier is as follows:
4 iterator(iterators-definition)
5 where iterators-definition is one of the following:
6 iterator-specifier[, iterators-definition]
7 where iterator-specifier is one of the following:
8 [ iterator-type ] identifier = range-specification
9 where:
10 • identifier is a base language identifier.
11 • iterator-type is a type name.
12 • iterator-type is a type specifier.
13 • range-specification is of the form begin:end[:step], where begin and end are expressions for
14 which their types can be converted to iterator-type and step is an integral expression.
C / C++
15 In an iterator-specifier, if the iterator-type is not specified then the type of that iterator is of int
16 type.
C / C++ Fortran
17 In an iterator-specifier, if the iterator-type is not specified then the type of that iterator is default
18 integer.
Fortran
19 In a range-specification, if the step is not specified its value is implicitly defined to be 1.
20 An iterator only exists in the context of the clause in which it appears. An iterator also hides all
21 accessible symbols with the same name in the context of the clause.
22 The use of a variable in an expression that appears in the range-specification causes an implicit
23 reference to the variable in all enclosing constructs.
C / C++
C / C++ Fortran
Fortran
CHAPTER2. DIRECTIVES 47
1 2 3 4 5 6 7 8 9
10 11
12
13
14
15
16
17
18
19
20
21
22
C / C++
The values of the iterator are the set of values i0 , . . . , iN −1 where:
• • •
•
i0 = (iterator-type) begin,
ij = (iterator-type) (ij−1 + step), and
ifstep>0,
– i0 < (iterator-type) end,
– iN −1 < (iterator-type) end, and
– (iterator-type) (iN −1 + step) ≥ (iterator-type) end; ifstep<0,
– i0 > (iterator-type) end,
– iN −1 > (iterator-type) end, and
– (iterator-type) (iN −1 + step) ≤ (iterator-type) end. C / C++
Fortran
The values of the iterator are the set of values i1 , . . . , iN where:
• • •
•
i1 = begin,
ij = ij−1 + step, and ifstep>0, –i1≤end, –iN≤end,and
– iN +step>end; ifstep<0, –i1≥end, –iN≥end,and
– iN +step
24 then 1 ≤ number of threads ≤ ThreadsAvailable;
25 else if (dyn-var = false) and (ThreadsRequested ≤ ThreadsAvailable)
26 then number of threads = ThreadsRequested;
27 else if (dyn-var = false) and (ThreadsRequested > ThreadsAvailable)
28 then behavior is implementation defined;
29 30
CHAPTER2. DIRECTIVES 79
1
2 3 4 5
6
7 8
9 10 11
12 2.6.2
13
14
15
16
17
18
19
20 21 22
23 24 25 26 27
28 29 30
Note – Since the initial value of the dyn-var ICV is implementation defined, programs that depend on a specific number of threads for correct execution should explicitly disable dynamic adjustment of the number of threads.
Cross References
• nthreads-var, dyn-var, thread-limit-var, and max-active-levels-var ICVs, see Section 2.5 on page 63.
• parallel construct, see Section 2.6 on page 74. • num_threads clause, see Section 2.6 on page 74. • if clause, see Section 2.15 on page 220.
Controlling OpenMP Thread Affinity
When a thread encounters a parallel directive without a proc_bind clause, the bind-var ICV is used to determine the policy for assigning OpenMP threads to places within the current place partition, that is, within the places listed in the place-partition-var ICV for the implicit task of the encountering thread. If the parallel directive has a proc_bind clause then the binding policy specified by the proc_bind clause overrides the policy specified by the first element of the bind-var ICV. Once a thread in the team is assigned to a place, the OpenMP implementation should not move it to another place.
The master thread affinity policy instructs the execution environment to assign every thread in the team to the same place as the master thread. The place partition is not changed by this policy, and each implicit task inherits the place-partition-var ICV of the parent implicit task.
The close thread affinity policy instructs the execution environment to assign the threads in the team to places close to the place of the parent thread. The place partition is not changed by this policy, and each implicit task inherits the place-partition-var ICV of the parent implicit task. If T is the number of threads in the team, and P is the number of places in the parent’s place partition, then the assignment of threads in the team to places is as follows:
• T ≤ P : The master thread executes on the place of the parent thread. The thread with the next smallest thread number executes on the next place in the place partition, and so on, with wrap around with respect to the place partition of the master thread.
80
OpenMP API – Version 5.0 November 2018
1 2 3 4 5 6
7 8 9
10 11 12 13
14
15
16
17
18
19
20
21
22
23
24
25
26
27 28
29
30
31
32
33
34
35
36
• T > P : Each place p will contain Sp threads with consecutive thread numbers where
⌊T/P⌋ ≤ Sp ≤ ⌈T/P⌉. The first S0 threads (including the master thread) are assigned to the place of the parent thread. The next S1 threads are assigned to the next place in the place partition, and so on, with wrap around with respect to the place partition of the master thread. When P does not divide T evenly, the exact number of threads in a particular place is implementation defined.
The purpose of the spread thread affinity policy is to create a sparse distribution for a team of T threads among the P places of the parent’s place partition. A sparse distribution is achieved by first subdividing the parent partition into T subpartitions if T ≤ P , or P subpartitions if T > P . Then one thread (T ≤ P ) or a set of threads (T > P ) is assigned to each subpartition. The place-partition-var ICV of each implicit task is set to its subpartition. The subpartitioning is not only a mechanism for achieving a sparse distribution, it also defines a subset of places for a thread to use when creating a nested parallel region. The assignment of threads to places is as follows:
• T ≤ P : The parent thread’s place partition is split into T subpartitions, where each subpartition contains ⌊P/T⌋ or ⌈P/T⌉ consecutive places. A single thread is assigned to each subpartition. The master thread executes on the place of the parent thread and is assigned to the subpartition that includes that place. The thread with the next smallest thread number is assigned to the first place in the next subpartition, and so on, with wrap around with respect to the original place partition of the master thread.
• T > P : The parent thread’s place partition is split into P subpartitions, each consisting of a single place. Each subpartition is assigned Sp threads with consecutive thread numbers, where ⌊T/P⌋ ≤ Sp ≤ ⌈T/P⌉. The first S0 threads (including the master thread) are assigned to the subpartition containing the place of the parent thread. The next S1 threads are assigned to the next subpartition, and so on, with wrap around with respect to the original place partition of the master thread. When P does not divide T evenly, the exact number of threads in a particular subpartition is implementation defined.
The determination of whether the affinity request can be fulfilled is implementation defined. If the affinity request cannot be fulfilled, then the affinity of threads in the team is implementation defined.
Note – Wrap around is needed if the end of a place partition is reached before all thread assignments are done. For example, wrap around may be needed in the case of close and T ≤ P , if the master thread is assigned to a place other than the first place in the place partition. In this case, thread 1 is assigned to the place after the place of the master place, thread 2 is assigned to the place after that, and so on. The end of the place partition may be reached before all threads are assigned. In this case, assignment of threads is resumed with the first place in the place partition.
CHAPTER2. DIRECTIVES 81
1 2.7 2
3 4
5
6
7 8
9
10
11
12
13
14
15
16
17
18
19 20 21
teams Construct Summary
The teams construct creates a league of initial teams and the initial thread in each team executes the region.
Syntax
C / C++
The syntax of the teams construct is as follows:
where clause is one of the following:
num_teams(integer-expression) thread_limit(integer-expression) default(shared | none)
private(list)
firstprivate(list)
shared(list)
reduction([default ,]reduction-identifier:list)
allocate([allocator :] list)
C / C++ Fortran
The syntax of the teams construct is as follows:
82
OpenMP API – Version 5.0 November 2018
#pragma omp teams [clause[[,]clause]…]new-line structured-block
!$omp teams [clause[[,]clause]…] structured-block
!$omp end teams
1 where clause is one of the following:
num_teams(scalar-integer-expression) thread_limit(scalar-integer-expression)
default(shared | firstprivate | private | none) private(list)
firstprivate(list)
shared(list)
reduction([default ,]reduction-identifier:list) allocate([allocator :] list)
2 3 4 5 6 7 8 9
Fortran
10 Binding
11 The binding thread set for a teams region is the encountering thread.
12 Description
13 When a thread encounters a teams construct, a league of teams is created. Each team is an initial
14 team, and the initial thread in each team executes the teams region.
15 The number of teams created is implementation defined, but is less than or equal to the value
16 specified in the num_teams clause. A thread may obtain the number of initial teams created by
17 the construct by a call to the omp_get_num_teams routine.
18 The maximum number of threads participating in the contention group that each team initiates is
19 implementation defined, but is less than or equal to the value specified in the thread_limit
20 clause.
21 On a combined or composite construct that includes target and teams constructs, the
22 expressions in num_teams and thread_limit clauses are evaluated on the host device on
23 entry to the target construct.
24 Once the teams are created, the number of initial teams remains constant for the duration of the
25 teams region.
26 Within a teams region, initial team numbers uniquely identify each initial team. Initial team
27 numbers are consecutive whole numbers ranging from zero to one less than the number of initial
28 teams. A thread may obtain its own initial team number by a call to the omp_get_team_num
29 library routine. The policy for assigning the initial threads to places is implementation defined. The
30 teams construct sets the place-partition-var and default-device-var ICVs for each initial thread to
31 an implementation-defined value.
32 After the teams have completed execution of the teams region, the encountering task resumes
33 execution of the enclosing task region.
CHAPTER2. DIRECTIVES 83
1
Execution Model Events
The teams-begin event occurs in a thread that encounters a teams construct before any initial task is created for the corresponding teams region.
Upon creation of each initial task, an initial-task-begin event occurs in the thread that executes the initial task after the initial task is fully initialized but before the thread begins to execute the structured block of the teams construct.
If the teams region creates a native thread, a native-thread-begin event occurs as the first event in the context of the new thread prior to the initial-task-begin event.
When a thread finishes an initial task, an initial-task-end event occurs in the thread.
The teams-end event occurs in the thread that encounters the teams construct after the thread
executes its initial-task-end event but before it resumes execution of the encountering task.
If a native thread is destroyed at the end of a teams region, a native-thread-end event occurs in the
thread as the last event prior to destruction of the thread.
Tool Callbacks
A thread dispatches a registered ompt_callback_parallel_begin callback for each occurrence of a teams-begin event in that thread. The callback occurs in the task that encounters the teams construct. This callback has the type signature ompt_callback_parallel_begin_t. In the dispatched callback,
(flags & ompt_parallel_league) evaluates to true.
A thread dispatches a registered ompt_callback_implicit_task callback with ompt_scope_begin as its endpoint argument for each occurrence of an initial-task-begin in that thread. Similarly, a thread dispatches a registered ompt_callback_implicit_task callback with ompt_scope_end as its endpoint argument for each occurrence of an initial-task-end event in that thread. The callbacks occur in the context of the initial task and have type signature ompt_callback_implicit_task_t. In the dispatched callback,
(flags & ompt_task_initial) evaluates to true.
A thread dispatches a registered ompt_callback_parallel_end callback for each occurrence of a teams-end event in that thread. The callback occurs in the task that encounters the teams construct. This callback has the type signature ompt_callback_parallel_end_t.
A thread dispatches a registered ompt_callback_thread_begin callback for the native-thread-begin event in that thread. The callback occurs in the context of the thread. The callback has type signature ompt_callback_thread_begin_t.
A thread dispatches a registered ompt_callback_thread_end callback for the native-thread-end event in that thread. The callback occurs in the context of the thread. The callback has type signature ompt_callback_thread_end_t.
2 3
4 5 6
7 8
9
10 11
12 13
14
15 16 17 18 19
20
21
22
23
24
25
26
27 28 29
30 31 32
33 34 35
84
OpenMP API – Version 5.0 November 2018
1 Restrictions
2 Restrictions to the teams construct are as follows:
3 •
4 •
5
6 • 7
8 • 9
10 • 11
12
13 • 14
15
16
A program that branches into or out of a teams region is non-conforming.
A program must not depend on any ordering of the evaluations of the clauses of the teams
directive, or on any side effects of the evaluation of the clauses.
At most one thread_limit clause can appear on the directive. The thread_limit expression must evaluate to a positive integer value.
At most one num_teams clause can appear on the directive. The num_teams expression must evaluate to a positive integer value.
A teams region can only be strictly nested within the implicit parallel region or a target region. If a teams construct is nested within a target construct, that target construct must contain no statements, declarations or directives outside of the teams construct.
distribute, distribute simd, distribute parallel worksharing-loop, distribute parallel worksharing-loop SIMD, parallel regions, including any parallel regions arising from combined constructs, omp_get_num_teams() regions, and omp_get_team_num() regions are the only OpenMP regions that may be strictly nested inside the teams region.
17 Cross References
18 •
19 •
20 •
21 •
22 •
23 •
24
25 •
26 •
27 •
28 •
29 •
30 •
31 •
parallel construct, see Section 2.6 on page 74.
distribute construct, see Section 2.9.4.1 on page 120.
distribute simd construct, see Section 2.9.4.2 on page 123.
allocate clause, see Section 2.11.4 on page 158.
target construct, see Section 2.12.5 on page 170.
default, shared, private, firstprivate, and reduction clauses, see Section 2.19.4 on page 282.
omp_get_num_teams routine, see Section 3.2.38 on page 373. omp_get_team_num routine, see Section 3.2.39 on page 374. ompt_callback_thread_begin_t, see Section 4.5.2.1 on page 459. ompt_callback_thread_end_t, see Section 4.5.2.2 on page 460. ompt_callback_parallel_begin_t, see Section 4.5.2.3 on page 461. ompt_callback_parallel_end_t, see Section 4.5.2.4 on page 463. ompt_callback_implicit_task_t, see Section 4.5.2.11 on page 471.
CHAPTER2. DIRECTIVES 85
1 2.8
2 3 4 5
6 7 8 9
10 11
12 13
14 15
16 17
18 19
20 2.8.1 21
22 23 24
Worksharing Constructs
A worksharing construct distributes the execution of the corresponding region among the members of the team that encounters it. Threads execute portions of the region in the context of the implicit tasks that each one is executing. If the team consists of only one thread then the worksharing region is not executed in parallel.
A worksharing region has no barrier on entry; however, an implied barrier exists at the end of the worksharing region, unless a nowait clause is specified. If a nowait clause is present, an implementation may omit the barrier at the end of the worksharing region. In this case, threads that finish early may proceed straight to the instructions that follow the worksharing region without waiting for the other members of the team to finish the worksharing region, and without performing a flush operation.
The OpenMP API defines the worksharing constructs that are described in this section as well as the worksharing-loop construct, which is described in Section 2.9.2 on page 101.
Restrictions
The following restrictions apply to worksharing constructs:
• Each worksharing region must be encountered by all threads in a team or by none at all, unless cancellation has been requested for the innermost enclosing parallel region.
• The sequence of worksharing regions and barrier regions encountered must be the same for every thread in a team.
sections Construct Summary
The sections construct is a non-iterative worksharing construct that contains a set of structured blocks that are to be distributed among and executed by the threads in a team. Each structured block is executed once by one of the threads in the team in the context of its implicit task.
86
OpenMP API – Version 5.0 November 2018
1 Syntax
2 The syntax of the sections construct is as follows:
C / C++
#pragma omp sections [clause[[,]clause]…]new-line {
[#pragma omp section new-line] structured-block
[#pragma omp section new-line structured-block]
…
}
3 4 5 6 7 8 9
10
11 where clause is one of the following:
12
13
14
15
16
17
private(list)
firstprivate(list)
lastprivate([ lastprivate-modifier:] list) reduction([reduction-modifier ,] reduction-identifier : list)
allocate([allocator :] list) nowait
C / C++ Fortran
18 The syntax of the sections construct is as follows:
!$omp sections [clause[[,]clause]…]
[!$omp section] structured-block [!$omp section
structured-block] …
!$omp end sections [nowait]
19
20
21
22
23
24
25
26 where clause is one of the following:
27 28 29 30 31
private(list)
firstprivate(list)
lastprivate([ lastprivate-modifier:] list)
reduction([reduction-modifier ,] reduction-identifier : list) allocate([allocator :] list)
Fortran
CHAPTER2. DIRECTIVES 87
1
Binding
The binding thread set for a sections region is the current team. A sections region binds to the innermost enclosing parallel region. Only the threads of the team that executes the binding parallel region participate in the execution of the structured blocks and the implied barrier of the sections region if the barrier is not eliminated by a nowait clause.
Description
Each structured block in the sections construct is preceded by a section directive except possibly the first block, for which a preceding section directive is optional.
The method of scheduling the structured blocks among the threads in the team is implementation defined.
There is an implicit barrier at the end of a sections construct unless a nowait clause is specified.
Execution Model Events
The section-begin event occurs after an implicit task encounters a sections construct but before the task executes any structured block of the sections region.
The sections-end event occurs after an implicit task finishes execution of a sections region but before it resumes execution of the enclosing context.
The section-begin event occurs before an implicit task starts to execute a structured block in the sections construct for each of those structured blocks that the task executes.
Tool Callbacks
A thread dispatches a registered ompt_callback_work callback with ompt_scope_begin as its endpoint argument and ompt_work_sections as its wstype argument for each occurrence of a section-begin event in that thread. Similarly, a thread dispatches a registered ompt_callback_work callback with ompt_scope_end as its endpoint argument and ompt_work_sections as its wstype argument for each occurrence of a sections-end event in that thread. The callbacks occur in the context of the implicit task. The callbacks have type signature ompt_callback_work_t.
A thread dispatches a registered ompt_callback_dispatch callback for each occurrence of a section-begin event in that thread. The callback occurs in the context of the implicit task. The callback has type signature ompt_callback_dispatch_t.
2 3 4 5
6
7 8
9 10
11 12
13
14 15
16 17
18 19
20
21
22
23
24
25
26
27
28 29 30
88
OpenMP API – Version 5.0 November 2018
1 2
3 4 5
6 7
8 9
10 11
12 13
14 15 16 17
18 2.8.2 19
20 21 22 23
Restrictions
Restrictions to the sections construct are as follows:
• Orphaned section directives are prohibited. That is, the section directives must appear within the sections construct and must not be encountered elsewhere in the sections region.
• The code enclosed in a sections construct must be a structured block.
• Only a single nowait clause can appear on a sections directive.
C++
• A throw executed inside a sections region must cause execution to resume within the same section of the sections region, and the same thread that threw the exception must catch it.
C++
Cross References
• allocate clause, see Section 2.11.4 on page 158.
• private, firstprivate, lastprivate, and reduction clauses, see Section 2.19.4 on
page 282.
• ompt_scope_begin and ompt_scope_end, see Section 4.4.4.11 on page 443.
• ompt_work_sections, see Section 4.4.4.15 on page 445.
• ompt_callback_work_t, see Section 4.5.2.5 on page 464.
• ompt_callback_dispatch_t, see Section 4.5.2.6 on page 465.
single Construct Summary
The single construct specifies that the associated structured block is executed by only one of the threads in the team (not necessarily the master thread), in the context of its implicit task. The other threads in the team, which do not execute the block, wait at an implicit barrier at the end of the single construct unless a nowait clause is specified.
CHAPTER2. DIRECTIVES 89
2
3 4
5
6 7 8 9
10
11
12 13 14
15
16 17 18
19
20 21
22
23 24 25 26
C / C++
The syntax of the single construct is as follows:
where clause is one of the following:
C / C++ Fortran
The syntax of the single construct is as follows:
1
Syntax
#pragma omp single [clause[[,]clause]…]new-line structured-block
private(list) firstprivate(list) copyprivate(list) allocate([allocator :] list)
nowait
!$omp single [clause[[,]clause]…] structured-block
!$omp end single [end_clause[[,]end_clause]…]
where clause is one of the following:
and end_clause is one of the following:
Binding
Fortran
private(list) firstprivate(list) allocate([allocator :] list)
copyprivate(list) nowait
90
OpenMP API – Version 5.0 November 2018
The binding thread set for a single region is the current team. A single region binds to the innermost enclosing parallel region. Only the threads of the team that executes the binding parallel region participate in the execution of the structured block and the implied barrier of the single region if the barrier is not eliminated by a nowait clause.
1 Description
2 Only one of the encountering threads will execute the structured block associated with the single
3 construct. The method of choosing a thread to execute the structured block each time the team
4 encounters the construct is implementation defined. There is an implicit barrier at the end of the
5 single construct unless a nowait clause is specified.
6 Execution Model Events
7 The single-begin event occurs after an implicit task encounters a single construct but
8 before the task starts to execute the structured block of the single region.
9 The single-end event occurs after an implicit task finishes execution of a single region but before
10 it resumes execution of the enclosing region.
11 Tool Callbacks
12 A thread dispatches a registered ompt_callback_work callback with ompt_scope_begin
13 as its endpoint argument for each occurrence of a single-begin event in that thread. Similarly, a
14 thread dispatches a registered ompt_callback_work callback with ompt_scope_begin as
15 its endpoint argument for each occurrence of a single-end event in that thread. For each of these
16 callbacks, the wstype argument is ompt_work_single_executor if the thread executes the
17 structured block associated with the single region; otherwise, the wstype argument is
18 ompt_work_single_other. The callback has type signature ompt_callback_work_t.
19 Restrictions
20 Restrictions to the single construct are as follows:
21 • The copyprivate clause must not be used with the nowait clause.
22 • At most one nowait clause can appear on a single construct.
C++
23 • A throw executed inside a single region must cause execution to resume within the same
24 single region, and the same thread that threw the exception must catch it.
C++
CHAPTER2. DIRECTIVES 91
1
Cross References
• allocate clause, see Section 2.11.4 on page 158.
• private and firstprivate clauses, see Section 2.19.4 on page 282.
• copyprivate clause, see Section 2.19.6.2 on page 312.
• ompt_scope_begin and ompt_scope_end, see Section 4.4.4.11 on page 443.
• ompt_work_single_executor and ompt_work_single_other, see Section 4.4.4.15 on page 445.
• ompt_callback_work_t, Section 4.5.2.5 on page 464. Fortran
workshare Construct Summary
The workshare construct divides the execution of the enclosed structured block into separate units of work, and causes the threads of the team to share the work such that each unit is executed only once by one thread, in the context of its implicit task.
Syntax
The syntax of the workshare construct is as follows:
Binding
The binding thread set for a workshare region is the current team. A workshare region binds to the innermost enclosing parallel region. Only the threads of the team that executes the binding parallel region participate in the execution of the units of work and the implied barrier of the workshare region if the barrier is not eliminated by a nowait clause.
2
3
4
5
6 7
8
9 2.8.3 10
11 12 13
14
15
16 17 18
19
20 21 22 23
!$omp workshare
structured-block
!$omp end workshare [nowait]
92
OpenMP API – Version 5.0 November 2018
Fortran (cont.)
1 Description
2 There is an implicit barrier at the end of a workshare construct unless a nowait clause is
3 specified.
4 An implementation of the workshare construct must insert any synchronization that is required
5 to maintain standard Fortran semantics. For example, the effects of one statement within the
6 structured block must appear to occur before the execution of succeeding statements, and the
7 evaluation of the right hand side of an assignment must appear to complete prior to the effects of
8 assigning to the left hand side.
9 The statements in the workshare construct are divided into units of work as follows:
10 • 11
12 13
14 15
16 •
17 •
18 •
19
20 • 21
22
23 • 24
25 •
26 •
27 28
29 • 30
For array expressions within each statement, including transformational array intrinsic functions that compute scalar values from arrays:
– Evaluation of each element of the array expression, including any references to ELEMENTAL functions, is a unit of work.
– Evaluation of transformational array intrinsic functions may be freely subdivided into any number of units of work.
For an array assignment statement, the assignment of each element is a unit of work. For a scalar assignment statement, the assignment operation is a unit of work.
For a WHERE statement or construct, the evaluation of the mask expression and the masked assignments are each a unit of work.
For a FORALL statement or construct, the evaluation of the mask expression, expressions occurring in the specification of the iteration space, and the masked assignments are each a unit of work.
For an atomic construct, the atomic operation on the storage location designated as x is a unit of work.
For a critical construct, the construct is a single unit of work.
For a parallel construct, the construct is a unit of work with respect to the workshare construct. The statements contained in the parallel construct are executed by a new thread team.
If none of the rules above apply to a portion of a statement in the structured block, then that portion is a unit of work.
31 The transformational array intrinsic functions are MATMUL, DOT_PRODUCT, SUM, PRODUCT,
32 MAXVAL, MINVAL, COUNT, ANY, ALL, SPREAD, PACK, UNPACK, RESHAPE, TRANSPOSE,
33 EOSHIFT, CSHIFT, MINLOC, and MAXLOC.
34 It is unspecified how the units of work are assigned to the threads executing a workshare region.
CHAPTER2. DIRECTIVES 93
Fortran (cont.)
1 2 3
4 5
6 7
8
9 10
11 12
13
14
15
16
17
18
19
20
21 22
23 24
25 26 27
28 29 30 31 32
If an array expression in the block references the value, association status, or allocation status of private variables, the value of the expression is undefined, unless the same value would be computed by every thread.
If an array assignment, a scalar assignment, a masked array assignment, or a FORALL assignment assigns to a private variable in the block, the result is unspecified.
The workshare directive causes the sharing of work to occur only in the workshare construct, and not in the remainder of the workshare region.
Execution Model Events
The workshare-begin event occurs after an implicit task encounters a workshare construct but before the task starts to execute the structured block of the workshare region.
The workshare-end event occurs after an implicit task finishes execution of a workshare region but before it resumes execution of the enclosing context.
Tool Callbacks
A thread dispatches a registered ompt_callback_work callback with ompt_scope_begin as its endpoint argument and ompt_work_workshare as its wstype argument for each occurrence of a workshare-begin event in that thread. Similarly, a thread dispatches a registered ompt_callback_work callback with ompt_scope_end as its endpoint argument and ompt_work_workshare as its wstype argument for each occurrence of a workshare-end event in that thread. The callbacks occur in the context of the implicit task. The callbacks have type signature ompt_callback_work_t.
Restrictions
The following restrictions apply to the workshare construct:
• The only OpenMP constructs that may be closely nested inside a workshare construct are the
94
OpenMP API – Version 5.0 November 2018
•
atomic, critical, and parallel constructs.
Base language statements that are encountered inside a workshare construct but that are not enclosed within a parallel construct that is nested inside the workshare construct must consist of only the following:
– array assignments
– scalar assignments – FORALL statements – FORALL constructs – WHERE statements
1
– WHERE constructs
2 3 4
5 6 7
8
9 10 11 12 13 14
15 2.9
16 2.9.1
17 18
•
•
All array assignments, scalar assignments, and masked array assignments that are encountered inside a workshare construct but are not nested inside a parallel construct that is nested inside the workshare construct must be intrinsic assignments.
The construct must not contain any user defined function calls unless the function is ELEMENTAL or the function call is contained inside a parallel construct that is nested inside the workshare construct.
Cross References
• parallel construct, see Section 2.6 on page 74.
• critical construct, see Section 2.17.1 on page 223.
• atomic construct, see Section 2.17.7 on page 234.
• ompt_scope_begin and ompt_scope_end, see Section 4.4.4.11 on page 443.
• ompt_work_workshare, see Section 4.4.4.15 on page 445.
• ompt_callback_work_t, see Section 4.5.2.5 on page 464. Fortran
Loop-Related Directives Canonical Loop Form
C / C++
The loops associated with a loop-associated directive have canonical loop form if they conform to the following:
for (init-expr; test-expr; incr-expr) structured-block
init-expr One of the following: var = lb
integer-type var = lb random-access-iterator-type var = lb pointer-type var = lb
continued on next page
CHAPTER2. DIRECTIVES 95
C/C++ (cont.)
continued from previous page
test-expr
incr-expr
One of the following:
var relational-op b b relational-op var
One of the following: ++var
var++
– – var
var – –
var += incr
var – = incr
var = var + incr var = incr + var var = var – incr
One of the following:
A variable of a signed or unsigned integer type.
For C++, a variable of a random access iterator type. For C, a variable of a pointer type.
This variable must not be modified during the execution of the for-loop other than in incr-expr.
One of the following:
<
<= > >= !=
Expressions of a type compatible with the type of var that are loop invariant with respect to the outermost associated loop or are one of the following (where var-outer, a1, and a2 have a type compatible with the type of var, var-outer
is var from an outer associated loop, and a1 and a2 are loop invariant integer expressions with respect to the outermost loop):
var
relational-op
lb and b
continued on next page
96 OpenMP API – Version 5.0 November 2018
continued from previous page
var-outer
var-outer + a2
a2 + var-outer var-outer – a2
a2 – var-outer
a1 * var-outer
a1 * var-outer + a2 a2 + a1 * var-outer a1 * var-outer – a2 a2 – a1 * var-outer var-outer * a1 var-outer * a1 + a2 a2 + var-outer * a1 var-outer * a1 – a2 a2 – var-outer * a1
incr An integer expression that is loop invariant with respect to the outermost associated loop.
C / C++ Fortran
1 The loops associated with a loop-associated directive have canonical loop form if each of them is a
2 do-loop that is a do-construct or an inner-shared-do-construct as defined by the Fortran standard. If
3 an end do directive follows a do-construct in which several loop statements share a DO termination
4 statement, then the directive can only be specified for the outermost of these DO statements.
CHAPTER2. DIRECTIVES 97
1
The do-stmt for any do-loop must conform to the following: DO [ label ] var = lb , b [ , incr ]
var
lb and b
A variable of integer type.
Expressions of a type compatible with the type of var that are loop invariant with respect to the outermost associated loop or are one of the following (where var-outer, a1, and a2 have a type compatible with the type of var, var-outer
is var from an outer associated loop, and a1 and a2 are loop invariant integer expressions with respect to the outermost loop):
var-outer
var-outer + a2
a2 + var-outer
var-outer – a2
a2 – var-outer
a1 * var-outer
a1 * var-outer + a2
a2 + a1 * var-outer
a1 * var-outer – a2
a2 – a1 * var-outer
var-outer * a1
var-outer * a1 + a2
a2 + var-outer * a1
var-outer * a1 – a2
a2 – var-outer * a1
An integer expression that is loop invariant with respect to the outermost associated loop. If it is not explicitly specified, its value is assumed to be 1.
Fortran
incr
2 3 4 5 6 7 8
9
The canonical form allows the iteration count of all associated loops to be computed before executing the outermost loop. The incr and range-expr are evaluated before executing the loop-associated construct. If b or lb is loop invariant with respect to the outermost associated loop, it is evaluated before executing the loop-associated construct. If b or lb is not loop invariant with respect to the outermost associated loop, a1 and/or a2 are evaluated before executing the loop-associated construct. The computation is performed for each loop in an integer type. This type is derived from the type of var as follows:
• If var is of an integer type, then the type is the type of var.
98
OpenMP API – Version 5.0 November 2018
C++
1 • If var is of a random access iterator type, then the type is the type that would be used by
2 std::distance applied to variables of the type of var.
C++ C
3 • If var is of a pointer type, then the type is ptrdiff_t. C
4 The behavior is unspecified if any intermediate result required to compute the iteration count
5 cannot be represented in the type determined above.
6 There is no implied synchronization during the evaluation of the lb, b, or incr expressions. It is
7 unspecified whether, in what order, or how many times any side effects within the lb, b, or incr
8 expressions occur.
9
10 Note – Random access iterators are required to support random access to elements in constant
11 time. Other iterators are precluded by the restrictions since they can take linear time or offer limited
12 functionality. The use of tasks to parallelize those cases is therefore advisable.
13
C++
14 A range-based for loop that is valid in the base language and has a begin value that satisfies the
15 random access iterator requirement has canonical loop form. Range-based for loops are of the
16 following form:
17 for (range-decl: range-expr) structured-block
18 The begin-expr and end-expr expressions are derived from range-expr by the base language and
19 assigned to variables to which this specification refers as __begin and __end respectively. Both
20 __begin and __end are privatized. For the purpose of the rest of the standard __begin is the
21 iteration variable of the range-for loop.
C++
CHAPTER2. DIRECTIVES 99
1 2
3 4 5 6
7 8 9
10
11 12
13 14
15 16
17 18
19
20 21
22 23 24
25 26 27 28
Restrictions
The following restrictions also apply:
•
•
• •
• • • • •
•
C / C++
If test-expr is of the form var relational-op b and relational-op is < or <= then incr-expr must cause var to increase on each iteration of the loop. If test-expr is of the form var relational-op b and relational-op is > or >= then incr-expr must cause var to decrease on each iteration of the loop.
If test-expr is of the form b relational-op var and relational-op is < or <= then incr-expr must cause var to decrease on each iteration of the loop. If test-expr is of the form b relational-op var and relational-op is > or >= then incr-expr must cause var to increase on each iteration of the loop.
If test-expr is of the form b != var or var != b then incr-expr must cause var either to increase on each iteration of the loop or to decrease on each iteration of the loop.
If relational-op is != and incr-expr is of the form that has incr then incr must be a constant expression and evaluate to -1 or 1.
C / C++ C++
In the simd construct the only random access iterator types that are allowed for var are pointer types.
The range-expr of a range-for loop must be loop invariant with respect to the outermost associated loop, and must not reference iteration variables of any associated loops.
The loops associated with an ordered clause with a parameter may not include range-for loops.
C++
The b, lb, incr, and range-expr expressions may not reference any var or member of the range-decl of any enclosed associated loop.
For any associated loop where the b or lb expression is not loop invariant with respect to the outermost loop, the var-outer that appears in the expression may not have a random access iterator type.
For any associated loop where b or lb is not loop invariant with respect to the outermost loop, the expression b − lb will have the form c ∗ var-outer + d, where c and d are loop invariant integer expressions. Let incr-outer be the incr expression of the outer loop referred to by var-outer. The value of c ∗ incr-outer mod incr must be 0.
100
OpenMP API – Version 5.0 November 2018
1 2 3 4
5 2.9.2 6
7 8 9
10 11
12
13 14
15
16
17
18
19
20
21
22
23
24
25
26
27 28
Cross References
• simd construct, see Section 2.9.3.1 on page 110.
• lastprivate clause, see Section 2.19.4.5 on page 288. • linear clause, see Section 2.19.4.6 on page 290.
Worksharing-Loop Construct Summary
The worksharing-loop construct specifies that the iterations of one or more associated loops will be executed in parallel by threads in the team in the context of their implicit tasks. The iterations are distributed across threads that already exist in the team that is executing the parallel region to which the worksharing-loop region binds.
Syntax
C / C++
The syntax of the worksharing-loop construct is as follows:
where clause is one of the following:
#pragma omp for [clause[[,]clause]…]new-line for-loops
private(list)
firstprivate(list)
lastprivate([ lastprivate-modifier:] list)
linear(list[ : linear-step])
reduction([ reduction-modifier,]reduction-identifier : list)
schedule([modifier[, modifier]:]kind[,chunk_size]) collapse(n)
ordered[(n)]
nowait
allocate([allocator :]list) order(concurrent)
The for directive places restrictions on the structure of all associated for-loops. Specifically, all associated for-loops must have canonical loop form (see Section 2.9.1 on page 95).
C / C++
CHAPTER2. DIRECTIVES 101
1
2 3 4
5
6 7 8 9
10
11
12
13
14
15
16
17 18
19
20 21 22 23
24
25 26
27 28
29 30
Fortran
The syntax of the worksharing-loop construct is as follows:
where clause is one of the following:
!$omp do [clause[[,]clause]…] do-loops
[!$omp end do [nowait]]
private(list) firstprivate(list)
lastprivate([ lastprivate-modifier:] list)
linear(list[ : linear-step])
reduction([ reduction-modifier,]reduction-identifier : list) schedule([modifier[, modifier]:]kind[,chunk_size])
collapse(n) ordered[(n)] allocate([allocator :]list) order(concurrent)
If an end do directive is not specified, an end do directive is assumed at the end of the do-loops. The do directive places restrictions on the structure of all associated do-loops. Specifically, all
associated do-loops must have canonical loop form (see Section 2.9.1 on page 95). Fortran
Binding
The binding thread set for a worksharing-loop region is the current team. A worksharing-loop region binds to the innermost enclosing parallel region. Only the threads of the team executing the binding parallel region participate in the execution of the loop iterations and the implied barrier of the worksharing-loop region if the barrier is not eliminated by a nowait clause.
Description
The worksharing-loop construct is associated with a loop nest that consists of one or more loops that follow the directive.
There is an implicit barrier at the end of a worksharing-loop construct unless a nowait clause is specified.
The collapse clause may be used to specify how many loops are associated with the worksharing-loop construct. The parameter of the collapse clause must be a constant positive
102
OpenMP API – Version 5.0 November 2018
1 integer expression. If a collapse clause is specified with a parameter value greater than 1, then
2 the iterations of the associated loops to which the clause applies are collapsed into one larger
3 iteration space that is then divided according to the schedule clause. The sequential execution of
4 the iterations in these associated loops determines the order of the iterations in the collapsed
5 iteration space. If no collapse clause is present or its parameter is 1, the only loop that is
6 associated with the worksharing-loop construct for the purposes of determining how the iteration
7 space is divided according to the schedule clause is the one that immediately follows the
8 worksharing-loop directive.
9 If more than one loop is associated with the worksharing-loop construct then the number of times
10 that any intervening code between any two associated loops will be executed is unspecified but will
11 be at least once per iteration of the loop enclosing the intervening code and at most once per
12 iteration of the innermost loop associated with the construct. If the iteration count of any loop that
13 is associated with the worksharing-loop construct is zero and that loop does not enclose the
14 intervening code, the behavior is unspecified.
15 The integer type (or kind, for Fortran) used to compute the iteration count for the collapsed loop is
16 implementation defined.
17 A worksharing-loop has logical iterations numbered 0,1,…,N-1 where N is the number of loop
18 iterations, and the logical numbering denotes the sequence in which the iterations would be
19 executed if a set of associated loop(s) were executed sequentially. At the beginning of each logical
20 iteration, the loop iteration variable of each associated loop has the value that it would have if the
21 set of the associated loop(s) were executed sequentially. The schedule clause specifies how
22 iterations of these associated loops are divided into contiguous non-empty subsets, called chunks,
23 and how these chunks are distributed among threads of the team. Each thread executes its assigned
24 chunk(s) in the context of its implicit task. The iterations of a given chunk are executed in
25 sequential order by the assigned thread. The chunk_size expression is evaluated using the original
26 list items of any variables that are made private in the worksharing-loop construct. It is unspecified
27 whether, in what order, or how many times, any side effects of the evaluation of this expression
28 occur. The use of a variable in a schedule clause expression of a worksharing-loop construct
29 causes an implicit reference to the variable in all enclosing constructs.
30 Different worksharing-loop regions with the same schedule and iteration count, even if they occur
31 in the same parallel region, can distribute iterations among threads differently. The only exception
32 is for the static schedule as specified in Table 2.5. Programs that depend on which thread
33 executes a particular iteration under any other circumstances are non-conforming.
34 See Section 2.9.2.1 on page 109 for details of how the schedule for a worksharing-loop region is
35 determined.
36 The schedule kind can be one of those specified in Table 2.5.
37 The schedule modifier can be one of those specified in Table 2.6. If the static schedule kind is
38 specified or if the ordered clause is specified, and if the nonmonotonic modifier is not
39 specified, the effect is as if the monotonic modifier is specified. Otherwise, unless the
40 monotonic modifier is specified, the effect is as if the nonmonotonic modifier is specified. If
CHAPTER2. DIRECTIVES 103
1 2
3 4 5 6 7
8 9
10 11 12
a schedule clause specifies a modifier then that modifier overrides any modifier that is specified in the run-sched-var ICV.
The ordered clause with the parameter may also be used to specify how many loops are associated with the worksharing-loop construct. The parameter of the ordered clause must be a constant positive integer expression if specified. The parameter of the ordered clause does not affect how the logical iteration space is then divided. If an ordered clause with the parameter is specified for the worksharing-loop construct, then those associated loops form a doacross loop nest.
If the value of the parameter in the collapse or ordered clause is larger than the number of nested loops following the construct, the behavior is unspecified.
If an order(concurrent) clause is present, then after assigning the iterations of the associated loops to their respective threads, as specified in Table 2.5, the iterations may be executed in any order, including concurrently.
TABLE 2.5: schedule Clause kind Values
static
When kind is static, iterations are divided into chunks of size chunk_size, and the chunks are assigned to the threads in the team in a round-robin fashion in the order of the thread number. Each chunk contains chunk_size iterations, except for the chunk that contains the sequentially last iteration, which may have fewer iterations.
When no chunk_size is specified, the iteration space is divided into chunks that are approximately equal in size, and at most one chunk is distributed to each thread. The size of the chunks is unspecified in this case.
A compliant implementation of the static schedule must ensure that the same assignment of logical iteration numbers to threads will be used in
two worksharing-loop regions if the following conditions are satisfied: 1) both worksharing-loop regions have the same number of loop iterations, 2) both worksharing-loop regions have the same value of chunk_size specified, or both worksharing-loop regions have no chunk_size specified, 3) both worksharing-loop regions bind to the same parallel region, and 4) neither loop is associated with a SIMD construct. A data dependence between
the same logical iterations in two such loops is guaranteed to be satisfied
allowing safe use of the nowait clause. table continued on next page
104
OpenMP API – Version 5.0 November 2018
1
table continued from previous page
dynamic
guided
When kind is dynamic, the iterations are distributed to threads in the team in chunks. Each thread executes a chunk of iterations, then requests another chunk, until no chunks remain to be distributed.
Each chunk contains chunk_size iterations, except for the chunk that contains the sequentially last iteration, which may have fewer iterations.
When no chunk_size is specified, it defaults to 1.
When kind is guided, the iterations are assigned to threads in the team in chunks. Each thread executes a chunk of iterations, then requests another chunk, until no chunks remain to be assigned.
For a chunk_size of 1, the size of each chunk is proportional to the number of unassigned iterations divided by the number of threads in the team, decreasing to 1. For a chunk_size with value k (greater than 1), the size
of each chunk is determined in the same way, with the restriction that
the chunks do not contain fewer than k iterations (except for the chunk
that contains the sequentially last iteration, which may have fewer than k iterations).
When no chunk_size is specified, it defaults to 1.
When kind is auto, the decision regarding scheduling is delegated to the compiler and/or runtime system. The programmer gives the implementation the freedom to choose any possible mapping of iterations to threads in the team.
When kind is runtime, the decision regarding scheduling is deferred until run time, and the schedule and chunk size are taken from the run-sched-var ICV. If the ICV is set to auto, the schedule is implementation defined.
auto
runtime
2 Note – For a team of p threads and a loop of n iterations, let ⌈n/p⌉ be the integer q that satisfies
3 n = p ∗ q − r, with 0 <= r < p. One compliant implementation of the static schedule (with no
4 specified chunk_size) would behave as though chunk_size had been specified with value q. Another
5 compliant implementation would assign q iterations to the first p − r threads, and q − 1 iterations to
6 the remaining r threads. This illustrates why a conforming program must not rely on the details of a
7 particular implementation.
8 A compliant implementation of the guided schedule with a chunk_size value of k would assign
9 q = ⌈n/p⌉ iterations to the first available thread and set n to the larger of n − q and p ∗ k. It would
10 then repeat this process until q is greater than or equal to the number of remaining iterations, at
11 which time the remaining iterations form the final chunk. Another compliant implementation could
12 usethesamemethod,exceptwithq=⌈n/(2p)⌉,andsetntothelargerofn−qand2∗p∗k.
13
CHAPTER2. DIRECTIVES 105
1
TABLE 2.6: schedule Clause modifier Values
monotonic
nonmonotonic
simd
When the monotonic modifier is specified then each thread executes the chunks that it is assigned in increasing logical iteration order. When the nonmonotonic modifier is specified then chunks are assigned to threads in any order and the behavior of an application that depends on any execution order of the chunks is unspecified.
When the simd modifier is specified and the loop is associated with a SIMD construct, the chunk_size for all chunks except the first and last chunks is new_chunk_size = ⌈chunk_size/simd_width⌉ ∗ simd_width where simd_width is an implementation-defined value. The first chunk will have at least new_chunk_size iterations except if it is also the last chunk. The last chunk may have fewer iterations than new_chunk_size. If the simd modifier is specified and the loop is not associated with a SIMD construct, the modifier is ignored.
2
3 4
5 6
7 8
9
10
11
12
13
14
15
16
17 18 19
Execution Model Events
The ws-loop-begin event occurs after an implicit task encounters a worksharing-loop construct but before the task starts execution of the structured block of the worksharing-loop region.
The ws-loop-end event occurs after a worksharing-loop region finishes execution but before resuming execution of the encountering task.
The ws-loop-iteration-begin event occurs once for each iteration of a worksharing-loop before the iteration is executed by an implicit task.
Tool Callbacks
A thread dispatches a registered ompt_callback_work callback with ompt_scope_begin as its endpoint argument and work_loop as its wstype argument for each occurrence of a ws-loop-begin event in that thread. Similarly, a thread dispatches a registered ompt_callback_work callback with ompt_scope_end as its endpoint argument and work_loop as its wstype argument for each occurrence of a ws-loop-end event in that thread. The callbacks occur in the context of the implicit task. The callbacks have type signature ompt_callback_work_t.
A thread dispatches a registered ompt_callback_dispatch callback for each occurrence of a ws-loop-iteration-begin event in that thread. The callback occurs in the context of the implicit task. The callback has type signature ompt_callback_dispatch_t.
106
OpenMP API – Version 5.0 November 2018
1 Restrictions
2 Restrictions to the worksharing-loop construct are as follows:
3 •
4 •
5
6 • 7
8 • 9
10 • 11
12 •
13 •
14
15 • 16
17 •
18 •
19 •
20 •
21 •
22
23 •
24 •
25 •
26
27 •
28 •
29 •
30 •
31
No OpenMP directive may appear in the region between any associated loops.
If a collapse clause is specified, exactly one loop must occur in the region at each nesting
level up to the number of loops specified by the parameter of the collapse clause.
If the ordered clause is present, all loops associated with the construct must be perfectly
nested; that is there must be no intervening code between any two loops.
If a reduction clause with the inscan modifier is specified, neither the ordered nor
schedule clause may appear on the worksharing-loop directive.
The values of the loop control expressions of the loops associated with the worksharing-loop
construct must be the same for all threads in the team.
Only one schedule clause can appear on a worksharing-loop directive.
The schedule clause must not appear on the worksharing-loop directive if the associated loop(s) form a non-rectangular loop nest.
The ordered clause must not appear on the worksharing-loop directive if the associated loop(s) form a non-rectangular loop nest.
Only one collapse clause can appear on a worksharing-loop directive.
chunk_size must be a loop invariant integer expression with a positive value.
The value of the chunk_size expression must be the same for all threads in the team.
The value of the run-sched-var ICV must be the same for all threads in the team.
When schedule(runtime) or schedule(auto) is specified, chunk_size must not be specified.
A modifier may not be specified on a linear clause.
Only one ordered clause can appear on a worksharing-loop directive.
The ordered clause must be present on the worksharing-loop construct if any ordered region ever binds to a worksharing-loop region arising from the worksharing-loop construct.
The nonmonotonic modifier cannot be specified if an ordered clause is specified.
Either the monotonic modifier or the nonmonotonic modifier can be specified but not both.
The loop iteration variable may not appear in a threadprivate directive.
If both the collapse and ordered clause with a parameter are specified, the parameter of the ordered clause must be greater than or equal to the parameter of the collapse clause.
CHAPTER2. DIRECTIVES 107
1 2
3 4
5 6
7 8 9
10
11 12 13
14 15
16 17
18 19
20
21
22
23
24 25
26 27 28 29
• A linear clause or an ordered clause with a parameter can be specified on a worksharing-loop directive but not both.
• If an order(concurrent) clause is present, all restrictions from the loop construct with an order(concurrent) clause also apply.
• If an order(concurrent) clause is present, an ordered clause may not appear on the same directive.
C / C++
• The associated for-loops must be structured blocks.
• Only an iteration of the innermost associated loop may be curtailed by a continue statement. • No statement can branch to any associated for statement.
• Only one nowait clause can appear on a for directive.
• A throw executed inside a worksharing-loop region must cause execution to resume within the same iteration of the worksharing-loop region, and the same thread that threw the exception must catch it.
C / C++ Fortran
• The associated do-loops must be structured blocks.
• Only an iteration of the innermost associated loop may be curtailed by a CYCLE statement.
• No statement in the associated loops other than the DO statements can cause a branch out of the loops.
• The do-loop iteration variable must be of type integer.
• The do-loop cannot be a DO WHILE or a DO loop without loop control.
Fortran
Cross References
• order(concurrent) clause, see Section 2.9.5 on page 128.
• ordered construct, see Section 2.17.9 on page 250.
• depend clause, see Section 2.17.11 on page 255.
• private, firstprivate, lastprivate, linear, and reduction clauses, see Section 2.19.4 on page 282.
• ompt_scope_begin and ompt_scope_end, see Section 4.4.4.11 on page 443. • ompt_work_loop, see Section 4.4.4.15 on page 445.
• ompt_callback_work_t, see Section 4.5.2.5 on page 464.
• OMP_SCHEDULE environment variable, see Section 6.1 on page 601.
108
OpenMP API – Version 5.0 November 2018
START
schedule No clause present?
Yes
schedule No kind value is
runtime?
Yes
Use def-sched-var schedule kind
Use schedule kind specified in schedule clause
1 2.9.2.1
2 3 4 5 6 7 8 9
10
11 12
Use run-sched-var schedule kind FIGURE 2.1: Determining the schedule for a Worksharing-Loop
Determining the Schedule of a Worksharing-Loop
When execution encounters a worksharing-loop directive, the schedule clause (if any) on the directive, and the run-sched-var and def-sched-var ICVs are used to determine how loop iterations are assigned to threads. See Section 2.5 on page 63 for details of how the values of the ICVs are determined. If the worksharing-loop directive does not have a schedule clause then the current value of the def-sched-var ICV determines the schedule. If the worksharing-loop directive has a schedule clause that specifies the runtime schedule kind then the current value of the run-sched-var ICV determines the schedule. Otherwise, the value of the schedule clause determines the schedule. Figure 2.1 describes how the schedule for a worksharing-loop is determined.
Cross References
• ICVs, see Section 2.5 on page 63.
CHAPTER2. DIRECTIVES 109
1 2.9.3
2 2.9.3.1
3
4 5 6
7 8
9 10
11
12
13
14
15
16
17
18
19
20
21
22
23 24
SIMD Directives
simd Construct Summary
The simd construct can be applied to a loop to indicate that the loop can be transformed into a SIMD loop (that is, multiple iterations of the loop can be executed concurrently using SIMD instructions).
Syntax
The syntax of the simd construct is as follows:
C / C++
where clause is one of the following:
#pragma omp simd [clause[[,]clause]...]new-line for-loops
if([simd :]scalar-expression) safelen(length) simdlen(length)
linear(list[ : linear-step]) aligned(list[ : alignment]) nontemporal(list) private(list)
lastprivate([ lastprivate-modifier:] list)
reduction([ reduction-modifier,]reduction-identifier : list) collapse(n)
order(concurrent)
The simd directive places restrictions on the structure of the associated for-loops. Specifically, all associated for-loops must have canonical loop form (Section 2.9.1 on page 95).
C / C++
110
OpenMP API – Version 5.0 November 2018
Fortran
!$omp simd [clause[[,]clause...] do-loops
[!$omp end simd]
1 2 3
4 where clause is one of the following:
5 6 7 8 9
10
11
12
13
14
15
if([simd :]scalar-logical-expression) safelen(length)
simdlen(length) linear(list[ : linear-step]) aligned(list[ : alignment]) nontemporal(list)
private(list)
lastprivate([ lastprivate-modifier:] list)
reduction([ reduction-modifier,]reduction-identifier : list) collapse(n)
order(concurrent)
16 If an end simd directive is not specified, an end simd directive is assumed at the end of the
17 do-loops.
18 The simd directive places restrictions on the structure of all associated do-loops. Specifically, all
19 associated do-loops must have canonical loop form (see Section 2.9.1 on page 95).
Fortran
20 Binding
21 A simd region binds to the current task region. The binding thread set of the simd region is the
22 current team.
23 Description
24 The simd construct enables the execution of multiple iterations of the associated loops
25 concurrently by means of SIMD instructions.
26 The collapse clause may be used to specify how many loops are associated with the construct.
27 The parameter of the collapse clause must be a constant positive integer expression. If no
28 collapse clause is present, the only loop that is associated with the simd construct is the one
29 that immediately follows the directive.
CHAPTER2. DIRECTIVES 111
1 2 3 4
5 6 7 8 9
10
11 12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29 30
31 32
If more than one loop is associated with the simd construct, then the iterations of all associated loops are collapsed into one larger iteration space that is then executed with SIMD instructions. The sequential execution of the iterations in all associated loops determines the order of the iterations in the collapsed iteration space.
If more than one loop is associated with the simd construct then the number of times that any intervening code between any two associated loops will be executed is unspecified but will be at least once per iteration of the loop enclosing the intervening code and at most once per iteration of the innermost loop associated with the construct. If the iteration count of any loop that is associated with the simd construct is zero and that loop does not enclose the intervening code, the behavior is unspecified.
The integer type (or kind, for Fortran) used to compute the iteration count for the collapsed loop is implementation defined.
A SIMD loop has logical iterations numbered 0,1,...,N-1 where N is the number of loop iterations, and the logical numbering denotes the sequence in which the iterations would be executed if the associated loop(s) were executed with no SIMD instructions. At the beginning of each logical iteration, the loop iteration variable of each associated loop has the value that it would have if the set of the associated loop(s) were executed sequentially. The number of iterations that are executed concurrently at any given time is implementation defined. Each concurrent iteration will be executed by a different SIMD lane. Each set of concurrent iterations is a SIMD chunk. Lexical forward dependencies in the iterations of the original loop must be preserved within each SIMD chunk.
The safelen clause specifies that no two concurrent iterations within a SIMD chunk can have a distance in the logical iteration space that is greater than or equal to the value given in the clause. The parameter of the safelen clause must be a constant positive integer expression. The simdlen clause specifies the preferred number of iterations to be executed concurrently unless an if clause is present and evaluates to false, in which case the preferred number of iterations to be executed concurrently is one. The parameter of the simdlen clause must be a constant positive integer expression.
C / C++
The aligned clause declares that the object to which each list item points is aligned to the number of bytes expressed in the optional parameter of the aligned clause.
C / C++ Fortran
The aligned clause declares that the location of each list item is aligned to the number of bytes expressed in the optional parameter of the aligned clause.
Fortran
112
OpenMP API – Version 5.0 November 2018
1 The optional parameter of the aligned clause, alignment, must be a constant positive integer
2 expression. If no optional parameter is specified, implementation-defined default alignments for
3 SIMD instructions on the target platforms are assumed.
4 The nontemporal clause specifies that accesses to the storage locations to which the list items
5 refer have low temporal locality across the iterations in which those storage locations are accessed.
6 Restrictions
7 •
8 •
9
10 •
11 •
12 •
13 •
14 •
15 •
16 •
17 •
18
19 •
20 •
21 22
23 • 24
25 •
26 •
No OpenMP directive may appear in the region between any associated loops.
If a collapse clause is specified, exactly one loop must occur in the region at each nesting
level up to the number of loops specified by the parameter of the collapse clause. The associated loops must be structured blocks.
A program that branches into or out of a simd region is non-conforming.
Only one collapse clause can appear on a simd directive.
A list-item cannot appear in more than one aligned clause.
A list-item cannot appear in more than one nontemporal clause.
Only one safelen clause can appear on a simd directive.
Only one simdlen clause can appear on a simd directive.
If both simdlen and safelen clauses are specified, the value of the simdlen parameter must be less than or equal to the value of the safelen parameter.
A modifier may not be specified on a linear clause.
The only OpenMP constructs that can be encountered during execution of a simd region are the atomic construct, the loop construct, the simd construct and the ordered construct with the simd clause.
If an order(concurrent) clause is present, all restrictions from the loop construct with an order(concurrent) clause also apply.
C / C++
The simd region cannot contain calls to the longjmp or setjmp functions. C / C++
C
The type of list items appearing in the aligned clause must be array or pointer. C
CHAPTER2. DIRECTIVES 113
1 2
3
4 5
6 7
8 9
10 11
12 13 14 15
16 2.9.3.2 17
18 19 20 21
C++
• The type of list items appearing in the aligned clause must be array, pointer, reference to array, or reference to pointer.
• No exception can be raised in the simd region. C++
Fortran
• The do-loop iteration variable must be of type integer.
• The do-loop cannot be a DO WHILE or a DO loop without loop control.
• If a list item on the aligned clause has the ALLOCATABLE attribute, the allocation status must be allocated.
• If a list item on the aligned clause has the POINTER attribute, the association status must be associated.
• If the type of a list item on the aligned clause is either C_PTR or Cray pointer, the list item must be defined.
Fortran
Cross References
• order(concurrent) clause, see Section 2.9.5 on page 128. • if Clause, see Section 2.15 on page 220.
• private, lastprivate, linear and reduction clauses, see Section 2.19.4 on page 282.
Worksharing-Loop SIMD Construct Summary
The worksharing-loop SIMD construct specifies that the iterations of one or more associated loops will be distributed across threads that already exist in the team and that the iterations executed by each thread can also be executed concurrently using SIMD instructions. The worksharing-loop SIMD construct is a composite construct.
114
OpenMP API – Version 5.0 November 2018
1 Syntax 2
C / C++
#pragma omp for simd [clause[[,]clause]...]new-line for-loops
3
4 where clause can be any of the clauses accepted by the for or simd directives with identical
5 meanings and restrictions.
6 7 8
C / C++ Fortran
!$omp do simd [clause[[,]clause]...] do-loops
[!$omp end do simd [nowait]]
9 where clause can be any of the clauses accepted by the simd or do directives, with identical
10 meanings and restrictions.
11 If an end do simd directive is not specified, an end do simd directive is assumed at the end of
12 the do-loops.
Fortran
13 Description
14 The worksharing-loop SIMD construct will first distribute the iterations of the associated loop(s)
15 across the implicit tasks of the parallel region in a manner consistent with any clauses that apply to
16 the worksharing-loop construct. The resulting chunks of iterations will then be converted to a
17 SIMD loop in a manner consistent with any clauses that apply to the simd construct.
18 Execution Model Events
19 This composite construct generates the same events as the worksharing-loop construct.
20 Tool Callbacks
21 This composite construct dispatches the same callbacks as the worksharing-loop construct.
CHAPTER2. DIRECTIVES 115
1
Restrictions
All restrictions to the worksharing-loop construct and the simd construct apply to the worksharing-loop SIMD construct. In addition, the following restrictions apply:
• No ordered clause with a parameter can be specified.
• A list item may appear in a linear or firstprivate clause but not both.
Cross References
• worksharing-loop construct, see Section 2.9.2 on page 101. • simd construct, see Section 2.9.3.1 on page 110.
• Data attribute clauses, see Section 2.19.4 on page 282.
declare simd Directive Summary
The declare simd directive can be applied to a function (C, C++ and Fortran) or a subroutine (Fortran) to enable the creation of one or more versions that can process multiple arguments using SIMD instructions from a single invocation in a SIMD loop. The declare simd directive is a declarative directive. There may be multiple declare simd directives for a function (C, C++, Fortran) or subroutine (Fortran).
Syntax
The syntax of the declare simd directive is as follows: C / C++
2 3
4 5
6 7 8 9
10 2.9.3.3 11
12 13 14 15 16
17 18
19 20 21 22
23
24
25
26
27
28
29
where clause is one of the following:
#pragma omp declare simd [clause[[,]clause]...]new-line [#pragma omp declare simd [clause[[,]clause]...]new-line] [ ... ]
function definition or declaration
simdlen(length)
linear(linear-list[ : linear-step]) aligned(argument-list[ : alignment]) uniform(argument-list)
inbranch
notinbranch
C / C++
116
OpenMP API – Version 5.0 November 2018
Fortran
1 !$omp declare simd [(proc-name)][clause[[,]clause]...]
2 where clause is one of the following:
3 4 5 6 7 8
9 Description
Fortran
C / C++
simdlen(length) linear(linear-list[ : linear-step]) aligned(argument-list[ : alignment])
uniform(argument-list) inbranch notinbranch
10 The use of one or more declare simd directives immediately prior to a function declaration or
11 definition enables the creation of corresponding SIMD versions of the associated function that can
12 be used to process multiple arguments from a single invocation in a SIMD loop concurrently.
13 The expressions appearing in the clauses of each directive are evaluated in the scope of the
14 arguments of the function declaration or definition.
C / C++ Fortran
15 The use of one or more declare simd directives for a specified subroutine or function enables
16 the creation of corresponding SIMD versions of the subroutine or function that can be used to
17 process multiple arguments from a single invocation in a SIMD loop concurrently.
Fortran
18 If a SIMD version is created, the number of concurrent arguments for the function is determined by
19 the simdlen clause. If the simdlen clause is used its value corresponds to the number of
20 concurrent arguments of the function. The parameter of the simdlen clause must be a constant
21 positive integer expression. Otherwise, the number of concurrent arguments for the function is
22 implementation defined.
23 The special this pointer can be used as if it was one of the arguments to the function in any of the
24 linear, aligned, or uniform clauses.
C++
25 The uniform clause declares one or more arguments to have an invariant value for all concurrent
26 invocations of the function in the execution of a single SIMD loop.
C++
CHAPTER2. DIRECTIVES 117
1 2
3 4
5 6 7
8
9 10 11 12
13
14
15
16
17 18
19
20 21 22
23 24
25
26 27 28
29
C / C++
The aligned clause declares that the object to which each list item points is aligned to the number of bytes expressed in the optional parameter of the aligned clause.
C / C++ Fortran
The aligned clause declares that the target of each list item is aligned to the number of bytes expressed in the optional parameter of the aligned clause.
Fortran
The optional parameter of the aligned clause, alignment, must be a constant positive integer expression. If no optional parameter is specified, implementation-defined default alignments for SIMD instructions on the target platforms are assumed.
The inbranch clause specifies that the SIMD version of the function will always be called from inside a conditional statement of a SIMD loop. The notinbranch clause specifies that the SIMD version of the function will never be called from inside a conditional statement of a SIMD loop. If neither clause is specified, then the SIMD version of the function may or may not be called from inside a conditional statement of a SIMD loop.
Restrictions
• Each argument can appear in at most one uniform or linear clause.
• At most one simdlen clause can appear in a declare simd directive.
• Either inbranch or notinbranch may be specified, but not both.
• When a linear-step expression is specified in a linear clause it must be either a constant integer expression or an integer-typed parameter that is specified in a uniform clause on the directive.
• The function or subroutine body must be a structured block.
• The execution of the function or subroutine, when called from a SIMD loop, cannot result in the execution of an OpenMP construct except for an ordered construct with the simd clause or an atomic construct.
• The execution of the function or subroutine cannot have any side effects that would alter its execution for concurrent iterations of a SIMD chunk.
• A program that branches into or out of the function is non-conforming.
C / C++
• If the function has any declarations, then the declare simd construct for any declaration that has one must be equivalent to the one specified for the definition. Otherwise, the result is unspecified.
• The function cannot contain calls to the longjmp or setjmp functions. C / C++
118
OpenMP API – Version 5.0 November 2018
1 •
2 •
3 •
4
5 •
6 •
7 8
9 • 10
11 • 12
13 • 14
15 • 16
17
18 •
19 •
20
C
The type of list items appearing in the aligned clause must be array or pointer. C
C++
The function cannot contain any calls to throw.
The type of list items appearing in the aligned clause must be array, pointer, reference to
array, or reference to pointer.
C++ Fortran
proc-name must not be a generic name, procedure pointer or entry name.
If proc-name is omitted, the declare simd directive must appear in the specification part of a subroutine subprogram or a function subprogram for which creation of the SIMD versions is enabled.
Any declare simd directive must appear in the specification part of a subroutine subprogram, function subprogram or interface body to which it applies.
If a declare simd directive is specified in an interface block for a procedure, it must match a declare simd directive in the definition of the procedure.
If a procedure is declared via a procedure declaration statement, the procedure proc-name should appear in the same specification.
If a declare simd directive is specified for a procedure name with explicit interface and a declare simd directive is also specified for the definition of the procedure then the two declare simd directives must match. Otherwise the result is unspecified.
Procedure pointers may not be used to access versions created by the declare simd directive. The type of list items appearing in the aligned clause must be C_PTR or Cray pointer, or the
list item must have the POINTER or ALLOCATABLE attribute. Fortran
21 Cross References
22 • linear clause, see Section 2.19.4.6 on page 290.
23 • reduction clause, see Section 2.19.5.4 on page 300.
CHAPTER2. DIRECTIVES 119
1 2.9.4
2 2.9.4.1
3
4 5 6
7
8
9 10
11
12
13
14
15
16
17
18 19
20
21 22 23
distribute Loop Constructs distribute Construct
Summary
The distribute construct specifies that the iterations of one or more loops will be executed by the initial teams in the context of their implicit tasks. The iterations are distributed across the initial threads of all initial teams that execute the teams region to which the distribute region binds.
Syntax
C / C++
The syntax of the distribute construct is as follows:
Where clause is one of the following:
#pragma omp distribute [clause[[,]clause]...]new-line for-loops
private(list) firstprivate(list) lastprivate(list)
collapse(n) dist_schedule(kind[, chunk_size]) allocate([allocator :]list)
The distribute directive places restrictions on the structure of all associated for-loops. Specifically, all associated for-loops must have canonical loop form (see Section 2.9.1 on page 95).
C / C++ Fortran
The syntax of the distribute construct is as follows:
!$omp distribute [clause[[,]clause]...] do-loops
[!$omp end distribute]
120
OpenMP API – Version 5.0 November 2018
1 Where clause is one of the following:
private(list) firstprivate(list)
lastprivate(list)
collapse(n) dist_schedule(kind[, chunk_size]) allocate([allocator :]list)
2 3 4 5 6 7
8 If an end distribute directive is not specified, an end distribute directive is assumed at
9 the end of the do-loops.
10 The distribute directive places restrictions on the structure of all associated do-loops.
11 Specifically, all associated do-loops must have canonical loop form (see Section 2.9.1 on page 95).
Fortran
12 Binding
13 The binding thread set for a distribute region is the set of initial threads executing an
14 enclosing teams region. A distribute region binds to this teams region.
15 Description
16 The distribute construct is associated with a loop nest consisting of one or more loops that
17 follow the directive.
18 There is no implicit barrier at the end of a distribute construct. To avoid data races the
19 original list items modified due to lastprivate or linear clauses should not be accessed
20 between the end of the distribute construct and the end of the teams region to which the
21 distribute binds.
22 The collapse clause may be used to specify how many loops are associated with the
23 distribute construct. The parameter of the collapse clause must be a constant positive
24 integer expression. If no collapse clause is present or its parameter is 1, the only loop that is
25 associated with the distribute construct is the one that immediately follows the distribute
26 construct. If a collapse clause is specified with a parameter value greater than 1 and more than
27 one loop is associated with the distribute construct, then the iteration of all associated loops
28 are collapsed into one larger iteration space. The sequential execution of the iterations in all
29 associated loops determines the order of the iterations in the collapsed iteration space.
30 A distribute loop has logical iterations numbered 0,1,...,N-1 where N is the number of loop
31 iterations, and the logical numbering denotes the sequence in which the iterations would be
32 executed if the set of associated loop(s) were executed sequentially. At the beginning of each
CHAPTER2. DIRECTIVES 121
1 2
3 4 5 6 7 8
9 10
11 12 13 14 15
16
17
18 19
20 21
22
23
24
25
26
27
28
29
30
31
32
33 34
logical iteration, the loop iteration variable of each associated loop has the value that it would have if the set of the associated loop(s) were executed sequentially.
If more than one loop is associated with the distribute construct then the number of times that any intervening code between any two associated loops will be executed is unspecified but will be at least once per iteration of the loop enclosing the intervening code and at most once per iteration of the innermost loop associated with the construct. If the iteration count of any loop that is associated with the distribute construct is zero and that loop does not enclose the intervening code, the behavior is unspecified.
The integer type (or kind, for Fortran) used to compute the iteration count for the collapsed loop is implementation defined.
If dist_schedule is specified, kind must be static. If specified, iterations are divided into chunks of size chunk_size, chunks are assigned to the initial teams of the league in a round-robin fashion in the order of the initial team number. When no chunk_size is specified, the iteration space is divided into chunks that are approximately equal in size, and at most one chunk is distributed to each initial team of the league. The size of the chunks is unspecified in this case.
When no dist_schedule clause is specified, the schedule is implementation defined. Execution Model Events
The distribute-begin event occurs after an implicit task encounters a distribute construct but before the task starts to execute the structured block of the distribute region.
The distribute-end event occurs after an implicit task finishes execution of a distribute region but before it resumes execution of the enclosing context.
Tool Callbacks
A thread dispatches a registered ompt_callback_work callback with ompt_scope_begin as its endpoint argument and ompt_work_distribute as its wstype argument for each occurrence of a distribute-begin event in that thread. Similarly, a thread dispatches a registered ompt_callback_work callback with ompt_scope_end as its endpoint argument and ompt_work_distribute as its wstype argument for each occurrence of a distribute-end event in that thread. The callbacks occur in the context of the implicit task. The callbacks have type signature ompt_callback_work_t.
Restrictions
Restrictions to the distribute construct are as follows:
• The distribute construct inherits the restrictions of the worksharing-loop construct.
• Each distribute region must be encountered by the initial threads of all initial teams in a league or by none at all.
122
OpenMP API – Version 5.0 November 2018
1 2
3 4
5
6 7
8
9 10 11 12
13 2.9.4.2 14
15 16 17
18 19
20 21
22 23
• The sequence of the distribute regions encountered must be the same for every initial thread of every initial team in a league.
• The region corresponding to the distribute construct must be strictly nested inside a teams region.
• A list item may appear in a firstprivate or lastprivate clause but not both.
• The dist_schedule clause must not appear on the distribute directive if the associated
loop(s) form a non-rectangular loop nest.
Cross References
• teams construct, see Section 2.7 on page 82
• worksharing-loop construct, see Section 2.9.2 on page 101.
• ompt_work_distribute, see Section 4.4.4.15 on page 445. • ompt_callback_work_t, see Section 4.5.2.5 on page 464.
distribute simd Construct Summary
The distribute simd construct specifies a loop that will be distributed across the master threads of the teams region and executed concurrently using SIMD instructions. The distribute simd construct is a composite construct.
Syntax
The syntax of the distribute simd construct is as follows: C / C++
where clause can be any of the clauses accepted by the distribute or simd directives with identical meanings and restrictions.
C / C++
#pragma omp distribute simd [clause[[,]clause]...]newline for-loops
CHAPTER2. DIRECTIVES 123
!$omp distribute simd [clause[[,]clause]...] do-loops
[!$omp end distribute simd]
1 2 3
4 5
6 7
8
9 10 11 12
13 14
15 16
17 18
19 20
21
22 23 24 25
Fortran
where clause can be any of the clauses accepted by the distribute or simd directives with identical meanings and restrictions.
If an end distribute simd directive is not specified, an end distribute simd directive is assumed at the end of the do-loops.
Fortran
Description
The distribute simd construct will first distribute the iterations of the associated loop(s) according to the semantics of the distribute construct and any clauses that apply to the distribute construct. The resulting chunks of iterations will then be converted to a SIMD loop in a manner consistent with any clauses that apply to the simd construct.
Execution Model Events
This composite construct generates the same events as the distribute construct. Tool Callbacks
This composite construct dispatches the same callbacks as the distribute construct.
Restrictions
• The restrictions for the distribute and simd constructs apply.
• A list item may not appear in a linear clause unless it is the loop iteration variable of a loop
that is associated with the construct.
• The conditional modifier may not appear in a lastprivate clause.
Cross References
• simd construct, see Section 2.9.3.1 on page 110.
• distribute construct, see Section 2.9.4.1 on page 120. • Data attribute clauses, see Section 2.19.4 on page 282.
124
OpenMP API – Version 5.0 November 2018
1 2.9.4.3 2
3 4 5
6 7
8 9
10 11
12 13 14
15 16
17 18
19
20
21
22
23
24
25
26
27 28
Distribute Parallel Worksharing-Loop Construct Summary
The distribute parallel worksharing-loop construct specifies a loop that can be executed in parallel by multiple threads that are members of multiple teams. The distribute parallel worksharing-loop construct is a composite construct.
Syntax
The syntax of the distribute parallel worksharing-loop construct is as follows:
C / C++
where clause can be any of the clauses accepted by the distribute or parallel worksharing-loop directives with identical meanings and restrictions.
C / C++ Fortran
where clause can be any of the clauses accepted by the distribute or parallel worksharing-loop directives with identical meanings and restrictions.
If an end distribute parallel do directive is not specified, an end distribute parallel do directive is assumed at the end of the do-loops.
Fortran
Description
The distribute parallel worksharing-loop construct will first distribute the iterations of the associated loop(s) into chunks according to the semantics of the distribute construct and any clauses that apply to the distribute construct. Each of these chunks will form a loop. Each resulting loop will then be distributed across the threads within the teams region to which the distribute construct binds in a manner consistent with any clauses that apply to the parallel worksharing-loop construct.
Execution Model Events
This composite construct generates the same events as the distribute and parallel worksharing-loop constructs.
#pragma omp distribute parallel for [clause[[,]clause]...]newline for-loops
!$omp distribute parallel do [clause[[,]clause]...] do-loops
[!$omp end distribute parallel do]
CHAPTER2. DIRECTIVES 125
1
Tool Callbacks
This composite construct dispatches the same callbacks as the distribute and parallel worksharing-loop constructs.
Restrictions
• The restrictions for the distribute and parallel worksharing-loop constructs apply. • No ordered clause can be specified.
• No linear clause can be specified.
• The conditional modifier may not appear in a lastprivate clause.
Cross References
• distribute construct, see Section 2.9.4.1 on page 120.
• Parallel worksharing-loop construct, see Section 2.13.1 on page 185. • Data attribute clauses, see Section 2.19.4 on page 282.
Distribute Parallel Worksharing-Loop SIMD Construct Summary
The distribute parallel worksharing-loop SIMD construct specifies a loop that can be executed concurrently using SIMD instructions in parallel by multiple threads that are members of multiple teams. The distribute parallel worksharing-loop SIMD construct is a composite construct.
Syntax
C / C++
The syntax of the distribute parallel worksharing-loop SIMD construct is as follows:
where clause can be any of the clauses accepted by the distribute or parallel worksharing-loop SIMD directives with identical meanings and restrictions.
C / C++
2 3
4 5 6 7 8
9 10 11 12
13 2.9.4.4 14
15 16 17
18
19
20 21 22
23 24
#pragma omp distribute parallel for simd \ [clause[ [,] clause] ... ] newline
for-loops
126
OpenMP API – Version 5.0 November 2018
Fortran
1 The syntax of the distribute parallel worksharing-loop SIMD construct is as follows:
2 3 4
5 where clause can be any of the clauses accepted by the distribute or parallel worksharing-loop
6 SIMD directives with identical meanings and restrictions.
7 If an end distribute parallel do simd directive is not specified, an end distribute
8 parallel do simd directive is assumed at the end of the do-loops.
Fortran
9 Description
10 The distribute parallel worksharing-loop SIMD construct will first distribute the iterations of the
11 associated loop(s) according to the semantics of the distribute construct and any clauses that
12 apply to the distribute construct. The resulting loops will then be distributed across the
13 threads contained within the teams region to which the distribute construct binds in a
14 manner consistent with any clauses that apply to the parallel worksharing-loop construct. The
15 resulting chunks of iterations will then be converted to a SIMD loop in a manner consistent with
16 any clauses that apply to the simd construct.
17 Execution Model Events
18 This composite construct generates the same events as the distribute and parallel
19 worksharing-loop SIMD constructs.
20 Tool Callbacks
21 This composite construct dispatches the same callbacks as the distribute and parallel
22 worksharing-loop SIMD constructs.
23 Restrictions
!$omp distribute parallel do simd [clause[[,]clause]...] do-loops
[!$omp end distribute parallel do simd]
24 •
25 •
26 •
27
28 •
The restrictions for the distribute and parallel worksharing-loop SIMD constructs apply. No ordered clause can be specified.
A list item may not appear in a linear clause unless it is the loop iteration variable of a loop that is associated with the construct.
The conditional modifier may not appear in a lastprivate clause.
CHAPTER2. DIRECTIVES 127
1 2 3 4
5 2.9.5 6
7 8
9
10
11 12
13
14
15
16
17
18
19
20
21 22 23
24 25
Cross References
• distribute construct, see Section 2.9.4.1 on page 120.
• Parallel worksharing-loop SIMD construct, see Section 2.13.5 on page 190. • Data attribute clauses, see Section 2.19.4 on page 282.
loop Construct Summary
A loop construct specifies that the iterations of the associated loops may execute concurrently and permits the encountering thread(s) to execute the loop accordingly.
Syntax
C / C++
The syntax of the loop construct is as follows:
where clause is one of the following:
#pragma omp loop [clause[[,]clause]...]new-line for-loops
bind(binding) collapse(n) order(concurrent) private(list)
lastprivate(list)
reduction([default ,]reduction-identifier:list)
where binding is one of the following:
The loop directive places restrictions on the structure of all associated for-loops. Specifically, all associated for-loops must have canonical loop form (see Section 2.9.1 on page 95).
C / C++
teams
parallel
thread
128
OpenMP API – Version 5.0 November 2018
Fortran
1 The syntax of the loop construct is as follows:
2 3 4
5 where clause is one of the following:
6 7 8 9
10 11
12 where binding is one of the following:
13 14 15
16 If an end loop directive is not specified, an end loop directive is assumed at the end of the
17 do-loops.
18 The loop directive places restrictions on the structure of all associated do-loops. Specifically, all
19 associated do-loops must have canonical loop form (see Section 2.9.1 on page 95).
Fortran
20 Binding
21 If the bind clause is present on the construct, the binding region is determined by binding.
22 Specifically, if binding is teams and there exists an innermost enclosing teams region then the
23 binding region is that teams region; if binding is parallel then the binding region is the
24 innermost enclosing parallel region, which may be an implicit parallel region; and if binding is
25 thread then the binding region is not defined. If the bind clause is not present on the construct
26 and the loop construct is closely nested inside a teams or parallel construct, the binding
27 region is the corresponding teams or parallel region. If none of those conditions hold, the
28 binding region is not defined.
29 If the binding region is a teams region, then the binding thread set is the set of master threads that
30 are executing that region. If the binding region is a parallel region, then the binding thread set is the
31 team of threads that are executing that region. If the binding region is not defined, then the binding
32 thread set is the encountering thread.
!$omp loop [clause[[,]clause]...] do-loops
[!$omp end loop]
bind(binding) collapse(n) order(concurrent)
private(list)
lastprivate(list)
reduction([default ,]reduction-identifier:list)
teams
parallel
thread
CHAPTER2. DIRECTIVES 129
1
Description
The loop construct is associated with a loop nest that consists of one or more loops that follow the directive. The directive asserts that the iterations may execute in any order, including concurrently.
The collapse clause may be used to specify how many loops are associated with the loop construct. The parameter of the collapse clause must be a constant positive integer expression. If a collapse clause is specified with a parameter value greater than 1, then the iterations of the associated loops to which the clause applies are collapsed into one larger iteration space with unspecified ordering. If no collapse clause is present or its parameter is 1, the only loop that is associated with the loop construct is the one that immediately follows the loop directive.
If more than one loop is associated with the loop construct then the number of times that any intervening code between any two associated loops will be executed is unspecified but will be at least once per iteration of the loop enclosing the intervening code and at most once per iteration of the innermost loop associated with the construct. If the iteration count of any loop that is associated with the loop construct is zero and that loop does not enclose the intervening code, the behavior is unspecified.
The iteration space of the associated loops correspond to logical iterations numbered 0,1,...,N-1 where N is the number of loop iterations, and the logical numbering denotes the sequence in which the iterations would be executed if a set of associated loop(s) were executed sequentially. At the beginning of each logical iteration, the loop iteration variable of each associated loop has the value that it would have if the set of the associated loop(s) were executed sequentially.
Each logical iteration is executed once per instance of the loop region that is encountered by the binding thread set.
If the order(concurrent) clause appears on the loop construct, the iterations of the associated loops may execute in any order, including concurrently. If the order clause is not present, the behavior is as if the order(concurrent) clause appeared on the construct.
The set of threads that may execute the iterations of the loop region is the binding thread set. Each iteration is executed by one thread from this set.
If the loop region binds to a teams region, the threads in the binding thread set may continue execution after the loop region without waiting for all iterations of the associated loop(s) to complete. The iterations are guaranteed to complete before the end of the teams region.
If the loop region does not bind to a teams region, all iterations of the associated loop(s) must complete before the encountering thread(s) continue execution after the loop region.
Restrictions
Restrictions to the loop construct are as follows:
• If the collapse clause is specified then there may be no intervening OpenMP directives
between the associated loops.
2 3
4 5 6 7 8 9
10
11
12
13
14
15
16 17 18 19 20
21 22
23 24 25
26 27
28 29 30
31 32
33 34
35 36
130
OpenMP API – Version 5.0 November 2018
1 •
2 •
3
4 • 5
6 • 7
8 • 9
10 • 11
12 • 13
14
15 • 16
17 • 18
19 •
20 •
21 •
22 •
23 •
24
At most one collapse clause can appear on a loop directive.
A list item may not appear in a lastprivate clause unless it is the loop iteration variable of a
loop that is associated with the construct.
If a loop construct is not nested inside another OpenMP construct and it appears in a procedure,
the bind clause must be present.
If a loop region binds to a teams or parallel region, it must be encountered by all threads in
the binding thread set or by none of them.
If the bind clause is present and binding is teams, the loop region corresponding to the
loop construct must be strictly nested inside a teams region.
If the bind clause is present and binding is parallel, the behavior is unspecified if the loop
region corresponding to a loop construct is closely nested inside a simd region.
The only constructs that may be nested inside a loop region are the loop construct, the parallel construct, the simd construct, and combined constructs for which the first construct is a parallel construct.
A loop region corresponding to a loop construct may not contain calls to procedures that contain OpenMP directives.
A loop region corresponding to a loop construct may not contain calls to the OpenMP Runtime API.
If a threadprivate variable is referenced inside a loop region, the behavior is unspecified. C / C++
The associated for-loops must be structured blocks.
No statement can branch to any associated for statement. C / C++
Fortran
The associated do-loops must be structured blocks.
No statement in the associated loops other than the DO statements can cause a branch out of the
loops.
Fortran
25 Cross References
26 • The single construct, see Section 2.8.2 on page 89.
27 • The Worksharing-Loop construct, see Section 2.9.2 on page 101.
28 • SIMD directives, see Section 2.9.3 on page 110.
29 • distribute construct, see Section 2.9.4.1 on page 120.
CHAPTER2. DIRECTIVES 131
1 2.9.6 2
3 4
5
6 7 8 9
10 11 12
13
14 15
16
17
18
19
20
21
22
23
24
25
26 27
28 29
scan Directive Summary
The scan directive specifies that scan computations update the list items on each iteration. Syntax
The syntax of the scan directive is as follows:
where clause is one of the following:
and where loop-associated-directive is a for, for simd, or simd directive. C / C++
Fortran
The syntax of the scan directive is as follows:
C / C++
loop-associated-directive for-loop-headers
{
structured-block
#pragma omp scan clause new-line
structured-block
}
inclusive(list)
exclusive(list)
loop-associated-directive do-loop-headers
structured-block !$omp scan clause
structured-block do-termination-stmts(s) [end-loop-associated-directive]
where clause is one of the following:
and where loop-associated-directive (end-loop-associated-directive) is a do (end do), do simd (end do simd), or simd (end simd) directive.
Fortran
inclusive(list)
exclusive(list)
132
OpenMP API – Version 5.0 November 2018
1 Description
2 The scan directive may appear in the body of a loop or loop nest associated with an enclosing
3 worksharing-loop, worksharing-loop SIMD, or simd construct, to specify that a scan computation
4 updates each list item on each loop iteration. The directive specifies that either an inclusive scan
5 computation is to be performed for each list item that appears in an inclusive clause on the
6 directive, or an exclusive scan computation is to be performed for each list item that appears in an
7 exclusive clause on the directive. For each list item for which a scan computation is specified,
8 statements that lexically precede or follow the directive constitute one of two phases for a given
9 logical iteration of the loop – an input phase or a scan phase.
10 If the list item appears in an inclusive clause, all statements in the structured block that
11 lexically precede the directive constitute the input phase and all statements in the structured block
12 that lexically follow the directive constitute the scan phase. If the list item appears in an
13 exclusive clause and the iteration is not the last iteration, all statements in the structured block
14 that lexically precede the directive constitute the scan phase and all statements in the structured
15 block that lexically follow the directive constitute the input phase. If the list item appears in an
16 exclusive clause and the iteration is the last iteration, the iteration does not have an input phase
17 and all statements that lexically precede or follow the directive constitute the scan phase for the
18 iteration. The input phase contains all computations that update the list item in the iteration, and the
19 scan phase ensures that any statement that reads the list item uses the result of the scan computation
20 for that iteration.
21 The result of a scan computation for a given iteration is calculated according to the last generalized
22 prefix sum (PRESUMlast) applied over the sequence of values given by the original value of the list
23 item prior to the loop and all preceding updates to the list item in the logical iteration space of the
24 loop. The operation PRESUMlast( op, a1, ..., aN ) is defined for a given binary operator op and a
25 sequence of N values a1 , . . . , aN as follows:
26 • ifN=1,a1
27 • if N > 1, op( PRESUMlast(op, a1, …, aK), PRESUMlast(op, aL, …, aN) ), where
28 1 ≤ K + 1 = L ≤ N.
29 At the beginning of the input phase of each iteration, the list item is initialized with the initializer
30 value of the reduction-identifier specified by the reduction clause on the innermost enclosing
31 construct. The update value of a list item is, for a given iteration, the value of the list item on
32 completion of its input phase.
33 Let orig-val be the value of the original list item on entry to the enclosing worksharing-loop,
34 worksharing-loop SIMD, or simd construct. Let combiner be the combiner for the
35 reduction-identifier specified by the reduction clause on the construct. And let uI be the update
36 value of a list item for iteration I. For list items appearing in an inclusive clause on the scan
37 directive, at the beginning of the scan phase for iteration I the list item is assigned the result of the
38 operation PRESUMlast( combiner, orig-val, u0, …, uI). For list items appearing in an
39 exclusive clause on the scan directive, at the beginning of the scan phase for iteration I = 0
CHAPTER2. DIRECTIVES 133
1 2
3 4
5 6 7
8
9 10
11 12
13 14 15
16 17 18 19 20
the list item is assigned the value orig-val, and at the beginning of the scan phase for iteration I > 0 the list item is assigned the result of the operation PRESUMlast ( combiner, orig-val, u0 , . . . , uI-1 ).
Restrictions
Restrictions to the scan directive are as follows:
134
OpenMP API – Version 5.0 November 2018
•
•
• •
Exactly one scan directive must appear in the loop body of an enclosing worksharing-loop, worksharing-loop SIMD, or simd construct on which a reduction clause with the inscan modifier is present.
A list item that appears in the inclusive or exclusive clause must appear in a reduction clause with the inscan modifier on the enclosing worksharing-loop, worksharing-loop SIMD, or simd construct.
Cross-iteration dependences across different logical iterations must not exist, except for dependences for the list items specified in an inclusive or exclusive clause.
Intra-iteration dependences from a statement in the structured block preceding a scan directive to a statement in the structured block following a scan directive must not exist, except for dependences for the list items specified in an inclusive or exclusive clause.
Cross References
• worksharing-loop construct, see Section 2.9.2 on page 101.
• simd construct, see Section 2.9.3.1 on page 110.
• worksharing-loop SIMD construct, see Section 2.9.3.2 on page 114. • reduction clause, see Section 2.19.5.4 on page 300.
1 2.10
2 2.10.1
3 4
5
6
7 8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24 25 26
Tasking Constructs
task Construct Summary
The task construct defines an explicit task. Syntax
The syntax of the task construct is as follows:
where clause is one of the following:
C / C++
#pragma omp task [clause[[,]clause]…]new-line structured-block
if([ task :]scalar-expression) final(scalar-expression) untied
default(shared | none) mergeable
private(list) firstprivate(list)
shared(list)
in_reduction(reduction-identifier : list) depend([depend-modifier,] dependence-type : locator-list)
priority(priority-value) allocate([allocator :] list) affinity([aff-modifier :] locator-list) detach(event-handle)
where aff-modifier is one of the following: iterator(iterators-definition)
where event-handle is a variable of the omp_event_handle_t type. C / C++
CHAPTER2. DIRECTIVES 135
1
2 3 4
5
6 7 8 9
10
11
12
13
14
15
16
17
18
19
20 21 22
23
24 25
Fortran
The syntax of the task construct is as follows:
where clause is one of the following:
if([ task :]scalar-logical-expression) final(scalar-logical-expression) untied
default(private | firstprivate | shared | none) mergeable
private(list)
firstprivate(list)
shared(list)
in_reduction(reduction-identifier : list) depend([depend-modifier,] dependence-type : locator-list) priority(priority-value)
allocate([allocator :] list) affinity([aff-modifier :] locator-list) detach(event-handle)
where aff-modifier is one of the following: iterator(iterators-definition)
where event-handle is an integer variable of omp_event_handle_kind kind. Fortran
Binding
The binding thread set of the task region is the current team. A task region binds to the innermost enclosing parallel region.
136
OpenMP API – Version 5.0 November 2018
!$omp task [clause[[,]clause]…] structured-block
!$omp end task
1 Description
2 The task construct is a task generating construct. When a thread encounters a task construct, an
3 explicit task is generated from the code for the associated structured-block. The data environment
4 of the task is created according to the data-sharing attribute clauses on the task construct, per-data
5 environment ICVs, and any defaults that apply. The data environment of the task is destroyed when
6 the execution code of the associated structured-block is completed.
7 The encountering thread may immediately execute the task, or defer its execution. In the latter case,
8 any thread in the team may be assigned the task. Completion of the task can be guaranteed using
9 task synchronization constructs. If a task construct is encountered during execution of an outer
10 task, the generated task region corresponding to this construct is not a part of the outer task
11 region unless the generated task is an included task.
12 If a detach clause is present on a task construct a new event allow-completion-event is created.
13 The allow-completion-event is connected to the completion of the associated task region. The
14 original event-handle will be updated to represent the allow-completion-event event before the task
15 data environment is created. The event-handle will be considered as if it was specified on a
16 firstprivate clause. The use of a variable in a detach clause expression of a task
17 construct causes an implicit reference to the variable in all enclosing constructs.
18 If no detach clause is present on a task construct the generated task is completed when the
19 execution of its associated structured-block is completed. If a detach clause is present on a task
20 construct the task is completed when the execution of its associated structured-block is completed
21 and the allow-completion-event is fulfilled.
22 When an if clause is present on a task construct, and the if clause expression evaluates to false,
23 an undeferred task is generated, and the encountering thread must suspend the current task region,
24 for which execution cannot be resumed until execution of the structured block that is associated
25 with the generated task is completed. The use of a variable in an if clause expression of a task
26 construct causes an implicit reference to the variable in all enclosing constructs.
27 When a final clause is present on a task construct and the final clause expression evaluates
28 to true, the generated task will be a final task. All task constructs encountered during execution of
29 a final task will generate final and included tasks. The use of a variable in a final clause
30 expression of a task construct causes an implicit reference to the variable in all enclosing
31 constructs. Encountering a task construct with the detach clause during the execution of a final
32 task results in unspecified behavior.
33 The if clause expression and the final clause expression are evaluated in the context outside of
34 the task construct, and no ordering of those evaluations is specified..
35 A thread that encounters a task scheduling point within the task region may temporarily suspend
36 the task region. By default, a task is tied and its suspended task region can only be resumed by
37 the thread that started its execution. If the untied clause is present on a task construct, any
38 thread in the team can resume the task region after a suspension. The untied clause is ignored
CHAPTER2. DIRECTIVES 137
1 2
3 4 5
6 7
8
9
10
11
12
13
14
15
16 17 18
19 20 21
22
23 24
25 26 27 28
29
30 31
if a final clause is present on the same task construct and the final clause expression evaluates to true, or if a task is an included task.
The task construct includes a task scheduling point in the task region of its generating task, immediately following the generation of the explicit task. Each explicit task region includes a task scheduling point at the end of its associated structured-block.
When the mergeable clause is present on a task construct, the generated task is a mergeable task.
The priority clause is a hint for the priority of the generated task. The priority-value is a non-negative integer expression that provides a hint for task execution order. Among all tasks ready to be executed, higher priority tasks (those with a higher numerical value in the priority clause expression) are recommended to execute before lower priority ones. The default priority-value when no priority clause is specified is zero (the lowest priority). If a value is specified in the priority clause that is higher than the max-task-priority-var ICV then the implementation will use the value of that ICV. A program that relies on task execution order being determined by this priority-value may have unspecified behavior.
The affinity clause is a hint to indicate data affinity of the generated task. The task is recommended to execute closely to the location of the list items. A program that relies on the task execution location being determined by this list may have unspecified behavior.
The list items that appear in the affinity clause may reference iterators defined by an iterators-definition appearing in the same clause. The list items that appear in the affinity clause may include array sections.
C / C++
The list items that appear in the affinity clause may use shape-operators. C / C++
If a list item appears in an affinity clause then data affinity refers to the original list item.
Note – When storage is shared by an explicit task region, the programmer must ensure, by adding proper synchronization, that the storage does not reach the end of its lifetime before the explicit task region completes its execution.
Execution Model Events
The task-create event occurs when a thread encounters a construct that causes a new task to be created. The event occurs after the task is initialized but before it begins execution or is deferred.
138
OpenMP API – Version 5.0 November 2018
1 Tool Callbacks
2 A thread dispatches a registered ompt_callback_task_create callback for each occurrence
3 of a task-create event in the context of the encountering task. This callback has the type signature
4 ompt_callback_task_create_t and the flags argument indicates the task types shown in
5 Table 2.7.
TABLE 2.7: ompt_callback_task_create callback flags evaluation
Operation
(flags & ompt_task_explicit) (flags & ompt_task_undeferred) (flags & ompt_task_final)
(flags & ompt_task_untied) (flags & ompt_task_mergeable) (flags & ompt_task_merged)
Evaluates to true
Always in the dispatched callback If the task is an undeferred task
If the task is a final task
If the task is an untied task
If the task is a mergeable task If the task is a merged task
6 Restrictions
7 Restrictions to the task construct are as follows:
8 •
9 •
10
11 •
12 •
13 •
14 •
15 •
16
17 • 18
A program that branches into or out of a task region is non-conforming.
A program must not depend on any ordering of the evaluations of the clauses of the task
directive, or on any side effects of the evaluations of the clauses. At most one if clause can appear on the directive.
At most one final clause can appear on the directive.
At most one priority clause can appear on the directive.
At most one detach clause can appear on the directive.
If a detach clause appears on the directive, then a mergeable clause cannot appear on the
same directive.
C / C++
A throw executed inside a task region must cause execution to resume within the same task region, and the same thread that threw the exception must catch it.
C / C++
CHAPTER2. DIRECTIVES 139
1 2 3 4 5 6 7 8 9
10
11 2.10.2 12
13 14 15
16
17
18 19
20
21
22
23
24
25
26
27
Cross References
• Task scheduling constraints, see Section 2.10.6 on page 149. • allocate clause, see Section 2.11.4 on page 158.
• if clause, see Section 2.15 on page 220.
• depend clause, see Section 2.17.11 on page 255.
• Data-sharing attribute clauses, Section 2.19.4 on page 282. • default clause, see Section 2.19.4.1 on page 282.
• in_reduction clause, see Section 2.19.5.6 on page 303. • omp_fulfill_event, see Section 3.5.1 on page 396.
• ompt_callback_task_create_t, see Section 4.5.2.7 on page 467.
taskloop Construct Summary
The taskloop construct specifies that the iterations of one or more associated loops will be executed in parallel using explicit tasks. The iterations are distributed across tasks generated by the construct and scheduled to be executed.
Syntax
C / C++
The syntax of the taskloop construct is as follows:
where clause is one of the following:
if([ taskloop :]scalar-expression) shared(list)
private(list)
firstprivate(list)
lastprivate(list)
reduction([default ,]reduction-identifier:list)
in_reduction(reduction-identifier : list)
140
OpenMP API – Version 5.0 November 2018
#pragma omp taskloop [clause[[,]clause]…]new-line for-loops
default(shared | none)
grainsize(grain-size) num_tasks(num-tasks) collapse(n) final(scalar-expr)
priority(priority-value) untied
mergeable
nogroup
allocate([allocator :] list)
1 2 3 4 5 6 7 8 9
10
11 The taskloop directive places restrictions on the structure of all associated for-loops.
12 Specifically, all associated for-loops must have canonical loop form (see Section 2.9.1 on page 95). C / C++
Fortran
13 The syntax of the taskloop construct is as follows:
14 15 16
17 where clause is one of the following:
18
19
20
21
22
23
24
25
26
27
28
29
30
!$omp taskloop [clause[[,]clause]…] do-loops
[!$omp end taskloop]
if([ taskloop :]scalar-logical-expression) shared(list)
private(list)
firstprivate(list)
lastprivate(list)
reduction([default ,]reduction-identifier:list) in_reduction(reduction-identifier : list)
default(private | firstprivate | shared | none) grainsize(grain-size)
num_tasks(num-tasks)
collapse(n)
final(scalar-logical-expr) priority(priority-value)
CHAPTER2. DIRECTIVES 141
untied
mergeable
nogroup allocate([allocator :] list)
1 2 3 4
5 6
7 8
9
10 11
12
13
14
15
16
17
18
19 20 21 22
23
24
25
26
27
28
29
30 31 32 33
If an end taskloop directive is not specified, an end taskloop directive is assumed at the end of the do-loops.
The taskloop directive places restrictions on the structure of all associated do-loops. Specifically, all associated do-loops must have canonical loop form (see Section 2.9.1 on page 95).
Fortran
Binding
The binding thread set of the taskloop region is the current team. A taskloop region binds to the innermost enclosing parallel region.
Description
The taskloop construct is a task generating construct. When a thread encounters a taskloop construct, the construct partitions the iterations of the associated loops into explicit tasks for parallel execution. The data environment of each generated task is created according to the data-sharing attribute clauses on the taskloop construct, per-data environment ICVs, and any defaults that apply. The order of the creation of the loop tasks is unspecified. Programs that rely on any execution order of the logical loop iterations are non-conforming.
By default, the taskloop construct executes as if it was enclosed in a taskgroup construct with no statements or directives outside of the taskloop construct. Thus, the taskloop construct creates an implicit taskgroup region. If the nogroup clause is present, no implicit taskgroup region is created.
If a reduction clause is present on the taskloop construct, the behavior is as if a task_reduction clause with the same reduction operator and list items was applied to the implicit taskgroup construct enclosing the taskloop construct. The taskloop construct executes as if each generated task was defined by a task construct on which an in_reduction clause with the same reduction operator and list items is present. Thus, the generated tasks are participants of the reduction defined by the task_reduction clause that was applied to the implicit taskgroup construct.
If an in_reduction clause is present on the taskloop construct, the behavior is as if each generated task was defined by a task construct on which an in_reduction clause with the same reduction operator and list items is present. Thus, the generated tasks are participants of a reduction previously defined by a reduction scoping clause.
142
OpenMP API – Version 5.0 November 2018
1 If a grainsize clause is present on the taskloop construct, the number of logical loop
2 iterations assigned to each generated task is greater than or equal to the minimum of the value of
3 the grain-size expression and the number of logical loop iterations, but less than two times the value
4 of the grain-size expression.
5 The parameter of the grainsize clause must be a positive integer expression. If num_tasks is
6 specified, the taskloop construct creates as many tasks as the minimum of the num-tasks
7 expression and the number of logical loop iterations. Each task must have at least one logical loop
8 iteration. The parameter of the num_tasks clause must be a positive integer expression. If neither
9 a grainsize nor num_tasks clause is present, the number of loop tasks generated and the
10 number of logical loop iterations assigned to these tasks is implementation defined.
11 The collapse clause may be used to specify how many loops are associated with the taskloop
12 construct. The parameter of the collapse clause must be a constant positive integer expression.
13 If no collapse clause is present or its parameter is 1, the only loop that is associated with the
14 taskloop construct is the one that immediately follows the taskloop directive. If a
15 collapse clause is specified with a parameter value greater than 1 and more than one loop is
16 associated with the taskloop construct, then the iterations of all associated loops are collapsed
17 into one larger iteration space that is then divided according to the grainsize and num_tasks
18 clauses. The sequential execution of the iterations in all associated loops determines the order of
19 the iterations in the collapsed iteration space.
20 If more than one loop is associated with the taskloop construct then the number of times that
21 any intervening code between any two associated loops will be executed is unspecified but will be
22 at least once per iteration of the loop enclosing the intervening code and at most once per iteration
23 of the innermost loop associated with the construct. If the iteration count of any loop that is
24 associated with the taskloop construct is zero and that loop does not enclose intervening code,
25 the behavior is unspecified.
26 A taskloop loop has logical iterations numbered 0,1,…,N-1 where N is the number of loop
27 iterations, and the logical numbering denotes the sequence in which the iterations would be
28 executed if the set of associated loop(s) were executed sequentially. At the beginning of each
29 logical iteration, the loop iteration variable of each associated loop has the value that it would have
30 if the set of the associated loop(s) were executed sequentially.
31 The iteration count for each associated loop is computed before entry to the outermost loop. If
32 execution of any associated loop changes any of the values used to compute any of the iteration
33 counts, then the behavior is unspecified.
34 The integer type (or kind, for Fortran) used to compute the iteration count for the collapsed loop is
35 implementation defined.
36 When an if clause is present on a taskloop construct, and if the if clause expression evaluates
37 to false, undeferred tasks are generated. The use of a variable in an if clause expression of a
38 taskloop construct causes an implicit reference to the variable in all enclosing constructs.
CHAPTER2. DIRECTIVES 143
1 2 3 4
5 6 7
8
9 10
11 12
13
14 15 16 17
18
19 20 21 22 23
24 25
26 27
When a final clause is present on a taskloop construct and the final clause expression evaluates to true, the generated tasks will be final tasks. The use of a variable in a final clause expression of a taskloop construct causes an implicit reference to the variable in all enclosing constructs.
When a priority clause is present on a taskloop construct, the generated tasks use the priority-value as if it was specified for each individual task. If the priority clause is not specified, tasks generated by the taskloop construct have the default task priority (zero).
If the untied clause is specified, all tasks generated by the taskloop construct are untied tasks. When the mergeable clause is present on a taskloop construct, each generated task is a
mergeable task.
For firstprivate variables of class type, the number of invocations of copy constructors to
perform the initialization is implementation-defined.
C++
Note – When storage is shared by a taskloop region, the programmer must ensure, by adding proper synchronization, that the storage does not reach the end of its lifetime before the taskloop region and its descendant tasks complete their execution.
Execution Model Events
The taskloop-begin event occurs after a task encounters a taskloop construct but before any other events that may trigger as a consequence of executing the taskloop. Specifically, a taskloop-begin event for a taskloop will precede the taskgroup-begin that occurs unless a nogroup clause is present. Regardless of whether an implicit taskgroup is present, a taskloop-begin will always precede any task-create events for generated tasks.
The taskloop-end event occurs after a taskloop region finishes execution but before resuming execution of the encountering task.
The taskloop-iteration-begin event occurs before an explicit task executes each iteration of a taskloop.
C++
144
OpenMP API – Version 5.0 November 2018
1 Tool Callbacks
2 A thread dispatches a registered ompt_callback_work callback for each occurrence of a
3 taskloop-begin and taskloop-end event in that thread. The callback occurs in the context of the
4 encountering task. The callback has type signature ompt_callback_work_t. The callback
5 receives ompt_scope_begin or ompt_scope_end as its endpoint argument, as appropriate,
6 and ompt_work_taskloop as its wstype argument.
7 A thread dispatches a registered ompt_callback_dispatch callback for each occurrence of a
8 taskloop-iteration-begin event in that thread. The callback occurs in the context of the encountering
9 task. The callback has type signature ompt_callback_dispatch_t.
10 Restrictions
11 The restrictions of the taskloop construct are as follows:
12 •
13 •
14 •
15
16 • 17
18 •
19 •
20 •
21 •
22
23 •
24 •
25 •
26 •
A program that branches into or out of a taskloop region is non-conforming. No OpenMP directive may appear in the region between any associated loops.
If a collapse clause is specified, exactly one loop must occur in the region at each nesting level up to the number of loops specified by the parameter of the collapse clause.
If a reduction clause is present on the taskloop directive, the nogroup clause must not be specified.
The same list item cannot appear in both a reduction and an in_reduction clause.
At most one grainsize clause can appear on a taskloop directive.
At most one num_tasks clause can appear on a taskloop directive.
The grainsize clause and num_tasks clause are mutually exclusive and may not appear on the same taskloop directive.
At most one collapse clause can appear on a taskloop directive. At most one if clause can appear on the directive.
At most one final clause can appear on the directive.
At most one priority clause can appear on the directive.
27 Cross References
28 • task construct, Section 2.10.1 on page 135.
29 • if clause, see Section 2.15 on page 220.
30 • taskgroup construct, Section 2.17.6 on page 232.
31 • Data-sharing attribute clauses, Section 2.19.4 on page 282.
CHAPTER2. DIRECTIVES 145
1 2 3 4 5
6 2.10.3 7
8
9 10
11
12
13 14
15 16
17
18 19 20
21 22
23 24
• default clause, see Section 2.19.4.1 on page 282.
• ompt_scope_begin and ompt_scope_end, see Section 4.4.4.11 on page 443. • ompt_work_taskloop, see Section 4.4.4.15 on page 445.
• ompt_callback_work_t, see Section 4.5.2.5 on page 464.
• ompt_callback_dispatch_t, see Section 4.5.2.6 on page 465.
taskloop simd Construct Summary
The taskloop simd construct specifies a loop that can be executed concurrently using SIMD instructions and that those iterations will also be executed in parallel using explicit tasks. The taskloop simd construct is a composite construct.
Syntax
C / C++
The syntax of the taskloop simd construct is as follows:
where clause can be any of the clauses accepted by the taskloop or simd directives with identical meanings and restrictions.
C / C++ Fortran
The syntax of the taskloop simd construct is as follows:
where clause can be any of the clauses accepted by the taskloop or simd directives with identical meanings and restrictions.
If an end taskloop simd directive is not specified, an end taskloop simd directive is assumed at the end of the do-loops.
Fortran
#pragma omp taskloop simd [clause[[,]clause]…]new-line for-loops
!$omp taskloop simd [clause[[,]clause]…] do-loops
[!$omp end taskloop simd]
146
OpenMP API – Version 5.0 November 2018
1
Binding
The binding thread set of the taskloop simd region is the current team. A taskloop simd region binds to the innermost enclosing parallel region.
Description
The taskloop simd construct will first distribute the iterations of the associated loop(s) across tasks in a manner consistent with any clauses that apply to the taskloop construct. The resulting tasks will then be converted to a SIMD loop in a manner consistent with any clauses that apply to the simd construct, except for the collapse clause. For the purposes of each task’s conversion to a SIMD loop, the collapse clause is ignored and the effect of any in_reduction clause is as if a reduction clause with the same reduction operator and list items is present on the construct.
Execution Model Events
This composite construct generates the same events as the taskloop construct. Tool Callbacks
This composite construct dispatches the same callbacks as the taskloop construct.
Restrictions
• The restrictions for the taskloop and simd constructs apply.
• The conditional modifier may not appear in a lastprivate clause.
Cross References
• simd construct, see Section 2.9.3.1 on page 110.
• taskloop construct, see Section 2.10.2 on page 140.
• Data-sharing attribute clauses, see Section 2.19.4 on page 282.
taskyield Construct Summary
The taskyield construct specifies that the current task can be suspended in favor of execution of a different task. The taskyield construct is a stand-alone directive.
2 3
4
5 6 7 8 9
10
11 12
13 14
15 16 17
18 19 20 21
22 2.10.4 23
24 25
CHAPTER2. DIRECTIVES 147
1
Syntax
2 3
4 5
6
7 8
9 10
11 12
13 2.10.5
14 15
16 17
18 19
20 21
22 23
C / C++
The syntax of the taskyield construct is as follows: #pragma omp taskyield new-line
C / C++ Fortran
The syntax of the taskyield construct is as follows: !$omp taskyield
Fortran
Binding
A taskyield region binds to the current task region. The binding thread set of the taskyield region is the current team.
Description
The taskyield region includes an explicit task scheduling point in the current task region. Cross References
• Task scheduling, see Section 2.10.6 on page 149.
Initial Task
Execution Model Events
No events are associated with the implicit parallel region in each initial thread.
The initial-thread-begin event occurs in an initial thread after the OpenMP runtime invokes the tool initializer but before the initial thread begins to execute the first OpenMP region in the initial task.
The initial-task-begin event occurs after an initial-thread-begin event but before the first OpenMP region in the initial task begins to execute.
The initial-task-end event occurs before an initial-thread-end event but after the last OpenMP region in the initial task finishes to execute.
The initial-thread-end event occurs as the final event in an initial thread at the end of an initial task immediately prior to invocation of the tool finalizer.
148
OpenMP API – Version 5.0 November 2018
1
Tool Callbacks
A thread dispatches a registered ompt_callback_thread_begin callback for the initial-thread-begin event in an initial thread. The callback occurs in the context of the initial thread. The callback has type signature ompt_callback_thread_begin_t. The callback receives ompt_thread_initial as its thread_type argument.
A thread dispatches a registered ompt_callback_implicit_task callback with ompt_scope_begin as its endpoint argument for each occurrence of an initial-task-begin in that thread. Similarly, a thread dispatches a registered ompt_callback_implicit_task callback with ompt_scope_end as its endpoint argument for each occurrence of an initial-task-end event in that thread. The callbacks occur in the context of the initial task and have type signature ompt_callback_implicit_task_t. In the dispatched callback,
(flag & ompt_task_initial) always evaluates to true.
A thread dispatches a registered ompt_callback_thread_end callback for the initial-thread-end event in that thread. The callback occurs in the context of the thread. The callback has type signature ompt_callback_thread_end_t. The implicit parallel region does not dispatch a ompt_callback_parallel_end callback; however, the implicit parallel region can be finalized within this ompt_callback_thread_end callback.
Cross References
• ompt_thread_initial, see Section 4.4.4.10 on page 443.
• ompt_task_initial, see Section 4.4.4.18 on page 446.
• ompt_callback_thread_begin_t, see Section 4.5.2.1 on page 459.
• ompt_callback_thread_end_t, see Section 4.5.2.2 on page 460.
• ompt_callback_parallel_begin_t, see Section 4.5.2.3 on page 461. • ompt_callback_parallel_end_t, see Section 4.5.2.4 on page 463.
• ompt_callback_implicit_task_t, see Section 4.5.2.11 on page 471.
Task Scheduling
Whenever a thread reaches a task scheduling point, the implementation may cause it to perform a task switch, beginning or resuming execution of a different task bound to the current team. Task scheduling points are implied at the following locations:
• during the generation of an explicit task;
• the point immediately following the generation of an explicit task;
2 3 4 5
6 7 8 9
10 11 12
13 14 15 16 17
18
19
20
21
22
23
24
25
26 2.10.6
27 28 29
30 31
CHAPTER2. DIRECTIVES 149
1 2 3 4 5 6 7 8 9
10
11
12
13
14
15 16
17
18
19
20
21
22
23 24 25 26
27
28 29
• after the point of completion of the structured block associated with a task; • in a taskyield region;
• in a taskwait region;
• at the end of a taskgroup region;
• in an implicit barrier region;
• in an explicit barrier region;
• during the generation of a target region;
• the point immediately following the generation of a target region;
• at the beginning and end of a target data region;
• in a target update region;
• in a target enter data region;
• in a target exit data region;
• in the omp_target_memcpy routine;
• in the omp_target_memcpy_rect routine;
When a thread encounters a task scheduling point it may do one of the following, subject to the Task Scheduling Constraints (below):
• begin execution of a tied task bound to the current team;
• resume any suspended task region, bound to the current team, to which it is tied;
• begin execution of an untied task bound to the current team; or
• resume any suspended untied task region bound to the current team.
If more than one of the above choices is available, it is unspecified as to which will be chosen. Task Scheduling Constraints are as follows:
150
OpenMP API – Version 5.0 November 2018
1.
2. 3.
Scheduling of new tied tasks is constrained by the set of task regions that are currently tied to the thread and that are not suspended in a barrier region. If this set is empty, any new tied task may be scheduled. Otherwise, a new tied task may be scheduled only if it is a descendent task of every task in the set.
A dependent task shall not start its execution until its task dependences are fulfilled.
A task shall not be scheduled while any task with which it is mutually exclusive has been scheduled, but has not yet completed.
1 4. When an explicit task is generated by a construct containing an if clause for which the
2 expression evaluated to false, and the previous constraints are already met, the task is executed
3 immediately after generation of the task.
4 A program relying on any other assumption about task scheduling is non-conforming.
5
6 Note – Task scheduling points dynamically divide task regions into parts. Each part is executed
7 uninterrupted from start to end. Different parts of the same task region are executed in the order in
8 which they are encountered. In the absence of task synchronization constructs, the order in which a
9 thread executes parts of different schedulable tasks is unspecified.
10 A program must behave correctly and consistently with all conceivable scheduling sequences that
11 are compatible with the rules above.
12 For example, if threadprivate storage is accessed (explicitly in the source code or implicitly
13 in calls to library routines) in one part of a task region, its value cannot be assumed to be preserved
14 into the next part of the same task region if another schedulable task exists that modifies it.
15 As another example, if a lock acquire and release happen in different parts of a task region, no
16 attempt should be made to acquire the same lock in any part of another task that the executing
17 thread may schedule. Otherwise, a deadlock is possible. A similar situation can occur when a
18 critical region spans multiple parts of a task and another schedulable task contains a
19 critical region with the same name.
20 The use of threadprivate variables and the use of locks or critical sections in an explicit task with an
21 if clause must take into account that when the if clause evaluates to false, the task is executed
22 immediately, without regard to Task Scheduling Constraint 2.
23
24 Execution Model Events
25 The task-schedule event occurs in a thread when the thread switches tasks at a task scheduling
26 point; no event occurs when switching to or from a merged task.
27 Tool Callbacks
28 A thread dispatches a registered ompt_callback_task_schedule callback for each
29 occurrence of a task-schedule event in the context of the task that begins or resumes. This callback
30 has the type signature ompt_callback_task_schedule_t. The argument prior_task_status
31 is used to indicate the cause for suspending the prior task. This cause may be the completion of the
32 prior task region, the encountering of a taskyield construct, or the encountering of an active
33 cancellation point.
34 Cross References
35 • ompt_callback_task_schedule_t, see Section 4.5.2.10 on page 470.
CHAPTER2. DIRECTIVES 151
1 2.11
2 2.11.1
3 4 5 6
Memory Management Directives Memory Spaces
OpenMP memory spaces represent storage resources where variables can be stored and retrieved. Table 2.8 shows the list of predefined memory spaces. The selection of a given memory space expresses an intent to use storage with certain traits for the allocations. The actual storage resources that each memory space represents are implementation defined.
TABLE 2.8: Predefined Memory Spaces
Memory space name
omp_default_mem_space
omp_large_cap_mem_space
omp_const_mem_space
omp_high_bw_mem_space
omp_low_lat_mem_space
Storage selection intent
Represents the system default storage. Represents storage with large capacity.
Represents storage optimized for variables with constant values. The result of writing to this storage is unspecified.
Represents storage with high bandwidth. Represents storage with low latency.
7
8
9 10 11 12
13 14
15 2.11.2
16 17 18 19 20
Note – For variables allocated in the omp_const_mem_space memory space OpenMP supports initializing constant memory either by means of the firstprivate clause or through initialization with compile time constants for static and constant variables. Implementation-defined mechanisms to provide the constant value of these variables may also be supported.
Cross References
• omp_init_allocator routine, see Section 3.7.2 on page 409.
Memory Allocators
OpenMP memory allocators can be used by a program to make allocation requests. When a memory allocator receives a request to allocate storage of a certain size, an allocation of logically consecutive memory in the resources of its associated memory space of at least the size that was requested will be returned if possible. This allocation will not overlap with any other existing allocation from an OpenMP memory allocator.
152
OpenMP API – Version 5.0 November 2018
1 2
The behavior of the allocation process can be affected by the allocator traits that the user specifies. Table 2.9 shows the allowed allocators traits, their possible values and the default value of each trait.
TABLE 2.9: Allocator Traits
3
Allocator trait
sync_hint
alignment
access
pool_size
fallback
fb_data
pinned
partition
Allowed values
contended, uncontended, serialized, private
A positive integer value that is a power of 2
all, cgroup, pteam, thread Positive integer value
default_mem_fb, null_fb, abort_fb, allocator_fb
an allocator handle true, false
environment, nearest, blocked, interleaved
Default value
contended
1 byte
all
Implementation defined
default_mem_fb
(none)
false
environment
4 5
6 7
8 9
10 11
12 13 14
15 16 17
18 19 20
The sync_hint trait describes the expected manner in which multiple threads may use the allocator. The values and their description are:
• contended: high contention is expected on the allocator; that is, many threads are expected to request allocations simultaneously.
• uncontended: low contention is expected on the allocator; that is, few threads are expected to request allocations simultaneously.
• serialized: only one thread at a time will request allocations with the allocator. Requesting two allocations simultaneously when specifying serialized results in unspecified behavior.
• private: the same thread will request allocations with the allocator every time. Requesting an allocation from different threads, simultaneously or not, when specifying private results in unspecified behavior.
Allocated memory will be byte aligned to at least the value specified for the alignment trait of the allocator. Some directives and API routines can specify additional requirements on alignment beyond those described in this section.
Memory allocated by allocators with the access trait defined to be all must be accessible by all threads in the device where the allocation was requested. Memory allocated by allocators with the access trait defined to be cgroup will be memory accessible by all threads in the same
CHAPTER2. DIRECTIVES 153
1 2 3 4 5 6 7 8 9
10 11 12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29 30
31 32 33
34 35
36 37
38 39
contention group as the thread that requested the allocation. Attempts to access the memory returned by an allocator with the access trait defined to be cgroup from a thread that is not part of the same contention group as the thread that allocated the memory result in unspecified behavior. Memory allocated by allocators with the access trait defined to be pteam will be memory accessible by all threads that bind to the same parallel region of the thread that requested the allocation. Attempts to access the memory returned by an allocator with the access trait defined to be pteam from a thread that does not bind to the same parallel region as the thread that allocated the memory result in unspecified behavior. Memory allocated by allocator with the access trait defined to be thread will be memory accessible by the thread that requested the allocation. Attempts to access the memory returned by an allocator with the access trait defined to be thread from a thread other than the one that allocated the memory result in unspecified behavior.
The total amount of storage in bytes that an allocator can use is limited by the pool_size trait. For allocators with the access trait defined to be all, this limit refers to allocations from all threads that access the allocator. For allocators with the access trait defined to be cgroup, this limit refers to allocations from threads that access the allocator from the same contention group. For allocators with the access trait defined to be pteam, this limit refers to allocations from threads that access the allocator from the same parallel team. For allocators with the access trait defined to be thread, this limit refers to allocations from each thread that access the allocator. Requests that would result in using more storage than pool_size will not be fulfilled by the allocator.
The fallback trait specifies how the allocator behaves when it cannot fulfill an allocation request. If the fallback trait is set to null_fb, the allocator returns the value zero if it fails to allocate the memory. If the fallback trait is set to abort_fb, program execution will be terminated if the allocation fails. If the fallback trait is set to allocator_fb then when an allocation fails the request will be delegated to the allocator specified in the fb_data trait. If the fallback trait is set to default_mem_fb then when an allocation fails another allocation will be tried in the omp_default_mem_space memory space, which assumes all allocator traits to be set to their default values except for fallback trait which will be set to null_fb.
Allocators with the pinned trait defined to be true ensure that their allocations remain in the same storage resource at the same location for their entire lifetime.
The partition trait describes the partitioning of allocated memory over the storage resources represented by the memory space associated with the allocator. The partitioning will be done in parts with a minimum size that is implementation defined. The values are:
• environment: the placement of allocated memory is determined by the execution environment.
• nearest: allocated memory is placed in the storage resource that is nearest to the thread that requests the allocation.
• blocked: allocated memory is partitioned into parts of approximately the same size with at most one part per storage resource.
154
OpenMP API – Version 5.0 November 2018
1 • interleaved: allocated memory parts are distributed in a round-robin fashion across the
2 storage resources.
3 Table 2.10 shows the list of predefined memory allocators and their associated memory spaces. The
4 predefined memory allocators have default values for their allocator traits unless otherwise
5 specified.
6
TABLE 2.10: Predefined Allocators Allocator name
omp_default_mem_alloc
omp_large_cap_mem_alloc
omp_const_mem_alloc
omp_high_bw_mem_alloc
omp_low_lat_mem_alloc
omp_cgroup_mem_alloc
omp_pteam_mem_alloc
omp_thread_mem_alloc
Associated memory space
omp_default_mem_space omp_large_cap_mem_space omp_const_mem_space omp_high_bw_mem_space omp_low_lat_mem_space Implementation defined Implementation defined Implementation defined
Fortran
Non-default trait values
(none)
(none)
(none)
(none)
(none) access:cgroup access:pteam access:thread
7 If any operation of the base language causes a reallocation of an array that is allocated with a
8 memory allocator then that memory allocator will be used to release the current memory and to
9 allocate the new memory.
Fortran
10 Cross References
11 • omp_init_allocator routine, see Section 3.7.2 on page 409.
12 • omp_destroy_allocator routine, see Section 3.7.3 on page 410.
13 • omp_set_default_allocator routine, see Section 3.7.4 on page 411.
14 • omp_get_default_allocator routine, see Section 3.7.5 on page 412.
15 • OMP_ALLOCATOR environment variable, see Section 6.21 on page 618.
CHAPTER2. DIRECTIVES 155
1 2.11.3 2
3 4
5
6 7
8
9 10
11 12
13
14 15 16 17
18 19 20
allocate Directive Summary
The allocate directive specifies how a set of variables are allocated. The allocate directive is a declarative directive if it is not associated with an allocation statement.
Syntax
C / C++
The syntax of the allocate directive is as follows: #pragma omp allocate(list) [clause]new-line
where clause is one of the following: allocator(allocator)
where allocator is an expression of omp_allocator_handle_t type. C / C++
Fortran
The syntax of the allocate directive is as follows: !$omp allocate(list) [clause]
or
where clause is one of the following: allocator(allocator)
where allocator is an integer expression of omp_allocator_handle_kind kind. Fortran
!$omp allocate[(list)]clause [!$omp allocate(list) clause […]]
allocate statement
156
OpenMP API – Version 5.0 November 2018
1 Description
2 If the directive is not associated with a statement, the storage for each list item that appears in the
3 directive will be provided by an allocation through a memory allocator. If no clause is specified
4 then the memory allocator specified by the def-allocator-var ICV will be used. If the allocator
5 clause is specified, the memory allocator specified in the clause will be used. The allocation of each
6 list item will be byte aligned to at least the alignment required by the base language for the type of
7 that list item.
8 The scope of this allocation is that of the list item in the base language. At the end of the scope for a
9 given list item the memory allocator used to allocate that list item deallocates the storage.
Fortran
10 If the directive is associated with an allocate statement, the same list items appearing in the
11 directive list and the allocate statement list are allocated with the memory allocator of the directive.
12 If no list items are specified then all variables listed in the allocate statement are allocated with the
13 memory allocator of the directive.
Fortran
14 For allocations that arise from this directive the null_fb value of the fallback allocator trait will
15 behave as if the abort_fb had been specified.
16 Restrictions
17 • 18
19 • 20
21 •
22 •
23 24
25 • 26
27 • 28
A variable that is part of another variable (as an array or structure element) cannot appear in an allocate directive.
The allocate directive must appear in the same scope as the declarations of each of its list items and must follow all such declarations.
At most one allocator clause can appear on the allocate directive.
allocate directives that appear in a target region must specify an allocator clause unless a requires directive with the dynamic_allocators clause is present in the same compilation unit.
C / C++
If a list item has a static storage type, the allocator expression in the allocator clause must be a constant expression that evaluates to one of the predefined memory allocator values.
After a list item has been allocated, the scope that contains the allocate directive must not end abnormally other than through C++ exceptions, such as through a call to the longjmp function.
C / C++
CHAPTER2. DIRECTIVES 157
1 2
3 4
5 6
7 8 9
10
11
12
13
14 15
16 2.11.4 17
18 19
20
21 22
Fortran
• List items specified in the allocate directive must not have the ALLOCATABLE attribute unless the directive is associated with an allocate statement.
• List items specified in an allocate directive that is associated with an allocate statement must be variables that are allocated by the allocate statement.
• Multiple directives can only be associated with an allocate statement if list items are specified on each allocate directive.
• If a list item has the SAVE attribute, is a common block name, or is declared in the scope of a module, then only predefined memory allocator parameters can be used in the allocator clause.
• A type parameter inquiry cannot appear in an allocate directive. Fortran
Cross References
• def-allocator-var ICV, see Section 2.5.1 on page 64.
• Memory allocators, see Section 2.11.2 on page 152.
• omp_allocator_handle_t and omp_allocator_handle_kind, see Section 3.7.1 on page 406.
allocate Clause Summary
The allocate clause specifies the memory allocator to be used to obtain storage for private variables of a directive.
Syntax
The syntax of the allocate clause is as follows: allocate([allocator:] list)
158
OpenMP API – Version 5.0 November 2018
C / C++
1 where allocator is an expression of the omp_allocator_handle_t type. C / C++
Fortran
2 where allocator is an integer expression of the omp_allocator_handle_kind kind. Fortran
3 Description
4 The storage for new list items that arise from list items that appear in the directive will be provided
5 through a memory allocator. If an allocator is specified in the clause, that allocator will be used for
6 allocations. For all directives except the target directive, if no allocator is specified in the clause
7 then the memory allocator that is specified by the def-allocator-var ICV will be used for the list
8 items that are specified in the allocate clause. The allocation of each list item will be byte
9 aligned to at least the alignment required by the base language for the type of that list item.
10 For allocations that arise from this clause the null_fb value of the fallback allocator trait will
11 behave as if the abort_fb had been specified.
12 Restrictions
13 • 14
15 • 16
17 • 18
19
For any list item that is specified in the allocate clause on a directive, a data-sharing attribute clause that may create a private copy of that list item must be specified on the same directive.
For task, taskloop or target directives, allocation requests to memory allocators with the trait access set to thread result in unspecified behavior.
allocate clauses that appear on a target construct or on constructs in a target region must specify an allocator expression unless a requires directive with the dynamic_allocators clause is present in the same compilation unit.
20 Cross References
21 •
22 •
23 •
24
def-allocator-var ICV, see Section 2.5.1 on page 64. Memory allocators, see Section 2.11.2 on page 152.
omp_allocator_handle_t and omp_allocator_handle_kind, see Section 3.7.1 on page 406.
CHAPTER2. DIRECTIVES 159
1 2.12
2 2.12.1
3
4 5 6 7
8 9
10 11
12 13
14
15 16 17
18 19 20
21 22 23
24 25 26
27
28 29
30 31
Device Directives
Device Initialization Execution Model Events
The device-initialize event occurs in a thread that encounters the first target, target data, or target enter data construct or a device memory routine that is associated with a particular target device after the thread initiates initialization of OpenMP on the device and the device’s OpenMP initialization, which may include device-side tool initialization, completes.
The device-load event for a code block for a target device occurs in some thread before any thread executes code from that code block on that target device.
The device-unload event for a target device occurs in some thread whenever a code block is unloaded from the device.
The device-finalize event for a target device that has been initialized occurs in some thread before an OpenMP implementation shuts down.
Tool Callbacks
A thread dispatches a registered ompt_callback_device_initialize callback for each occurrence of a device-initialize event in that thread. This callback has type signature ompt_callback_device_initialize_t.
A thread dispatches a registered ompt_callback_device_load callback for each occurrence of a device-load event in that thread. This callback has type signature ompt_callback_device_load_t.
A thread dispatches a registered ompt_callback_device_unload callback for each occurrence of a device-unload event in that thread. This callback has type signature ompt_callback_device_unload_t.
A thread dispatches a registered ompt_callback_device_finalize callback for each occurrence of a device-finalize event in that thread. This callback has type signature ompt_callback_device_finalize_t.
Restrictions
No thread may offload execution of an OpenMP construct to a device until a dispatched ompt_callback_device_initialize callback completes.
No thread may offload execution of an OpenMP construct to a device after a dispatched ompt_callback_device_finalize callback occurs.
160
OpenMP API – Version 5.0 November 2018
1 2 3 4 5
6 2.12.2 7
8
9
10
11 12
13
14 15 16 17 18
19
20 21 22
Cross References
• ompt_callback_device_load_t, see Section 4.5.2.21 on page 484.
• ompt_callback_device_unload_t, see Section 4.5.2.22 on page 486.
• ompt_callback_device_initialize_t, see Section 4.5.2.19 on page 482. • ompt_callback_device_finalize_t, see Section 4.5.2.20 on page 484.
target data Construct Summary
Map variables to a device data environment for the extent of the region.
Syntax
C / C++
The syntax of the target data construct is as follows:
#pragma omp target data clause[[[,]clause]…]new-line
structured-block
where clause is one of the following:
if([ target data :]scalar-expression) device(integer-expression)
map([[map-type-modifier[,] [map-type-modifier[,] …] map-type: ] locator-list) use_device_ptr(ptr-list)
use_device_addr(list)
C / C++
Fortran
The syntax of the target data construct is as follows:
!$omp target data clause[[[,]clause]…] structured-block
!$omp end target data
CHAPTER2. DIRECTIVES 161
1
where clause is one of the following:
if([ target data :]scalar-logical-expression) device(scalar-integer-expression)
map([[map-type-modifier[,] [map-type-modifier[,] …] map-type: ] locator-list) use_device_ptr(ptr-list)
use_device_addr(list)
2 3 4 5 6
7
8 9
10
11 12 13 14 15
16 17
18
19
20
21
22
23
24 25 26 27
28
29 30
31 32
Fortran
162
OpenMP API – Version 5.0 November 2018
Binding
The binding task set for a target data region is the generating task. The target data region binds to the region of the generating task.
Description
When a target data construct is encountered, the encountering task executes the region. If there is no device clause, the default device is determined by the default-device-var ICV. When an if clause is present and the if clause expression evaluates to false, the device is the host. Variables are mapped for the extent of the region, according to any data-mapping attribute clauses, from the data environment of the encountering task to the device data environment.
Pointers that appear in a use_device_ptr clause are privatized and the device pointers to the corresponding list items in the device data environment are assigned into the private versions.
List items that appear in a use_device_addr clause have the address of the corresponding object in the device data environment inside the construct. For objects, any reference to the value of the object will be to the corresponding object on the device, while references to the address will result in a valid device address that points to that object. Array sections privatize the base of the array section and assign the private copy to the address of the corresponding array section in the device data environment.
If one or more of the use_device_ptr or use_device_addr clauses and one or more map clauses are present on the same construct, the address conversions of use_device_addr and use_device_ptr clauses will occur as if performed after all variables are mapped according to those map clauses.
Execution Model Events
The events associated with entering a target data region are the same events as associated with a target enter data construct, described in Section 2.12.3 on page 164.
The events associated with exiting a target data region are the same events as associated with a target exit data construct, described in Section 2.12.4 on page 166.
1 Tool Callbacks
2 The tool callbacks dispatched when entering a target data region are the same as the tool callbacks
3 dispatched when encountering a target enter data construct, described in Section 2.12.3 on
4 page 164.
5 The tool callbacks dispatched when exiting a target data region are the same as the tool callbacks
6 dispatched when encountering a target exit data construct, described in Section 2.12.4 on page 166.
7 Restrictions
8 •
9 10 11
12 • 13
14
15 •
16 •
17 •
18
19 • 20
21 • 22
23 • 24
25 •
A program must not depend on any ordering of the evaluations of the clauses of the
target data directive, except as explicitly stated for map clauses relative to use_device_ptr and use_device_addr clauses, or on any side effects of the evaluations of the clauses.
At most one device clause can appear on the directive. The device clause expression must evaluate to a non-negative integer value less than the value of omp_get_num_devices() or to the value of omp_get_initial_device().
At most one if clause can appear on the directive.
A map-type in a map clause must be to, from, tofrom or alloc.
At least one map, use_device_addr or use_device_ptr clause must appear on the directive.
A list item in a use_device_ptr clause must hold the address of an object that has a corresponding list item in the device data environment.
A list item in a use_device_addr clause must have a corresponding list item in the device data environment.
A list item that specifies a given variable may not appear in more than one use_device_ptr clause.
A reference to a list item in a use_device_addr clause must be to the address of the list item.
26 Cross References
27 • default-device-var, see Section 2.5 on page 63.
28 • if Clause, see Section 2.15 on page 220.
29 • map clause, see Section 2.19.7.1 on page 315.
30 • omp_get_num_devices routine, see Section 3.2.36 on page 371.
31 • ompt_callback_target_t, see Section 4.5.2.26 on page 490.
CHAPTER2. DIRECTIVES 163
1 2.12.3 2
3 4
5
6 7
8
9 10 11 12 13
14 15
16
17 18 19 20 21
22
23 24 25
target enter data Construct Summary
The target enter data directive specifies that variables are mapped to a device data environment. The target enter data directive is a stand-alone directive.
Syntax
C / C++
The syntax of the target enter data construct is as follows:
#pragma omp target enter data [clause[[,]clause]…]new-line
where clause is one of the following:
C / C++ Fortran
The syntax of the target enter data is as follows: !$omp target enter data [clause[[,]clause]…]
if([ target enter data :]scalar-expression)
device(integer-expression)
map([map-type-modifier[,] [map-type-modifier[,] …] map-type: locator-list) depend([depend-modifier,] dependence-type : locator-list)
nowait
where clause is one of the following:
if([ target enter data :]scalar-logical-expression) device(scalar-integer-expression)
map([map-type-modifier[,] [map-type-modifier[,] …] map-type: locator-list) depend([depend-modifier,] dependence-type : locator-list)
nowait
Fortran
164
OpenMP API – Version 5.0 November 2018
Binding
The binding task set for a target enter data region is the generating task, which is the target task generated by the target enter data construct. The target enter data region binds to the corresponding target task region.
1 Description
2 When a target enter data construct is encountered, the list items are mapped to the device
3 data environment according to the map clause semantics.
4 The target enter data construct is a task generating construct. The generated task is a target
5 task. The generated task region encloses the target enter data region.
6 All clauses are evaluated when the target enter data construct is encountered. The data
7 environment of the target task is created according to the data-sharing attribute clauses on the
8 target enter data construct, per-data environment ICVs, and any default data-sharing
9 attribute rules that apply to the target enter data construct. A variable that is mapped in the
10 target enter data construct has a default data-sharing attribute of shared in the data
11 environment of the target task.
12 Assignment operations associated with mapping a variable (see Section 2.19.7.1 on page 315)
13 occur when the target task executes.
14 If the nowait clause is present, execution of the target task may be deferred. If the nowait
15 clause is not present, the target task is an included task.
16 If a depend clause is present, it is associated with the target task.
17 If no device clause is present, the default device is determined by the default-device-var ICV.
18 When an if clause is present and the if clause expression evaluates to false, the device is the host.
19 Execution Model Events
20 Events associated with a target task are the same as for the task construct defined in
21 Section 2.10.1 on page 135.
22 The target-enter-data-begin event occurs when a thread enters a target enter data region.
23 The target-enter-data-end event occurs when a thread exits a target enter data region.
24 Tool Callbacks
25 Callbacks associated with events for target tasks are the same as for the task construct defined in
26 Section 2.10.1 on page 135; (flags & ompt_task_target) always evaluates to true in the
27 dispatched callback.
28 A thread dispatches a registered ompt_callback_target callback with
29 ompt_scope_begin as its endpoint argument and ompt_target_enter_data as its kind
30 argument for each occurrence of a target-enter-data-begin event in that thread in the context of the
31 target task on the host. Similarly, a thread dispatches a registered ompt_callback_target
32 callback with ompt_scope_end as its endpoint argument and ompt_target_enter_data
33 as its kind argument for each occurrence of a target-enter-data-end event in that thread in the
CHAPTER2. DIRECTIVES 165
1 2
3
4 5
6
7 8 9
10 11 12
13
14
15
16
17
18
19
20
21
22
23 2.12.4 24
25 26
context of the target task on the host. These callbacks have type signature ompt_callback_target_t.
Restrictions
• A program must not depend on any ordering of the evaluations of the clauses of the target enter data directive, or on any side effects of the evaluations of the clauses.
• At least one map clause must appear on the directive.
• At most one device clause can appear on the directive. The device clause expression must evaluate to a non-negative integer value less than the value of omp_get_num_devices() or to the value of omp_get_initial_device().
• At most one if clause can appear on the directive.
• A map-type must be specified in all map clauses and must be either to or alloc.
• At most one nowait clause can appear on the directive.
Cross References
• default-device-var, see Section 2.5.1 on page 64.
• task, see Section 2.10.1 on page 135.
• task scheduling constraints, see Section 2.10.6 on page 149. • target data, see Section 2.12.2 on page 161.
• target exit data, see Section 2.12.4 on page 166.
• if Clause, see Section 2.15 on page 220.
• map clause, see Section 2.19.7.1 on page 315.
• omp_get_num_devices routine, see Section 3.2.36 on page 371.
• ompt_callback_target_t, see Section 4.5.2.26 on page 490.
target exit data Construct Summary
The target exit data directive specifies that list items are unmapped from a device data environment. The target exit data directive is a stand-alone directive.
166
OpenMP API – Version 5.0 November 2018
1 Syntax
2 The syntax of the target exit data construct is as follows:
3 #pragma omp target exit data [clause[[,]clause]…]new-line
4 where clause is one of the following:
5 6 7 8 9
C / C++ Fortran
10 The syntax of the target exit data is as follows:
11 !$omp target exit data [clause[[,]clause]…]
C / C++
if([ target exit data :]scalar-expression)
device(integer-expression)
map([map-type-modifier[,] [map-type-modifier[,] …] map-type: locator-list) depend([depend-modifier,] dependence-type : locator-list)
nowait
12 where clause is one of the following:
13 14 15 16 17
18 Binding
Fortran
if([ target exit data :]scalar-logical-expression)
device(scalar-integer-expression)
map([map-type-modifier[,] [map-type-modifier[,] …] map-type: locator-list) depend([depend-modifier,] dependence-type : locator-list)
nowait
19 The binding task set for a target exit data region is the generating task, which is the target
20 task generated by the target exit data construct. The target exit data region binds to
21 the corresponding target task region.
CHAPTER2. DIRECTIVES 167
1
Description
When a target exit data construct is encountered, the list items in the map clauses are unmapped from the device data environment according to the map clause semantics.
The target exit data construct is a task generating construct. The generated task is a target task. The generated task region encloses the target exit data region.
All clauses are evaluated when the target exit data construct is encountered. The data environment of the target task is created according to the data-sharing attribute clauses on the target exit data construct, per-data environment ICVs, and any default data-sharing attribute rules that apply to the target exit data construct. A variable that is mapped in the
target exit data construct has a default data-sharing attribute of shared in the data environment of the target task.
Assignment operations associated with mapping a variable (see Section 2.19.7.1 on page 315) occur when the target task executes.
If the nowait clause is present, execution of the target task may be deferred. If the nowait clause is not present, the target task is an included task.
If a depend clause is present, it is associated with the target task.
If no device clause is present, the default device is determined by the default-device-var ICV. When an if clause is present and the if clause expression evaluates to false, the device is the host.
Execution Model Events
Events associated with a target task are the same as for the task construct defined in Section 2.10.1 on page 135.
The target-exit-data-begin event occurs when a thread enters a target exit data region. The target-exit-data-end event occurs when a thread exits a target exit data region.
Tool Callbacks
Callbacks associated with events for target tasks are the same as for the task construct defined in Section 2.10.1 on page 135; (flags & ompt_task_target) always evaluates to true in the dispatched callback.
A thread dispatches a registered ompt_callback_target callback with ompt_scope_begin as its endpoint argument and ompt_target_exit_data as its kind argument for each occurrence of a target-exit-data-begin event in that thread in the context of the target task on the host. Similarly, a thread dispatches a registered ompt_callback_target callback with ompt_scope_end as its endpoint argument and ompt_target_exit_data as its kind argument for each occurrence of a target-exit-data-end event in that thread in the context of the target task on the host. These callbacks have type signature ompt_callback_target_t.
2 3
4 5
6 7 8 9
10 11
12 13
14 15
16 17 18
19
20 21
22 23
24
25 26 27
28
29
30
31
32
33
34
168
OpenMP API – Version 5.0 November 2018
1 Restrictions
2 • 3
4 •
5 •
6 7
8 •
9 •
10
11 •
A program must not depend on any ordering of the evaluations of the clauses of the target exit data directive, or on any side effects of the evaluations of the clauses.
At least one map clause must appear on the directive.
At most one device clause can appear on the directive. The device clause expression must evaluate to a non-negative integer value less than the value of omp_get_num_devices() or to the value of omp_get_initial_device().
At most one if clause can appear on the directive.
A map-type must be specified in all map clauses and must be either from, release, or
delete.
At most one nowait clause can appear on the directive.
12 Cross References
13 • default-device-var, see Section 2.5.1 on page 64.
14 • task, see Section 2.10.1 on page 135.
15 • task scheduling constraints, see Section 2.10.6 on page 149.
16 • target data, see Section 2.12.2 on page 161.
17 • target enter data, see Section 2.12.3 on page 164.
18 • if Clause, see Section 2.15 on page 220.
19 • map clause, see Section 2.19.7.1 on page 315.
20 • omp_get_num_devices routine, see Section 3.2.36 on page 371.
21 • ompt_callback_target_t, see Section 4.5.2.26 on page 490.
CHAPTER2. DIRECTIVES 169
1 2.12.5 2
3 4
5
6 7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23 24
25 26
target Construct Summary
Map variables to a device data environment and execute the construct on that device.
Syntax
C / C++
The syntax of the target construct is as follows:
where clause is one of the following:
if([ target :]scalar-expression)
device([device-modifier :]integer-expression) private(list)
firstprivate(list) in_reduction(reduction-identifier : list)
map([[map-type-modifier[,] [map-type-modifier[,] …] map-type: ] locator-list) is_device_ptr(list)
defaultmap(implicit-behavior[:variable-category])
nowait
depend([depend-modifier,] dependence-type : locator-list) allocate([[allocator :] list) uses_allocators(allocator[(allocator-traits-array)]
[,allocator[(allocator-traits-array)] …])
and where device-modifier is one of the following:
and where allocator is an identifier of omp_allocator_handle_t type and allocator-traits-array is an identifier of const omp_alloctrait_t * type.
C / C++
170
OpenMP API – Version 5.0 November 2018
#pragma omp target [clause[[,]clause]…]new-line structured-block
ancestor
device_num
Fortran
1 The syntax of the target construct is as follows:
2 3 4
5 where clause is one of the following:
6 7 8 9
10
11
12
13
14
15
16
17
18
19 and where device-modifier is one of the following: 20
21
22 and where allocator is an integer expression of omp_allocator_handle_kind kind and
23 allocator-traits-array is an array of type(omp_alloctrait) type. Fortran
24 Binding
25 The binding task set for a target region is the generating task, which is the target task generated
26 by the target construct. The target region binds to the corresponding target task region.
!$omp target [clause[[,]clause]…] structured-block
!$omp end target
if([ target :]scalar-logical-expression) device([device-modifier :]scalar-integer-expression) private(list)
firstprivate(list)
in_reduction(reduction-identifier : list)
map([[map-type-modifier[,] [map-type-modifier[,] …] map-type: ] locator-list) is_device_ptr(list)
defaultmap(implicit-behavior[:variable-category]) nowait
depend([depend-modifier,] dependence-type : locator-list) allocate([allocator:]list)
uses_allocators(allocator[(allocator-traits-array)] [,allocator[(allocator-traits-array)] …])
ancestor
device_num
CHAPTER2. DIRECTIVES 171
1
Description
The target construct provides a superset of the functionality provided by the target data directive, except for the use_device_ptr and use_device_addr clauses.
The functionality added to the target directive is the inclusion of an executable region to be executed by a device. That is, the target directive is an executable directive.
The target construct is a task generating construct. The generated task is a target task. The generated task region encloses the target region.
All clauses are evaluated when the target construct is encountered. The data environment of the target task is created according to the data-sharing attribute clauses on the target construct, per-data environment ICVs, and any default data-sharing attribute rules that apply to the target construct. If a variable or part of a variable is mapped by the target construct and does not appear as a list item in an in_reduction clause on the construct, the variable has a default data-sharing attribute of shared in the data environment of the target task.
Assignment operations associated with mapping a variable (see Section 2.19.7.1 on page 315) occur when the target task executes.
If a device clause in which the device_num device-modifier appears is present on the construct, the device clause expression specifies the device number of the target device. If device-modifier does not appear in the clause, the behavior of the clause is as if device-modifier is device_num.
If a device clause in which the ancestor device-modifier appears is present on the target construct and the device clause expression evaluates to 1, execution of the target region occurs on the parent device of the enclosing target region. If the target construct is not encountered in a target region, the current device is treated as the parent device. The encountering thread waits for completion of the target region on the parent device before resuming. For any list item that appears in a map clause on the same construct, if the corresponding list item exists in the device data environment of the parent device, it is treated as if it has a reference count of positive infinity.
If the nowait clause is present, execution of the target task may be deferred. If the nowait clause is not present, the target task is an included task.
If a depend clause is present, it is associated with the target task.
When an if clause is present and the if clause expression evaluates to false, the target region
is executed by the host device in the host data environment.
The is_device_ptr clause is used to indicate that a list item is a device pointer already in the device data environment and that it should be used directly. Support for device pointers created outside of OpenMP, specifically outside of the omp_target_alloc routine and the use_device_ptr clause, is implementation defined.
2 3
4 5
6 7
8
9 10 11 12 13
14 15
16 17 18 19
20
21
22
23
24
25
26
27 28
29
30 31
32 33 34 35
172
OpenMP API – Version 5.0 November 2018
1 If a function (C, C++, Fortran) or subroutine (Fortran) is referenced in a target construct then
2 that function or subroutine is treated as if its name had appeared in a to clause on a
3 declare target directive.
4 Each memory allocator specified in the uses_allocators clause will be made available in the
5 target region. For each non-predefined allocator that is specified, a new allocator handle will be
6 associated with an allocator that is created with the specified traits as if by a call to
7 omp_init_allocator at the beginning of the target region. Each non-predefined allocator
8 will be destroyed as if by a call to omp_destroy_allocator at the end of the target region.
C / C++
9 If a list item in a map clause has a base pointer and it is a scalar variable with a predetermined
10 data-sharing attribute of firstprivate (see Section 2.19.1.1 on page 270), then on entry to the
11 target region:
12 • 13
14
15 • 16
17
18
If the list item is not a zero-length array section, the corresponding private variable is initialized such that the corresponding list item in the device data environment can be accessed through the pointer in the target region.
If the list item is a zero-length array section, the corresponding private variable is initialized such that the corresponding storage location of the array section can be referenced through the pointer in the target region. If the corresponding storage location is not present in the device data environment, the corresponding private variable is initialized to NULL.
C / C++
19 Execution Model Events
20 Events associated with a target task are the same as for the task construct defined in
21 Section 2.10.1 on page 135.
22 Events associated with the initial task that executes the target region are defined in
23 Section 2.10.5 on page 148.
24 The target-begin event occurs when a thread enters a target region.
25 The target-end event occurs when a thread exits a target region.
26 The target-submit event occurs prior to creating an initial task on a target device for a target
27 region.
CHAPTER2. DIRECTIVES 173
1
Tool Callbacks
Callbacks associated with events for target tasks are the same as for the task construct defined in Section 2.10.1 on page 135; (flags & ompt_task_target) always evaluates to true in the dispatched callback.
A thread dispatches a registered ompt_callback_target callback with ompt_scope_begin as its endpoint argument and ompt_target as its kind argument for each occurrence of a target-begin event in that thread in the context of the target task on the host. Similarly, a thread dispatches a registered ompt_callback_target callback with ompt_scope_end as its endpoint argument and ompt_target as its kind argument for each occurrence of a target-end event in that thread in the context of the target task on the host. These callbacks have type signature ompt_callback_target_t.
A thread dispatches a registered ompt_callback_target_submit callback for each occurrence of a target-submit event in that thread. The callback has type signature ompt_callback_target_submit_t.
Restrictions
2 3 4
5 6 7 8 9
10 11
12 13 14
15
16 17
18 19
20
21 22
23 24
25
26
27
28 29
30 31 32
33 34
• •
• •
•
• • • •
•
•
If a target update, target data, target enter data, or target exit data construct is encountered during execution of a target region, the behavior is unspecified.
The result of an omp_set_default_device, omp_get_default_device, or omp_get_num_devices routine called within a target region is unspecified.
The effect of an access to a threadprivate variable in a target region is unspecified.
If a list item in a map clause is a structure element, any other element of that structure that is
referenced in the target construct must also appear as a list item in a map clause.
A variable referenced in a target region but not the target construct that is not declared in
the target region must appear in a declare target directive.
At most one defaultmap clause for each category can appear on the directive.
At most one nowait clause can appear on the directive.
A map-type in a map clause must be to, from, tofrom or alloc.
A list item that appears in an is_device_ptr clause must be a valid device pointer in the device data environment.
At most one device clause can appear on the directive. The device clause expression must evaluate to a non-negative integer value less than the value of omp_get_num_devices() or to the value of omp_get_initial_device().
If a device clause in which the ancestor device-modifier appears is present on the construct, then the following restrictions apply:
174
OpenMP API – Version 5.0 November 2018
1 2
3 4
5 6
7 • 8
9
10 • 11
12 •
13 •
14 •
15 16
17 • 18
19 • 20
21 •
22 •
23 •
24
25 • 26
27 • 28
– A requires directive with the reverse_offload clause must be specified;
– The device clause expression must evaluate to 1;
– Only the device, firstprivate, private, defaultmap, and map clauses may appear on the construct;
– No OpenMP constructs or calls to OpenMP API runtime routines are allowed inside the corresponding target region.
Memory allocators that do not appear in a uses_allocators clause cannot appear as an allocator in an allocate clause or be used in the target region unless a requires directive with the dynamic_allocators clause is present in the same compilation unit.
Memory allocators that appear in a uses_allocators clause cannot appear in other data-sharing attribute clauses or data-mapping attribute clauses in the same construct.
Predefined allocators appearing in a uses_allocators clause cannot have traits specified. Non-predefined allocators appearing in a uses_allocators clause must have traits specified.
Arrays that contain allocator traits that appear in a uses_allocators clause must be constant arrays, have constant values and be defined in the same scope as the construct in which the clause appears.
Any IEEE floating-point exception status flag, halting mode, or rounding mode set prior to a target region is unspecified in the region.
Any IEEE floating-point exception status flag, halting mode, or rounding mode set in a target region is unspecified upon exiting the region.
C / C++
An attached pointer must not be modified in a target region. C / C++
C
A list item that appears in an is_device_ptr clause must have a type of pointer or array. C
C++
A list item that appears in an is_device_ptr clause must have a type of pointer, array, reference to pointer or reference to array.
The effect of invoking a virtual member function of an object on a device other than the device on which the object was constructed is implementation defined.
A throw executed inside a target region must cause execution to resume within the same target region, and the same thread that threw the exception must catch it.
C++
CHAPTER2. DIRECTIVES 175
1 2
3 4
5 6 7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23 2.12.6 24
25 26 27
• • •
Fortran
An attached pointer that is associated with a given pointer target must not become associated with a different pointer target in a target region.
A list item that appears in an is_device_ptr clause must be a dummy argument that does not have the ALLOCATABLE, POINTER or VALUE attribute.
If a list item in a map clause is an array section, and the array section is derived from a variable with a POINTER or ALLOCATABLE attribute then the behavior is unspecified if the corresponding list item’s variable is modified in the region.
Fortran
176
OpenMP API – Version 5.0 November 2018
Cross References
• default-device-var, see Section 2.5 on page 63.
• task construct, see Section 2.10.1 on page 135.
• task scheduling constraints, see Section 2.10.6 on page 149
• Memory allocators, see Section 2.11.2 on page 152.
• target data construct, see Section 2.12.2 on page 161.
• if Clause, see Section 2.15 on page 220.
• private and firstprivate clauses, see Section 2.19.4 on page 282.
• Data-Mapping Attribute Rules and Clauses, see Section 2.19.7 on page 314.
• omp_get_num_devices routine, see Section 3.2.36 on page 371.
• omp_alloctrait_t and omp_alloctrait types, see Section 3.7.1 on page 406. • omp_set_default_allocator routine, see Section 3.7.4 on page 411.
• omp_get_default_allocator routine, see Section 3.7.5 on page 412.
• ompt_callback_target_t, see Section 4.5.2.26 on page 490.
• ompt_callback_target_submit_t, Section 4.5.2.28 on page 494.
target update Construct Summary
The target update directive makes the corresponding list items in the device data environment consistent with their original list items, according to the specified motion clauses. The
target update construct is a stand-alone directive.
1 Syntax
2 The syntax of the target update construct is as follows:
3 #pragma omp target update clause[[[,]clause]…]new-line
4 where clause is either motion-clause or one of the following:
5 6 7 8
9 and motion-clause is one of the following: 10
11
C / C++ Fortran
12 The syntax of the target update construct is as follows:
13 !$omp target update clause[[[,]clause]…]
14 where clause is either motion-clause or one of the following:
15 16 17 18
19 and motion-clause is one of the following: 20
C / C++
if([ target update :]scalar-expression)
device(integer-expression)
nowait
depend([depend-modifier,] dependence-type : locator-list)
to([mapper(mapper-identifier):]locator-list) from([mapper(mapper-identifier):]locator-list)
if([target update :]scalar-logical-expression) device(scalar-integer-expression)
nowait
depend([depend-modifier,] dependence-type : locator-list)
to([mapper(mapper-identifier):]locator-list)
from([mapper(mapper-identifier):]locator-list)
21
22 Binding
Fortran
23 The binding task set for a target update region is the generating task, which is the target task
24 generated by the target update construct. The target update region binds to the
25 corresponding target task region.
CHAPTER2. DIRECTIVES 177
2 3 4 5 6 7 8 9
10 11
12 13
14 15
16 17
18 19
20
21 22
23 24
25
26 27
28 29
30
– –
– –
C / C++
On exit from the region that part of the original list item will have the value it had on entry to the region;
On exit from the region that part of the corresponding list item will have the value it had on entry to the region;
C / C++ Fortran
On exit from the region that part of the original list item, if associated, will be associated with the same pointer target with which it was associated on entry to the region;
On exit from the region that part of the corresponding list item, if associated, will be associated with the same pointer target with which it was associated on entry to the region.
Fortran
1
Description
For each list item in a to or from clause there is a corresponding list item and an original list item. If the corresponding list item is not present in the device data environment then no assignment occurs to or from the original list item. Otherwise, each corresponding list item in the device data environment has an original list item in the current task’s data environment. If a mapper() modifier appears in a to clause, each list item is replaced with the list items that the given mapper specifies are to be mapped with a to or tofrom map-type. If a mapper() modifier appears in a from clause, each list item is replaced with the list items that the given mapper specifies are to be mapped with a from or tofrom map-type.
For each list item in a from or a to clause:
• For each part of the list item that is an attached pointer:
• For each part of the list item that is not an attached pointer:
–
–
If the clause is from, the value of that part of the corresponding list item is assigned to that part of the original list item;
If the clause is to, the value of that part of the original list item is assigned to that part of the corresponding list item.
• To avoid data races:
– –
Concurrent reads or updates of any part of the original list item must be synchronized with the update of the original list item that occurs as a result of the from clause;
Concurrent reads or updates of any part of the corresponding list item must be synchronized with the update of the corresponding list item that occurs as a result of the to clause.
C / C++
The list items that appear in the to or from clauses may use shape-operators. C / C++
178
OpenMP API – Version 5.0 November 2018
1 The list items that appear in the to or from clauses may include array sections with stride
2 expressions.
3 The target update construct is a task generating construct. The generated task is a target task.
4 The generated task region encloses the target update region.
5 All clauses are evaluated when the target update construct is encountered. The data
6 environment of the target task is created according to the data-sharing attribute clauses on the
7 target update construct, per-data environment ICVs, and any default data-sharing attribute
8 rules that apply to the target update construct. A variable that is mapped in the
9 target update construct has a default data-sharing attribute of shared in the data
10 environment of the target task.
11 Assignment operations associated with mapping a variable (see Section 2.19.7.1 on page 315)
12 occur when the target task executes.
13 If the nowait clause is present, execution of the target task may be deferred. If the nowait
14 clause is not present, the target task is an included task.
15 If a depend clause is present, it is associated with the target task.
16 The device is specified in the device clause. If there is no device clause, the device is
17 determined by the default-device-var ICV. When an if clause is present and the if clause
18 expression evaluates to false then no assignments occur.
19 Execution Model Events
20 Events associated with a target task are the same as for the task construct defined in
21 Section 2.10.1 on page 135.
22 The target-update-begin event occurs when a thread enters a target update region.
23 The target-update-end event occurs when a thread exits a target update region.
24 Tool Callbacks
25 Callbacks associated with events for target tasks are the same as for the task construct defined in
26 Section 2.10.1 on page 135; (flags & ompt_task_target) always evaluates to true in the
27 dispatched callback.
28 A thread dispatches a registered ompt_callback_target callback with
29 ompt_scope_begin as its endpoint argument and ompt_target_update as its kind
30 argument for each occurrence of a target-update-begin event in that thread in the context of the
31 target task on the host. Similarly, a thread dispatches a registered ompt_callback_target
32 callback with ompt_scope_end as its endpoint argument and ompt_target_update as its
33 kind argument for each occurrence of a target-update-end event in that thread in the context of the
34 target task on the host. These callbacks have type signature ompt_callback_target_t.
CHAPTER2. DIRECTIVES 179
1
Restrictions
• A program must not depend on any ordering of the evaluations of the clauses of the target update directive, or on any side effects of the evaluations of the clauses.
• At least one motion-clause must be specified.
• A list item can only appear in a to or from clause, but not both.
• A list item in a to or from clause must have a mappable type.
• At most one device clause can appear on the directive. The device clause expression must evaluate to a non-negative integer value less than the value of omp_get_num_devices() or to the value of omp_get_initial_device().
• At most one if clause can appear on the directive.
• At most one nowait clause can appear on the directive.
Cross References
• Array shaping, Section 2.1.4 on page 43
• Array sections, Section 2.1.5 on page 44
• default-device-var, see Section 2.5 on page 63.
• task construct, see Section 2.10.1 on page 135.
• task scheduling constraints, see Section 2.10.6 on page 149
• target data, see Section 2.12.2 on page 161.
• if Clause, see Section 2.15 on page 220.
• omp_get_num_devices routine, see Section 3.2.36 on page 371.
• ompt_callback_task_create_t, see Section 4.5.2.7 on page 467. • ompt_callback_target_t, see Section 4.5.2.26 on page 490.
declare target Directive Summary
The declare target directive specifies that variables, functions (C, C++ and Fortran), and subroutines (Fortran) are mapped to a device. The declare target directive is a declarative directive.
2 3
4 5 6
7 8 9
10 11
12
13
14
15
16
17
18
19
20
21
22
23 2.12.7 24
25 26 27
180
OpenMP API – Version 5.0 November 2018
2
3 4 5
6 7
8 9
10
11 12 13
14 15
16 17
18
19 20 21
C / C++
The syntax of the declare target directive takes either of the following forms:
or
#pragma omp declare target (extended-list) new-line or
#pragma omp declare target clause[[,]clause…]new-line where clause is one of the following:
C / C++ Fortran
The syntax of the declare target directive is as follows: !$omp declare target (extended-list)
or
!$omp declare target [clause[[,]clause]…] where clause is one of the following:
Fortran
1
Syntax
#pragma omp declare target new-line declaration-definition-seq
#pragma omp end declare target new-line
to(extended-list)
link(list)
device_type(host | nohost | any)
to(extended-list)
link(list)
device_type(host | nohost | any)
CHAPTER2. DIRECTIVES 181
2 3 4
5
6 7 8 9
10 11
12 13 14
15 16
17 18
19 20 21 22
23 24 25 26
1
Description
The declare target directive ensures that procedures and global variables can be executed or accessed on a device. Variables are mapped for all device executions, or for specific device executions through a link clause.
If an extended-list is present with no clause then the to clause is assumed.
The device_type clause specifies if a version of the procedure should be made available on host, device or both. If host is specified only a host version of the procedure is made available. If nohost is specified then only a device version of the procedure is made available. If any is specified then both device and host versions of the procedure are made available.
C / C++
If a function appears in a to clause in the same translation unit in which the definition of the function occurs then a device-specific version of the function is created.
If a variable appears in a to clause in the same translation unit in which the definition of the variable occurs then the original list item is allocated a corresponding list item in the device data environment of all devices.
C / C++ Fortran
If an internal procedure appears in a to clause then a device-specific version of the procedure is created.
If a variable that is host associated appears in a to clause then the original list item is allocated a corresponding list item in the device data environment of all devices.
Fortran
If a variable appears in a to clause then the corresponding list item in the device data environment of each device is initialized once, in the manner specified by the program, but at an unspecified point in the program prior to the first reference to that list item. The list item is never removed from those device data environments as if its reference count is initialized to positive infinity.
Including list items in a link clause supports compilation of functions called in a target region that refer to the list items. The list items are not mapped by the declare target directive. Instead, they are mapped according to the data mapping rules described in Section 2.19.7 on
page 314.
182
OpenMP API – Version 5.0 November 2018
C / C++
1 If a function is referenced in a function that appears as a list item in a to clause on a
2 declare target directive then the name of the referenced function is treated as if it had
3 appeared in a to clause on a declare target directive.
4 If a variable with static storage duration or a function (except lambda for C++) is referenced in the
5 initializer expression list of a variable with static storage duration that appears as a list item in a to
6 clause on a declare target directive then the name of the referenced variable or function is
7 treated as if it had appeared in a to clause on a declare target directive.
8 The form of the declare target directive that has no clauses and requires a matching
9 end declare target directive defines an implicit extended-list to an implicit to clause. The
10 implicit extended-list consists of the variable names of any variable declarations at file or
11 namespace scope that appear between the two directives and of the function names of any function
12 declarations at file, namespace or class scope that appear between the two directives.
13 The declaration-definition-seq defined by a declare target directive and an
14 end declare target directive may contain declare target directives. If a
15 device_type clause is present on the contained declare target directive, then its argument
16 determines which versions are made available. If a list item appears both in an implicit and explicit
17 list, the explicit list determines which versions are made available.
C / C++ Fortran
18 If a procedure is referenced in a procedure that appears as a list item in a to clause on a
19 declare target directive then the name of the procedure is treated as if it had appeared in a to
20 clause on a declare target directive.
21 If a declare target does not have any clauses then an implicit extended-list to an implicit to
22 clause of one item is formed from the name of the enclosing subroutine subprogram, function
23 subprogram or interface body to which it applies.
24 If a declare target directive has a device_type clause then any enclosed internal
25 procedures cannot contain any declare target directives. The enclosing device_type
26 clause implicitly applies to internal procedures.
27 Restrictions
28 •
29 •
30 •
31 •
32
A threadprivate variable cannot appear in a declare target directive.
A variable declared in a declare target directive must have a mappable type.
The same list item must not appear multiple times in clauses on the same directive.
The same list item must not explicitly appear in both a to clause on one declare target directive and a link clause on another declare target directive.
Fortran
CHAPTER2. DIRECTIVES 183
1 2
3 4 5
6
7 8
9 10
11 12
13 14 15
16 17 18
19 20
21 22
23 24 25
26 27
28
29 30
31 32
• •
• •
• • •
•
• • •
•
• •
•
C++
The function names of overloaded functions or template functions may only be specified within an implicit extended-list.
If a lambda declaration and definition appears between a declare target directive and the matching end declare target directive, all variables that are captured by the lambda expression must also appear in a to clause.
C++ Fortran
If a list item is a procedure name, it must not be a generic name, procedure pointer or entry name.
Any declare target directive with clauses must appear in a specification part of a subroutine subprogram, function subprogram, program or module.
Any declare target directive without clauses must appear in a specification part of a subroutine subprogram, function subprogram or interface body to which it applies.
If a declare target directive is specified in an interface block for a procedure, it must match a declare target directive in the definition of the procedure.
If an external procedure is a type-bound procedure of a derived type and a declare target directive is specified in the definition of the external procedure, such a directive must appear in the interface block that is accessible to the derived type definition.
If any procedure is declared via a procedure declaration statement that is not in the type-bound procedure part of a derived-type definition, any declare target with the procedure name must appear in the same specification part.
A variable that is part of another variable (as an array, structure element or type parameter inquiry) cannot appear in a declare target directive.
The declare target directive must appear in the declaration section of a scoping unit in which the common block or variable is declared.
If a declare target directive that specifies a common block name appears in one program unit, then such a directive must also appear in every other program unit that contains a COMMON statement that specifies the same name, after the last such COMMON statement in the program unit.
If a list item is declared with the BIND attribute, the corresponding C entities must also be specified in a declare target directive in the C program.
A blank common block cannot appear in a declare target directive.
A variable can only appear in a declare target directive in the scope in which it is declared.
It must not be an element of a common block or appear in an EQUIVALENCE statement.
A variable that appears in a declare target directive must be declared in the Fortran scope
of a module or have the SAVE attribute, either explicitly or implicitly. Fortran
184
OpenMP API – Version 5.0 November 2018
1 2 3
4 2.13
5 6 7
8
9 2.13.1 10
11 12
13
14
15 16
17 18
Cross References
• target data construct, see Section 2.12.2 on page 161. • target construct, see Section 2.12.5 on page 170.
Combined Constructs
Combined constructs are shortcuts for specifying one construct immediately nested inside another construct. The semantics of the combined constructs are identical to that of explicitly specifying the first construct containing one instance of the second construct and no other statements.
For combined constructs, tool callbacks are invoked as if the constructs were explicitly nested.
Parallel Worksharing-Loop Construct Summary
The parallel worksharing-loop construct is a shortcut for specifying a parallel construct containing a worksharing-loop construct with one or more associated loops and no other statements.
Syntax
C / C++
The syntax of the parallel worksharing-loop construct is as follows:
where clause can be any of the clauses accepted by the parallel or for directives, except the nowait clause, with identical meanings and restrictions.
C / C++
#pragma omp parallel for [clause[[,]clause]…]new-line for-loops
CHAPTER2. DIRECTIVES 185
1
2 3 4
5 6
7 8
9
10 11
12 13
14 15 16 17
18 2.13.2 19
20 21
Fortran
The syntax of the parallel worksharing-loop construct is as follows:
where clause can be any of the clauses accepted by the parallel or do directives, with identical meanings and restrictions.
If an end parallel do directive is not specified, an end parallel do directive is assumed at the end of the do-loops. nowait may not be specified on an end parallel do directive.
Fortran
Description
The semantics are identical to explicitly specifying a parallel directive immediately followed by a worksharing-loop directive.
Restrictions
• The restrictions for the parallel construct and the worksharing-loop construct apply.
Cross References
• parallel construct, see Section 2.6 on page 74.
• Worksharing-loop construct, see Section 2.9.2 on page 101. • Data attribute clauses, see Section 2.19.4 on page 282.
parallel loop Construct Summary
The parallel loop construct is a shortcut for specifying a parallel construct containing a loop construct with one or more associated loops and no other statements.
!$omp parallel do [clause[[,]clause]…] do-loops
[!$omp end parallel do]
186
OpenMP API – Version 5.0 November 2018
1 Syntax
2 The syntax of the parallel loop construct is as follows:
3 4
5 where clause can be any of the clauses accepted by the parallel or loop directives, with
6 identical meanings and restrictions.
C / C++ Fortran
7 The syntax of the parallel loop construct is as follows:
8
9 10
11 where clause can be any of the clauses accepted by the parallel or loop directives, with
12 identical meanings and restrictions.
13 If an end parallel loop directive is not specified, an end parallel loop directive is
14 assumed at the end of the do-loops. nowait may not be specified on an end parallel loop
15 directive.
Fortran
16 Description
17 The semantics are identical to explicitly specifying a parallel directive immediately followed
18 by a loop directive.
19 Restrictions
20 • The restrictions for the parallel construct and the loop construct apply.
21 Cross References
22 • parallel construct, see Section 2.6 on page 74.
23 • loop construct, see Section 2.9.5 on page 128.
24 • Data attribute clauses, see Section 2.19.4 on page 282.
C / C++
#pragma omp parallel loop [clause[[,]clause]…]new-line for-loops
!$omp parallel loop [clause[[,]clause]…]
do-loops
[!$omp end parallel loop]
CHAPTER2. DIRECTIVES 187
1 2.13.3 2
3 4
5
6
7 8 9
10 11 12 13 14
15 16
17
18
19
20
21
22
23
24
25 26
27 28
parallel sections Construct Summary
The parallel sections construct is a shortcut for specifying a parallel construct containing a sections construct and no other statements.
Syntax
C / C++
The syntax of the parallel sections construct is as follows:
#pragma omp parallel sections [clause[[,]clause]…]new-line {
[#pragma omp section new-line] structured-block
[#pragma omp section new-line structured-block]
…
}
where clause can be any of the clauses accepted by the parallel or sections directives, except the nowait clause, with identical meanings and restrictions.
C / C++ Fortran
The syntax of the parallel sections construct is as follows:
!$omp parallel sections [clause[[,]clause]…]
[!$omp section] structured-block
[!$omp section structured-block]
…
!$omp end parallel sections
where clause can be any of the clauses accepted by the parallel or sections directives, with identical meanings and restrictions.
The last section ends at the end parallel sections directive. nowait cannot be specified on an end parallel sections directive.
Fortran
188
OpenMP API – Version 5.0 November 2018
1
Description
2 3
4 5 6
7 8
9 10 11 12
13 2.13.4 14
15 16
17
18
19 20 21
22 23 24
C / C++
The semantics are identical to explicitly specifying a parallel directive immediately followed by a sections directive.
C / C++ Fortran
The semantics are identical to explicitly specifying a parallel directive immediately followed by a sections directive, and an end sections directive immediately followed by an
end parallel directive.
Fortran
Restrictions
The restrictions for the parallel construct and the sections construct apply.
Cross References
• parallel construct, see Section 2.6 on page 74.
• sections construct, see Section 2.8.1 on page 86.
• Data attribute clauses, see Section 2.19.4 on page 282.
Fortran
parallel workshare Construct Summary
The parallel workshare construct is a shortcut for specifying a parallel construct containing a workshare construct and no other statements.
Syntax
The syntax of the parallel workshare construct is as follows:
where clause can be any of the clauses accepted by the parallel directive, with identical meanings and restrictions. nowait may not be specified on an end parallel workshare directive.
!$omp parallel workshare [clause[[,]clause]…] structured-block
!$omp end parallel workshare
CHAPTER2. DIRECTIVES 189
1
Description
The semantics are identical to explicitly specifying a parallel directive immediately followed by a workshare directive, and an end workshare directive immediately followed by an end parallel directive.
Restrictions
The restrictions for the parallel construct and the workshare construct apply.
Cross References
• parallel construct, see Section 2.6 on page 74.
• workshare construct, see Section 2.8.3 on page 92.
• Data attribute clauses, see Section 2.19.4 on page 282. Fortran
Parallel Worksharing-Loop SIMD Construct Summary
The parallel worksharing-loop SIMD construct is a shortcut for specifying a parallel construct containing a worksharing-loop SIMD construct and no other statements.
Syntax
C / C++
The syntax of the parallel worksharing-loop SIMD construct is as follows:
where clause can be any of the clauses accepted by the parallel or for simd directives, except the nowait clause, with identical meanings and restrictions.
C / C++
2 3 4
5 6
7 8 9
10
11 2.13.5 12
13 14
15
16
17 18
19 20
190
OpenMP API – Version 5.0 November 2018
#pragma omp parallel for simd [clause[[,]clause]…]new-line for-loops
1
2 3 4
5 6
7 8 9
10
11 12
13 14
15 16 17 18
19 2.13.6 20
21 22
Fortran
The syntax of the parallel worksharing-loop SIMD construct is as follows:
where clause can be any of the clauses accepted by the parallel or do simd directives, with identical meanings and restrictions.
If an end parallel do simd directive is not specified, an end parallel do simd directive is assumed at the end of the do-loops. nowait may not be specified on an end parallel
do simd directive.
Fortran
Description
The semantics of the parallel worksharing-loop SIMD construct are identical to explicitly specifying a parallel directive immediately followed by a worksharing-loop SIMD directive.
Restrictions
The restrictions for the parallel construct and the worksharing-loop SIMD construct apply.
Cross References
• parallel construct, see Section 2.6 on page 74.
• Worksharing-loop SIMD construct, see Section 2.9.3.2 on page 114. • Data attribute clauses, see Section 2.19.4 on page 282.
parallel master Construct Summary
The parallel master construct is a shortcut for specifying a parallel construct containing a master construct and no other statements.
!$omp parallel do simd [clause[[,]clause]…] do-loops
[!$omp end parallel do simd]
CHAPTER2. DIRECTIVES 191
1
Syntax
2
3 4
5 6
7
8
9 10
11 12
13
14 15
16 17
18 19 20 21
22 2.13.7 23
24 25
C / C++
The syntax of the parallel master construct is as follows:
where clause can be any of the clauses accepted by the parallel or master directives, with identical meanings and restrictions.
C / C++ Fortran
The syntax of the parallel master construct is as follows:
where clause can be any of the clauses accepted by the parallel or master directives, with identical meanings and restrictions.
Fortran
Description
The semantics are identical to explicitly specifying a parallel directive immediately followed by a master directive.
Restrictions
The restrictions for the parallel construct and the master construct apply.
Cross References
• parallel construct, see Section 2.6 on page 74.
• master construct, see Section 2.16 on page 221.
• Data attribute clauses, see Section 2.19.4 on page 282.
master taskloop Construct Summary
The master taskloop construct is a shortcut for specifying a master construct containing a taskloop construct and no other statements.
#pragma omp parallel master [clause[[,]clause]…]new-line structured-block
!$omp parallel master [clause[[,]clause]…] structured-block
!$omp end parallel master
192
OpenMP API – Version 5.0 November 2018
1 Syntax
2 The syntax of the master taskloop construct is as follows:
3 4
5 where clause can be any of the clauses accepted by the master or taskloop directives with
6 identical meanings and restrictions.
C / C++ Fortran
7 The syntax of the master taskloop construct is as follows:
8
9 10
11 where clause can be any of the clauses accepted by the master or taskloop directives with
12 identical meanings and restrictions.
13 If an end master taskloop directive is not specified, an end master taskloop directive is
14 assumed at the end of the do-loops.
Fortran
15 Description
16 The semantics are identical to explicitly specifying a master directive immediately followed by a
17 taskloop directive.
18 Restrictions
19 The restrictions for the master and taskloop constructs apply.
20 Cross References
21 • taskloop construct, see Section 2.10.2 on page 140.
22 • master construct, see Section 2.16 on page 221.
23 • Data attribute clauses, see Section 2.19.4 on page 282.
C / C++
#pragma omp master taskloop [clause[[,]clause]…]new-line for-loops
!$omp master taskloop [clause[[,]clause]…]
do-loops
[!$omp end master taskloop]
CHAPTER2. DIRECTIVES 193
1 2.13.8 2
3 4
5
6
7 8
9 10
11
12 13 14
15 16
17 18
19
20 21
22 23
master taskloop simd Construct Summary
The master taskloop simd construct is a shortcut for specifying a master construct containing a taskloop simd construct and no other statements.
Syntax
C / C++
The syntax of the master taskloop simd construct is as follows:
where clause can be any of the clauses accepted by the master or taskloop simd directives with identical meanings and restrictions.
C / C++ Fortran
The syntax of the master taskloop simd construct is as follows:
where clause can be any of the clauses accepted by the master or taskloop simd directives with identical meanings and restrictions.
If an end master taskloop simd directive is not specified, an end master taskloop simd directive is assumed at the end of the do-loops.
Fortran
Description
The semantics are identical to explicitly specifying a master directive immediately followed by a taskloop simd directive.
Restrictions
The restrictions for the master and taskloop simd constructs apply.
#pragma omp master taskloop simd [clause[[,]clause]…]new-line for-loops
!$omp master taskloop simd [clause[[,]clause]…]
do-loops
[!$omp end master taskloop simd]
194
OpenMP API – Version 5.0 November 2018
1 2 3 4
5 2.13.9 6
7 8
9
10
11 12
13 14
15
16 17 18
19 20
21 22
23
24 25
Cross References
• taskloop simd construct, see Section 2.10.3 on page 146. • master construct, see Section 2.16 on page 221.
• Data attribute clauses, see Section 2.19.4 on page 282.
parallel master taskloop Construct Summary
The parallel master taskloop construct is a shortcut for specifying a parallel construct containing a master taskloop construct and no other statements.
Syntax
C / C++
The syntax of the parallel master taskloop construct is as follows:
where clause can be any of the clauses accepted by the parallel or master taskloop directives, except the in_reduction clause, with identical meanings and restrictions.
C / C++ Fortran
The syntax of the parallel master taskloop construct is as follows:
where clause can be any of the clauses accepted by the parallel or master taskloop directives, except the in_reduction clause, with identical meanings and restrictions.
If an end parallel master taskloop directive is not specified, an
end parallel master taskloop directive is assumed at the end of the do-loops.
Fortran
Description
The semantics are identical to explicitly specifying a parallel directive immediately followed by a master taskloop directive.
#pragma omp parallel master taskloop [clause[[,]clause]…]new-line for-loops
!$omp parallel master taskloop [clause[[,]clause]…]
do-loops
[!$omp end parallel master taskloop]
CHAPTER2. DIRECTIVES 195
8
9 10
11
12
13 14
15 16
17
18 19 20
21 22
23 24
Summary
The parallel master taskloop simd construct is a shortcut for specifying a parallel construct containing a master taskloop simd construct and no other statements.
Syntax
C / C++
The syntax of the parallel master taskloop simd construct is as follows:
where clause can be any of the clauses accepted by the parallel or master taskloop simd directives, except the in_reduction clause, with identical meanings and restrictions.
C / C++ Fortran
The syntax of the parallel master taskloop simd construct is as follows:
where clause can be any of the clauses accepted by the parallel or master taskloop simd directives, except the in_reduction clause, with identical meanings and restrictions.
If an end parallel master taskloop simd directive is not specified, an end parallel master taskloop simd directive is assumed at the end of the do-loops.
Fortran
1 Restrictions
2 The restrictions for the parallel construct and the master taskloop construct apply.
3 Cross References
4 • parallel construct, see Section 2.6 on page 74.
5 • master taskloop construct, see Section 2.13.7 on page 192.
6 • Data attribute clauses, see Section 2.19.4 on page 282.
7 2.13.10 parallel master taskloop simd Construct
#pragma omp parallel master taskloop simd [clause[[,]clause]…]new-line for-loops
!$omp parallel master taskloop simd [clause[[,]clause]…] do-loops
[!$omp end parallel master taskloop simd]
196
OpenMP API – Version 5.0 November 2018
1 Description
2 The semantics are identical to explicitly specifying a parallel directive immediately followed
3 by a master taskloop simd directive.
4 Restrictions
5 The restrictions for the parallel construct and the master taskloop simd construct apply.
6 Cross References
7 • parallel construct, see Section 2.6 on page 74.
8 • master taskloop simd construct, see Section 2.13.8 on page 194.
9 • Data attribute clauses, see Section 2.19.4 on page 282.
10 2.13.11 teams distribute Construct
11 Summary
12 The teams distribute construct is a shortcut for specifying a teams construct containing a
13 distribute construct and no other statements.
14 Syntax
15 The syntax of the teams distribute construct is as follows:
16 17
18 where clause can be any of the clauses accepted by the teams or distribute directives with
19 identical meanings and restrictions.
C / C++
C / C++
#pragma omp teams distribute [clause[[,]clause]…]new-line
for-loops
CHAPTER2. DIRECTIVES 197
Fortran
1 The syntax of the teams distribute construct is as follows:
2 3 4
5 where clause can be any of the clauses accepted by the teams or distribute directives with
6 identical meanings and restrictions.
7 If an end teams distribute directive is not specified, an end teams distribute
8 directive is assumed at the end of the do-loops.
Fortran
9 Description
10 The semantics are identical to explicitly specifying a teams directive immediately followed by a
11 distribute directive.
12 Restrictions
13 The restrictions for the teams and distribute constructs apply.
14 Cross References
15 • teams construct, see Section 2.7 on page 82.
16 • distribute construct, see Section 2.9.4.1 on page 120.
17 • Data attribute clauses, see Section 2.19.4 on page 282.
18 2.13.12 teams distribute simd Construct
19
20 21
Summary
The teams distribute simd construct is a shortcut for specifying a teams construct containing a distribute simd construct and no other statements.
198
OpenMP API – Version 5.0 November 2018
!$omp teams distribute [clause[[,]clause]…] do-loops
[!$omp end teams distribute]
1 Syntax
2 The syntax of the teams distribute simd construct is as follows:
3 4
5 where clause can be any of the clauses accepted by the teams or distribute simd directives
6 with identical meanings and restrictions.
C / C++ Fortran
7 The syntax of the teams distribute simd construct is as follows:
8
9 10
11 where clause can be any of the clauses accepted by the teams or distribute simd directives
12 with identical meanings and restrictions.
13 If an end teams distribute simd directive is not specified, an end teams
14 distribute simd directive is assumed at the end of the do-loops.
Fortran
15 Description
16 The semantics are identical to explicitly specifying a teams directive immediately followed by a
17 distribute simd directive.
18 Restrictions
19 The restrictions for the teams and distribute simd constructs apply.
20 Cross References
21 • teams construct, see Section 2.7 on page 82.
22 • distribute simd construct, see Section 2.9.4.2 on page 123.
23 • Data attribute clauses, see Section 2.19.4 on page 282.
C / C++
#pragma omp teams distribute simd [clause[[,]clause]…]new-line for-loops
!$omp teams distribute simd [clause[[,]clause]…]
do-loops
[!$omp end teams distribute simd]
CHAPTER2. DIRECTIVES 199
2
3 4
5
6
7 8 9
10 11
12
13 14 15
16 17
18 19
20
21 22
23 24
Summary
The teams distribute parallel worksharing-loop construct is a shortcut for specifying a teams construct containing a distribute parallel worksharing-loop construct and no other statements.
Syntax
C / C++
The syntax of the teams distribute parallel worksharing-loop construct is as follows:
where clause can be any of the clauses accepted by the teams or distribute parallel for directives with identical meanings and restrictions.
C / C++ Fortran
The syntax of the teams distribute parallel worksharing-loop construct is as follows:
where clause can be any of the clauses accepted by the teams or distribute parallel do directives with identical meanings and restrictions.
If an end teams distribute parallel do directive is not specified, an end teams distribute parallel do directive is assumed at the end of the do-loops.
Fortran
Description
The semantics are identical to explicitly specifying a teams directive immediately followed by a distribute parallel worksharing-loop directive.
Restrictions
The restrictions for the teams and distribute parallel worksharing-loop constructs apply.
1 2.13.13 Teams Distribute Parallel Worksharing-Loop Construct
#pragma omp teams distribute parallel for \ [clause[ [,] clause] … ] new-line
for-loops
!$omp teams distribute parallel do [clause[[,]clause]…] do-loops
[ !$omp end teams distribute parallel do ]
200
OpenMP API – Version 5.0 November 2018
1 2 3 4
5 2.13.14 6
7
8
9 10
11
12
13 14 15
16 17
18
19 20 21
22 23
24 25
Cross References
• teams construct, see Section 2.7 on page 82.
• Distribute parallel worksharing-loop construct, see Section 2.9.4.3 on page 125. • Data attribute clauses, see Section 2.19.4 on page 282.
Teams Distribute Parallel Worksharing-Loop SIMD Construct
Summary
The teams distribute parallel worksharing-loop SIMD construct is a shortcut for specifying a teams construct containing a distribute parallel worksharing-loop SIMD construct and no other statements.
Syntax
C / C++
The syntax of the teams distribute parallel worksharing-loop SIMD construct is as follows:
where clause can be any of the clauses accepted by the teams or distribute parallel for simd directives with identical meanings and restrictions.
C / C++ Fortran
The syntax of the teams distribute parallel worksharing-loop SIMD construct is as follows:
where clause can be any of the clauses accepted by the teams or distribute parallel do simd directives with identical meanings and restrictions.
If an end teams distribute parallel do simd directive is not specified, an end teams distribute parallel do simd directive is assumed at the end of the do-loops.
Fortran
#pragma omp teams distribute parallel for simd \ [clause[ [,] clause] … ] new-line
for-loops
!$omp teams distribute parallel do simd [clause[[,]clause]…]
do-loops
[!$omp end teams distribute parallel do simd]
CHAPTER2. DIRECTIVES 201
1 Description
2 The semantics are identical to explicitly specifying a teams directive immediately followed by a
3 distribute parallel worksharing-loop SIMD directive.
4 Restrictions
5 The restrictions for the teams and distribute parallel worksharing-loop SIMD constructs apply.
6 Cross References
7 • teams construct, see Section 2.7 on page 82.
8 • Distribute parallel worksharing-loop SIMD construct, see Section 2.9.4.4 on page 126.
9 • Data attribute clauses, see Section 2.19.4 on page 282.
10 2.13.15 teams loop Construct
11
12 13
14
15
16 17
18 19
Summary
The teams loop construct is a shortcut for specifying a teams construct containing a loop construct and no other statements.
Syntax
C / C++
The syntax of the teams loop construct is as follows:
where clause can be any of the clauses accepted by the teams or loop directives with identical meanings and restrictions.
C / C++
#pragma omp teams loop [clause[[,]clause]…]new-line
for-loops
202
OpenMP API – Version 5.0 November 2018
Fortran
1 The syntax of the teams loop construct is as follows:
2 3 4
5 where clause can be any of the clauses accepted by the teams or loop directives with identical
6 meanings and restrictions.
7 If an end teams loop directive is not specified, an end teams loop directive is assumed at the
8 end of the do-loops.
Fortran
9 Description
10 The semantics are identical to explicitly specifying a teams directive immediately followed by a
11 loop directive.
12 Restrictions
13 The restrictions for the teams and loop constructs apply.
14 Cross References
15 • teams construct, see Section 2.7 on page 82.
16 • loop construct, see Section 2.9.5 on page 128.
17 • Data attribute clauses, see Section 2.19.4 on page 282.
18 2.13.16 target parallel Construct
19 Summary
20 The target parallel construct is a shortcut for specifying a target construct containing a
21 parallel construct and no other statements.
!$omp teams loop [clause[[,]clause]…] do-loops
[!$omp end teams loop]
CHAPTER2. DIRECTIVES 203
2
3 4
5 6
7
8
9 10
11 12
13
14 15
16
17 18
19 20
21 22 23
C / C++
The syntax of the target parallel construct is as follows:
where clause can be any of the clauses accepted by the target or parallel directives, except for copyin, with identical meanings and restrictions.
C / C++ Fortran
The syntax of the target parallel construct is as follows:
where clause can be any of the clauses accepted by the target or parallel directives, except for copyin, with identical meanings and restrictions.
Fortran
Description
The semantics are identical to explicitly specifying a target directive immediately followed by a parallel directive.
Restrictions
The restrictions for the target and parallel constructs apply except for the following explicit modifications:
• If any if clause on the directive includes a directive-name-modifier then all if clauses on the directive must include a directive-name-modifier.
• At most one if clause without a directive-name-modifier can appear on the directive.
• At most one if clause with the parallel directive-name-modifier can appear on the directive. • At most one if clause with the target directive-name-modifier can appear on the directive.
1
Syntax
#pragma omp target parallel [clause[[,]clause]…]new-line structured-block
!$omp target parallel [clause[[,]clause]…]
structured-block
!$omp end target parallel
204
OpenMP API – Version 5.0 November 2018
1 Cross References
2 • parallel construct, see Section 2.6 on page 74.
3 • target construct, see Section 2.12.5 on page 170.
4 • if Clause, see Section 2.15 on page 220.
5 • Data attribute clauses, see Section 2.19.4 on page 282.
6 2.13.17 Target Parallel Worksharing-Loop Construct
7 Summary
8 The target parallel worksharing-loop construct is a shortcut for specifying a target construct
9 containing a parallel worksharing-loop construct and no other statements.
10 Syntax
11 The syntax of the target parallel worksharing-loop construct is as follows:
12 13
14 where clause can be any of the clauses accepted by the target or parallel for directives,
15 except for copyin, with identical meanings and restrictions.
C / C++ Fortran
16 The syntax of the target parallel worksharing-loop construct is as follows:
17 18 19
20 where clause can be any of the clauses accepted by the target or parallel do directives,
21 except for copyin, with identical meanings and restrictions.
22 If an end target parallel do directive is not specified, an end target parallel do
23 directive is assumed at the end of the do-loops.
Fortran
C / C++
#pragma omp target parallel for [clause[[,]clause]…]new-line for-loops
!$omp target parallel do [clause[[,]clause]…] do-loops
[!$omp end target parallel do]
CHAPTER2. DIRECTIVES 205
1 Description
2 The semantics are identical to explicitly specifying a target directive immediately followed by a
3 parallel worksharing-loop directive.
4 Restrictions
5 The restrictions for the target and parallel worksharing-loop constructs apply except for the
6 following explicit modifications:
7 • 8
9 •
10 •
11 •
If any if clause on the directive includes a directive-name-modifier then all if clauses on the directive must include a directive-name-modifier.
At most one if clause without a directive-name-modifier can appear on the directive.
At most one if clause with the parallel directive-name-modifier can appear on the directive. At most one if clause with the target directive-name-modifier can appear on the directive.
12 Cross References
13 • target construct, see Section 2.12.5 on page 170.
14 • Parallel Worksharing-Loop construct, see Section 2.13.1 on page 185.
15 • if Clause, see Section 2.15 on page 220.
16 • Data attribute clauses, see Section 2.19.4 on page 282.
17 2.13.18 Target Parallel Worksharing-Loop SIMD Construct
18
19 20
Summary
The target parallel worksharing-loop SIMD construct is a shortcut for specifying a target construct containing a parallel worksharing-loop SIMD construct and no other statements.
206
OpenMP API – Version 5.0 November 2018
1 Syntax
2 The syntax of the target parallel worksharing-loop SIMD construct is as follows:
3 4 5
6 where clause can be any of the clauses accepted by the target or parallel for simd
7 directives, except for copyin, with identical meanings and restrictions.
C / C++ Fortran
8 The syntax of the target parallel worksharing-loop SIMD construct is as follows:
9 10 11
12 where clause can be any of the clauses accepted by the target or parallel do simd
13 directives, except for copyin, with identical meanings and restrictions.
14 If an end target parallel do simd directive is not specified, an end target parallel
15 do simd directive is assumed at the end of the do-loops.
Fortran
16 Description
17 The semantics are identical to explicitly specifying a target directive immediately followed by a
18 parallel worksharing-loop SIMD directive.
19 Restrictions
20 The restrictions for the target and parallel worksharing-loop SIMD constructs apply except for
21 the following explicit modifications:
C / C++
#pragma omp target parallel for simd \ [clause[[,] clause] … ] new-line
for-loops
!$omp target parallel do simd [clause[[,]clause]…] do-loops
[!$omp end target parallel do simd]
22 • 23
24 •
25 •
26 •
If any if clause on the directive includes a directive-name-modifier then all if clauses on the directive must include a directive-name-modifier.
At most one if clause without a directive-name-modifier can appear on the directive.
At most one if clause with the parallel directive-name-modifier can appear on the directive. At most one if clause with the target directive-name-modifier can appear on the directive.
CHAPTER2. DIRECTIVES 207
7
8 9
10
11
12 13
14 15
16
17 18 19
20 21
22 23 24
Summary
The target parallel loop construct is a shortcut for specifying a target construct containing a parallel loop construct and no other statements.
Syntax
C / C++
The syntax of the target parallel loop construct is as follows:
where clause can be any of the clauses accepted by the target or parallel loop directives with identical meanings and restrictions.
C / C++ Fortran
The syntax of the target parallel loop construct is as follows:
where clause can be any of the clauses accepted by the teams or parallel loop directives with identical meanings and restrictions.
If an end target parallel loop directive is not specified, an end target parallel loop directive is assumed at the end of the do-loops. nowait may not be specified on an end target parallel loop directive.
Fortran
1 Cross References
2 • target construct, see Section 2.12.5 on page 170.
3 • Parallel worksharing-loop SIMD construct, see Section 2.13.5 on page 190.
4 • if Clause, see Section 2.15 on page 220.
5 • Data attribute clauses, see Section 2.19.4 on page 282.
6 2.13.19 target parallel loop Construct
#pragma omp target parallel loop [clause[[,]clause]…]new-line for-loops
!$omp target parallel loop [clause[[,]clause]…] do-loops
[!$omp end target parallel loop]
208
OpenMP API – Version 5.0 November 2018
1 Description
2 The semantics are identical to explicitly specifying a target directive immediately followed by a
3 parallel loop directive.
4 Restrictions
5 The restrictions for the target and parallel loop constructs apply.
6 Cross References
7 • target construct, see Section 2.12.5 on page 170.
8 • parallel loop construct, see Section 2.13.2 on page 186.
9 • Data attribute clauses, see Section 2.19.4 on page 282.
10 2.13.20 target simd Construct
11 Summary
12 The target simd construct is a shortcut for specifying a target construct containing a simd
13 construct and no other statements.
14 Syntax
15 The syntax of the target simd construct is as follows:
16 17
18 where clause can be any of the clauses accepted by the target or simd directives with identical
19 meanings and restrictions.
C / C++
C / C++
#pragma omp target simd [clause[[,]clause]…]new-line
for-loops
CHAPTER2. DIRECTIVES 209
Fortran
1 The syntax of the target simd construct is as follows:
2 3 4
5 where clause can be any of the clauses accepted by the target or simd directives with identical
6 meanings and restrictions.
7 If an end target simd directive is not specified, an end target simd directive is assumed at
8 the end of the do-loops.
Fortran
9 Description
10 The semantics are identical to explicitly specifying a target directive immediately followed by a
11 simd directive.
12 Restrictions
13 The restrictions for the target and simd constructs apply.
14 Cross References
15 • simd construct, see Section 2.9.3.1 on page 110.
16 • target construct, see Section 2.12.5 on page 170.
17 • Data attribute clauses, see Section 2.19.4 on page 282.
18 2.13.21 target teams Construct
19
20 21
Summary
The target teams construct is a shortcut for specifying a target construct containing a teams construct and no other statements.
210
OpenMP API – Version 5.0 November 2018
!$omp target simd [clause[[,]clause]…] do-loops
[!$omp end target simd]
1 Syntax
2 The syntax of the target teams construct is as follows:
3 4
5 where clause can be any of the clauses accepted by the target or teams directives with identical
6 meanings and restrictions.
C / C++ Fortran
7 The syntax of the target teams construct is as follows:
8
9 10
11 where clause can be any of the clauses accepted by the target or teams directives with identical
12 meanings and restrictions.
Fortran
13 Description
14 The semantics are identical to explicitly specifying a target directive immediately followed by a
15 teams directive.
16 Restrictions
17 The restrictions for the target and teams constructs apply.
18 Cross References
19 • teams construct, see Section 2.7 on page 82.
20 • target construct, see Section 2.12.5 on page 170.
21 • Data attribute clauses, see Section 2.19.4 on page 282.
22 2.13.22 target teams distribute Construct
23 Summary
24 The target teams distribute construct is a shortcut for specifying a target construct
25 containing a teams distribute construct and no other statements.
C / C++
#pragma omp target teams [clause[[,]clause]…]new-line structured-block
!$omp target teams [clause[[,]clause]…] structured-block
!$omp end target teams
CHAPTER2. DIRECTIVES 211
2
3 4
5 6
7
8
9 10
11 12
13 14
15
16 17
18 19
20 21 22 23
C / C++
The syntax of the target teams distribute construct is as follows:
where clause can be any of the clauses accepted by the target or teams distribute directives with identical meanings and restrictions.
C / C++ Fortran
The syntax of the target teams distribute construct is as follows:
where clause can be any of the clauses accepted by the target or teams distribute directives with identical meanings and restrictions.
If an end target teams distribute directive is not specified, an end target teams distribute directive is assumed at the end of the do-loops.
Fortran
Description
The semantics are identical to explicitly specifying a target directive immediately followed by a teams distribute directive.
Restrictions
The restrictions for the target and teams distribute constructs.
Cross References
• target construct, see Section 2.12.2 on page 161.
• teams distribute construct, see Section 2.13.11 on page 197. • Data attribute clauses, see Section 2.19.4 on page 282.
1
Syntax
#pragma omp target teams distribute [clause[[,]clause]…]new-line for-loops
!$omp target teams distribute [clause[[,]clause]…]
do-loops
[!$omp end target teams distribute]
212
OpenMP API – Version 5.0 November 2018
1 2.13.23 target teams distribute simd Construct
2 Summary
3 The target teams distribute simd construct is a shortcut for specifying a target
4 construct containing a teams distribute simd construct and no other statements.
5 Syntax
6 The syntax of the target teams distribute simd construct is as follows:
7 8 9
10 where clause can be any of the clauses accepted by the target or teams distribute simd
11 directives with identical meanings and restrictions.
C / C++ Fortran
12 The syntax of the target teams distribute simd construct is as follows:
13 14 15
16 where clause can be any of the clauses accepted by the target or teams distribute simd
17 directives with identical meanings and restrictions.
18 If an end target teams distribute simd directive is not specified, an end target
19 teams distribute simd directive is assumed at the end of the do-loops.
Fortran
20 Description
21 The semantics are identical to explicitly specifying a target directive immediately followed by a
22 teams distribute simd directive.
23 Restrictions
24 The restrictions for the target and teams distribute simd constructs apply.
C / C++
#pragma omp target teams distribute simd \ [clause[ [,] clause] … ] new-line
for-loops
!$omp target teams distribute simd [clause[[,]clause]…] do-loops
[!$omp end target teams distribute simd]
CHAPTER2. DIRECTIVES 213
6
7 8
9
10
11 12
13 14
15
16 17 18
19 20
21 22
23
24 25
Summary
The target teams loop construct is a shortcut for specifying a target construct containing a teams loop construct and no other statements.
Syntax
C / C++
The syntax of the target teams loop construct is as follows:
where clause can be any of the clauses accepted by the target or teams loop directives with identical meanings and restrictions.
C / C++ Fortran
The syntax of the target teams loop construct is as follows:
where clause can be any of the clauses accepted by the target or teams loop directives with identical meanings and restrictions.
If an end target teams loop directive is not specified, an end target teams loop directive is assumed at the end of the do-loops.
Fortran
Description
The semantics are identical to explicitly specifying a target directive immediately followed by a teams loop directive.
1 Cross References
2 • target construct, see Section 2.12.2 on page 161.
3 • teams distribute simd construct, see Section 2.13.12 on page 198.
4 • Data attribute clauses, see Section 2.19.4 on page 282.
5 2.13.24 target teams loop Construct
#pragma omp target teams loop [clause[[,]clause]…]new-line for-loops
!$omp target teams loop [clause[[,]clause]…]
do-loops
[!$omp end target teams loop]
214
OpenMP API – Version 5.0 November 2018
1 2
3 4 5 6
7 2.13.25 8
9
10 11 12
13
14
15 16 17
18 19
20
21 22 23
24 25
26 27 28
Restrictions
The restrictions for the target and teams loop constructs.
Cross References
• target construct, see Section 2.12.5 on page 170.
• Teams loop construct, see Section 2.13.15 on page 202. • Data attribute clauses, see Section 2.19.4 on page 282.
Target Teams Distribute Parallel Worksharing-Loop Construct
Summary
The target teams distribute parallel worksharing-loop construct is a shortcut for specifying a target construct containing a teams distribute parallel worksharing-loop construct and no other statements.
Syntax
C / C++
The syntax of the target teams distribute parallel worksharing-loop construct is as follows:
where clause can be any of the clauses accepted by the target or teams distribute parallel for directives with identical meanings and restrictions.
C / C++ Fortran
The syntax of the target teams distribute parallel worksharing-loop construct is as follows:
where clause can be any of the clauses accepted by the target or teams distribute parallel do directives with identical meanings and restrictions.
If an end target teams distribute parallel do directive is not specified, an end target teams distribute parallel do directive is assumed at the end of the do-loops.
Fortran
#pragma omp target teams distribute parallel for \ [clause[ [,] clause] … ] new-line
for-loops
!$omp target teams distribute parallel do [clause[[,]clause]…]
do-loops
[!$omp end target teams distribute parallel do]
CHAPTER2. DIRECTIVES 215
1
Description
The semantics are identical to explicitly specifying a target directive immediately followed by a teams distribute parallel worksharing-loop directive.
Restrictions
The restrictions for the target and teams distribute parallel worksharing-loop constructs apply except for the following explicit modifications:
• If any if clause on the directive includes a directive-name-modifier then all if clauses on the directive must include a directive-name-modifier.
• At most one if clause without a directive-name-modifier can appear on the directive.
• At most one if clause with the parallel directive-name-modifier can appear on the directive.
• At most one if clause with the target directive-name-modifier can appear on the directive.
Cross References
• target construct, see Section 2.12.5 on page 170.
• Teams distribute parallel worksharing-loop construct, see Section 2.13.13 on page 200. • if Clause, see Section 2.15 on page 220.
• Data attribute clauses, see Section 2.19.4 on page 282.
Target Teams Distribute Parallel Worksharing-Loop SIMD Construct
Summary
The target teams distribute parallel worksharing-loop SIMD construct is a shortcut for specifying a target construct containing a teams distribute parallel worksharing-loop SIMD construct and no other statements.
2 3
4
5 6
7 8
9 10 11
12 13 14 15 16
17 2.13.26 18
19
20 21 22
216
OpenMP API – Version 5.0 November 2018
1 Syntax
2 The syntax of the target teams distribute parallel worksharing-loop SIMD construct is as follows:
3 4 5
6 where clause can be any of the clauses accepted by the target or teams distribute
7 parallel for simd directives with identical meanings and restrictions.
C / C++ Fortran
8 The syntax of the target teams distribute parallel worksharing-loop SIMD construct is as follows:
9 10 11
12 where clause can be any of the clauses accepted by the target or teams distribute
13 parallel do simd directives with identical meanings and restrictions.
14 If an end target teams distribute parallel do simd directive is not specified, an
15 end target teams distribute parallel do simd directive is assumed at the end of the
16 do-loops.
Fortran
17 Description
18 The semantics are identical to explicitly specifying a target directive immediately followed by a
19 teams distribute parallel worksharing-loop SIMD directive.
20 Restrictions
21 The restrictions for the target and teams distribute parallel worksharing-loop SIMD constructs
22 apply except for the following explicit modifications:
C / C++
#pragma omp target teams distribute parallel for simd \ [clause[ [,] clause] … ] new-line
for-loops
!$omp target teams distribute parallel do simd [clause[[,]clause]…] do-loops
[!$omp end target teams distribute parallel do simd]
23 • 24
25 •
26 •
27 •
If any if clause on the directive includes a directive-name-modifier then all if clauses on the directive must include a directive-name-modifier.
At most one if clause without a directive-name-modifier can appear on the directive.
At most one if clause with the parallel directive-name-modifier can appear on the directive. At most one if clause with the target directive-name-modifier can appear on the directive.
CHAPTER2. DIRECTIVES 217
1 2 3 4 5
6 2.14
7 8 9
10 11 12
13
14 15
16 17
18 19
20 21
22 23
24 25 26
27 28
Cross References
• target construct, see Section 2.12.5 on page 170.
• Teams distribute parallel worksharing-loop SIMD construct, see Section 2.13.14 on page 201. • if Clause, see Section 2.15 on page 220.
• Data attribute clauses, see Section 2.19.4 on page 282.
Clauses on Combined and Composite Constructs
This section specifies the handling of clauses on combined or composite constructs and the handling of implicit clauses from variables with predetermined data sharing if they are not predetermined only on a particular construct. Some clauses are permitted only on a single construct of the constructs that constitute the combined or composite construct, in which case the effect is as if the clause is applied to that specific construct. As detailed in this section, other clauses have the effect as if they are applied to one or more constituent constructs.
The collapse clause is applied once to the combined or composite construct.
The effect of the private clause is as if it is applied only to the innermost constituent construct
that permits it.
The effect of the firstprivate clause is as if it is applied to one or more constructs as follows:
• To the distribute construct if it is among the constituent constructs;
• To the teams construct if it is among the constituent constructs and the distribute construct is not;
• To the worksharing-loop construct if it is among the constituent constructs;
• To the taskloop construct if it is among the constituent constructs;
• To the parallel construct if it is among the constituent constructs and the worksharing-loop construct or the taskloop construct is not;
• To the outermost constituent construct if not already applied to it by the above rules and the outermost constituent construct is not a teams construct, a parallel construct, a master construct, or a target construct; and
• To the target construct if it is among the constituent constructs and the same list item does not appear in a lastprivate or map clause.
218
OpenMP API – Version 5.0 November 2018
1 If the parallel construct is among the constituent constructs and the effect is not as if the
2 firstprivate clause is applied to it by the above rules, then the effect is as if the shared
3 clause with the same list item is applied to the parallel construct. If the teams construct is
4 among the constituent constructs and the effect is not as if the firstprivate clause is applied to
5 it by the above rules, then the effect is as if the shared clause with the same list item is applied to
6 the teams construct.
7 The effect of the lastprivate clause is as if it is applied to one or more constructs as follows:
8 • To the worksharing-loop construct if it is among the constituent constructs;
9 • To the taskloop construct if it is among the constituent constructs;
10 • To the distribute construct if it is among the constituent constructs; and
11 • To the innermost constituent construct that permits it unless it is a worksharing-loop or
12 distribute construct.
13 If the parallel construct is among the constituent constructs and the list item is not also specified
14 in the firstprivate clause, then the effect of the lastprivate clause is as if the shared
15 clause with the same list item is applied to the parallel construct. If the teams construct is
16 among the constituent constructs and the list item is not also specified in the firstprivate
17 clause, then the effect of the lastprivate clause is as if the shared clause with the same list
18 item is applied to the teams construct. If the target construct is among the constituent
19 constructs and the list item is not specified in a map clause, the effect of the lastprivate clause
20 is as if the same list item appears in a map clause with a map-type of tofrom.
21 The effect of the shared, default, order, or allocate clause is as if it is applied to all
22 constituent constructs that permit the clause.
23 The effect of the reduction clause is as if it is applied to all constructs that permit the clause,
24 except for the following constructs:
25 • The parallel construct, when combined with the sections, worksharing-loop, loop, or
26 taskloop construct; and
27 • The teams construct, when combined with the loop construct.
28 For the parallel and teams constructs above, the effect of the reduction clause instead is as
29 if each list item or, for any list item that is an array item, its corresponding base array or base
30 pointer appears in a shared clause for the construct. If the task reduction-modifier is specified,
31 the effect is as if it only modifies the behavior of the reduction clause on the innermost
32 construct that constitutes the combined construct and that accepts the modifier (see Section 2.19.5.4
33 on page 300). If the inscan reduction-modifier is specified, the effect is as if it modifies the
34 behavior of the reduction clause on all constructs of the combined construct to which the clause
35 is applied and that accept the modifier. If a construct to which the inscan reduction-modifier is
36 applied is combined with the target construct, the effect is as if the same list item also appears in
37 a map clause with a map-type of tofrom.
CHAPTER2. DIRECTIVES 219
1 2 3 4
5
6 7 8 9
10 11 12 13
14 15
16
17
18
19
20
21
22
23 24 25
26 2.15 27
28 29 30
The in_reduction clause is permitted on a single construct among those that constitute the combined or composite construct and the effect is as if the clause is applied to that construct, but if that construct is a target construct, the effect is also as if the same list item appears in a map clause with a map-type of tofrom and a map-type-modifier of always.
The effect of the if clause is described in Section 2.15 on page 220.
The effect of the linear clause is as if it is applied to the innermost constituent construct. Additionally, if the list item is not the iteration variable of a simd or worksharing-loop SIMD construct, the effect on the outer constituent constructs is as if the list item was specified in firstprivate and lastprivate clauses on the combined or composite construct, with the rules specified above applied. If a list item of the linear clause is the iteration variable of a simd or worksharing-loop SIMD construct and it is not declared in the construct, the effect on the outer constituent constructs is as if the list item was specified in a lastprivate clause on the combined or composite construct with the rules specified above applied.
The effect of the nowait clause is as if it is applied to the outermost constituent construct that permits it.
If the clauses have expressions on them, such as for various clauses where the argument of the clause is an expression, or lower-bound, length, or stride expressions inside array sections (or subscript and stride expressions in subscript-triplet for Fortran), or linear-step or alignment expressions, the expressions are evaluated immediately before the construct to which the clause has been split or duplicated per the above rules (therefore inside of the outer constituent constructs). However, the expressions inside the num_teams and thread_limit clauses are always evaluated before the outermost constituent construct.
The restriction that a list item may not appear in more than one data sharing clause with the exception of specifying a variable in both firstprivate and lastprivate clauses applies after the clauses are split or duplicated per the above rules.
if Clause Summary
The semantics of an if clause are described in the section on the construct to which it applies. The if clause directive-name-modifier names the associated construct to which an expression applies, and is particularly useful for composite and combined constructs.
220
OpenMP API – Version 5.0 November 2018
1
Syntax
2 3
4 5
6
7 8 9
10 11
12 2.16 13
14 15
16
17 18
19
20 21 22
C / C++
The syntax of the if clause is as follows:
if([ directive-name-modifier :] scalar-expression)
C / C++ Fortran
The syntax of the if clause is as follows:
if([ directive-name-modifier :] scalar-logical-expression)
Fortran
Description
The effect of the if clause depends on the construct to which it is applied. For combined or composite constructs, the if clause only applies to the semantics of the construct named in the directive-name-modifier if one is specified. If no directive-name-modifier is specified for a combined or composite construct then the if clause applies to all constructs to which an if clause can apply.
master Construct Summary
The master construct specifies a structured block that is executed by the master thread of the team. Syntax
The syntax of the master construct is as follows:
C / C++ Fortran
The syntax of the master construct is as follows:
Fortran
C / C++
#pragma omp master new-line structured-block
!$omp master
structured-block
!$omp end master
CHAPTER2. DIRECTIVES 221
2 3
4
5 6 7 8
9
10 11
12 13
14
15
16
17
18
19
20
21
22 23
24 25 26 27
1
Binding
The binding thread set for a master region is the current team. A master region binds to the innermost enclosing parallel region.
Description
Only the master thread of the team that executes the binding parallel region participates in the execution of the structured block of the master region. Other threads in the team do not execute the associated structured block. There is no implied barrier either on entry to, or exit from, the master construct.
Execution Model Events
The master-begin event occurs in the master thread of a team that encounters the master construct on entry to the master region.
The master-end event occurs in the master thread of a team that encounters the master construct on exit from the master region.
Tool Callbacks
A thread dispatches a registered ompt_callback_master callback with ompt_scope_begin as its endpoint argument for each occurrence of a master-begin event in that thread. Similarly, a thread dispatches a registered ompt_callback_master callback with ompt_scope_end as its endpoint argument for each occurrence of a master-end event in that thread. These callbacks occur in the context of the task executed by the master thread and have the type signature ompt_callback_master_t.
Restrictions
C++
• A throw executed inside a master region must cause execution to resume within the same master region, and the same thread that threw the exception must catch it
C++
Cross References
• parallel construct, see Section 2.6 on page 74.
• ompt_scope_begin and ompt_scope_end, see Section 4.4.4.11 on page 443. • ompt_callback_master_t, see Section 4.5.2.12 on page 473.
222
OpenMP API – Version 5.0 November 2018
1 2.17
2 3 4
5 6 7 8
9 2.17.1 10
11 12
13
14
15 16
17 18
19
20 21 22
23 24 25
Synchronization Constructs and Clauses
A synchronization construct orders the completion of code executed by different threads. This ordering is imposed by synchronizing flush operations that are executed as part of the region that corresponds to the construct.
Synchronization through the use of synchronizing flush operations and atomic operations is described in Section 1.4.4 on page 25 and Section 1.4.6 on page 28. Section 2.17.8.1 on page 246 defines the behavior of synchronizing flush operations that are implied at various other locations in an OpenMP program.
critical Construct Summary
The critical construct restricts execution of the associated structured block to a single thread at a time.
Syntax
C / C++
The syntax of the critical construct is as follows:
where hint-expression is an integer constant expression that evaluates to a valid synchronization hint (as described in Section 2.17.12 on page 260).
C / C++ Fortran
The syntax of the critical construct is as follows:
where hint-expression is a constant expression that evaluates to a scalar value with kind omp_sync_hint_kind and a value that is a valid synchronization hint (as described in Section 2.17.12 on page 260).
Fortran
#pragma omp critical [(name) [[,] hint(hint-expression)]]new-line structured-block
!$omp critical [(name) [[,] hint(hint-expression)]] structured-block
!$omp end critical [(name)]
CHAPTER2. DIRECTIVES 223
1 2
3
4 5 6 7 8
9 10
11 12
13 14 15 16
17 18 19 20 21
22
23 24
25 26
27 28
Binding
The binding thread set for a critical region is all threads in the contention group. Description
The region that corresponds to a critical construct is executed as if only a single thread at a time among all threads in the contention group enters the region for execution, without regard to the team(s) to which the threads belong. An optional name may be used to identify the critical construct. All critical constructs without a name are considered to have the same unspecified name.
C / C++
Identifiers used to identify a critical construct have external linkage and are in a name space that is separate from the name spaces used by labels, tags, members, and ordinary identifiers.
C / C++ Fortran
The names of critical constructs are global entities of the program. If a name conflicts with any other entity, the behavior of the program is unspecified.
Fortran
The threads of a contention group execute the critical region as if only one thread of the contention group executes the critical region at a time. The critical construct enforces these execution semantics with respect to all critical constructs with the same name in all threads in the contention group.
If present, the hint clause gives the implementation additional information about the expected runtime properties of the critical region that can optionally be used to optimize the implementation. The presence of a hint clause does not affect the isolation guarantees provided by the critical construct. If no hint clause is specified, the effect is as if hint(omp_sync_hint_none) had been specified.
Execution Model Events
The critical-acquiring event occurs in a thread that encounters the critical construct on entry to the critical region before initiating synchronization for the region.
The critical-acquired event occurs in a thread that encounters the critical construct after it enters the region, but before it executes the structured block of the critical region.
The critical-released event occurs in a thread that encounters the critical construct after it completes any synchronization on exit from the critical region.
224
OpenMP API – Version 5.0 November 2018
1 Tool Callbacks
2 A thread dispatches a registered ompt_callback_mutex_acquire callback for each
3 occurrence of a critical-acquiring event in that thread. This callback has the type signature
4 ompt_callback_mutex_acquire_t.
5 A thread dispatches a registered ompt_callback_mutex_acquired callback for each
6 occurrence of a critical-acquired event in that thread. This callback has the type signature
7 ompt_callback_mutex_t.
8 A thread dispatches a registered ompt_callback_mutex_released callback for each
9 occurrence of a critical-released event in that thread. This callback has the type signature
10 ompt_callback_mutex_t.
11 The callbacks occur in the task that encounters the critical construct. The callbacks should receive
12 ompt_mutex_critical as their kind argument if practical, but a less specific kind is
13 acceptable.
14 Restrictions
15 The following restrictions apply to the critical construct:
16 • 17
18 • 19
20 • 21
22 • 23
24 • 25
Unless the effect is as if hint(omp_sync_hint_none) was specified, the critical construct must specify a name.
If the hint clause is specified, each of the critical constructs with the same name must have a hint clause for which the hint-expression evaluates to the same value.
C++
A throw executed inside a critical region must cause execution to resume within the same critical region, and the same thread that threw the exception must catch it.
C++
Fortran
If a name is specified on a critical directive, the same name must also be specified on the
end critical directive.
If no name appears on the critical directive, no name can appear on the end critical directive.
Fortran
CHAPTER2. DIRECTIVES 225
1 2 3 4 5
6 2.17.2 7
8 9
10
11 12
13 14
15
16 17
18
19 20 21
22
Cross References
• Synchronization Hints, see Section 2.17.12 on page 260.
• ompt_mutex_critical, see Section 4.4.4.16 on page 445.
• ompt_callback_mutex_acquire_t, see Section 4.5.2.14 on page 476. • ompt_callback_mutex_t, see Section 4.5.2.15 on page 477.
barrier Construct Summary
The barrier construct specifies an explicit barrier at the point at which the construct appears. The barrier construct is a stand-alone directive.
Syntax
C / C++
The syntax of the barrier construct is as follows: #pragma omp barrier new-line
C / C++ Fortran
The syntax of the barrier construct is as follows: !$omp barrier
Fortran
Binding
The binding thread set for a barrier region is the current team. A barrier region binds to the innermost enclosing parallel region.
Description
All threads of the team that is executing the binding parallel region must execute the barrier region and complete execution of all explicit tasks bound to this parallel region before any are allowed to continue execution beyond the barrier.
The barrier region includes an implicit task scheduling point in the current task region.
226
OpenMP API – Version 5.0 November 2018
1 Execution Model Events
2 The explicit-barrier-begin event occurs in each thread that encounters the barrier construct on
3 entry to the barrier region.
4 The explicit-barrier-wait-begin event occurs when a task begins an interval of active or passive
5 waiting in a barrier region.
6 The explicit-barrier-wait-end event occurs when a task ends an interval of active or passive waiting
7 and resumes execution in a barrier region.
8 The explicit-barrier-end event occurs in each thread that encounters the barrier construct after
9 the barrier synchronization on exit from the barrier region.
10 A cancellation event occurs if cancellation is activated at an implicit cancellation point in a
11 barrier region.
12 Tool Callbacks
13 A thread dispatches a registered ompt_callback_sync_region callback with
14 ompt_sync_region_barrier_explicit — or ompt_sync_region_barrier, if the
15 implementation cannot make a distinction — as its kind argument and ompt_scope_begin as
16 its endpoint argument for each occurrence of an explicit-barrier-begin event in the task that
17 encounters the barrier construct. Similarly, a thread dispatches a registered
18 ompt_callback_sync_region callback with
19 ompt_sync_region_barrier_explicit — or ompt_sync_region_barrier, if the
20 implementation cannot make a distinction — as its kind argument and ompt_scope_end as its
21 endpoint argument for each occurrence of an explicit-barrier-end event in the task that encounters
22 the barrier construct. These callbacks occur in the task that encounters the barrier construct
23 and have the type signature ompt_callback_sync_region_t.
24 A thread dispatches a registered ompt_callback_sync_region_wait callback with
25 ompt_sync_region_barrier_explicit — or ompt_sync_region_barrier, if the
26 implementation cannot make a distinction — as its kind argument and ompt_scope_begin as
27 its endpoint argument for each occurrence of an explicit-barrier-wait-begin event. Similarly, a
28 thread dispatches a registered ompt_callback_sync_region_wait callback with
29 ompt_sync_region_barrier_explicit — or ompt_sync_region_barrier, if the
30 implementation cannot make a distinction — as its kind argument and ompt_scope_end as its
31 endpoint argument for each occurrence of an explicit-barrier-wait-end event. These callbacks
32 occur in the context of the task that encountered the barrier construct and have type signature
33 ompt_callback_sync_region_t.
34 A thread dispatches a registered ompt_callback_cancel callback with
35 ompt_cancel_detected as its flags argument for each occurrence of a cancellation event in
36 that thread. The callback occurs in the context of the encountering task. The callback has type
37 signature ompt_callback_cancel_t.
CHAPTER2. DIRECTIVES 227
1 2
3 4
5 6
7 8 9
10 11
12 2.17.3
13 14 15 16
17
18 19
20 21
22 23
24 25
26 27
Restrictions
The following restrictions apply to the barrier construct:
• Each barrier region must be encountered by all threads in a team or by none at all, unless
cancellation has been requested for the innermost enclosing parallel region.
• The sequence of worksharing regions and barrier regions encountered must be the same for every thread in a team.
Cross References
• ompt_scope_begin and ompt_scope_end, see Section 4.4.4.11 on page 443. • ompt_sync_region_barrier, see Section 4.4.4.13 on page 444.
• ompt_callback_sync_region_t, see Section 4.5.2.13 on page 474.
• ompt_callback_cancel_t, see Section 4.5.2.18 on page 481.
Implicit Barriers
This section describes the OMPT events and tool callbacks associated with implicit barriers, which occur at the end of various regions as defined in the description of the constructs to which they correspond. Implicit barriers are task scheduling points. For a description of task scheduling points, associated events, and tool callbacks, see Section 2.10.6 on page 149.
Execution Model Events
The implicit-barrier-begin event occurs in each implicit task at the beginning of an implicit barrier region.
The implicit-barrier-wait-begin event occurs when a task begins an interval of active or passive waiting in an implicit barrier region.
The implicit-barrier-wait-end event occurs when a task ends an interval of active or waiting and resumes execution of an implicit barrier region.
The implicit-barrier-end event occurs in each implicit task after the barrier synchronization on exit from an implicit barrier region.
A cancellation event occurs if cancellation is activated at an implicit cancellation point in an implicit barrier region.
228
OpenMP API – Version 5.0 November 2018
1 Tool Callbacks
2 A thread dispatches a registered ompt_callback_sync_region callback with
3 ompt_sync_region_barrier_implicit — or ompt_sync_region_barrier, if the
4 implementation cannot make a distinction — as its kind argument and ompt_scope_begin as
5 its endpoint argument for each occurrence of an implicit-barrier-begin event in that thread.
6 Similarly, a thread dispatches a registered ompt_callback_sync_region callback with
7 ompt_sync_region_barrier_implicit — or ompt_sync_region_barrier, if the
8 implementation cannot make a distinction — as its kind argument and ompt_scope_end as its
9 endpoint argument for each occurrence of an implicit-barrier-end event in that thread. These
10 callbacks occur in the implicit task that executes the parallel region and have the type signature
11 ompt_callback_sync_region_t.
12 A thread dispatches a registered ompt_callback_sync_region_wait callback with
13 ompt_sync_region_barrier_implicit — or ompt_sync_region_barrier, if the
14 implementation cannot make a distinction — as its kind argument and ompt_scope_begin as
15 its endpoint argument for each occurrence of a implicit-barrier-wait-begin event in that thread.
16 Similarly, a thread dispatches a registered ompt_callback_sync_region_wait callback
17 with ompt_sync_region_barrier_explicit — or ompt_sync_region_barrier,
18 if the implementation cannot make a distinction — as its kind argument and ompt_scope_end
19 as its endpoint argument for each occurrence of an implicit-barrier-wait-end event in that thread.
20 These callbacks occur in the implicit task that executes the parallel region and have type signature
21 ompt_callback_sync_region_t.
22 A thread dispatches a registered ompt_callback_cancel callback with
23 ompt_cancel_detected as its flags argument for each occurrence of a cancellation event in
24 that thread. The callback occurs in the context of the encountering task. The callback has type
25 signature ompt_callback_cancel_t.
26 Restrictions
27 If a thread is in the state ompt_state_wait_barrier_implicit_parallel, a call to
28 ompt_get_parallel_info may return a pointer to a copy of the data object associated with
29 the parallel region rather than a pointer to the associated data object itself. Writing to the data
30 object returned by omp_get_parallel_info when a thread is in the
31 ompt_state_wait_barrier_implicit_parallel results in unspecified behavior.
32 Cross References
33 • ompt_scope_begin and ompt_scope_end, see Section 4.4.4.11 on page 443.
34 • ompt_sync_region_barrier, see Section 4.4.4.13 on page 444
35 • ompt_cancel_detected, see Section 4.4.4.24 on page 450.
36 • ompt_callback_sync_region_t, see Section 4.5.2.13 on page 474.
37 • ompt_callback_cancel_t, see Section 4.5.2.18 on page 481.
CHAPTER2. DIRECTIVES 229
1 2.17.4
Implementation-Specific Barriers
2 3 4 5 6 7 8
9 2.17.5 10
11 12
13
14 15
16 17
18 19
20 21
22
23 24
An OpenMP implementation can execute implementation-specific barriers that are not implied by the OpenMP specification; therefore, no execution model events are bound to these barriers. The implementation can handle these barriers like implicit barriers and dispatch all events as for implicit barriers. These callbacks are dispatched with ompt_sync_region_barrier_implementation — or ompt_sync_region_barrier, if the implementation cannot make a distinction — as the kind argument.
taskwait Construct Summary
The taskwait construct specifies a wait on the completion of child tasks of the current task. The taskwait construct is a stand-alone directive.
Syntax
C / C++
The syntax of the taskwait construct is as follows:
#pragma omp taskwait [clause[[,]clause]…]new-line
where clause is one of the following: depend([depend-modifier,]dependence-type : locator-list)
C / C++ Fortran
The syntax of the taskwait construct is as follows: !$omp taskwait [clause[[,]clause]…]
where clause is one of the following: depend([depend-modifier,]dependence-type : locator-list)
Fortran
Binding
The taskwait region binds to the current task region. The binding thread set of the taskwait region is the current team.
230
OpenMP API – Version 5.0 November 2018
1 Description
2 If no depend clause is present on the taskwait construct, the current task region is suspended
3 at an implicit task scheduling point associated with the construct. The current task region remains
4 suspended until all child tasks that it generated before the taskwait region complete execution.
5 Otherwise, if one or more depend clauses are present on the taskwait construct, the behavior
6 is as if these clauses were applied to a task construct with an empty associated structured block
7 that generates a mergeable and included task. Thus, the current task region is suspended until the
8 predecessor tasks of this task complete execution.
9 Execution Model Events
10 The taskwait-begin event occurs in each thread that encounters the taskwait construct on entry
11 to the taskwait region.
12 The taskwait-wait-begin event occurs when a task begins an interval of active or passive waiting in
13 a taskwait region.
14 The taskwait-wait-end event occurs when a task ends an interval of active or passive waiting and
15 resumes execution in a taskwait region.
16 The taskwait-end event occurs in each thread that encounters the taskwait construct after the
17 taskwait synchronization on exit from the taskwait region.
18 Tool Callbacks
19 A thread dispatches a registered ompt_callback_sync_region callback with
20 ompt_sync_region_taskwait as its kind argument and ompt_scope_begin as its
21 endpoint argument for each occurrence of a taskwait-begin event in the task that encounters the
22 taskwait construct. Similarly, a thread dispatches a registered
23 ompt_callback_sync_region callback with ompt_sync_region_taskwait as its
24 kind argument and ompt_scope_end as its endpoint argument for each occurrence of a
25 taskwait-end event in the task that encounters the taskwait construct. These callbacks occur in
26 the task that encounters the taskwait construct and have the type signature
27 ompt_callback_sync_region_t.
28 A thread dispatches a registered ompt_callback_sync_region_wait callback with
29 ompt_sync_region_taskwait as its kind argument and ompt_scope_begin as its
30 endpoint argument for each occurrence of a taskwait-wait-begin event. Similarly, a thread
31 dispatches a registered ompt_callback_sync_region_wait callback with
32 ompt_sync_region_taskwait as its kind argument and ompt_scope_end as its endpoint
33 argument for each occurrence of a taskwait-wait-end event. These callbacks occur in the context of
34 the task that encounters the taskwait construct and have type signature
35 ompt_callback_sync_region_t.
CHAPTER2. DIRECTIVES 231
1 2
3 4
5 6
7 8 9
10 11 12 13
14 2.17.6 15
16 17
18
19
20 21
22
23 24
Restrictions
The following restrictions apply to the taskwait construct:
• The mutexinoutset dependence-type may not appear in a depend clause on a taskwait
construct.
• If the dependence-type of a depend clause is depobj then the dependence objects cannot
represent dependences of the mutexinoutset dependence type.
Cross References
• task construct, see Section 2.10.1 on page 135.
• Task scheduling, see Section 2.10.6 on page 149.
• depend clause, see Section 2.17.11 on page 255.
• ompt_scope_begin and ompt_scope_end, see Section 4.4.4.11 on page 443. • ompt_sync_region_taskwait, see Section 4.4.4.13 on page 444.
• ompt_callback_sync_region_t, see Section 4.5.2.13 on page 474.
taskgroup Construct Summary
The taskgroup construct specifies a wait on completion of child tasks of the current task and their descendent tasks.
Syntax
C / C++
The syntax of the taskgroup construct is as follows:
where clause is one of the following:
232
OpenMP API – Version 5.0 November 2018
#pragma omp taskgroup [clause[[,]clause]…] new-line structured-block
task_reduction(reduction-identifier : list) allocate([allocator: ]list)
C / C++
Fortran
1 The syntax of the taskgroup construct is as follows:
!$omp taskgroup [clause[[,]clause]…] structured-block
!$omp end taskgroup
2 3 4
5 where clause is one of the following: 6
7
task_reduction(reduction-identifier : list) allocate([allocator: ]list)
Fortran
8 Binding
9 The binding task set of a taskgroup region is all tasks of the current team that are generated in
10 the region. A taskgroup region binds to the innermost enclosing parallel region.
11 Description
12 When a thread encounters a taskgroup construct, it starts executing the region. All child tasks
13 generated in the taskgroup region and all of their descendants that bind to the same parallel
14 region as the taskgroup region are part of the taskgroup set associated with the taskgroup
15 region.
16 There is an implicit task scheduling point at the end of the taskgroup region. The current task is
17 suspended at the task scheduling point until all tasks in the taskgroup set complete execution.
18 Execution Model Events
19 The taskgroup-begin event occurs in each thread that encounters the taskgroup construct on
20 entry to the taskgroup region.
21 The taskgroup-wait-begin event occurs when a task begins an interval of active or passive waiting
22 in a taskgroup region.
23 The taskgroup-wait-end event occurs when a task ends an interval of active or passive waiting and
24 resumes execution in a taskgroup region.
25 The taskgroup-end event occurs in each thread that encounters the taskgroup construct after the
26 taskgroup synchronization on exit from the taskgroup region.
CHAPTER2. DIRECTIVES 233
1
Tool Callbacks
A thread dispatches a registered ompt_callback_sync_region callback with ompt_sync_region_taskgroup as its kind argument and ompt_scope_begin as its endpoint argument for each occurrence of a taskgroup-begin event in the task that encounters the taskgroup construct. Similarly, a thread dispatches a registered ompt_callback_sync_region callback with ompt_sync_region_taskgroup as its kind argument and ompt_scope_end as its endpoint argument for each occurrence of a taskgroup-end event in the task that encounters the taskgroup construct. These callbacks occur in the task that encounters the taskgroup construct and have the type signature ompt_callback_sync_region_t.
A thread dispatches a registered ompt_callback_sync_region_wait callback with ompt_sync_region_taskgroup as its kind argument and ompt_scope_begin as its endpoint argument for each occurrence of a taskgroup-wait-begin event. Similarly, a thread dispatches a registered ompt_callback_sync_region_wait callback with ompt_sync_region_taskgroup as its kind argument and ompt_scope_end as its endpoint argument for each occurrence of a taskgroup-wait-end event. These callbacks occur in the context of the task that encounters the taskgroup construct and have type signature ompt_callback_sync_region_t.
Cross References
• Task scheduling, see Section 2.10.6 on page 149.
• task_reduction Clause, see Section 2.19.5.5 on page 303.
• ompt_scope_begin and ompt_scope_end, see Section 4.4.4.11 on page 443. • ompt_sync_region_taskgroup, see Section 4.4.4.13 on page 444.
• ompt_callback_sync_region_t, see Section 4.5.2.13 on page 474.
atomic Construct Summary
The atomic construct ensures that a specific storage location is accessed atomically, rather than exposing it to the possibility of multiple, simultaneous reading and writing threads that may result in indeterminate values.
2 3 4 5 6 7 8 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25 2.17.7 26
27 28 29
234
OpenMP API – Version 5.0 November 2018
1 Syntax
2 In the following syntax, atomic-clause is a clause that indicates the semantics for which atomicity is
3 enforced, memory-order-clause is a clause that indicates the memory ordering behavior of the
4 construct and clause is a clause other than atomic-clause. Specifically, atomic-clause is one of the
5 following:
6 7 8 9
10 memory-order-clause is one of the following:
11 12 13 14 15
16 and clause is either memory-order-clause or one of the following:
17 hint(hint-expression)
C / C++
18 The syntax of the atomic construct takes one of the following forms:
19 20 21
22 or 23
24
25 or
26 27 28
29 where expression-stmt is an expression statement with one of the following forms:
30 • If atomic-clause is read:
31 v = x;
read
write
update
capture
seq_cst
acq_rel
release
acquire
relaxed
#pragma omp atomic [clause[[[,]clause]…][,]] atomic-clause [[,]clause[[[,]clause]…]] new-line
expression-stmt
#pragma omp atomic [clause[[,]clause]…] new-line expression-stmt
#pragma omp atomic [clause[[[,]clause]…][,]] capture [[,]clause[[[,]clause]…]] new-line
structured-block
CHAPTER2. DIRECTIVES 235
C/C++ (cont.)
1 2
3
4 5 6 7 8 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35 36
37 38
• •
If atomic-clause is write: x = expr;
If atomic-clause is update or not present:
x++;
x–;
++x;
–x;
x binop= expr;
x = x binop expr; x = expr binop x;
•
If atomic-clause is capture:
v = x++; v = x–;
v = ++x;
v = –x;
v = x binop= expr;
v = x = x binop expr; v = x = expr binop x;
and where structured-block is a structured block with one of the following forms:
{ v = x; x binop= expr; }
{ x binop= expr; v = x; }
{ v = x; x = x binop expr; } { v = x; x = expr binop x; } { x = x binop expr; v = x; }
{ x = expr binop x; v = x; } { v = x; x = expr; }
{ v = x; x++; }
{ v = x; ++x; }
{ ++x; v = x; } { x++; v = x; }
{ v = x; x–; } { v = x; –x; } { –x; v = x; } { x–; v = x; }
236
OpenMP API – Version 5.0 November 2018
In the preceding expressions:
• •
x and v (as applicable) are both l-value expressions with scalar type.
During the execution of an atomic region, multiple syntactic occurrences of x must designate the
same storage location.
1 •
2 •
3 •
4 •
5 •
6 •
7 8
9 • 10
11
12 • 13
14 •
Neither of v and expr (as applicable) may access the storage location designated by x. Neither of x and expr (as applicable) may access the storage location designated by v. expr is an expression with scalar type.
binopisoneof+,*,-,/,&,^,|,<<,or>>.
binop, binop=, ++, and — are not overloaded operators.
The expression x binop expr must be numerically equivalent to x binop (expr). This requirement is satisfied if the operators in expr have precedence greater than binop, or by using parentheses around expr or subexpressions of expr.
The expression expr binop x must be numerically equivalent to (expr) binop x. This requirement is satisfied if the operators in expr have precedence equal to or greater than binop, or by using parentheses around expr or subexpressions of expr.
For forms that allow multiple occurrences of x, the number of times that x is evaluated is unspecified.
hint-expression is a constant integer expression that evaluates to a valid synchronization hint. C / C++
Fortran
15 The syntax of the atomic construct takes any of the following forms:
16 17 18
19 or
20 21 22
23 or
24 25 26
27 or
28 29 30
31 or
!$omp atomic [clause[[[,]clause]…][,]] read [[,]clause[[[,]clause]…]] capture-statement
[!$omp end atomic]
!$omp atomic [clause[[[,]clause]…][,]] write [[,]clause[[[,]clause]…]] write-statement
[!$omp end atomic]
!$omp atomic [clause[[[,]clause]…][,]] update [[,]clause[[[,]clause]…]]
update-statement [!$omp end atomic]
!$omp atomic [clause[[,]clause]…] update-statement
[!$omp end atomic]
CHAPTER2. DIRECTIVES 237
Fortran (cont.)
!$omp atomic [clause[[[,]clause]…][,]] capture [[,]clause[[[,]clause]…]] update-statement
capture-statement
!$omp end atomic
1 2 3 4
5
6 7 8 9
10
11 12 13 14
15
16
17
18
19 20
21
22
23
24
25
26
27
28
29
30
31 32
33
or
or
where write-statement has the following form (if atomic-clause is capture or write): x = expr
where capture-statement has the following form (if atomic-clause is capture or read): v=x
and where update-statement has one of the following forms (if atomic-clause is update, capture, or not present):
!$omp atomic [clause[[[,]clause]…][,]] capture [[,]clause[[[,]clause]…]] capture-statement
update-statement
!$omp end atomic
!$omp atomic [clause[[[,]clause]…][,]] capture [[,]clause[[[,]clause]…]]
capture-statement
write-statement
!$omp end atomic
x = xoperatorexpr
x = exproperatorx
x = intrinsic_procedure_name (x, expr_list) x = intrinsic_procedure_name (expr_list, x)
238
OpenMP API – Version 5.0 November 2018
In the preceding statements:
• x and v (as applicable) are both scalar variables of intrinsic type.
• x must not have the ALLOCATABLE attribute.
• During the execution of an atomic region, multiple syntactic occurrences of x must designate the same storage location.
• None of v, expr, and expr_list (as applicable) may access the same storage location as x.
1 •
2 •
3 •
4 5
6 •
7 •
8 •
9 10
11 • 12
13
14 • 15
16 •
17 •
18 •
19
20 • 21
None of x, expr, and expr_list (as applicable) may access the same storage location as v. expr is a scalar expression.
expr_list is a comma-separated, non-empty list of scalar expressions. If intrinsic_procedure_name refers to IAND, IOR, or IEOR, exactly one expression must appear in expr_list.
intrinsic_procedure_name is one of MAX, MIN, IAND, IOR, or IEOR. operator is one of +, *, -, /, .AND., .OR., .EQV., or .NEQV..
The expression x operator expr must be numerically equivalent to x operator (expr). This requirement is satisfied if the operators in expr have precedence greater than operator, or by using parentheses around expr or subexpressions of expr.
The expression expr operator x must be numerically equivalent to (expr) operator x. This requirement is satisfied if the operators in expr have precedence equal to or greater than operator, or by using parentheses around expr or subexpressions of expr.
intrinsic_procedure_name must refer to the intrinsic procedure name and not to other program entities.
operator must refer to the intrinsic operator and not to a user-defined operator. All assignments must be intrinsic assignments.
For forms that allow multiple occurrences of x, the number of times that x is evaluated is unspecified.
hint-expression is a constant expression that evaluates to a scalar value with kind omp_sync_hint_kind and a value that is a valid synchronization hint.
Fortran
22 Binding
23 If the size of x is 8, 16, 32, or 64 bits and x is aligned to a multiple of its size, the binding thread set
24 for the atomic region is all threads on the device. Otherwise, the binding thread set for the
25 atomic region is all threads in the contention group. atomic regions enforce exclusive access
26 with respect to other atomic regions that access the same storage location x among all threads in
27 the binding thread set without regard to the teams to which the threads belong.
28 Description
29 If atomic-clause is not present on the construct, the behavior is as if the update clause is specified.
30 The atomic construct with the read clause results in an atomic read of the location designated
31 by x regardless of the native machine word size.
CHAPTER2. DIRECTIVES 239
1 2
3 4 5 6 7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28 29 30 31
32 33 34 35
36 37 38
39 40
The atomic construct with the write clause results in an atomic write of the location designated by x regardless of the native machine word size.
The atomic construct with the update clause results in an atomic update of the location designated by x using the designated operator or intrinsic. Only the read and write of the location designated by x are performed mutually atomically. The evaluation of expr or expr_list need not be atomic with respect to the read or write of the location designated by x. No task scheduling points are allowed between the read and the write of the location designated by x.
The atomic construct with the capture clause results in an atomic captured update — an atomic update of the location designated by x using the designated operator or intrinsic while also capturing the original or final value of the location designated by x with respect to the atomic update. The original or final value of the location designated by x is written in the location designated by v based on the base language semantics of structured block or statements of the atomic construct. Only the read and write of the location designated by x are performed mutually atomically. Neither the evaluation of expr or expr_list, nor the write to the location designated by v, need be atomic with respect to the read or write of the location designated by x. No task scheduling points are allowed between the read and the write of the location designated by x.
The atomic construct may be used to enforce memory consistency between threads, based on the guarantees provided by Section 1.4.6 on page 28. A strong flush on the location designated by x is performed on entry to and exit from the atomic operation, ensuring that the set of all atomic operations in the program applied to the same location has a total completion order. If the write, update, or capture clause is specified and the release, acq_rel, or seq_cst clause is specified then the strong flush on entry to the atomic operation is also a release flush. If the read or capture clause is specified and the acquire, acq_rel, or seq_cst clause is specified then the strong flush on exit from the atomic operation is also an acquire flush. Therefore, if memory-order-clause is specified and is not relaxed, release and/or acquire flush operations are implied and permit synchronization between the threads without the use of explicit flush directives.
For all forms of the atomic construct, any combination of two or more of these atomic constructs enforces mutually exclusive access to the locations designated by x among threads in the binding thread set. To avoid data races, all accesses of the locations designated by x that could potentially occur in parallel must be protected with an atomic construct.
atomic regions do not guarantee exclusive access with respect to any accesses outside of atomic regions to the same storage location x even if those accesses occur during a critical or ordered region, while an OpenMP lock is owned by the executing task, or during the execution of a reduction clause.
However, other OpenMP synchronization can ensure the desired exclusive access. For example, a barrier that follows a series of atomic updates to x guarantees that subsequent accesses do not form a race with the atomic accesses.
A compliant implementation may enforce exclusive access between atomic regions that update different storage locations. The circumstances under which this occurs are implementation defined.
240
OpenMP API – Version 5.0 November 2018
1 If the storage location designated by x is not size-aligned (that is, if the byte alignment of x is not a
2 multiple of the size of x), then the behavior of the atomic region is implementation defined.
3 If present, the hint clause gives the implementation additional information about the expected
4 properties of the atomic operation that can optionally be used to optimize the implementation. The
5 presence of a hint clause does not affect the semantics of the atomic construct, and all hints
6 may be ignored. If no hint clause is specified, the effect is as if
7 hint(omp_sync_hint_none) had been specified.
8 Execution Model Events
9 The atomic-acquiring event occurs in the thread that encounters the atomic construct on entry to
10 the atomic region before initiating synchronization for the region.
11 The atomic-acquired event occurs in the thread that encounters the atomic construct after it
12 enters the region, but before it executes the structured block of the atomic region.
13 The atomic-released event occurs in the thread that encounters the atomic construct after it
14 completes any synchronization on exit from the atomic region.
15 Tool Callbacks
16 A thread dispatches a registered ompt_callback_mutex_acquire callback for each
17 occurrence of an atomic-acquiring event in that thread. This callback has the type signature
18 ompt_callback_mutex_acquire_t.
19 A thread dispatches a registered ompt_callback_mutex_acquired callback for each
20 occurrence of an atomic-acquired event in that thread. This callback has the type signature
21 ompt_callback_mutex_t.
22 A thread dispatches a registered ompt_callback_mutex_released callback with
23 ompt_mutex_atomic as the kind argument if practical, although a less specific kind may be
24 used, for each occurrence of an atomic-released event in that thread. This callback has the type
25 signature ompt_callback_mutex_t and occurs in the task that encounters the atomic
26 construct.
27 Restrictions
28 The following restrictions apply to the atomic construct:
29 • OpenMP constructs may not be encountered during execution of an atomic region.
30 • At most one memory-order-clause may appear on the construct.
31 • At most one hint clause may appear on the construct.
32 • If atomic-clause is read then memory-order-clause must not be acq_rel or release.
CHAPTER2. DIRECTIVES 241
1
2 3
4 5
6 7
8
9
10
11
12
13
14
15
16
17
18
19 2.17.8 20
21 22 23 24
• If atomic-clause is write then memory-order-clause must not be acq_rel or acquire.
• If atomic-clause is update or not present then memory-order-clause must not be acq_rel or
acquire.
• All atomic accesses to the storage locations designated by x throughout the program are required
to have a compatible type.
C / C++ Fortran
• All atomic accesses to the storage locations designated by x throughout the program are required
to have the same type and type parameters.
Fortran
Cross References
• critical construct, see Section 2.17.1 on page 223. • barrier construct, see Section 2.17.2 on page 226.
• flush construct, see Section 2.17.8 on page 242.
• ordered construct, see Section 2.17.9 on page 250.
• Synchronization Hints, see Section 2.17.12 on page 260. • reduction clause, see Section 2.19.5.4 on page 300. • lock routines, see Section 3.3 on page 381.
• ompt_mutex_atomic, see Section 4.4.4.16 on page 445.
• ompt_callback_mutex_acquire_t, see Section 4.5.2.14 on page 476. • ompt_callback_mutex_t, see Section 4.5.2.15 on page 477.
flush Construct Summary
The flush construct executes the OpenMP flush operation. This operation makes a thread’s temporary view of memory consistent with memory and enforces an order on the memory operations of the variables explicitly specified or implied. See the memory model description in Section 1.4 on page 23 for more details. The flush construct is a stand-alone directive.
C / C++
242
OpenMP API – Version 5.0 November 2018
1 Syntax
2 The syntax of the flush construct is as follows:
3 #pragma omp flush [memory-order-clause] [(list)]new-line
4 where memory-order-clause is one of the following:
5 6 7
C / C++ Fortran
8 The syntax of the flush construct is as follows:
9 !$omp flush [memory-order-clause] [(list)]
10 where memory-order-clause is one of the following:
C / C++
acq_rel
release
acquire
acq_rel
release
acquire
11 12 13
14 Binding
Fortran
15 The binding thread set for a flush region is the encountering thread. Execution of a flush
16 region affects the memory and the temporary view of memory of only the thread that executes the
17 region. It does not affect the temporary view of other threads. Other threads must themselves
18 execute a flush operation in order to be guaranteed to observe the effects of the flush operation of
19 the encountering thread.
20 Description
21 If memory-order-clause is not specified then the flush construct results in a strong flush operation
22 with the following behavior. A flush construct without a list, executed on a given thread, operates
23 as if the whole thread-visible data state of the program, as defined by the base language, is flushed.
24 A flush construct with a list applies the flush operation to the items in the list, and the flush
25 operation does not complete until the operation is complete for all specified list items. An
26 implementation may implement a flush with a list by ignoring the list, and treating it the same as
27 a flush without a list.
CHAPTER2. DIRECTIVES 243
2 3
4 5
6 7
8
9 10 11 12 13
14
15
16
17
18
19
20
21
22
23
24
1
If no list items are specified, the flush operation has the release and/or acquire flush properties:
• If memory-order-clause is not specified or is acq_rel, the flush operation is both a release flush and an acquire flush.
• If memory-order-clause is release, the flush operation is a release flush. • If memory-order-clause is acquire, the flush operation is an acquire flush.
C / C++
If a pointer is present in the list, the pointer itself is flushed, not the memory block to which the pointer refers.
C / C++ Fortran
If the list item or a subobject of the list item has the POINTER attribute, the allocation or association status of the POINTER item is flushed, but the pointer target is not. If the list item is a Cray pointer, the pointer is flushed, but the object to which it points is not. If the list item is of type C_PTR, the variable is flushed, but the storage that corresponds to that address is not flushed. If the list item or the subobject of the list item has the ALLOCATABLE attribute and has an allocation status of allocated, the allocated variable is flushed; otherwise the allocation status is flushed.
Fortran
Note – Use of a flush construct with a list is extremely error prone and users are strongly discouraged from attempting it. The following examples illustrate the ordering properties of the flush operation. In the following incorrect pseudocode example, the programmer intends to prevent simultaneous execution of the protected section by the two threads, but the program does not work properly because it does not enforce the proper ordering of the operations on variables a and b. Any shared data accessed in the protected section is not guaranteed to be current or consistent during or after the protected section. The atomic notation in the pseudocode in the following two examples indicates that the accesses to a and b are atomic write and atomic read operations. Otherwise both examples would contain data races and automatically result in unspecified behavior. The flush operations are strong flushes that are applied to the specified flush lists
244
OpenMP API – Version 5.0 November 2018
Incorrect example:
thread 1
atomic(b = 1) flush(b)
flush(a)
atomic(tmp = a)
if (tmp == 0) then
protected section
end if
a=b=0
thread 2
atomic(a = 1) flush(a)
flush(b)
atomic(tmp = b)
if (tmp == 0) then
protected section
end if
1
2 The problem with this example is that operations on variables a and b are not ordered with respect
3 to each other. For instance, nothing prevents the compiler from moving the flush of b on thread 1 or
4 the flush of a on thread 2 to a position completely after the protected section (assuming that the
5 protected section on thread 1 does not reference b and the protected section on thread 2 does not
6 reference a). If either re-ordering happens, both threads can simultaneously execute the protected
7 section.
8 The following pseudocode example correctly ensures that the protected section is executed by not
9 more than one of the two threads at any one time. Execution of the protected section by neither
10 thread is considered correct in this example. This occurs if both flushes complete prior to either
11 thread executing its if statement.
12
Correct example:
thread 1
atomic(b = 1) flush(a,b) atomic(tmp = a)
if (tmp == 0) then
protected section
end if
a=b=0
thread 2
atomic(a = 1) flush(a,b) atomic(tmp = b)
if (tmp == 0) then
protected section
end if
CHAPTER2. DIRECTIVES 245
1 The compiler is prohibited from moving the flush at all for either thread, ensuring that the
2 respective assignment is complete and the data is flushed before the if statement is executed.
3 4
5 Execution Model Events
6 The flush event occurs in a thread that encounters the flush construct.
7 Tool Callbacks
8 A thread dispatches a registered ompt_callback_flush callback for each occurrence of a
9 flush event in that thread. This callback has the type signature ompt_callback_flush_t.
10 Restrictions
11 The following restrictions apply to the flush construct:
12 • If memory-order-clause is release, acquire, or acq_rel, list items must not be specified
13 on the flush directive.
14 Cross References
15 • ompt_callback_flush_t, see Section 4.5.2.17 on page 480.
16 2.17.8.1 Implicit Flushes
17
18 19
20
21
22
23
24
25
26
Flush operations implied when executing an atomic region are described in Section 2.17.7. A flush region that corresponds to a flush directive with the release clause present is
implied at the following locations: • During a barrier region;
• At entry to a parallel region; • At entry to a teams region;
• At exit from a critical region;
• During an omp_unset_lock region;
• During an omp_unset_nest_lock region; • Immediately before every task scheduling point;
246
OpenMP API – Version 5.0 November 2018
1 •
2 •
3
4 •
At exit from the task region of each implicit task;
At exit from an ordered region, if a threads clause or a depend clause with a source dependence type is present, or if no clauses are present; and
During a cancel region, if the cancel-var ICV is true.
5 A flush region that corresponds to a flush directive with the acquire clause present is
6 implied at the following locations:
7 •
8 •
9 •
10 •
11 12 13
14
15 •
16 •
17 •
18
19 • 20
During a barrier region;
At exit from a teams region;
At entry to a critical region;
If the region causes the lock to be set, during:
– an omp_set_lock region;
– an omp_test_lock region;
– an omp_set_nest_lock region; and
– an omp_test_nest_lock region;
Immediately after every task scheduling point;
At entry to the task region of each implicit task;
At entry to an ordered region, if a threads clause or a depend clause with a sink dependence type is present, or if no clauses are present; and
Immediately before a cancellation point, if the cancel-var ICV is true and cancellation has been activated.
21
22 Note – A flush region is not implied at the following locations:
23 • At entry to worksharing regions; and
24 • At entry to or exit from master regions.
25
26 The synchronization behavior of implicit flushes is as follows:
27 • 28
29
30
31
When a thread executes an atomic region for which the corresponding construct has the release, acq_rel, or seq_cst clause and specifies an atomic operation that starts a given release sequence, the release flush that is performed on entry to the atomic operation synchronizes with an acquire flush that is performed by a different thread and has an associated atomic operation that reads a value written by a modification in the release sequence.
CHAPTER2. DIRECTIVES 247
1 2 3 4 5
6 7 8 9
10 11 12
13 14 15 16
17 18 19
20 21 22 23
24 25 26 27
28 29 30
31 32 33 34
35 36 37 38
•
•
•
•
•
•
•
•
•
•
When a thread executes an atomic region for which the corresponding construct has the acquire, acq_rel, or seq_cst clause and specifies an atomic operation that reads a value written by a given modification, a release flush that is performed by a different thread and has an associated release sequence that contains that modification synchronizes with the acquire flush that is performed on exit from the atomic operation.
When a thread executes a critical region that has a given name, the behavior is as if the release flush performed on exit from the region synchronizes with the acquire flush performed on entry to the next critical region with the same name that is performed by a different thread, if it exists.
When a thread team executes a barrier region, the behavior is as if the release flush performed by each thread within the region synchronizes with the acquire flush performed by all other threads within the region.
When a thread executes a taskwait region that does not result in the creation of a dependent task, the behavior is as if each thread that executes a remaining child task performs a release flush upon completion of the child task that synchronizes with an acquire flush performed in the taskwait region.
When a thread executes a taskgroup region, the behavior is as if each thread that executes a remaining descendant task performs a release flush upon completion of the descendant task that synchronizes with an acquire flush performed on exit from the taskgroup region.
When a thread executes an ordered region that does not arise from a stand-alone ordered directive, the behavior is as if the release flush performed on exit from the region synchronizes with the acquire flush performed on entry to an ordered region encountered in the next logical iteration to be executed by a different thread, if it exists.
When a thread executes an ordered region that arises from a stand-alone ordered directive, the behavior is as if the release flush performed in the ordered region from a given source iteration synchronizes with the acquire flush performed in all ordered regions executed by a different thread that are waiting for dependences on that iteration to be satisfied.
When a thread team begins execution of a parallel region, the behavior is as if the release flush performed by the master thread on entry to the parallel region synchronizes with the acquire flush performed on entry to each implicit task that is assigned to a different thread.
When an initial thread begins execution of a target region that is generated by a different thread from a target task, the behavior is as if the release flush performed by the generating thread in the target task synchronizes with the acquire flush performed by the initial thread on entry to its initial task region.
When an initial thread completes execution of a target region that is generated by a different thread from a target task, the behavior is as if the release flush performed by the initial thread on exit from its initial task region synchronizes with the acquire flush performed by the generating thread in the target task.
248
OpenMP API – Version 5.0 November 2018
1 • 2
3
4
5 • 6
7
8
9 • 10
11
12 • 13
14
15
16 • 17
18
19
20 • 21
22
23
24 • 25
26
27
28
29 • 30
31
32
33
34 • 35
36
37
38
When a thread encounters a teams construct, the behavior is as if the release flush performed by the thread on entry to the teams region synchronizes with the acquire flush performed on entry to each initial task that is executed by a different initial thread that participates in the execution of the teams region.
When a thread that encounters a teams construct reaches the end of the teams region, the behavior is as if the release flush performed by each different participating initial thread at exit from its initial task synchronizes with the acquire flush performed by the thread at exit from the teams region.
When a task generates an explicit task that begins execution on a different thread, the behavior is as if the thread that is executing the generating task performs a release flush that synchronizes with the acquire flush performed by the thread that begins to execute the explicit task.
When an undeferred task completes execution on a given thread that is different from the thread on which its generating task is suspended, the behavior is as if a release flush performed by the thread that completes execution of the undeferred task synchronizes with an acquire flush performed by the thread that resumes execution of the generating task.
When a dependent task with one or more predecessor tasks begins execution on a given thread, the behavior is as if each release flush performed by a different thread on completion of a predecessor task synchronizes with the acquire flush performed by the thread that begins to execute the dependent task.
When a task begins execution on a given thread and it is mutually exclusive with respect to another sibling task that is executed by a different thread, the behavior is as if each release flush performed on completion of the sibling task synchronizes with the acquire flush performed by the thread that begins to execute the task.
When a thread executes a cancel region, the cancel-var ICV is true, and cancellation is not already activated for the specified region, the behavior is as if the release flush performed during the cancel region synchronizes with the acquire flush performed by a different thread immediately before a cancellation point in which that thread observes cancellation was activated for the region.
When a thread executes an omp_unset_lock region that causes the specified lock to be unset, the behavior is as if a release flush is performed during the omp_unset_lock region that synchronizes with an acquire flush that is performed during the next omp_set_lock or omp_test_lock region to be executed by a different thread that causes the specified lock to be set.
When a thread executes an omp_unset_nest_lock region that causes the specified nested lock to be unset, the behavior is as if a release flush is performed during the omp_unset_nest_lock region that synchronizes with an acquire flush that is performed during the next omp_set_nest_lock or omp_test_nest_lock region to be executed by a different thread that causes the specified nested lock to be set.
CHAPTER2. DIRECTIVES 249
1 2.17.9 2
3 4 5 6 7
8
9
10 11
12
13 14
15 16
17
18 19
20
21 22 23
24
25 26
27 28
ordered Construct Summary
The ordered construct either specifies a structured block in a worksharing-loop, simd, or worksharing-loop SIMD region that will be executed in the order of the loop iterations, or it is a stand-alone directive that specifies cross-iteration dependences in a doacross loop nest. The ordered construct sequentializes and orders the execution of ordered regions while allowing code outside the region to run in parallel.
Syntax
C / C++
The syntax of the ordered construct is as follows:
where clause is one of the following:
or
#pragma omp ordered clause[[[,]clause]…] new-line where clause is one of the following:
C / C++ Fortran
The syntax of the ordered construct is as follows:
where clause is one of the following:
or
!$omp ordered clause[[[,]clause]…]
#pragma omp ordered [clause[[,]clause]] new-line
structured-block
depend(source)
depend(sink : vec)
!$omp ordered [clause[[,]clause]] structured-block
!$omp end ordered
250
OpenMP API – Version 5.0 November 2018
threads simd
threads simd
1 where clause is one of the following: 2
3
depend(source) depend(sink : vec)
4 If the depend clause is specified, the ordered construct is a stand-alone directive.
5 Binding
6 The binding thread set for an ordered region is the current team. An ordered region binds to
7 the innermost enclosing simd or worksharing-loop SIMD region if the simd clause is present, and
8 otherwise it binds to the innermost enclosing worksharing-loop region. ordered regions that bind
9 to different regions execute independently of each other.
10 Description
11 If no clause is specified, the ordered construct behaves as if the threads clause had been
12 specified. If the threads clause is specified, the threads in the team that is executing the
13 worksharing-loop region execute ordered regions sequentially in the order of the loop iterations.
14 If any depend clauses are specified then those clauses specify the order in which the threads in the
15 team execute ordered regions. If the simd clause is specified, the ordered regions
16 encountered by any thread will execute one at a time in the order of the loop iterations.
17 When the thread that is executing the first iteration of the loop encounters an ordered construct,
18 it can enter the ordered region without waiting. When a thread that is executing any subsequent
19 iteration encounters an ordered construct without a depend clause, it waits at the beginning of
20 the ordered region until execution of all ordered regions belonging to all previous iterations
21 has completed. When a thread that is executing any subsequent iteration encounters an ordered
22 construct with one or more depend(sink:vec) clauses, it waits until its dependences on all
23 valid iterations specified by the depend clauses are satisfied before it completes execution of the
24 ordered region. A specific dependence is satisfied when a thread that is executing the
25 corresponding iteration encounters an ordered construct with a depend(source) clause.
26 Execution Model Events
27 The ordered-acquiring event occurs in the task that encounters the ordered construct on entry to
28 the ordered region before it initiates synchronization for the region.
29 The ordered-acquired event occurs in the task that encounters the ordered construct after it
30 enters the region, but before it executes the structured block of the ordered region.
31 The ordered-released event occurs in the task that encounters the ordered construct after it
32 completes any synchronization on exit from the ordered region.
Fortran
CHAPTER2. DIRECTIVES 251
1 2
3 4
5
6 7 8
9 10 11
12 13 14 15 16
17
18
19
20
21
22
23
24
25
26
27
28
29 30
31 32
The doacross-sink event occurs in the task that encounters a ordered construct for each depend(sink:vec) clause after the dependence is fulfilled.
The doacross-source event occurs in the task that encounters a ordered construct with a depend(source:vec) clause before signaling the dependence to be fulfilled.
Tool Callbacks
A thread dispatches a registered ompt_callback_mutex_acquire callback for each occurrence of an ordered-acquiring event in that thread. This callback has the type signature ompt_callback_mutex_acquire_t.
A thread dispatches a registered ompt_callback_mutex_acquired callback for each occurrence of an ordered-acquired event in that thread. This callback has the type signature ompt_callback_mutex_t.
A thread dispatches a registered ompt_callback_mutex_released callback with ompt_mutex_ordered as the kind argument if practical, although a less specific kind may be used, for each occurrence of an ordered-released event in that thread. This callback has the type signature ompt_callback_mutex_t and occurs in the task that encounters the atomic construct.
A thread dispatches a registered ompt_callback_dependences callback with all vector entries listed as ompt_dependence_type_sink in the deps argument for each occurrence of a doacross-sink event in that thread. A thread dispatches a registered ompt_callback_dependences callback with all vector entries listed as ompt_dependence_type_source in the deps argument for each occurrence of a doacross-source event in that thread. These callbacks have the type signature ompt_callback_dependences_t.
Restrictions
Restrictions to the ordered construct are as follows:
252
OpenMP API – Version 5.0 November 2018
• • • •
•
At most one threads clause can appear on an ordered construct.
At most one simd clause can appear on an ordered construct.
At most one depend(source) clause can appear on an ordered construct.
The construct corresponding to the binding region of an ordered region must not specify a reduction clause with the inscan modifier.
Either depend(sink:vec) clauses or depend(source) clauses may appear on an ordered construct, but not both.
1 • 2
3
4
5 • 6
7
8 • 9
10 • 11
12 • 13
14 • 15
16
17 • 18
19
20 • 21
The worksharing-loop or worksharing-loop SIMD region to which an ordered region corresponding to an ordered construct without a depend clause binds must have an ordered clause without the parameter specified on the corresponding worksharing-loop or worksharing-loop SIMD directive.
The worksharing-loop region to which an ordered region corresponding to an ordered construct with any depend clauses binds must have an ordered clause with the parameter specified on the corresponding worksharing-loop directive.
An ordered construct with the depend clause specified must be closely nested inside a worksharing-loop (or parallel worksharing-loop) construct.
An ordered region corresponding to an ordered construct without the simd clause specified must be closely nested inside a loop region.
An ordered region corresponding to an ordered construct with the simd clause specified must be closely nested inside a simd or worksharing-loop SIMD region.
An ordered region corresponding to an ordered construct with both the simd and threads clauses must be closely nested inside a worksharing-loop SIMD region or must be closely nested inside a worksharing-loop and simd region.
During execution of an iteration of a worksharing-loop or a loop nest within a worksharing-loop, simd, or worksharing-loop SIMD region, a thread must not execute more than one ordered region corresponding to an ordered construct without a depend clause.
C++
A throw executed inside a ordered region must cause execution to resume within the same ordered region, and the same thread that threw the exception must catch it.
C++
22 Cross References
23 • worksharing-loop construct, see Section 2.9.2 on page 101.
24 • simd construct, see Section 2.9.3.1 on page 110.
25 • parallel Worksharing-loop construct, see Section 2.13.1 on page 185.
26 • depend Clause, see Section 2.17.11 on page 255
27 • ompt_mutex_ordered, see Section 4.4.4.16 on page 445.
28 • ompt_callback_mutex_acquire_t, see Section 4.5.2.14 on page 476.
29 • ompt_callback_mutex_t, see Section 4.5.2.15 on page 477.
CHAPTER2. DIRECTIVES 253
9
10 11
12
13 14
15 16
17 18 19
20 21
22 23
24 25 26
Summary
The depobj construct initializes, updates or destroys an OpenMP depend object. The depobj construct is a stand-alone directive.
Syntax
C / C++
The syntax of the depobj construct is as follows: #pragma omp depobj(depobj) clausenew-line
where depobj is an lvalue expression of type omp_depend_t. where clause is one of the following:
C / C++ Fortran
The syntax of the depobj construct is as follows: !$omp depobj(depobj) clause
where depobj is a scalar integer variable of the omp_depend_kind kind. where clause is one of the following:
Fortran
1 2.17.10 Depend Objects
2 This section describes constructs that support OpenMP depend objects that can be used to supply
3 user-computed dependences to depend clauses. OpenMP depend objects must be accessed only
4 through the depobj construct or through the depend clause; programs that otherwise access
5 OpenMP depend objects are non-conforming.
6 An OpenMP depend object can be in one of the following states: uninitialized or initialized.
7 Initially OpenMP depend objects are in the uninitialized state.
8 2.17.10.1depobj Construct
depend(dependence-type : locator)
destroy update(dependence-type)
depend(dependence-type : locator) destroy update(dependence-type)
254
OpenMP API – Version 5.0 November 2018
1 Binding
2 The binding thread set for depobj regions is the encountering thread.
3 Description
4 A depobj construct with a depend clause present sets the state of depobj to initialized. The
5 depobj is initialized to represent the dependence that the depend clause specifies.
6 A depobj construct with a destroy clause present changes the state of the depobj to
7 uninitialized.
8 A depobj construct with an update clause present changes the dependence type of the
9 dependence represented by depobj to the one specified by the update clause.
10 Restrictions
11 • 12
13 •
14 •
15
16 • 17
18 • 19
A depend clause on a depobj construct must not have source, sink or depobj as dependence-type.
A depend clause on a depobj construct can only specify one locator.
The depobj of a depobj construct with the depend clause present must be in the uninitialized
state.
The depobj of a depobj construct with the destroy clause present must be in the initialized state.
The depobj of a depobj construct with the update clause present must be in the initialized state.
20 Cross References
21 • depend clause, see Section 2.17.11 on page 255.
22 2.17.11 depend Clause
23 Summary
24 The depend clause enforces additional constraints on the scheduling of tasks or loop iterations.
25 These constraints establish dependences only between sibling tasks or between loop iterations.
CHAPTER2. DIRECTIVES 255
2 3
4
5 6 7 8 9
10 11
12 13
14 15
16 17
18 19
20 21
22 23 24
25
26 27 28 29 30
where depend-modifier is one of the following: iterator(iterators-definition)
or
depend(dependence-type) where dependence-type is:
source
depend(dependence-type : vec) where dependence-type is:
sink
and where vec is the iteration vector, which has the form: x1[±d1],x2[±d2],…,xn[±dn]
where n is the value specified by the ordered clause in the worksharing-loop directive, xi denotes the loop iteration variable of the i-th nested loop associated with the worksharing-loop directive, and di is a constant non-negative integer.
Description
Task dependences are derived from the dependence-type of a depend clause and its list items when dependence-type is in, out, inout, or mutexinoutset. When the dependence-type is depobj, the task dependences are derived from the dependences represented by the depend objects specified in the depend clause as if the depend clauses of the depobj constructs were specified in the current construct.
1
Syntax
The syntax of the depend clause is as follows: depend([depend-modifier,]dependence-type : locator-list)
where dependence-type is one of the following:
in
out
inout
mutexinoutset
depobj
or
256
OpenMP API – Version 5.0 November 2018
1 For the in dependence-type, if the storage location of at least one of the list items is the same as the
2 storage location of a list item appearing in a depend clause with an out, inout, or
3 mutexinoutset dependence-type on a construct from which a sibling task was previously
4 generated, then the generated task will be a dependent task of that sibling task.
5 For the out and inout dependence-types, if the storage location of at least one of the list items is
6 the same as the storage location of a list item appearing in a depend clause with an in, out,
7 inout, or mutexinoutset dependence-type on a construct from which a sibling task was
8 previously generated, then the generated task will be a dependent task of that sibling task.
9 For the mutexinoutset dependence-type, if the storage location of at least one of the list items
10 is the same as the storage location of a list item appearing in a depend clause with an in, out, or
11 inout dependence-type on a construct from which a sibling task was previously generated, then
12 the generated task will be a dependent task of that sibling task.
13 If a list item appearing in a depend clause with a mutexinoutset dependence-type on a
14 task-generating construct has the same storage location as a list item appearing in a depend clause
15 with a mutexinoutset dependence-type on a different task generating construct, and both
16 constructs generate sibling tasks, the sibling tasks will be mutually exclusive tasks.
17 The list items that appear in the depend clause may reference iterators defined by an
18 iterators-definition appearing on an iterator modifier.
19 The list items that appear in the depend clause may include array sections.
Fortran
20 If a list item has the ALLOCATABLE attribute and its allocation status is unallocated, the behavior
21 is unspecified. If a list item has the POINTER attribute and its association status is disassociated or
22 undefined, the behavior is unspecified.
Fortran C / C++
23 The list items that appear in a depend clause may use shape-operators. C / C++
24
25 Note – The enforced task dependence establishes a synchronization of memory accesses
26 performed by a dependent task with respect to accesses performed by the predecessor tasks.
27 However, it is the responsibility of the programmer to synchronize properly with respect to other
28 concurrent accesses that occur outside of those tasks.
29
30 The source dependence-type specifies the satisfaction of cross-iteration dependences that arise
31 from the current iteration.
32 The sink dependence-type specifies a cross-iteration dependence, where the iteration vector vec
33 indicates the iteration that satisfies the dependence.
CHAPTER2. DIRECTIVES 257
1 2
3
4 5 6
7
8
9 10
11 12
13
14 15 16
17 18 19
20 21
22 23
24 25
26 27
28 29
30 31
If the iteration vector vec does not occur in the iteration space, the depend clause is ignored. If all depend clauses on an ordered construct are ignored then the construct is ignored.
Note – An iteration vector vec that does not indicate a lexicographically earlier iteration may cause a deadlock.
Execution Model Events
The task-dependences event occurs in a thread that encounters a task generating construct or a taskwait construct with a depend clause immediately after the task-create event for the new task or the taskwait-begin event.
The task-dependence event indicates an unfulfilled dependence for the generated task. This event occurs in a thread that observes the unfulfilled dependence before it is satisfied.
Tool Callbacks
A thread dispatches the ompt_callback_dependences callback for each occurrence of the task-dependences event to announce its dependences with respect to the list items in the depend clause. This callback has type signature ompt_callback_dependences_t.
A thread dispatches the ompt_callback_task_dependence callback for a task-dependence event to report a dependence between a predecessor task (src_task_data) and a dependent task (sink_task_data). This callback has type signature ompt_callback_task_dependence_t.
Restrictions
Restrictions to the depend clause are as follows:
•
• • •
• •
List items used in depend clauses of the same task or sibling tasks must indicate identical storage locations or disjoint storage locations.
List items used in depend clauses cannot be zero-length array sections.
Array sections cannot be specified in depend clauses with the depobj dependence type.
List items used in depend clauses with the depobj dependence type must be depend objects in the initialized state.
C / C++
List items used in depend clauses with the depobj dependence type must be expressions of the omp_depend_t type.
List items used in depend clauses with the in, out, inout or mutexinoutset dependence types cannot be expressions of the omp_depend_t type.
C / C++
258
OpenMP API – Version 5.0 November 2018
1 •
2 •
3
4 • 5
6
7
8 •
9 10 11 12
13 •
Fortran
A common block name cannot appear in a depend clause.
List items used in depend clauses with the depobj dependence type must be integer
expressions of the omp_depend_kind kind. Fortran
For a vec element of sink dependence-type of the form xi + di or xi − di if the loop iteration variable xi has an integral or pointer type, the expression xi + di or xi − di for any value of the loop iteration variable xi that can encounter the ordered construct must be computable without overflow in the type of the loop iteration variable.
C++
For a vec element of sink dependence-type of the form xi + di or xi − di if the loop iteration variable xi is of a random access iterator type other than pointer type, the expression
(xi −lbi )+di or(xi −lbi )−di foranyvalueoftheloopiterationvariablexi thatcan encounter the ordered construct must be computable without overflow in the type that would be used by std::distance applied to variables of the type of xi.
C++ C / C++
A bit-field cannot appear in a depend clause.
C / C++
14 Cross References
15 • Array sections, see Section 2.1.5 on page 44.
16 • Iterators, see Section 2.1.6 on page 47.
17 • task construct, see Section 2.10.1 on page 135.
18 • Task scheduling constraints, see Section 2.10.6 on page 149.
19 • target enter data construct, see Section 2.12.3 on page 164.
20 • target exit data construct, see Section 2.12.4 on page 166.
21 • target construct, see Section 2.12.5 on page 170.
22 • target update construct, see Section 2.12.6 on page 176.
23 • ordered construct, see Section 2.17.9 on page 250.
24 • depobj construct, see Section 2.17.10.1 on page 254.
25 • ompt_callback_dependences_t, see Section 4.5.2.8 on page 468.
26 • ompt_callback_task_dependence_t, see Section 4.5.2.9 on page 470.
CHAPTER2. DIRECTIVES 259
2 3 4 5 6 7
8
9 10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Hints about the expected dynamic behavior or suggested implementation can be provided by the programmer to locks (by using the omp_init_lock_with_hint or omp_init_nest_lock_with_hint functions to initialize the lock), and to atomic and critical directives by using the hint clause. The effect of a hint does not change the semantics of the associated construct; if ignoring the hint changes the program semantics, the result is unspecified.
The C/C++ header file (omp.h) and the Fortran include file (omp_lib.h) and/or Fortran 90 module file (omp_lib) define the valid hint constants. The valid constants must include the following, which can be extended with implementation-defined values:
C / C++
1 2.17.12 Synchronization Hints
typedef enum omp_sync_hint_t {
omp_sync_hint_none = 0x0,
omp_lock_hint_none = omp_sync_hint_none,
omp_sync_hint_uncontended = 0x1,
omp_lock_hint_uncontended = omp_sync_hint_uncontended,
omp_sync_hint_contended = 0x2,
omp_lock_hint_contended = omp_sync_hint_contended,
omp_sync_hint_nonspeculative = 0x4,
omp_lock_hint_nonspeculative = omp_sync_hint_nonspeculative,
omp_sync_hint_speculative = 0x8
omp_lock_hint_speculative = omp_sync_hint_speculative
} omp_sync_hint_t;
typedef omp_sync_hint_t omp_lock_hint_t;
C / C++ Fortran
integer, parameter :: omp_lock_hint_kind = omp_sync_hint_kind
integer (kind=omp_sync_hint_kind), &
parameter :: omp_sync_hint_none = &
int(Z’0’, kind=omp_sync_hint_kind)
integer (kind=omp_lock_hint_kind), &
parameter :: omp_lock_hint_none = omp_sync_hint_none
integer (kind=omp_sync_hint_kind), &
parameter :: omp_sync_hint_uncontended = &
int(Z’1’, kind=omp_sync_hint_kind)
integer (kind=omp_lock_hint_kind), &
parameter :: omp_lock_hint_uncontended = &
omp_sync_hint_uncontended
integer (kind=omp_sync_hint_kind), &
260
OpenMP API – Version 5.0 November 2018
parameter :: omp_sync_hint_contended = &
int(Z’2’, kind=omp_sync_hint_kind)
integer (kind=omp_lock_hint_kind), &
parameter :: omp_lock_hint_contended = &
omp_sync_hint_contended
integer (kind=omp_sync_hint_kind), &
parameter :: omp_sync_hint_nonspeculative = &
int(Z’4’, kind=omp_sync_hint_kind)
integer (kind=omp_lock_hint_kind), &
parameter :: omp_lock_hint_nonspeculative = &
omp_sync_hint_nonspeculative
integer (kind=omp_sync_hint_kind), &
parameter :: omp_sync_hint_speculative = &
int(Z’8’, kind=omp_sync_hint_kind)
integer (kind=omp_lock_hint_kind), &
parameter :: omp_lock_hint_speculative = &
omp_sync_hint_speculative
1 2 3 4 5 6 7 8 9
10
11
12
13
14
15
16
17
18 19
20
21 22 23
24 25 26
27 28
29 30
31
32
33
34
35
36
37
Fortran
The hints can be combined by using the + or | operators in C/C++ or the + operator in Fortran. Combining omp_sync_hint_none with any other hint is equivalent to specifying the other hint.
The intended meaning of each hint is:
• omp_sync_hint_uncontended: low contention is expected in this operation, that is, few threads are expected to perform the operation simultaneously in a manner that requires synchronization;
• omp_sync_hint_contended: high contention is expected in this operation, that is, many threads are expected to perform the operation simultaneously in a manner that requires synchronization;
• omp_sync_hint_speculative: the programmer suggests that the operation should be implemented using speculative techniques such as transactional memory; and
• omp_sync_hint_nonspeculative: the programmer suggests that the operation should not be implemented using speculative techniques such as transactional memory.
Note – Future OpenMP specifications may add additional hints to the omp_sync_hint_t type and the omp_sync_hint_kind kind. Implementers are advised to add implementation-defined hints starting from the most significant bit of the omp_sync_hint_t type and omp_sync_hint_kind kind and to include the name of the implementation in the name of the added hint to avoid name conflicts with other OpenMP implementations.
CHAPTER2. DIRECTIVES 261
1 2
3 4
5 6
7 8
9 10
11
12
13
14 15
The omp_sync_hint_t and omp_lock_hint_t enumeration types and the equivalent types in Fortran are synonyms for each other. The type omp_lock_hint_t has been deprecated.
Restrictions
Restrictions to the synchronization hints are as follows:
262
OpenMP API – Version 5.0 November 2018
• •
The hints omp_sync_hint_uncontended and omp_sync_hint_contended cannot be combined.
The hints omp_sync_hint_nonspeculative and omp_sync_hint_speculative cannot be combined.
The restrictions for combining multiple values of omp_sync_hint apply equally to the corresponding values of omp_lock_hint, and expressions that mix the two types.
Cross References
• • •
critical construct, see Section 2.17.1 on page 223. atomic construct, see Section 2.17.7 on page 234
omp_init_lock_with_hint and omp_init_nest_lock_with_hint, see Section 3.3.2 on page 385.
1 2.18
2 2.18.1
3
4 5
6
7 8
9
10 11 12 13
14 15
16 17
18
19 20 21 22
23 24
Cancellation Constructs
cancel Construct Summary
The cancel construct activates cancellation of the innermost enclosing region of the type specified. The cancel construct is a stand-alone directive.
Syntax
C / C++
The syntax of the cancel construct is as follows:
#pragma omp cancel construct-type-clause[[,]if-clause]new-line
where construct-type-clause is one of the following:
and if-clause is
if ([ cancel :]scalar-expression)
C / C++ Fortran
The syntax of the cancel construct is as follows: !$omp cancel construct-type-clause[[,]if-clause]
where construct-type-clause is one of the following:
and if-clause is
if ([ cancel :]scalar-logical-expression)
Fortran
parallel
sections
for
taskgroup
parallel
sections
do
taskgroup
CHAPTER2. DIRECTIVES 263
2 3 4 5
6
7 8 9
10 11 12
13 14
15
16
17
18
19
20 21 22
23 24 25 26
27 28 29
30
31 32 33 34 35
•
•
•
If the thread is at a cancel or cancellation point region and construct-type-clause is parallel, for, do, or sections, the thread continues execution at the end of the canceled region if cancellation has been activated for the innermost enclosing region of the type specified.
If the thread is at a cancel or cancellation point region and construct-type-clause is taskgroup, the encountering task checks for active cancellation of all of the taskgroup sets to which the encountering task belongs, and continues execution at the end of the current task region if cancellation has been activated for any of the taskgroup sets.
If the encountering task is at a barrier region, the encountering task checks for active cancellation of the innermost enclosing parallel region. If cancellation has been activated, then the encountering task continues execution at the end of the canceled region.
1
Binding
The binding thread set of the cancel region is the current team. The binding region of the cancel region is the innermost enclosing region of the type corresponding to the construct-type-clause specified in the directive (that is, the innermost parallel, sections, worksharing-loop, or taskgroup region).
Description
The cancel construct activates cancellation of the binding region only if the cancel-var ICV is true, in which case the cancel construct causes the encountering task to continue execution at the end of the binding region if construct-type-clause is parallel, for, do, or sections. If the cancel-var ICV is true and construct-type-clause is taskgroup, the encountering task continues execution at the end of the current task region. If the cancel-var ICV is false, the cancel construct is ignored.
Threads check for active cancellation only at cancellation points that are implied at the following locations:
• cancel regions;
• cancellation point regions;
• barrier regions;
• implicit barriers regions.
When a thread reaches one of the above cancellation points and if the cancel-var ICV is true, then:
Note – If one thread activates cancellation and another thread encounters a cancellation point, the order of execution between the two threads is non-deterministic. Whether the thread that encounters a cancellation point detects the activated cancellation depends on the underlying hardware and operating system.
264
OpenMP API – Version 5.0 November 2018
1 When cancellation of tasks is activated through a cancel construct with the taskgroup
2 construct-type-clause, the tasks that belong to the taskgroup set of the innermost enclosing
3 taskgroup region will be canceled. The task that encountered that construct continues execution
4 at the end of its task region, which implies completion of that task. Any task that belongs to the
5 innermost enclosing taskgroup and has already begun execution must run to completion or until
6 a cancellation point is reached. Upon reaching a cancellation point and if cancellation is active, the
7 task continues execution at the end of its task region, which implies the task’s completion. Any task
8 that belongs to the innermost enclosing taskgroup and that has not begun execution may be
9 discarded, which implies its completion.
10 When cancellation is active for a parallel, sections, or worksharing-loop region, each
11 thread of the binding thread set resumes execution at the end of the canceled region if a cancellation
12 point is encountered. If the canceled region is a parallel region, any tasks that have been
13 created by a task or a taskloop construct and their descendent tasks are canceled according to
14 the above taskgroup cancellation semantics. If the canceled region is a sections, or
15 worksharing-loop region, no task cancellation occurs.
C++
16 The usual C++ rules for object destruction are followed when cancellation is performed. C++
Fortran
17 All private objects or subobjects with ALLOCATABLE attribute that are allocated inside the
18 canceled construct are deallocated.
Fortran
19 If the canceled construct contains a reduction, task_reduction or lastprivate clause,
20 the final value of the list items that appeared in those clauses are undefined.
21 When an if clause is present on a cancel construct and the if expression evaluates to false, the
22 cancel construct does not activate cancellation. The cancellation point associated with the
23 cancel construct is always encountered regardless of the value of the if expression.
24
25 Note – The programmer is responsible for releasing locks and other synchronization data
26 structures that might cause a deadlock when a cancel construct is encountered and blocked
27 threads cannot be canceled. The programmer is also responsible for ensuring proper
28 synchronizations to avoid deadlocks that might arise from cancellation of OpenMP regions that
29 contain OpenMP synchronization constructs.
30
31 Execution Model Events
32 If a task encounters a cancel construct that will activate cancellation then a cancel event occurs.
33 A discarded-task event occurs for any discarded tasks.
CHAPTER2. DIRECTIVES 265
1
Tool Callbacks
A thread dispatches a registered ompt_callback_cancel callback for each occurrence of a cancel event in the context of the encountering task. This callback has type signature ompt_callback_cancel_t; (flags & ompt_cancel_activated) always evaluates to true in the dispatched callback; (flags & ompt_cancel_parallel) evaluates to true in the dispatched callback if construct-type-clause is parallel;
(flags & ompt_cancel_sections) evaluates to true in the dispatched callback if construct-type-clause is sections; (flags & ompt_cancel_loop) evaluates to true in the dispatched callback if construct-type-clause is for or do; and
(flags & ompt_cancel_taskgroup) evaluates to true in the dispatched callback if construct-type-clause is taskgroup.
A thread dispatches a registered ompt_callback_cancel callback with the ompt_data_t associated with the discarded task as its task_data argument and ompt_cancel_discarded_task as its flags argument for each occurrence of a discarded-task event. The callback occurs in the context of the task that discards the task and has type signature ompt_callback_cancel_t.
Restrictions
The restrictions to the cancel construct are as follows:
2 3 4 5 6 7 8 9
10 11
12 13 14 15 16
17
18
19
20
21
22
23
24
25
26 27
28 29 30
31 32 33 34
• •
• • •
The behavior for concurrent cancellation of a region and a region nested within it is unspecified.
If construct-type-clause is taskgroup, the cancel construct must be closely nested inside a task or a taskloop construct and the cancel region must be closely nested inside a taskgroup region. If construct-type-clause is sections, the cancel construct must be closely nested inside a sections or section construct. Otherwise, the cancel construct must be closely nested inside an OpenMP construct that matches the type specified in construct-type-clause of the cancel construct.
A worksharing construct that is canceled must not have a nowait clause.
A worksharing-loop construct that is canceled must not have an ordered clause.
During execution of a construct that may be subject to cancellation, a thread must not encounter an orphaned cancellation point. That is, a cancellation point must only be encountered within that construct and must not be encountered elsewhere in its region.
266
OpenMP API – Version 5.0 November 2018
Cross References
• cancel-var ICV, see Section 2.5.1 on page 64.
• if clause, see Section 2.15 on page 220.
• cancellation point construct, see Section 2.18.2 on page 267.
1 2 3
4 2.18.2 5
6 7 8
9
10 11
12
13 14 15 16
17 18
19
20 21 22 23
• omp_get_cancellation routine, see Section 3.2.9 on page 342.
• omp_cancel_flag_t enumeration type, see Section 4.4.4.24 on page 450. • ompt_callback_cancel_t, see Section 4.5.2.18 on page 481.
cancellation point Construct Summary
The cancellation point construct introduces a user-defined cancellation point at which implicit or explicit tasks check if cancellation of the innermost enclosing region of the type specified has been activated. The cancellation point construct is a stand-alone directive.
Syntax
C / C++
The syntax of the cancellation point construct is as follows: #pragma omp cancellation point construct-type-clausenew-line
where construct-type-clause is one of the following:
C / C++ Fortran
The syntax of the cancellation point construct is as follows: !$omp cancellation point construct-type-clause
where construct-type-clause is one of the following:
Fortran
parallel
sections
for
taskgroup
parallel
sections
do
taskgroup
CHAPTER2. DIRECTIVES 267
1
Binding
The binding thread set of the cancellation point construct is the current team. The binding region of the cancellation point region is the innermost enclosing region of the type corresponding to the construct-type-clause specified in the directive (that is, the innermost parallel, sections, worksharing-loop, or taskgroup region).
Description
This directive introduces a user-defined cancellation point at which an implicit or explicit task must check if cancellation of the innermost enclosing region of the type specified in the clause has been requested. This construct does not implement any synchronization between threads or tasks.
When an implicit or explicit task reaches a user-defined cancellation point and if the cancel-var ICV is true, then:
• If the construct-type-clause of the encountered cancellation point construct is parallel, for, do, or sections, the thread continues execution at the end of the canceled region if cancellation has been activated for the innermost enclosing region of the type specified.
• If the construct-type-clause of the encountered cancellation point construct is taskgroup, the encountering task checks for active cancellation of all taskgroup sets to which the encountering task belongs and continues execution at the end of the current task region if cancellation has been activated for any of them.
Execution Model Events
The cancellation event occurs if a task encounters a cancellation point and detected the activation of cancellation.
Tool Callbacks
A thread dispatches a registered ompt_callback_cancel callback for each occurrence of a cancel event in the context of the encountering task. This callback has type signature ompt_callback_cancel_t; (flags & ompt_cancel_detected) always evaluates to true in the dispatched callback; (flags & ompt_cancel_parallel) evaluates to true in the dispatched callback if construct-type-clause of the encountered cancellation point construct is parallel; (flags & ompt_cancel_sections) evaluates to true in the dispatched callback if construct-type-clause of the encountered cancellation point construct is sections; (flags & ompt_cancel_loop) evaluates to true in the dispatched callback if construct-type-clause of the encountered cancellation point construct is for or do; and (flags & ompt_cancel_taskgroup) evaluates to true in the dispatched callback if construct-type-clause of the encountered cancellation point construct is taskgroup.
2 3 4 5
6
7 8 9
10 11
12 13 14
15 16 17 18
19
20 21
22
23
24
25
26
27
28
29
30
31
32
33
268
OpenMP API – Version 5.0 November 2018
1
Restrictions
2 3 4
5 6
7 8 9
10 11 12 13 14
15 2.19 16
17 2.19.1 18
19
20 21
22 23
•
• •
A cancellation point construct for which construct-type-clause is taskgroup must be closely nested inside a task or taskloop construct, and the cancellation point region must be closely nested inside a taskgroup region.
A cancellation point construct for which construct-type-clause is sections must be closely nested inside a sections or section construct.
A cancellation point construct for which construct-type-clause is neither sections nor taskgroup must be closely nested inside an OpenMP construct that matches the type specified in construct-type-clause.
Cross References
• cancel-var ICV, see Section 2.5.1 on page 64.
• cancel construct, see Section 2.18.1 on page 263.
• omp_get_cancellation routine, see Section 3.2.9 on page 342. • ompt_callback_cancel_t, see Section 4.5.2.18 on page 481.
Data Environment
This section presents directives and clauses for controlling data environments.
Data-Sharing Attribute Rules
This section describes how the data-sharing attributes of variables referenced in data environments are determined. The following two cases are described separately:
• •
Section 2.19.1.1 on page 270 describes the data-sharing attribute rules for variables referenced in a construct.
Section 2.19.1.2 on page 273 describes the data-sharing attribute rules for variables referenced in a region, but outside any construct.
CHAPTER2. DIRECTIVES 269
2 3
4 5 6 7 8
9
10
11 12
13 14
15 16
17 18
19 20
21 22
23 24 25
26 27
28
29 30
The data-sharing attributes of variables that are referenced in a construct can be predetermined, explicitly determined, or implicitly determined, according to the rules outlined in this section.
Specifying a variable in a data-sharing attribute clause, except for the private clause, or copyprivate clause of an enclosed construct causes an implicit reference to the variable in the enclosing construct. Specifying a variable in a map clause of an enclosed construct may cause an implicit reference to the variable in the enclosing construct. Such implicit references are also subject to the data-sharing attribute rules outlined in this section.
Certain variables and objects have predetermined data-sharing attributes as follows: C / C++
• Variables that appear in threadprivate directives are threadprivate.
• Variables with automatic storage duration that are declared in a scope inside the construct are
private.
• Objects with dynamic storage duration are shared.
• Static data members are shared.
• The loop iteration variable(s) in the associated for-loop(s) of a for, parallel for, taskloop, or distribute construct is (are) private.
• The loop iteration variable in the associated for-loop of a simd construct with just one associated for-loop is linear with a linear-step that is the increment of the associated for-loop.
• The loop iteration variables in the associated for-loops of a simd construct with multiple associated for-loops are lastprivate.
• The loop iteration variable(s) in the associated for-loop(s) of a loop construct is (are) lastprivate.
• Variables with static storage duration that are declared in a scope inside the construct are shared.
• If a list item in a map clause on the target construct has a base pointer, and the base pointer is a scalar variable that does not appear in a map clause on the construct, the base pointer is firstprivate.
• If a list item in a reduction or in_reduction clause on a construct has a base pointer then the base pointer is private.
C / C++ Fortran
• Variables and common blocks that appear in threadprivate directives are threadprivate.
• The loop iteration variable(s) in the associated do-loop(s) of a do, parallel do, taskloop,
or distribute construct is (are) private.
1 2.19.1.1 Variables Referenced in a Construct
270
OpenMP API – Version 5.0 November 2018
1 • 2
3 • 4
5 •
6 •
7
8 •
9 •
10
11 •
12 •
13
The loop iteration variable in the associated do-loop of a simd construct with just one associated do-loop is linear with a linear-step that is the increment of the associated do-loop.
The loop iteration variables in the associated do-loops of a simd construct with multiple associated do-loops are lastprivate.
The loop iteration variable(s) in the associated do-loop(s) of a loop construct is (are) lastprivate. A loop iteration variable for a sequential loop in a parallel or task generating construct is
private in the innermost such construct that encloses the loop. Implied-do indices and forall indices are private.
Cray pointees have the same data-sharing attribute as the storage with which their Cray pointers are associated.
Assumed-size arrays are shared.
An associate name preserves the association with the selector established at the ASSOCIATE or
SELECT TYPE statement.
Fortran
14 Variables with predetermined data-sharing attributes may not be listed in data-sharing attribute
15 clauses, except for the cases listed below. For these exceptions only, listing a predetermined
16 variable in a data-sharing attribute clause is allowed and overrides the variable’s predetermined
17 data-sharing attributes.
18 • 19
20
21 • 22
23
24 • 25
26 • 27
C / C++
The loop iteration variable(s) in the associated for-loop(s) of a for, parallel for, taskloop, distribute, or loop construct may be listed in a private or lastprivate clause.
The loop iteration variable in the associated for-loop of a simd construct with just one associated for-loop may be listed in a private, lastprivate, or linear clause with a linear-step that is the increment of the associated for-loop.
The loop iteration variables in the associated for-loops of a simd construct with multiple associated for-loops may be listed in a private or lastprivate clause.
Variables with const-qualified type with no mutable members may be listed in a firstprivate clause, even if they are static data members.
C / C++
CHAPTER2. DIRECTIVES 271
1 2
3 4 5
6 7
8
9 10
11
12 13
14 15
16 17 18
19
20 21
22
23 24
25 26
27 28
29 30
• •
• •
•
Fortran
The loop iteration variable(s) in the associated do-loop(s) of a do, parallel do, taskloop, distribute, or loop construct may be listed in a private or lastprivate clause.
The loop iteration variable in the associated do-loop of a simd construct with just one associated do-loop may be listed in a private, lastprivate, or linear clause with a linear-step that is the increment of the associated loop.
The loop iteration variables in the associated do-loops of a simd construct with multiple associated do-loops may be listed in a private or lastprivate clause.
Variables used as loop iteration variables in sequential loops in a parallel or task generating construct may be listed in data-sharing attribute clauses on the construct itself, and on enclosed constructs, subject to other restrictions.
Assumed-size arrays may be listed in a shared clause. Fortran
Additional restrictions on the variables that may appear in individual clauses are described with each clause in Section 2.19.4 on page 282.
Variables with explicitly determined data-sharing attributes are those that are referenced in a given construct and are listed in a data-sharing attribute clause on the construct.
Variables with implicitly determined data-sharing attributes are those that are referenced in a given construct, do not have predetermined data-sharing attributes, and are not listed in a data-sharing attribute clause on the construct.
Rules for variables with implicitly determined data-sharing attributes are as follows:
•
• •
•
•
•
In a parallel, teams, or task generating construct, the data-sharing attributes of these variables are determined by the default clause, if present (see Section 2.19.4.1 on page 282).
In a parallel construct, if no default clause is present, these variables are shared. For constructs other than task generating constructs, if no default clause is present, these
variables reference the variables with the same names that exist in the enclosing context.
In a target construct, variables that are not mapped after applying data-mapping attribute
rules (see Section 2.19.7 on page 314) are firstprivate. C++
In an orphaned task generating construct, if no default clause is present, formal arguments passed by reference are firstprivate.
C++ Fortran
In an orphaned task generating construct, if no default clause is present, dummy arguments are firstprivate.
Fortran
272
OpenMP API – Version 5.0 November 2018
1 • 2
3
4 • 5
In a task generating construct, if no default clause is present, a variable for which the data-sharing attribute is not determined by the rules above and that in the enclosing context is determined to be shared by all implicit tasks bound to the current team is shared.
In a task generating construct, if no default clause is present, a variable for which the data-sharing attribute is not determined by the rules above is firstprivate.
6 Additional restrictions on the variables for which data-sharing attributes cannot be implicitly
7 determined in a task generating construct are described in Section 2.19.4.4 on page 286.
8 2.19.1.2 Variables Referenced in a Region but not in a Construct
9 The data-sharing attributes of variables that are referenced in a region, but not in a construct, are
10 determined as follows:
11 •
12 •
13
14 •
15 •
16 •
17
18 •
19 •
20
21 • 22
23 •
24 •
25
26 • 27
28
29
30
C / C++
Variables with static storage duration that are declared in called routines in the region are shared. File-scope or namespace-scope variables referenced in called routines in the region are shared
unless they appear in a threadprivate directive.
Objects with dynamic storage duration are shared.
Static data members are shared unless they appear in a threadprivate directive.
In C++, formal arguments of called routines in the region that are passed by reference have the same data-sharing attributes as the associated actual arguments.
Other variables declared in called routines in the region are private.
C / C++ Fortran
Local variables declared in called routines in the region and that have the save attribute, or that are data initialized, are shared unless they appear in a threadprivate directive.
Variables belonging to common blocks, or accessed by host or use association, and referenced in called routines in the region are shared unless they appear in a threadprivate directive.
Dummy arguments of called routines in the region that have the VALUE attribute are private. Dummy arguments of called routines in the region that do not have the VALUE attribute are
private if the associated actual argument is not shared.
Dummy arguments of called routines in the region that do not have the VALUE attribute are shared if the actual argument is shared and it is a scalar variable, structure, an array that is not a pointer or assumed-shape array, or a simply contiguous array section. Otherwise, the data-sharing attribute of the dummy argument is implementation-defined if the associated actual argument is shared.
CHAPTER2. DIRECTIVES 273
1 2
3 4
5 2.19.2 6
7 8
9
10 11
12 13
14 15
16 17
18
19 20 21 22
• Cray pointees have the same data-sharing attribute as the storage with which their Cray pointers are associated.
• Implied-do indices, forall indices, and other local variables declared in called routines in the region are private.
Fortran
threadprivate Directive Summary
The threadprivate directive specifies that variables are replicated, with each thread having its own copy. The threadprivate directive is a declarative directive.
Syntax
C / C++
The syntax of the threadprivate directive is as follows: #pragma omp threadprivate(list) new-line
where list is a comma-separated list of file-scope, namespace-scope, or static block-scope variables that do not have incomplete types.
C / C++ Fortran
The syntax of the threadprivate directive is as follows: !$omp threadprivate(list)
where list is a comma-separated list of named variables and named common blocks. Common block names must appear between slashes.
Fortran
Description
Each copy of a threadprivate variable is initialized once, in the manner specified by the program, but at an unspecified point in the program prior to the first reference to that copy. The storage of all copies of a threadprivate variable is freed according to how static variables are handled in the base language, but at an unspecified point in the program.
274
OpenMP API – Version 5.0 November 2018
1 A program in which a thread references another thread’s copy of a threadprivate variable is
2 non-conforming.
3 The content of a threadprivate variable can change across a task scheduling point if the executing
4 thread switches to another task that modifies the variable. For more details on task scheduling, see
5 Section 1.3 on page 20 and Section 2.10 on page 135.
6 In parallel regions, references by the master thread will be to the copy of the variable in the
7 thread that encountered the parallel region.
8 During a sequential part references will be to the initial thread’s copy of the variable. The values of
9 data in the initial thread’s copy of a threadprivate variable are guaranteed to persist between any
10 two consecutive references to the variable in the program provided that no teams construct that is
11 not nested inside of a target construct is encountered between the references and that the initial
12 thread is not nested inside of a teams region. For initial threads nested inside of a teams region,
13 the values of data in the copies of a threadprivate variable of those initial threads are guaranteed to
14 persist between any two consecutive references to the variable inside of that teams region.
15 The values of data in the threadprivate variables of threads that are not initial threads are
16 guaranteed to persist between two consecutive active parallel regions only if all of the
17 following conditions hold:
18 •
19 •
20 •
21 •
22
23 • 24
25 •
Neither parallel region is nested inside another explicit parallel region;
The number of threads used to execute both parallel regions is the same;
The thread affinity policies used to execute both parallel regions are the same;
The value of the dyn-var internal control variable in the enclosing task region is false at entry to both parallel regions; and
No teams construct that is not nested inside of a target construct is encountered between both parallel regions.
Neither the omp_pause_resource nor omp_pause_resource_all routine is called.
26 If these conditions all hold, and if a threadprivate variable is referenced in both regions, then
27 threads with the same thread number in their respective regions will reference the same copy of that
28 variable.
C / C++
29 If the above conditions hold, the storage duration, lifetime, and value of a thread’s copy of a
30 threadprivate variable that does not appear in any copyin clause on the second region will be
31 retained. Otherwise, the storage duration, lifetime, and value of a thread’s copy of the variable in
32 the second region is unspecified.
C / C++
CHAPTER2. DIRECTIVES 275
1 2 3 4 5 6 7
8
9 10 11 12
13 14
15
16 17
18 19
20 21
22
23
24 25 26
Fortran
If the above conditions hold, the definition, association, or allocation status of a thread’s copy of a threadprivate variable or a variable in a threadprivate common block that is not affected by any copyin clause that appears on the second region (a variable is affected by a copyin clause if the variable appears in the copyin clause or it is in a common block that appears in the copyin clause) will be retained. Otherwise, the definition and association status of a thread’s copy of the variable in the second region are undefined, and the allocation status of an allocatable variable will be implementation defined.
If a threadprivate variable or a variable in a threadprivate common block is not affected by any copyin clause that appears on the first parallel region in which it is referenced, the thread’s copy of the variable inherits the declared type parameter and the default parameter values from the original variable. The variable or any subobject of the variable is initially defined or undefined according to the following rules:
• If it has the ALLOCATABLE attribute, each copy created will have an initial allocation status of unallocated;
• If it has the POINTER attribute:
– If it has an initial association status of disassociated, either through explicit initialization or
default initialization, each copy created will have an association status of disassociated; – Otherwise, each copy created will have an association status of undefined.
• If it does not have either the POINTER or the ALLOCATABLE attribute:
– If it is initially defined, either through explicit initialization or default initialization, each copy
created is so defined;
– Otherwise, each copy created is undefined. Fortran
C / C++
The address of a threadprivate variable is not an address constant.
C / C++
C++
The order in which any constructors for different threadprivate variables of class type are called is
unspecified. The order in which any destructors for different threadprivate variables of class type are called is unspecified.
C++
276
OpenMP API – Version 5.0 November 2018
1 Restrictions
2 The restrictions to the threadprivate directive are as follows:
3 • 4
5 •
6 •
7 8
9 • 10
11 • 12
13 • 14
15
16 • 17
18
19 • 20
21
22 • 23
24
25 • 26
27 • 28
29
30 •
31 •
32 33
A threadprivate variable must not appear in any clause except the copyin, copyprivate, schedule, num_threads, thread_limit, and if clauses.
A program in which an untied task accesses threadprivate storage is non-conforming.
C / C++
If the value of a variable referenced in an explicit initializer of a threadprivate variable is modified prior to the first reference to any instance of the threadprivate variable, then the behavior is unspecified.
A variable that is part of another variable (as an array or structure element) cannot appear in a threadprivate clause unless it is a static data member of a C++ class.
A threadprivate directive for file-scope variables must appear outside any definition or declaration, and must lexically precede all references to any of the variables in its list.
A threadprivate directive for namespace-scope variables must appear outside any definition or declaration other than the namespace definition itself, and must lexically precede all references to any of the variables in its list.
Each variable in the list of a threadprivate directive at file, namespace, or class scope must refer to a variable declaration at file, namespace, or class scope that lexically precedes the directive.
A threadprivate directive for static block-scope variables must appear in the scope of the variable and not in a nested scope. The directive must lexically precede all references to any of the variables in its list.
Each variable in the list of a threadprivate directive in block scope must refer to a variable declaration in the same scope that lexically precedes the directive. The variable declaration must use the static storage-class specifier.
If a variable is specified in a threadprivate directive in one translation unit, it must be specified in a threadprivate directive in every translation unit in which it is declared.
C / C++ C++
A threadprivate directive for static class member variables must appear in the class definition, in the same scope in which the member variables are declared, and must lexically precede all references to any of the variables in its list.
A threadprivate variable must not have an incomplete type or a reference type.
A threadprivate variable with class type must have:
– An accessible, unambiguous default constructor in the case of default initialization without a given initializer;
CHAPTER2. DIRECTIVES 277
1 2
3 4
5 6
7 8
9 10 11 12
13 14 15
16
17 18
19 20
21 22 23 24
• • •
•
• •
•
– An accessible, unambiguous constructor that accepts the given argument in the case of direct initialization; and
– An accessible, unambiguous copy constructor in the case of copy initialization with an explicit initializer.
C++ Fortran
A variable that is part of another variable (as an array, structure element or type parameter inquiry) cannot appear in a threadprivate clause.
The threadprivate directive must appear in the declaration section of a scoping unit in which the common block or variable is declared.
If a threadprivate directive that specifies a common block name appears in one program unit, then such a directive must also appear in every other program unit that contains a COMMON statement that specifies the same name. It must appear after the last such COMMON statement in the program unit.
If a threadprivate variable or a threadprivate common block is declared with the BIND attribute, the corresponding C entities must also be specified in a threadprivate directive in the C program.
A blank common block cannot appear in a threadprivate directive.
A variable can only appear in a threadprivate directive in the scope in which it is declared.
It must not be an element of a common block or appear in an EQUIVALENCE statement.
A variable that appears in a threadprivate directive must be declared in the scope of a
module or have the SAVE attribute, either explicitly or implicitly. Fortran
278
OpenMP API – Version 5.0 November 2018
Cross References
• dyn-var ICV, see Section 2.5 on page 63.
• Number of threads used to execute a parallel region, see Section 2.6.1 on page 78. • copyin clause, see Section 2.19.6.1 on page 310.
2 3 4 5 6 7 8 9
10 11 12 13
14 15 16 17
18 19 20 21
22
23 24
25 26 27 28 29
30 31
For any construct, a list item that appears in a data-sharing attribute clause, including a reduction clause, may be privatized. Each task that references a privatized list item in any statement in the construct receives at least one new list item if the construct has one or more associated loops, and otherwise each such task receives one new list item. Each SIMD lane used in a simd construct that references a privatized list item in any statement in the construct receives at least one new list item. Language-specific attributes for new list items are derived from the corresponding original list item. Inside the construct, all references to the original list item are replaced by references to a new list item received by the task or SIMD lane.
If the construct has one or more associated loops, within the same logical iteration of the loop(s) the same new list item replaces all references to the original list item. For any two logical iterations, if the references to the original list item are replaced by the same list item then the logical iterations must execute in some sequential order.
In the rest of the region, it is unspecified whether references are to a new list item or the original list item. Therefore, if an attempt is made to reference the original item, its value after the region is also unspecified. If a task or a SIMD lane does not reference a privatized list item, it is unspecified whether the task or SIMD lane receives a new list item.
The value and/or allocation status of the original list item will change only: • If accessed and modified via pointer;
• If possibly accessed in the region but outside of the construct;
• As a side effect of directives or clauses; or
Fortran
• If accessed and modified via construct association. Fortran
C++
If the construct is contained in a member function, it is unspecified anywhere in the region if accesses through the implicit this pointer refer to the new list item or the original list item.
C++ C / C++
A new list item of the same type, with automatic storage duration, is allocated for the construct. The storage and thus lifetime of these list items last until the block in which they are created exits. The size and alignment of the new list item are determined by the type of the variable. This allocation occurs once for each task generated by the construct and once for each SIMD lane used by the construct.
The new list item is initialized, or has an undefined initial value, as if it had been locally declared without an initializer.
C / C++
1 2.19.3
List Item Privatization
CHAPTER2. DIRECTIVES 279
1 2
3 4 5
6 7 8 9
10
11 12
13 14
15 16
17 18 19 20
21
22 23
24 25
26 27 28
29 30 31
C++
If the type of a list item is a reference to a type T then the type will be considered to be T for all purposes of this clause.
The order in which any default constructors for different private variables of class type are called is unspecified. The order in which any destructors for different private variables of class type are called is unspecified.
C++ Fortran
If any statement of the construct references a list item, a new list item of the same type and type parameters is allocated. This allocation occurs once for each task generated by the construct and once for each SIMD lane used by the construct. The initial value of the new list item is undefined. The initial status of a private pointer is undefined.
For a list item or the subobject of a list item with the ALLOCATABLE attribute:
• If the allocation status is unallocated, the new list item or the subobject of the new list item will
have an initial allocation status of unallocated;
• If the allocation status is allocated, the new list item or the subobject of the new list item will have an initial allocation status of allocated; and
• If the new list item or the subobject of the new list item is an array, its bounds will be the same as those of the original list item or the subobject of the original list item.
A privatized list item may be storage-associated with other variables when the data-sharing attribute clause is encountered. Storage association may exist because of constructs such as EQUIVALENCE or COMMON. If A is a variable that is privatized by a construct and B is a variable that is storage-associated with A, then:
• •
•
The contents, allocation, and association status of B are undefined on entry to the region;
Any definition of A, or of its allocation or association status, causes the contents, allocation, and
association status of B to become undefined; and
Any definition of B, or of its allocation or association status, causes the contents, allocation, and
association status of A to become undefined.
A privatized list item clause may be a selector of an ASSOCIATE or SELECT TYPE construct. If the construct association is established prior to a parallel region, the association between the associate name and the original list item will be retained in the region.
Finalization of a list item of a finalizable type or subobjects of a list item of a finalizable type occurs at the end of the region. The order in which any final subroutines for different variables of a finalizable type are called is unspecified.
Fortran
280
OpenMP API – Version 5.0 November 2018
1 If a list item appears in both firstprivate and lastprivate clauses, the update required
2 for the lastprivate clause occurs after all initializations for the firstprivate clause.
3 Restrictions
4 The following restrictions apply to any list item that is privatized unless otherwise stated for a given
5 data-sharing attribute clause:
6 •
7 •
8 9
10
11 • 12
13 • 14
15 • 16
17 •
18 •
19
20 • 21
22 • 23
24 •
C
A variable that is part of another variable (as an array or structure element) cannot be privatized.
C C++
A variable that is part of another variable (as an array or structure element) cannot be privatized except if the data-sharing attribute clause is associated with a construct within a class non-static member function and the variable is an accessible data member of the object for which the non-static member function is invoked.
A variable of class type (or array thereof) that is privatized requires an accessible, unambiguous default constructor for the class type.
C++ C / C++
A variable that is privatized must not have a const-qualified type unless it is of class type with a mutable member. This restriction does not apply to the firstprivate clause.
A variable that is privatized must not have an incomplete type or be a reference to an incomplete type.
C / C++ Fortran
A variable that is part of another variable (as an array or structure element) cannot be privatized. A variable that is privatized must either be definable, or an allocatable variable. This restriction
does not apply to the firstprivate clause.
Variables that appear in namelist statements, in variable format expressions, and in expressions
for statement function definitions, may not be privatized.
Pointers with the INTENT(IN) attribute may not be privatized. This restriction does not apply
to the firstprivate clause.
Assumed-size arrays may not be privatized in a target, teams, or distribute construct. Fortran
CHAPTER2. DIRECTIVES 281
2 3 4
5 6 7 8 9
10
11 12 13
14
15
16
17
18
19
Several constructs accept clauses that allow a user to control the data-sharing attributes of variables referenced in the construct. Not all of the clauses listed in this section are valid on all directives. The set of clauses that is valid on a particular directive is described with the directive.
Most of the clauses accept a comma-separated list of list items (see Section 2.1 on page 38). All list items that appear in a clause must be visible, according to the scoping rules of the base language. With the exception of the default clause, clauses may be repeated as needed. A list item may not appear in more than one clause on the same directive, except that it may be specified in both firstprivate and lastprivate clauses.
The reduction data-sharing attribute clauses are explained in Section 2.19.5 on page 293.
C++
If a variable referenced in a data-sharing attribute clause has a type derived from a template, and the program does not otherwise reference that variable then any behavior related to that variable is unspecified.
C++ Fortran
When a named common block appears in a private, firstprivate, lastprivate, or shared clause of a directive, none of its members may be declared in another data-sharing attribute clause in that directive. When individual members of a common block appear in a private, firstprivate, lastprivate, reduction, or linear clause of a directive, the storage of the specified variables is no longer Fortran associated with the storage of the common block itself.
1 2.19.4
Data-Sharing Attribute Clauses
Fortran
20 2.19.4.1 default Clause
21
22 23 24
Summary
282
OpenMP API – Version 5.0 November 2018
The default clause explicitly determines the data-sharing attributes of variables that are referenced in a parallel, teams, or task generating construct and would otherwise be implicitly determined (see Section 2.19.1.1 on page 270).
1 Syntax
2 The syntax of the default clause is as follows:
3 default(shared | none)
C / C++ Fortran
4 The syntax of the default clause is as follows:
5 default(private | firstprivate | shared | none)
Fortran
6 Description
7 The default(shared) clause causes all variables referenced in the construct that have
8 implicitly determined data-sharing attributes to be shared.
Fortran
9 The default(firstprivate) clause causes all variables in the construct that have implicitly
10 determined data-sharing attributes to be firstprivate.
11 The default(private) clause causes all variables referenced in the construct that have
12 implicitly determined data-sharing attributes to be private.
Fortran
13 The default(none) clause requires that each variable that is referenced in the construct, and
14 that does not have a predetermined data-sharing attribute, must have its data-sharing attribute
15 explicitly determined by being listed in a data-sharing attribute clause.
16 Restrictions
17 The restrictions to the default clause are as follows:
18 • Only a single default clause may be specified on a parallel, task, taskloop or
19 teams directive.
20 2.19.4.2 shared Clause
21 Summary
22 The shared clause declares one or more list items to be shared by tasks generated by a
23 parallel, teams, or task generating construct.
C / C++
CHAPTER2. DIRECTIVES 283
2 3
4
5 6
7 8 9
10 11 12
13
14
15
16
17
18
19
20
21
22
23
24
25
26 27
28 29
1
Syntax
The syntax of the shared clause is as follows: shared(list)
Description
All references to a list item within a task refer to the storage area of the original variable at the point the directive was encountered.
The programmer must ensure, by adding proper synchronization, that storage shared by an explicit task region does not reach the end of its lifetime before the explicit task region completes its execution.
Fortran
The association status of a shared pointer becomes undefined upon entry to and exit from the parallel, teams, or task generating construct if it is associated with a target or a subobject of a target that appears as a privatized list item in a data-sharing attribute clause on the construct.
Note – Passing a shared variable to a procedure may result in the use of temporary storage in place of the actual argument when the corresponding dummy argument does not have the VALUE or CONTIGUOUS attribute and its data-sharing attribute is implementation-defined as per the rules in Section 2.19.1.2 on page 273. These conditions effectively result in references to, and definitions of, the temporary storage during the procedure reference. Furthermore, the value of the shared variable is copied into the intervening temporary storage before the procedure reference when the dummy argument does not have the INTENT(OUT) attribute, and is copied out of the temporary storage into the shared variable when the dummy argument does not have the INTENT(IN) attribute. Any references to (or definitions of) the shared storage that is associated with the dummy argument by any other task must be synchronized with the procedure reference to avoid possible data races.
Fortran
Restrictions
The restrictions for the shared clause are as follows:
C
• A variable that is part of another variable (as an array or structure element) cannot appear in a shared clause.
C
284
OpenMP API – Version 5.0 November 2018
C++
1 • A variable that is part of another variable (as an array or structure element) cannot appear in a
2 3 4
5 • 6
shared clause except if the shared clause is associated with a construct within a class non-static member function and the variable is an accessible data member of the object for which the non-static member function is invoked.
C++ Fortran
A variable that is part of another variable (as an array, structure element or type parameter inquiry) cannot appear in a shared clause.
Fortran
7 2.19.4.3 private Clause
8 Summary
9 The private clause declares one or more list items to be private to a task or to a SIMD lane.
10 Syntax
11 The syntax of the private clause is as follows:
12 private(list)
13 Description
14 The private clause specifies that its list items are to be privatized according to Section 2.19.3 on
15 page 279. Each task or SIMD lane that references a list item in the construct receives only one new
16 list item, unless the construct has one or more associated loops and the order(concurrent)
17 clause is also present.
18 List items that appear in a private, firstprivate, or reduction clause in a parallel
19 construct may also appear in a private clause in an enclosed parallel, worksharing, loop,
20 task, taskloop, simd, or target construct.
21 List items that appear in a private or firstprivate clause in a task or taskloop
22 construct may also appear in a private clause in an enclosed parallel, loop, task,
23 taskloop, simd, or target construct.
24 List items that appear in a private, firstprivate, lastprivate, or reduction clause
25 in a worksharing construct may also appear in a private clause in an enclosed parallel,
26 loop, task, simd, or target construct.
27 List items that appear in a private clause on a loop construct may also appear in a private
28 clause in an enclosed loop, parallel, or simd construct.
CHAPTER2. DIRECTIVES 285
6
7 8 9
10
11 12
13
14 15
16 17 18 19 20
21
22
23
24
25
26
27
28 29
Summary
The firstprivate clause declares one or more list items to be private to a task, and initializes each of them with the value that the corresponding original item has when the construct is encountered.
Syntax
The syntax of the firstprivate clause is as follows: firstprivate(list)
Description
The firstprivate clause provides a superset of the functionality provided by the private clause.
A list item that appears in a firstprivate clause is subject to the private clause semantics described in Section 2.19.4.3 on page 285, except as noted. In addition, the new list item is initialized from the original list item existing before the construct. The initialization of the new list item is done once for each task that references the list item in any statement in the construct. The initialization is done prior to the execution of the construct.
For a firstprivate clause on a parallel, task, taskloop, target, or teams construct, the initial value of the new list item is the value of the original list item that exists immediately prior to the construct in the task region where the construct is encountered unless otherwise specified. For a firstprivate clause on a worksharing construct, the initial value of the new list item for each implicit task of the threads that execute the worksharing construct is the value of the original list item that exists in the implicit task immediately prior to the point in time that the worksharing construct is encountered unless otherwise specified.
To avoid data races, concurrent updates of the original list item must be synchronized with the read of the original list item that occurs as a result of the firstprivate clause.
1 Restrictions
2 The restrictions to the private clause are as specified in Section 2.19.3.
3 Cross References
4 • List Item Privatization, see Section 2.19.3 on page 279.
5 2.19.4.4 firstprivate Clause
286
OpenMP API – Version 5.0 November 2018
C / C++
1 For variables of non-array type, the initialization occurs by copy assignment. For an array of
2 elements of non-array type, each element is initialized as if by assignment from an element of the
3 original array to the corresponding element of the new array.
C / C++ C++
4 For each variable of class type:
5 • 6
7 • 8
If the firstprivate clause is not on a target construct then a copy constructor is invoked to perform the initialization; and
If the firstprivate clause is on a target construct then it is unspecified how many copy constructors, if any, are invoked.
9 If copy constructors are called, the order in which copy constructors for different variables of class
10 type are called is unspecified.
C++ Fortran
11 If the original list item does not have the POINTER attribute, initialization of the new list items
12 occurs as if by intrinsic assignment unless the list item has a type bound procedure as a defined
13 assignment. If the original list item that does not have the POINTER attribute has the allocation
14 status of unallocated, the new list items will have the same status.
15 If the original list item has the POINTER attribute, the new list items receive the same association
16 status of the original list item as if by pointer assignment.
Fortran
17 Restrictions
18 The restrictions to the firstprivate clause are as follows:
19 • 20
21
22 • 23
24
25
26 • 27
28
29
A list item that is private within a parallel region must not appear in a firstprivate clause on a worksharing construct if any of the worksharing regions arising from the worksharing construct ever bind to any of the parallel regions arising from the parallel construct.
A list item that is private within a teams region must not appear in a firstprivate clause on a distribute construct if any of the distribute regions arising from the distribute construct ever bind to any of the teams regions arising from the teams construct.
A list item that appears in a reduction clause of a parallel construct must not appear in a firstprivate clause on a worksharing, task, or taskloop construct if any of the worksharing or task regions arising from the worksharing, task, or taskloop construct ever bind to any of the parallel regions arising from the parallel construct.
CHAPTER2. DIRECTIVES 287
1 • 2
3
4
5 • 6
7
8 • 9
10 • 11
12 • 13
A list item that appears in a reduction clause of a teams construct must not appear in a firstprivate clause on a distribute construct if any of the distribute regions arising from the distribute construct ever bind to any of the teams regions arising from the teams construct.
A list item that appears in a reduction clause of a worksharing construct must not appear in a firstprivate clause in a task construct encountered during execution of any of the worksharing regions arising from the worksharing construct.
C++
A variable of class type (or array thereof) that appears in a firstprivate clause requires an accessible, unambiguous copy constructor for the class type.
C++ C / C++
If a list item in a firstprivate clause on a worksharing construct has a reference type then it must bind to the same object for all threads of the team.
C / C++ Fortran
If the list item is a polymorphic variable with the ALLOCATABLE attribute, the behavior is unspecified.
Fortran
14 2.19.4.5 lastprivate Clause
15
16 17
18
19 20
21 22
Summary
The lastprivate clause declares one or more list items to be private to an implicit task or to a SIMD lane, and causes the corresponding original list item to be updated after the end of the region.
Syntax
The syntax of the lastprivate clause is as follows: lastprivate([ lastprivate-modifier:] list)
where lastprivate-modifier is: conditional
288
OpenMP API – Version 5.0 November 2018
1 Description
2 The lastprivate clause provides a superset of the functionality provided by the private
3 clause.
4 A list item that appears in a lastprivate clause is subject to the private clause semantics
5 described in Section 2.19.4.3 on page 285. In addition, when a lastprivate clause without the
6 conditional modifier appears on a directive, the value of each new list item from the
7 sequentially last iteration of the associated loops, or the lexically last section construct, is
8 assigned to the original list item. When the conditional modifier appears on the clause, if an
9 assignment to a list item is encountered in the construct then the original list item is assigned the
10 value that is assigned to the new list item in the sequentially last iteration or lexically last section in
11 which such an assignment is encountered.
C / C++
12 For an array of elements of non-array type, each element is assigned to the corresponding element
13 of the original array.
C / C++ Fortran
14 If the original list item does not have the POINTER attribute, its update occurs as if by intrinsic
15 assignment unless it has a type bound procedure as a defined assignment.
16 If the original list item has the POINTER attribute, its update occurs as if by pointer assignment. Fortran
17 When the conditional modifier does not appear on the lastprivate clause, list items that
18 are not assigned a value by the sequentially last iteration of the loops, or by the lexically last
19 section construct, have unspecified values after the construct. Unassigned subcomponents also
20 have unspecified values after the construct.
21 If the lastprivate clause is used on a construct to which neither the nowait nor the
22 nogroup clauses are applied, the original list item becomes defined at the end of the construct. To
23 avoid data races, concurrent reads or updates of the original list item must be synchronized with the
24 update of the original list item that occurs as a result of the lastprivate clause.
25 Otherwise, If the lastprivate clause is used on a construct to which the nowait or the
26 nogroup clauses are applied, accesses to the original list item may create a data race. To avoid
27 this data race, if an assignment to the original list item occurs then synchronization must be inserted
28 to ensure that the assignment completes and the original list item is flushed to memory.
29 If a list item that appears in a lastprivate clause with the conditional modifier is
30 modified in the region by an assignment outside the construct or not to the list item then the value
31 assigned to the original list item is unspecified.
CHAPTER2. DIRECTIVES 289
1 Restrictions
2 The restrictions to the lastprivate clause are as follows:
3 • 4
5
6
7 • 8
9 • 10
11
12 • 13
14
15 • 16
17 •
18 •
19 20
21 • 22
A list item that is private within a parallel region, or that appears in the reduction clause of a parallel construct, must not appear in a lastprivate clause on a worksharing construct if any of the corresponding worksharing regions ever binds to any of the corresponding parallel regions.
A list item that appears in a lastprivate clause with the conditional modifier must be a scalar variable.
C++
A variable of class type (or array thereof) that appears in a lastprivate clause requires an accessible, unambiguous default constructor for the class type, unless the list item is also specified in a firstprivate clause.
A variable of class type (or array thereof) that appears in a lastprivate clause requires an accessible, unambiguous copy assignment operator for the class type. The order in which copy assignment operators for different variables of class type are called is unspecified.
C++ C / C++
If a list item in a lastprivate clause on a worksharing construct has a reference type then it must bind to the same object for all threads of the team.
C / C++ Fortran
A variable that appears in a lastprivate clause must be definable.
If the original list item has the ALLOCATABLE attribute, the corresponding list item whose value is assigned to the original list item must have an allocation status of allocated upon exit from the sequentially last iteration or lexically last section construct.
If the list item is a polymorphic variable with the ALLOCATABLE attribute, the behavior is unspecified.
23 2.19.4.6 linear Clause
24
25 26 27
Summary
290
OpenMP API – Version 5.0 November 2018
Fortran
The linear clause declares one or more list items to be private and to have a linear relationship with respect to the iteration space of a loop associated with the construct on which the clause appears.
1
Syntax
C
2 3
4
5 6
7 8
9 10
11
12 13
14
15 16 17
18 19
20
21 22
The syntax of the linear clause is as follows: linear(linear-list[ : linear-step])
where linear-list is one of the following
where modifier is one of the following: val
list
modifier(list)
C C++
The syntax of the linear clause is as follows: linear(linear-list[ : linear-step])
where linear-list is one of the following
where modifier is one of the following:
list
modifier(list)
ref val
uval
C++ Fortran
The syntax of the linear clause is as follows: linear(linear-list[ : linear-step])
where linear-list is one of the following
list modifier(list)
CHAPTER2. DIRECTIVES 291
1
where modifier is one of the following:
ref val
uval
2 3 4
5
6 7 8
9 10 11 12
13
14
15
16
17
18
19
20
21
22
23 24
25 26 27
28
Fortran
Description
The linear clause provides a superset of the functionality provided by the private clause. A list item that appears in a linear clause is subject to the private clause semantics described in Section 2.19.4.3 on page 285 except as noted. If linear-step is not specified, it is assumed to be 1.
When a linear clause is specified on a construct, the value of the new list item on each iteration of the associated loop(s) corresponds to the value of the original list item before entering the construct plus the logical number of the iteration times linear-step. The value corresponding to the sequentially last iteration of the associated loop(s) is assigned to the original list item.
When a linear clause is specified on a declarative directive, all list items must be formal parameters (or, in Fortran, dummy arguments) of a function that will be invoked concurrently on each SIMD lane. If no modifier is specified or the val or uval modifier is specified, the value of each list item on each lane corresponds to the value of the list item upon entry to the function plus the logical number of the lane times linear-step. If the uval modifier is specified, each invocation uses the same storage location for each SIMD lane; this storage location is updated with the final value of the logically last lane. If the ref modifier is specified, the storage location of each list item on each lane corresponds to an array at the storage location upon entry to the function indexed by the logical number of the lane times linear-step.
Restrictions
• The linear-step expression must be invariant during the execution of the region that corresponds to the construct. Otherwise, the execution results in unspecified behavior.
• Only a loop iteration variable of a loop that is associated with the construct may appear as a list-item in a linear clause if a reduction clause with the inscan modifier also appears on the construct.
C
• A list-item that appears in a linear clause must be of integral or pointer type. C
292
OpenMP API – Version 5.0 November 2018
1 2
3
4 5
6 7
8 9
10 11
12 13
14 15 16
17 18
19 20 21
22
23 2.19.5
24 25 26 27 28
C++
• A list-item that appears in a linear clause without the ref modifier must be of integral or pointer type, or must be a reference to an integral or pointer type.
• The ref or uval modifier can only be used if the list-item is of a reference type.
• If a list item in a linear clause on a worksharing construct has a reference type then it must
bind to the same object for all threads of the team.
• If the list item is of a reference type and the ref modifier is not specified and if any write to the list item occurs before any read of the list item then the result is unspecified.
C++ Fortran
• A list-item that appears in a linear clause without the ref modifier must be of type integer.
• The ref or uval modifier can only be used if the list-item is a dummy argument without the VALUE attribute.
• Variables that have the POINTER attribute and Cray pointers may not appear in a linear clause.
• If the list item has the ALLOCATABLE attribute and the ref modifier is not specified, the allocation status of the list item in the sequentially last iteration must be allocated upon exit from that iteration.
• If the ref modifier is specified, variables with the ALLOCATABLE attribute, assumed-shape arrays and polymorphic variables may not appear in the linear clause.
• If the list item is a dummy argument without the VALUE attribute and the ref modifier is not specified and if any write to the list item occurs before any read of the list item then the result is unspecified.
• A common block name cannot appear in a linear clause. Fortran
Reduction Clauses and Directives
The reduction clauses are data-sharing attribute clauses that can be used to perform some forms of recurrence calculations in parallel. Reduction clauses include reduction scoping clauses and reduction participating clauses. Reduction scoping clauses define the region in which a reduction is computed. Reduction participating clauses define the participants in the reduction. Reduction clauses specify a reduction-identifier and one or more list items.
CHAPTER2. DIRECTIVES 293
2 3
4 5
6 7
8
9 10
11 12 13
Syntax
The syntax of a reduction-identifier is defined as follows:
C
A reduction-identifier is either an identifier or one of the following operators: +, -, *, &, |, ^, && and ||.
C
C++
A reduction-identifier is either an id-expression or one of the following operators: +, -, *, &, |, ^,
&& and ||.
C++ Fortran
A reduction-identifier is either a base language identifier, or a user-defined operator, or one of the following operators: +, -, *, .and., .or., .eqv., .neqv., or one of the following intrinsic procedure names: max, min, iand, ior, ieor.
Fortran C / C++
Table 2.11 lists each reduction-identifier that is implicitly declared at every scope for arithmetic types and its semantic initializer value. The actual initializer value is that value as expressed in the data type of the reduction list item.
TABLE 2.11: Implicitly Declared C/C++ reduction-identifiers
1 2.19.5.1 Properties Common To All Reduction Clauses
Identifier Initializer
+ omp_priv = 0
– omp_priv = 0
* omp_priv = 1
& omp_priv = ~ 0 | omp_priv = 0
^ omp_priv = 0 && omp_priv = 1 table continued on next page
Combiner
omp_out += omp_in
omp_out += omp_in
omp_out *= omp_in
omp_out &= omp_in
omp_out |= omp_in
omp_out ^= omp_in
omp_out = omp_in && omp_out
294
OpenMP API – Version 5.0 November 2018
table continued from previous page
Identifier Initializer
|| omp_priv = 0 max omp_priv = Least
representable number in the reduction list item type
min omp_priv = Largest representable number in the
reduction list item type
Combiner
omp_out =
omp_out =
omp_in :
omp_out =
omp_in :
C / C++ Fortran
omp_in || omp_out
omp_in > omp_out ?
omp_out
omp_in < omp_out ?
omp_out
1 Table 2.12 lists each reduction-identifier that is implicitly declared for numeric and logical types
2 and its semantic initializer value. The actual initializer value is that value as expressed in the data
3 type of the reduction list item.
TABLE 2.12: Implicitly Declared Fortran reduction-identifiers
Identifier Initializer
+ omp_priv = 0
- omp_priv = 0
* omp_priv = 1
.and. omp_priv = .true. .or. omp_priv = .false. .eqv. omp_priv = .true. .neqv. omp_priv = .false.
max omp_priv = Least representable number in the
reduction list item type
min omp_priv = Largest representable number in the
reduction list item type table continued on next page
Combiner
omp_out = omp_in + omp_out
omp_out = omp_in + omp_out
omp_out = omp_in * omp_out
omp_out = omp_in .and. omp_out
omp_out = omp_in .or. omp_out
omp_out = omp_in .eqv. omp_out
omp_out = omp_in .neqv. omp_out
omp_out = max(omp_in, omp_out)
omp_out = min(omp_in, omp_out)
CHAPTER2. DIRECTIVES 295
table continued from previous page
Identifier Initializer
iand omp_priv = Allbitson ior omp_priv = 0
ieor omp_priv = 0
Combiner
omp_out = iand(omp_in, omp_out)
omp_out = ior(omp_in, omp_out)
omp_out = ieor(omp_in, omp_out)
Fortran
1 2
3 4 5
6 7
8
9 10
11
12 13
14 15
16
17 18
19 20
21 22
In the above tables, omp_in and omp_out correspond to two identifiers that refer to storage of the type of the list item. omp_out holds the final value of the combiner operation.
Any reduction-identifier that is defined with the declare reduction directive is also valid. In that case, the initializer and combiner of the reduction-identifier are specified by the initializer-clause and the combiner in the declare reduction directive.
Description
A reduction clause specifies a reduction-identifier and one or more list items.
The reduction-identifier specified in a reduction clause must match a previously declared reduction-identifier of the same name and type for each of the list items. This match is done by means of a name lookup in the base language.
The list items that appear in a reduction clause may include array sections.
C++
If the type is a derived class, then any reduction-identifier that matches its base classes is also a match, if there is no specific match for the type.
If the reduction-identifier is not an id-expression, then it is implicitly converted to one by prepending the keyword operator (for example, + becomes operator+).
If the reduction-identifier is qualified then a qualified name lookup is used to find the declaration. If the reduction-identifier is unqualified then an argument-dependent name lookup must be
performed using the type of each list item.
C++
If the list item is an array or array section, it will be treated as if a reduction clause would be applied to each separate element of the array section.
If the list item is an array section, the elements of any copy of the array section will be allocated contiguously.
296
OpenMP API – Version 5.0 November 2018
Fortran
1 If the original list item has the POINTER attribute, any copies of the list item are associated with
2 private targets.
Fortran
3 Any copies associated with the reduction are initialized with the initializer value of the
4 reduction-identifier.
5 Any copies are combined using the combiner associated with the reduction-identifier.
6 Execution Model Events
7 The reduction-begin event occurs before a task begins to perform loads and stores that belong to the
8 implementation of a reduction and the reduction-end event occurs after the task has completed
9 loads and stores associated with the reduction. If a task participates in multiple reductions, each
10 reduction may be bracketed by its own pair of reduction-begin/reduction-end events or multiple
11 reductions may be bracketed by a single pair of events. The interval defined by a pair of
12 reduction-begin/reduction-end events may not contain a task scheduling point.
13 Tool Callbacks
14 A thread dispatches a registered ompt_callback_reduction with
15 ompt_sync_region_reduction in its kind argument and ompt_scope_begin as its
16 endpoint argument for each occurrence of a reduction-begin event in that thread. Similarly, a thread
17 dispatches a registered ompt_callback_reduction with
18 ompt_sync_region_reduction in its kind argument and ompt_scope_end as its
19 endpoint argument for each occurrence of a reduction-end event in that thread. These callbacks
20 occur in the context of the task that performs the reduction and has the type signature
21 ompt_callback_sync_region_t.
22 Restrictions
23 The restrictions common to reduction clauses are as follows:
24 • 25
26 • 27
28 • 29
30 • 31
Any number of reduction clauses can be specified on the directive, but a list item (or any array element in an array section) can appear only once in reduction clauses for that directive.
For a reduction-identifier declared with the declare reduction construct, the directive must appear before its use in a reduction clause.
If a list item is an array section or an array element, its base expression must be a base language identifier.
If a list item is an array section, it must specify contiguous storage and it cannot be a zero-length array section.
CHAPTER2. DIRECTIVES 297
1 2
3 4
5 6 7 8
9 10 11 12 13 14
15 16
17 18
19
20 21
22
23
24
25 26 27 28
•
•
•
•
• •
•
• •
• • • •
If a list item is an array section or an array element, accesses to the elements of the array outside the specified array section or array element result in unspecified behavior.
C
A variable that is part of another variable, with the exception of array elements, cannot appear in a reduction clause.
C C++
A variable that is part of another variable, with the exception of array elements, cannot appear in a reduction clause except if the reduction clause is associated with a construct within a class non-static member function and the variable is an accessible data member of the object for which the non-static member function is invoked.
C++ C / C++
The type of a list item that appears in a reduction clause must be valid for the reduction-identifier. For a max or min reduction in C, the type of the list item must be an allowed arithmetic data type: char, int, float, double, or _Bool, possibly modified with long, short, signed, or unsigned. For a max or min reduction in C++, the type of the list item must be an allowed arithmetic data type: char, wchar_t, int, float, double, or bool, possibly modified with long, short, signed, or unsigned.
A list item that appears in a reduction clause must not be const-qualified.
The reduction-identifier for any list item must be unambiguous and accessible. C / C++
Fortran
A variable that is part of another variable, with the exception of array elements, cannot appear in a reduction clause.
A type parameter inquiry cannot appear in a reduction clause.
The type, type parameters and rank of a list item that appears in a reduction clause must be valid
for the combiner and initializer.
A list item that appears in a reduction clause must be definable.
A procedure pointer may not appear in a reduction clause.
A pointer with the INTENT(IN) attribute may not appear in the reduction clause.
An original list item with the POINTER attribute or any pointer component of an original list item that is referenced in the combiner must be associated at entry to the construct that contains the reduction clause. Additionally, the list item or the pointer component of the list item must not be deallocated, allocated, or pointer assigned within the region.
298
OpenMP API – Version 5.0 November 2018
1 • 2
3
4
5
6 • 7
8
9 • 10
11 • 12
13
14
An original list item with the ALLOCATABLE attribute or any allocatable component of an original list item that corresponds to the special variable identifier in the combiner or the initializer must be in the allocated state at entry to the construct that contains the reduction clause. Additionally, the list item or the allocatable component of the list item must be neither deallocated nor allocated, explicitly or implicitly, within the region.
If the reduction-identifier is defined in a declare reduction directive, the
declare reduction directive must be in the same subprogram, or accessible by host or use association.
If the reduction-identifier is a user-defined operator, the same explicit interface for that operator must be accessible as at the declare reduction directive.
If the reduction-identifier is defined in a declare reduction directive, any subroutine or function referenced in the initializer clause or combiner expression must be an intrinsic function, or must have an explicit interface where the same explicit interface is accessible as at the declare reduction directive.
Fortran
15 Cross References
16 • ompt_scope_begin and ompt_scope_end, see Section 4.4.4.11 on page 443.
17 • ompt_sync_region_reduction, see Section 4.4.4.13 on page 444.
18 • ompt_callback_sync_region_t, see Section 4.5.2.13 on page 474.
19 2.19.5.2 Reduction Scoping Clauses
20 Reduction scoping clauses define the region in which a reduction is computed by tasks or SIMD
21 lanes. All properties common to all reduction clauses, which are defined in Section 2.19.5.1 on
22 page 294, apply to reduction scoping clauses.
23 The number of copies created for each list item and the time at which those copies are initialized
24 are determined by the particular reduction scoping clause that appears on the construct.
25 The time at which the original list item contains the result of the reduction is determined by the
26 particular reduction scoping clause.
27 The location in the OpenMP program at which values are combined and the order in which values
28 are combined are unspecified. Therefore, when comparing sequential and parallel runs, or when
29 comparing one parallel run to another (even if the number of threads used is the same), there is no
30 guarantee that bitwise-identical results will be obtained or that side effects (such as floating-point
31 exceptions) will be identical or take place at the same location in the OpenMP program.
32 To avoid data races, concurrent reads or updates of the original list item must be synchronized with
33 the update of the original list item that occurs as a result of the reduction computation.
CHAPTER2. DIRECTIVES 299
10
11 12 13 14 15
16 17
18 19
20 21 22
23
24 25
26
27
28
29
30
31
Summary
The reduction clause specifies a reduction-identifier and one or more list items. For each list item, a private copy is created in each implicit task or SIMD lane and is initialized with the initializer value of the reduction-identifier. After the end of the region, the original list item is updated with the values of the private copies using the combiner associated with the reduction-identifier.
Syntax
reduction([ reduction-modifier,]reduction-identifier : list)
Where reduction-identifier is defined in Section 2.19.5.1 on page 294, and reduction-modifier is
one of the following:
Description
The reduction clause is a reduction scoping clause and a reduction participating clause, as described in Section 2.19.5.2 on page 299 and Section 2.19.5.3 on page 300.
If reduction-modifier is not present or the default reduction-modifier is present, the behavior is as follows. For parallel and worksharing constructs, one or more private copies of each list item are created for each implicit task, as if the private clause had been used. For the simd construct, one or more private copies of each list item are created for each SIMD lane, as if the private clause had been used. For the taskloop construct, private copies are created according to the rules of the reduction scoping clauses. For the teams construct, one or more
1 2.19.5.3 Reduction Participating Clauses
2 A reduction participating clause specifies a task or a SIMD lane as a participant in a reduction
3 defined by a reduction scoping clause. All properties common to all reduction clauses, which are
4 defined in Section 2.19.5.1 on page 294, apply to reduction participating clauses.
5 Accesses to the original list item may be replaced by accesses to copies of the original list item
6 created by a region that corresponds to a construct with a reduction scoping clause.
7 In any case, the final value of the reduction must be determined as if all tasks or SIMD lanes that
8 participate in the reduction are executed sequentially in some arbitrary order.
9 2.19.5.4 reduction Clause
inscan
task
default
300
OpenMP API – Version 5.0 November 2018
1 private copies of each list item are created for the initial task of each team in the league, as if the
2 private clause had been used. For the loop construct, private copies are created and used in the
3 construct according to the description and restrictions in Section 2.19.3 on page 279. At the end of
4 a region that corresponds to a construct for which the reduction clause was specified, the
5 original list item is updated by combining its original value with the final value of each of the
6 private copies, using the combiner of the specified reduction-identifier.
7 If the inscan reduction-modifier is present, a scan computation is performed over updates to the
8 list item performed in each logical iteration of the loop associated with the worksharing-loop,
9 worksharing-loop SIMD, or simd construct (see Section 2.9.6 on page 132). The list items are
10 privatized in the construct according to the description and restrictions in Section 2.19.3 on
11 page 279. At the end of the region, each original list item is assigned the value of the private copy
12 from the last logical iteration of the loops associated with the construct.
13 If the task reduction-modifier is present for a parallel or worksharing construct, then each list
14 item is privatized according to the description and restrictions in Section 2.19.3 on page 279, and
15 an unspecified number of additional private copies are created to support task reductions. Any
16 copies associated with the reduction are initialized before they are accessed by the tasks that
17 participate in the reduction, which include all implicit tasks in the corresponding region and all
18 participating explicit tasks that specify an in_reduction clause (see Section 2.19.5.6 on
19 page 303). After the end of the region, the original list item contains the result of the reduction.
20 If nowait is not specified for the construct, the reduction computation will be complete at the end
21 of the construct; however, if the reduction clause is used on a construct to which nowait is
22 also applied, accesses to the original list item will create a race and, thus, have unspecified effect
23 unless synchronization ensures that they occur after all threads have executed all of their iterations
24 or section constructs, and the reduction computation has completed and stored the computed
25 value of that list item. This can most simply be ensured through a barrier synchronization.
26 Restrictions
27 The restrictions to the reduction clause are as follows:
28 • 29
30 • 31
32 • 33
34
35 • 36
37
All restrictions common to all reduction clauses, which are listed in Section 2.19.5.1 on page 294, apply to this clause.
A list item that appears in a reduction clause of a worksharing construct must be shared in the parallel region to which a corresponding worksharing region binds.
If a list item that appears in a reduction clause of a worksharing construct or loop construct for which the corresponding region binds to a parallel region is an array section or an array element, all threads that participate in the reduction must specify the same storage location.
A list item that appears in a reduction clause with the inscan reduction-modifier must appear as a list item in an inclusive or exclusive clause on a scan directive enclosed by the construct.
CHAPTER2. DIRECTIVES 301
1 2
3 4 5 6
7 8 9
10 11 12
13 14
15 16 17
18 19 20
21 22 23 24
25 26 27 28
29 30 31 32
• •
•
•
•
•
•
•
•
A reduction clause without the inscan reduction-modifier may not appear on a construct on which a reduction clause with the inscan reduction-modifier appears.
A reduction clause with the task reduction-modifier may only appear on a parallel construct, a worksharing construct or a combined or composite construct for which any of the aforementioned constructs is a constituent construct and simd or loop are not constituent constructs.
A reduction clause with the inscan reduction-modifier may only appear on a worksharing-loop construct, a worksharing-loop SIMD construct, a simd construct, a parallel worksharing-loop construct or a parallel worksharing-loop SIMD construct.
A list item that appears in a reduction clause of the innermost enclosing worksharing or parallel construct may not be accessed in an explicit task generated by a construct for which an in_reduction clause over the same list item does not appear.
The task reduction-modifier may not appear in a reduction clause if the nowait clause is specified on the same construct.
C / C++
If a list item in a reduction clause on a worksharing construct or loop construct for which the corresponding region binds to a parallel region has a reference type then it must bind to the same object for all threads of the team.
If a list item in a reduction clause on a worksharing construct or loop construct for which the corresponding region binds to a parallel region is an array section or an array element then the base pointer must point to the same variable for all threads of the team.
A variable of class type (or array thereof) that appears in a reduction clause with the inscan reduction-modifier requires an accessible, unambiguous default constructor for the class type. The number of calls to the default constructor while performing the scan computation is unspecified.
A variable of class type (or array thereof) that appears in a reduction clause with the inscan reduction-modifier requires an accessible, unambiguous copy assignment operator for the class type. The number of calls to the copy assignment operator while performing the scan computation is unspecified.
C / C++
302
OpenMP API – Version 5.0 November 2018
Cross References
• scan directive, see Section 2.9.6 on page 132.
• List Item Privatization, see Section 2.19.3 on page 279. • private clause, see Section 2.19.4.3 on page 285.
1 2.19.5.5 task_reduction Clause
2 Summary
3 The task_reduction clause specifies a reduction among tasks.
4 Syntax
5 task_reduction(reduction-identifier : list)
6 Where reduction-identifier is defined in Section 2.19.5.1.
7 Description
8 The task_reduction clause is a reduction scoping clause, as described in 2.19.5.2.
9 For each list item, the number of copies is unspecified. Any copies associated with the reduction
10 are initialized before they are accessed by the tasks participating in the reduction. After the end of
11 the region, the original list item contains the result of the reduction.
12 Restrictions
13 The restrictions to the task_reduction clause are as follows:
14 • All restrictions common to all reduction clauses, which are listed in Section 2.19.5.1 on
15 page 294, apply to this clause.
16 2.19.5.6 in_reduction Clause
17 Summary
18 The in_reduction clause specifies that a task participates in a reduction.
19 Syntax
20 in_reduction(reduction-identifier : list)
21 where reduction-identifier is defined in Section 2.19.5.1 on page 294.
22 Description
23 The in_reduction clause is a reduction participating clause, as described in Section 2.19.5.3
24 on page 300. For a given a list item, the in_reduction clause defines a task to be a participant
25 in a task reduction that is defined by an enclosing region for a matching list item that appears in a
26 task_reduction clause or a reduction clause with the task modifier, where either:
CHAPTER2. DIRECTIVES 303
1 1. The matching list item has the same storage location as the list item in the in_reduction
2 clause; or
3 2. A private copy, derived from the matching list item, that is used to perform the task reduction
4 has the same storage location as the list item in the in_reduction clause.
5 For the task construct, the generated task becomes the participating task. For each list item, a
6 private copy may be created as if the private clause had been used.
7 For the target construct, the target task becomes the participating task. For each list item, a
8 private copy will be created in the data environment of the target task as if the private clause had
9 been used, and this private copy will be implicitly mapped into the device data environment of the
10 target device.
11 At the end of the task region, if a private copy was created its value is combined with a copy created
12 by a reduction scoping clause or with the original list item.
13 Restrictions
14 The restrictions to the in_reduction clause are as follows:
15 • 16
17 • 18
19
20
21
All restrictions common to all reduction clauses, which are listed in Section 2.19.5.1 on page 294, apply to this clause.
A list item that appears in a task_reduction clause or a reduction clause with the task modifier that is specified on a construct that corresponds to a region in which the region of the participating task is closely nested must match each list item. The construct that corresponds to the innermost enclosing region that meets this condition must specify the same reduction-identifier for the matching list item as the in_reduction clause.
22 2.19.5.7 declare reduction Directive
23
24 25 26
Summary
The following section describes the directive for declaring user-defined reductions. The
declare reduction directive declares a reduction-identifier that can be used in a reduction clause. The declare reduction directive is a declarative directive.
304
OpenMP API – Version 5.0 November 2018
1 Syntax 2
3
4 where:
C
#pragma omp declare reduction(reduction-identifier:typename-list: combiner )[initializer-clause] new-line
5 • 6
7 •
8 •
9 •
10
11 12
13 where:
reduction-identifier is either a base language identifier or one of the following operators: +, -, *, &,|,^,&&and||
typename-list is a list of type names combiner is an expression
initializer-clause is initializer(initializer-expr) where initializer-expr is omp_priv = initializer or function-name(argument-list)
C C++
#pragma omp declare reduction(reduction-identifier:typename-list: combiner) [initializer-clause] new-line
14 • 15
16 •
17 •
18 •
19
20 21
22 where:
reduction-identifier is either an id-expression or one of the following operators: +, -, *, &, |, ^, &&or||
typename-list is a list of type names combiner is an expression
initializer-clause is initializer(initializer-expr) where initializer-expr is omp_priv initializer or function-name(argument-list)
C++ Fortran
!$omp declare reduction(reduction-identifier:type-list:combiner)
[initializer-clause]
23 • 24
25
26 •
27 •
28 •
29
reduction-identifier is either a base language identifier, or a user-defined operator, or one of the following operators: +, -, *, .and., .or., .eqv., .neqv., or one of the following intrinsic procedure names: max, min, iand, ior, ieor.
type-list is a list of type specifiers that must not be CLASS(*) and abstract type
combiner is either an assignment statement or a subroutine name followed by an argument list
initializer-clause is initializer(initializer-expr), where initializer-expr is omp_priv = expression or subroutine-name(argument-list)
Fortran
CHAPTER2. DIRECTIVES 305
2 3 4 5 6
7 8 9
10
11 12 13 14
15 16 17 18 19
20 21 22 23
24 25 26
27 28 29 30 31
32 33
1
Description
Custom reductions can be defined using the declare reduction directive; the reduction-identifier and the type identify the declare reduction directive. The reduction-identifier can later be used in a reduction clause that uses variables of the type or types specified in the declare reduction directive. If the directive applies to several types then it is considered as if there were multiple declare reduction directives, one for each type.
Fortran
If a type with deferred or assumed length type parameter is specified in a declare reduction directive, the reduction-identifier of that directive can be used in a reduction clause with any variable of the same type and the same kind parameter, regardless of the length type Fortran parameters with which the variable is declared.
Fortran
The visibility and accessibility of this declaration are the same as those of a variable declared at the same point in the program. The enclosing context of the combiner and of the initializer-expr is that of the declare reduction directive. The combiner and the initializer-expr must be correct in the base language as if they were the body of a function defined at the same point in the program.
Fortran
If the reduction-identifier is the same as the name of a user-defined operator or an extended operator, or the same as a generic name that is one of the allowed intrinsic procedures, and if the operator or procedure name appears in an accessibility statement in the same module, the accessibility of the corresponding declare reduction directive is determined by the accessibility attribute of the statement.
If the reduction-identifier is the same as a generic name that is one of the allowed intrinsic procedures and is accessible, and if it has the same name as a derived type in the same module, the accessibility of the corresponding declare reduction directive is determined by the accessibility of the generic name according to the base language.
Fortran C++
The declare reduction directive can also appear at points in the program at which a static data member could be declared. In this case, the visibility and accessibility of the declaration are the same as those of a static data member declared at the same point in the program.
C++
The combiner specifies how partial results can be combined into a single value. The combiner can use the special variable identifiers omp_in and omp_out that are of the type of the variables that this reduction-identifier reduces. Each of them will denote one of the values to be combined before executing the combiner. The special omp_out identifier refers to the storage that holds the resulting combined value after executing the combiner.
The number of times that the combiner is executed, and the order of these executions, for any reduction clause is unspecified.
306
OpenMP API – Version 5.0 November 2018
Fortran
1 If the combiner is a subroutine name with an argument list, the combiner is evaluated by calling the
2 subroutine with the specified argument list.
3 If the combiner is an assignment statement, the combiner is evaluated by executing the assignment
4 statement.
Fortran
5 As the initializer-expr value of a user-defined reduction is not known a priori the initializer-clause
6 can be used to specify one. Then the contents of the initializer-clause will be used as the initializer
7 for private copies of reduction list items where the omp_priv identifier will refer to the storage to
8 be initialized. The special identifier omp_orig can also appear in the initializer-clause and it will
9 refer to the storage of the original variable to be reduced.
10 The number of times that the initializer-expr is evaluated, and the order of these evaluations, is
11 unspecified.
C / C++
12 If the initializer-expr is a function name with an argument list, the initializer-expr is evaluated by
13 calling the function with the specified argument list. Otherwise, the initializer-expr specifies how
14 omp_priv is declared and initialized.
C / C++ C
15 If no initializer-clause is specified, the private variables will be initialized following the rules for
16 initialization of objects with static storage duration.
C C++
17 If no initializer-expr is specified, the private variables will be initialized following the rules for
18 default-initialization.
C++ Fortran
19 If the initializer-expr is a subroutine name with an argument list, the initializer-expr is evaluated by
20 calling the subroutine with the specified argument list.
21 If the initializer-expr is an assignment statement, the initializer-expr is evaluated by executing the
22 assignment statement.
23 If no initializer-clause is specified, the private variables will be initialized as follows:
24 • For complex, real, or integer types, the value 0 will be used.
25 • For logical types, the value .false. will be used.
CHAPTER2. DIRECTIVES 307
1 2
3 4
5 6 7
8
9
10
11
12 13
14 15
16 17
18 19
20 21
22 23
• For derived types for which default initialization is specified, default initialization will be used.
• Otherwise, not specifying an initializer-clause results in unspecified behavior. Fortran
C / C++
If reduction-identifier is used in a target region then a declare target construct must be specified for any function that can be accessed through the combiner and initializer-expr.
C / C++ Fortran
If reduction-identifier is used in a target region then a declare target construct must be specified for any function or subroutine that can be accessed through the combiner and initializer-expr.
Fortran
Restrictions
• • • •
•
• •
•
•
•
The only variables allowed in the combiner are omp_in and omp_out.
The only variables allowed in the initializer-clause are omp_priv and omp_orig.
If the variable omp_orig is modified in the initializer-clause, the behavior is unspecified.
If execution of the combiner or the initializer-expr results in the execution of an OpenMP construct or an OpenMP API call, then the behavior is unspecified.
A reduction-identifier may not be re-declared in the current scope for the same type or for a type that is compatible according to the base language rules.
At most one initializer-clause can be specified. The typename-list must not declare new types.
C / C++
A type name in a declare reduction directive cannot be a function type, an array type, a reference type, or a type qualified with const, volatile or restrict.
C / C++ C
If the initializer-expr is a function name with an argument list, then one of the arguments must be the address of omp_priv.
C C++
If the initializer-expr is a function name with an argument list, then one of the arguments must be omp_priv or the address of omp_priv.
C++
308
OpenMP API – Version 5.0 November 2018
1 2
3 4 5 6
7 8 9
10 11 12
13 14
15 16
17 18 19
20
21 22 23
24 25
26 27
28 2.19.6
29 30 31
• •
•
• • •
• •
•
Fortran
If the initializer-expr is a subroutine name with an argument list, then one of the arguments must be omp_priv.
If the declare reduction directive appears in the specification part of a module and the corresponding reduction clause does not appear in the same module, the reduction-identifier must be the same as the name of a user-defined operator, one of the allowed operators that is extended or a generic name that is the same as the name of one of the allowed intrinsic procedures.
If the declare reduction directive appears in the specification of a module, if the corresponding reduction clause does not appear in the same module, and if the reduction-identifier is the same as the name of a user-defined operator or an extended operator, or the same as a generic name that is the same as one of the allowed intrinsic procedures then the interface for that operator or the generic name must be defined in the specification of the same module, or must be accessible by use association.
Any subroutine or function used in the initializer clause or combiner expression must be an intrinsic function, or must have an accessible interface.
Any user-defined operator, defined assignment or extended operator used in the initializer clause or combiner expression must have an accessible interface.
If any subroutine, function, user-defined operator, defined assignment or extended operator is used in the initializer clause or combiner expression, it must be accessible to the subprogram in which the corresponding reduction clause is specified.
If the length type parameter is specified for a type, it must be a constant, a colon or an *.
If a type with deferred or assumed length parameter is specified in a declare reduction directive, no other declare reduction directive with the same type, the same kind parameters and the same reduction-identifier is allowed in the same scope.
Any subroutine used in the initializer clause or combiner expression must not have any alternate returns appear in the argument list.
Fortran
Cross References
• Properties Common To All Reduction Clauses, see Section 2.19.5.1 on page 294.
Data Copying Clauses
This section describes the copyin clause (allowed on the parallel construct and combined parallel worksharing constructs) and the copyprivate clause (allowed on the single construct).
CHAPTER2. DIRECTIVES 309
1 These clauses support the copying of data values from private or threadprivate variables on one
2 implicit task or thread to the corresponding variables on other implicit tasks or threads in the team.
3 The clauses accept a comma-separated list of list items (see Section 2.1 on page 38). All list items
4 appearing in a clause must be visible, according to the scoping rules of the base language. Clauses
5 may be repeated as needed, but a list item that specifies a given variable may not appear in more
6 than one clause on the same directive.
Fortran
7 An associate name preserves the association with the selector established at the ASSOCIATE
8 statement. A list item that appears in a data copying clause may be a selector of an ASSOCIATE
9 construct. If the construct association is established prior to a parallel region, the association
10 between the associate name and the original list item will be retained in the region. Fortran
11 2.19.6.1 copyin Clause
12
13 14 15
16
17 18
19
20 21 22 23
24 25
Summary
The copyin clause provides a mechanism to copy the value of a threadprivate variable of the master thread to the threadprivate variable of each other member of the team that is executing the parallel region.
Syntax
The syntax of the copyin clause is as follows: copyin(list)
Description
C / C++
The copy is done after the team is formed and prior to the start of execution of the associated structured block. For variables of non-array type, the copy occurs by copy assignment. For an array of elements of non-array type, each element is copied as if by assignment from an element of the array of the master thread to the corresponding element of the array of the other thread.
C / C++ C++
For class types, the copy assignment operator is invoked. The order in which copy assignment operators for different variables of class type are called is unspecified.
C++
310
OpenMP API – Version 5.0 November 2018
Fortran
1 The copy is done, as if by assignment, after the team is formed and prior to the start of execution of
2 the associated structured block.
3 On entry to any parallel region, each thread’s copy of a variable that is affected by a copyin
4 clause for the parallel region will acquire the type parameters, allocation, association, and
5 definition status of the copy of the master thread, according to the following rules:
6 • 7
8 •
9 10 11
12 • 13
If the original list item has the POINTER attribute, each copy receives the same association status as that of the copy of the master thread as if by pointer assignment.
If the original list item does not have the POINTER attribute, each copy becomes defined with the value of the copy of the master thread as if by intrinsic assignment unless the list item has a type bound procedure as a defined assignment. If the original list item that does not have the POINTER attribute has the allocation status of unallocated, each copy will have the same status.
If the original list item is unallocated or unassociated, the copy of the other thread inherits the declared type parameters and the default type parameter values from the original list item.
Fortran
14 Restrictions
15 The restrictions to the copyin clause are as follows:
16 •
17 •
18
19 • 20
21
22 • 23
24 • 25
C / C++
A list item that appears in a copyin clause must be threadprivate.
A variable of class type (or array thereof) that appears in a copyin clause requires an
accessible, unambiguous copy assignment operator for the class type.
C / C++ Fortran
A list item that appears in a copyin clause must be threadprivate. Named variables that appear in a threadprivate common block may be specified: it is not necessary to specify the whole common block.
A common block name that appears in a copyin clause must be declared to be a common block in the same scoping unit in which the copyin clause appears.
If the list item is a polymorphic variable with the ALLOCATABLE attribute, the behavior is unspecified.
Fortran
CHAPTER2. DIRECTIVES 311
5
6 7 8
9 10
11
12 13
14
15 16 17
18
19
20
21
22
23
24
25 26
Summary
The copyprivate clause provides a mechanism to use a private variable to broadcast a value from the data environment of one implicit task to the data environments of the other implicit tasks that belong to the parallel region.
To avoid data races, concurrent reads or updates of the list item must be synchronized with the update of the list item that occurs as a result of the copyprivate clause.
Syntax
The syntax of the copyprivate clause is as follows: copyprivate(list)
Description
The effect of the copyprivate clause on the specified list items occurs after the execution of the structured block associated with the single construct (see Section 2.8.2 on page 89), and before any of the threads in the team have left the barrier at the end of the construct.
C / C++
In all other implicit tasks that belong to the parallel region, each specified list item becomes defined with the value of the corresponding list item in the implicit task associated with the thread that executed the structured block. For variables of non-array type, the definition occurs by copy assignment. For an array of elements of non-array type, each element is copied by copy assignment from an element of the array in the data environment of the implicit task that is associated with the thread that executed the structured block to the corresponding element of the array in the data environment of the other implicit tasks
C / C++ C++
For class types, a copy assignment operator is invoked. The order in which copy assignment operators for different variables of class type are called is unspecified.
C++
1 Cross References
2 • parallel construct, see Section 2.6 on page 74.
3 • threadprivate directive, see Section 2.19.2 on page 274.
4 2.19.6.2 copyprivate Clause
312
OpenMP API – Version 5.0 November 2018
Fortran
1 If a list item does not have the POINTER attribute, then in all other implicit tasks that belong to the
2 parallel region, the list item becomes defined as if by intrinsic assignment with the value of the
3 corresponding list item in the implicit task that is associated with the thread that executed the
4 structured block. If the list item has a type bound procedure as a defined assignment, the
5 assignment is performed by the defined assignment.
6 If the list item has the POINTER attribute, then, in all other implicit tasks that belong to the
7 parallel region, the list item receives, as if by pointer assignment, the same association status of
8 the corresponding list item in the implicit task that is associated with the thread that executed the
9 structured block.
10 The order in which any final subroutines for different variables of a finalizable type are called is
11 unspecified.
Fortran
12
13 Note – The copyprivate clause is an alternative to using a shared variable for the value when
14 providing such a shared variable would be difficult (for example, in a recursion requiring a different
15 variable at each level).
16
17 Restrictions
18 The restrictions to the copyprivate clause are as follows:
19 • 20
21 • 22
23 • 24
25 •
26 •
27 •
28
29 • 30
All list items that appear in the copyprivate clause must be either threadprivate or private in the enclosing context.
A list item that appears in a copyprivate clause may not appear in a private or firstprivate clause on the single construct.
C++
A variable of class type (or array thereof) that appears in a copyprivate clause requires an accessible unambiguous copy assignment operator for the class type.
C++ Fortran
A common block that appears in a copyprivate clause must be threadprivate. Pointers with the INTENT(IN) attribute may not appear in the copyprivate clause.
The list item with the ALLOCATABLE attribute must have the allocation status of allocated when the intrinsic assignment is performed.
If the list item is a polymorphic variable with the ALLOCATABLE attribute, the behavior is unspecified.
Fortran
CHAPTER2. DIRECTIVES 313
1 2 3 4
5 2.19.7
6 7 8 9
10 11
12 13
14 15 16
17 18 19
20 21
22 23 24 25 26
27 28 29
Cross References
• parallel construct, see Section 2.6 on page 74.
• threadprivate directive, see Section 2.19.2 on page 274. • private clause, see Section 2.19.4.3 on page 285.
Data-Mapping Attribute Rules, Clauses, and Directives
This section describes how the data-mapping and data-sharing attributes of any variable referenced in a target region are determined. When specified, explicit data-sharing attributes, map or is_device_ptr clauses on target directives determine these attributes. Otherwise, the first matching rule from the following implicit data-mapping rules applies for variables referenced in a target construct that are not declared in the construct and do not appear in data-sharing attribute, map or is_device_ptr clauses.
• •
•
•
•
•
If a variable appears in a to or link clause on a declare target directive then it is treated as if it had appeared in a map clause with a map-type of tofrom.
If a list item appears in a reduction, lastprivate or linear clause on a combined target construct then it is treated as if it also appears in a map clause with a map-type of tofrom.
If a list item appears in an in_reduction clause on a target construct then it is treated as if it also appears in a map clause with a map-type of tofrom and a map-type-modifier of always.
If a defaultmap clause is present for the category of the variable and specifies an implicit behavior other than default, the data-mapping attribute is determined by that clause.
C++
If the target construct is within a class non-static member function, and a variable is an accessible data member of the object for which the non-static data member function is invoked, the variable is treated as if the this[:1] expression had appeared in a map clause with a map-type of tofrom. Additionally, if the variable is of a type pointer or reference to pointer, it is also treated as if it has appeared in a map clause as a zero-length array section.
If the this keyword is referenced inside a target construct within a class non-static member function, it is treated as if the this[:1] expression had appeared in a map clause with a map-type of tofrom.
C++
314
OpenMP API – Version 5.0 November 2018
1 • 2
3 • 4
5 • 6
7 • 8
9 • 10
11
C / C++
A variable that is of type pointer is treated as if it is the base pointer of a zero-length array section that appeared as a list item in a map clause.
C / C++ C++
A variable that is of type reference to pointer is treated as if it had appeared in a map clause as a zero-length array section.
C++
If a variable is not a scalar then it is treated as if it had appeared in a map clause with a map-type of tofrom.
Fortran
If a scalar variable has the TARGET, ALLOCATABLE or POINTER attribute then it is treated as if it has appeared in a map clause with a map-type of tofrom.
Fortran
If none of the above rules applies then a scalar variable is not mapped, but instead has an implicit data-sharing attribute of mapped, but instead has an implicit data-sharing attribute of firstprivate (see Section 2.19.1.1 on page 270).
12 2.19.7.1 map Clause
13 Summary
14 The map clause specifies how an original list item is mapped from the current task’s data
15 environment to a corresponding list item in the device data environment of the device identified by
16 the construct.
17 Syntax
18 The syntax of the map clause is as follows:
19 map([[map-type-modifier[,][map-type-modifier[,]...]map-type : ]locator-list)
20 where map-type is one of the following:
21
22
23
24
25
26
to
from
tofrom
alloc
release
delete
CHAPTER2. DIRECTIVES 315
1
and map-type-modifier is one of the following:
always close
mapper(mapper-identifier)
2 3 4
5 6 7
8
9 10 11 12
13 14 15 16 17
18 19
20 21
22 23 24
25 26 27 28
Description
The list items that appear in a map clause may include array sections and structure elements. The map-type and map-type-modifier specify the effect of the map clause, as described below.
For a given construct, the effect of a map clause with the to, from, or tofrom map-type is ordered before the effect of a map clause with the alloc, release, or delete map-type. If a mapper is specified for the type being mapped, or explicitly specified with the mapper map-type-modifier, then the effective map-type of a list item will be determined according to the rules of map-type decay.
If a mapper is specified for the type being mapped, or explicitly specified with the mapper map-type-modifier, then all map clauses that appear on the declare mapper directive are treated as though they appeared on the construct with the map clause. Array sections of a mapper type are mapped as normal, then each element in the array section is mapped according to the rules of the mapper.
C / C++
If a list item in a map clause is a variable of structure type then it is treated as if each structure element contained in the variable is a list item in the clause.
C / C++ Fortran
If a list item in a map clause is a derived type variable then it is treated as if each component is a list item in the clause.
Each pointer component that is a list item that results from a mapped derived type variable is treated as if its association status is undefined, unless the pointer component appears as another list item or as the base pointer of another list item in a map clause on the same construct.
Fortran
If a list item in a map clause is a structure element then all other structure elements of the containing structure variable form a structure sibling list. The map clause and the structure sibling list are associated with the same construct. If a corresponding list item of the structure sibling list item is present in the device data environment when the construct is encountered then:
316
OpenMP API – Version 5.0 November 2018
1 • If the structure sibling list item does not appear in a map clause on the construct then:
2 – 3
4
5 – 6
7 – 8
9
If the construct is a target, target data, or target enter data construct then the structure sibling list item is treated as if it is a list item in a map clause on the construct with a map-type of alloc.
If the construct is target exit data construct, then the structure sibling list item is treated as if it is a list item in a map clause on the construct with a map-type of release.
Fortran
If the structure sibling list item is a pointer then it is treated as if its association status is undefined, unless it appears as the base pointer of another list item in a map clause on the same construct.
Fortran
10 • If the map clause in which the structure element appears as a list item has a map-type of
11 delete and the structure sibling list item does not appear as a list item in a map clause on the
12 construct with a map-type of delete then the structure sibling list item is treated as if it is a list
13 item in a map clause on the construct with a map-type of delete.
14 If item1 is a list item in a map clause, and item2 is another list item in a map clause on the same
15 construct that has a base pointer that is, or is part of, item1, then:
16 • 17
18
19 • 20
21
If the map clause(s) appear on a target, target data, or target enter data construct, then on entry to the corresponding region the effect of the map clause on item1 is ordered to occur before the effect of the map clause on item2.
If the map clause(s) appear on a target, target data, or target exit data construct then on exit from the corresponding region the effect of the map clause on item2 is ordered to occur before the effect of the map clause on item1.
Fortran
22 If a list item in a map clause is an associated pointer and the pointer is not the base pointer of
23 another list item in a map clause on the same construct, then it is treated as if its pointer target is
24 implicitly mapped in the same clause. For the purposes of the map clause, the mapped pointer
25 target is treated as if its base pointer is the associated pointer.
Fortran
26 If a list item in a map clause has a base pointer, and a pointer variable is present in the device data
27 environment that corresponds to the base pointer when the effect of the map clause occurs, then if
28 the corresponding pointer or the corresponding list item is created in the device data environment
29 on entry to the construct, then:
C / C++
30 1. The corresponding pointer variable is assigned an address such that the corresponding list item
31 can be accessed through the pointer in a target region.
C / C++
CHAPTER2. DIRECTIVES 317
1 2 3
4
5 6
7 8
9 10
11
12 13 14
15 16 17
18 19 20
21 22
23 24
25 26
27 28
29 30
31
1.
2. 3.
Fortran
The corresponding pointer variable is associated with a pointer target that has the same rank and bounds as the pointer target of the original pointer, such that the corresponding list item can be accessed through the pointer in a target region.
Fortran
The corresponding pointer variable becomes an attached pointer for the corresponding list item.
If the original base pointer and the corresponding attached pointer share storage, then the original list item and the corresponding list item must share storage.
C++
If a lambda is mapped explicitly or implicitly, variables that are captured by the lambda behave as follows:
• the variables that are of pointer type are treated as if they had appeared in a map clause as zero-length array sections; and
• the variables that are of reference type are treated as if they had appeared in a map clause.
If a member variable is captured by a lambda in class scope, and the lambda is later mapped explicitly or implicitly with its full static type, the this pointer is treated as if it had appeared on a map clause.
C++
The original and corresponding list items may share storage such that writes to either item by one task followed by a read or write of the other item by another task without intervening synchronization can result in data races.
If the map clause appears on a target, target data, or target enter data construct then on entry to the region the following sequence of steps occurs as if performed as a single atomic operation:
318
OpenMP API – Version 5.0 November 2018
1.
If a corresponding list item of the original list item is not present in the device data environment, then:
a) A new list item with language-specific attributes is derived from the original list item and created in the device data environment;
b) The new list item becomes the corresponding list item of the original list item in the device data environment;
c) The corresponding list item has a reference count that is initialized to zero; and
d) The value of the corresponding list item is undefined;
If the corresponding list item’s reference count was not already incremented because of the
effect of a map clause on the construct then:
a) The corresponding list item’s reference count is incremented by one;
2.
1 3. If the corresponding list item’s reference count is one or the always map-type-modifier is
2 present, and if the map-type is to or tofrom, then:
3 a) 4
5 a) 6
7
8 b) 9
10
C / C++
For each part of the list item that is an attached pointer, that part of the corresponding list item will have the value that it had immediately prior to the effect of the map clause; and
C / C++ Fortran
For each part of the list item that is an attached pointer, that part of the corresponding list item, if associated, will be associated with the same pointer target that it was associated with immediately prior to the effect of the map clause.
Fortran
For each part of the list item that is not an attached pointer, the value of that part of the original list item is assigned to that part of the corresponding list item.
11 Note – If the effect of the map clauses on a construct would assign the value of an original list
12 item to a corresponding list item more than once, then an implementation is allowed to ignore
13 additional assignments of the same value to the corresponding list item.
14
15 In all cases on entry to the region, concurrent reads or updates of any part of the corresponding list
16 item must be synchronized with any update of the corresponding list item that occurs as a result of
17 the map clause to avoid data races.
18 If the map clause appears on a target, target data, or target exit data construct and a
19 corresponding list item of the original list item is not present in the device data environment on exit
20 from the region then the list item is ignored. Alternatively, if the map clause appears on a target,
21 target data, or target exit data construct and a corresponding list item of the original list
22 item is present in the device data environment on exit from the region, then the following sequence
23 of steps occurs as if performed as a single atomic operation:
24 1. 25
26
27 2. 28
29 3. 30
If the map-type is not delete and the corresponding list item’s reference count is finite and was not already decremented because of the effect of a map clause on the construct then:
a) The corresponding list item’s reference count is decremented by one;
If the map-type is delete and the corresponding list item’s reference count is finite then:
a) The corresponding list item’s reference count is set to zero;
If the map-type is from or tofrom and if the corresponding list item’s reference count is zero or the always map-type-modifier is present then:
CHAPTER2. DIRECTIVES 319
1 2
3 4 5
6 7
8 9
10
11 12 13 14
15 16 17
18 19 20 21
22
23
24
25
26
27
28 29 30 31
a)
a)
b)
C / C++
For each part of the list item that is an attached pointer, that part of the original list item will have the value that it had immediately prior to the effect of the map clause;
C / C++ Fortran
For each part of the list item that is an attached pointer, that part of the corresponding list item, if associated, will be associated with the same pointer target with which it was associated immediately prior to the effect of the map clause; and
Fortran
For each part of the list item that is not an attached pointer, the value of that part of the corresponding list item is assigned to that part of the original list item; and
4. If the corresponding list item’s reference count is zero then the corresponding list item is removed from the device data environment.
Note – If the effect of the map clauses on a construct would assign the value of a corresponding list item to an original list item more than once, then an implementation is allowed to ignore additional assignments of the same value to the original list item.
In all cases on exit from the region, concurrent reads or updates of any part of the original list item must be synchronized with any update of the original list item that occurs as a result of the map clause to avoid data races.
If a single contiguous part of the original storage of a list item with an implicit data-mapping attribute has corresponding storage in the device data environment prior to a task encountering the construct that is associated with the map clause, only that part of the original storage will have corresponding storage in the device data environment as a result of the map clause.
If a list item with an implicit data-mapping attribute does not have any corresponding storage in the device data environment prior to a task encountering the construct associated with the map clause, and one or more contiguous parts of the original storage are either list items or base pointers to list items that are explicitly mapped on the construct, only those parts of the original storage will have corresponding storage in the device data environment as a result of the map clauses on the construct.
C / C++
If a new list item is created then a new list item of the same type, with automatic storage duration, is allocated for the construct. The size and alignment of the new list item are determined by the static type of the variable. This allocation occurs if the region references the list item in any statement. Initialization and assignment of the new list item are through bitwise copy.
C / C++
320
OpenMP API – Version 5.0 November 2018
Fortran
1 If a new list item is created then a new list item of the same type, type parameter, and rank is
2 allocated. The new list item inherits all default values for the type parameters from the original list
3 item. The value of the new list item becomes that of the original list item in the map initialization
4 and assignment.
5 If the allocation status of the original list item with the ALLOCATABLE attribute is changed in the
6 host device data environment and the corresponding list item is already present in the device data
7 environment, the allocation status of the corresponding list item is unspecified until a mapping
8 operation is performed with a map clause on entry to a target, target data, or
9 target enter data region.
Fortran
10 The map-type determines how the new list item is initialized.
11 If a map-type is not specified, the map-type defaults to tofrom.
12 The close map-type-modifier is a hint to the runtime to allocate memory close to the target device.
13 Execution Model Events
14 The target-map event occurs when a thread maps data to or from a target device.
15 The target-data-op event occurs when a thread initiates a data operation on a target device.
16 Tool Callbacks
17 A thread dispatches a registered ompt_callback_target_map callback for each occurrence
18 of a target-map event in that thread. The callback occurs in the context of the target task and has
19 type signature ompt_callback_target_map_t.
20 A thread dispatches a registered ompt_callback_target_data_op callback for each
21 occurrence of a target-data-op event in that thread. The callback occurs in the context of the target
22 task and has type signature ompt_callback_target_data_op_t.
23 Restrictions
24 The restrictions to the map clause are as follows:
25 • A list item cannot appear in both a map clause and a data-sharing attribute clause on the same
26 construct unless the construct is a combined construct.
27 • Each of the map-type-modifier modifiers can appear at most once on the map clause.
CHAPTER2. DIRECTIVES 321
1 2
3
4 5 6 7 8
9 10 11 12
13 14 15 16
17 18
19 20
21
22 23 24 25
26 27
28 29 30
•
• •
•
•
• • •
• •
• •
C / C++
List items of the map clauses on the same construct must not share original storage unless they are the same lvalue expression or array section.
C / C++
If a list item is an array section, it must specify contiguous storage.
If multiple list items are explicitly mapped on the same construct and have the same containing array or have base pointers that share original storage, and if any of the list items do not have corresponding list items that are present in the device data environment prior to a task encountering the construct, then the list items must refer to the same array elements of either the containing array or the implicit array of the base pointers.
If any part of the original storage of a list item with an explicit data-mapping attribute has corresponding storage in the device data environment prior to a task encountering the construct associated with the map clause, all of the original storage must have corresponding storage in the device data environment prior to the task encountering the construct.
If a list item is an element of a structure, and a different element of the structure has a corresponding list item in the device data environment prior to a task encountering the construct associated with the map clause, then the list item must also have a corresponding list item in the device data environment prior to the task encountering the construct.
A list item must have a mappable type.
threadprivate variables cannot appear in a map clause.
If a mapper map-type-modifier is specified, its type must match the type of the list-items passed to that map clause.
Memory spaces and memory allocators cannot appear as a list item in a map clause.
C++
If the type of a list item is a reference to a type T then the reference in the device data environment is initialized to refer to the object in the device data environment that corresponds to the object referenced by the list item. If mapping occurs, it occurs as though the object were mapped through a pointer with an array section of type T and length one.
No type mapped through a reference can contain a reference to its own type, or any references to types that could produce a cycle of references.
If the list item is a lambda, any pointers and references captured by the lambda must have the corresponding list item in the device data environment prior to the task encountering the construct.
C++
322
OpenMP API – Version 5.0 November 2018
1 •
2 •
3 •
4 5
6 • 7
8 •
9 10
11 • 12
13
14 • 15
16
17
18 • 19
20 •
21 •
22 23
24 • 25
26 • 27
28 • 29
30
C / C++
A list item cannot be a variable that is a member of a structure with a union type. A bit-field cannot appear in a map clause.
A pointer that has a corresponding attached pointer must not be modified for the duration of the lifetime of the list item to which the corresponding pointer is attached in the device data environment.
C / C++ Fortran
List items of the map clauses on the same construct must not share original storage unless they are the same variable or array section.
A pointer that has a corresponding attached pointer and is associated with a given pointer target must not become associated with a different pointer target for the duration of the lifetime of the list item to which the corresponding pointer is attached in the device data environment.
If the allocation status of a list item or any subobject of the list item with the ALLOCATABLE attribute is unallocated upon entry to a target region, the list item or any subobject of the corresponding list item must be unallocated upon exit from the region.
If the allocation status of a list item or any subobject of the list item with the ALLOCATABLE attribute is allocated upon entry to a target region, the allocation status of the corresponding list item or any subobject of the corresponding list item must not be changed and must not be reshaped in the region.
If an array section is mapped and the size of the section is smaller than that of the whole array, the behavior of referencing the whole array in the target region is unspecified.
A list item must not be a whole array of an assumed-size array.
If the association status of a list item with the POINTER attribute is associated upon entry to a target region, the list item must be associated with the same pointer target upon exit from the region.
If the association status of a list item with the POINTER attribute is disassociated upon entry to a target region, the list item must be disassociated upon exit from the region.
If the association status of a list item with the POINTER attribute is undefined upon entry to a target region, the list item must be undefined upon exit from the region.
If the association status of a list item with the POINTER attribute is disassociated or undefined on entry and if the list item is associated with a pointer target inside a target region, then the pointer association status must become disassociated before the end of the region.
Fortran
CHAPTER2. DIRECTIVES 323
5
6 7 8
9
10 11
12
13
14
15
16
17
18
19
20
21 22 23
Summary
The defaultmap clause explicitly determines the data-mapping attributes of variables that are referenced in a target construct for which the data-mapping attributes would otherwise be implicitly determined (see Section 2.19.7 on page 314).
Syntax
The syntax of the defaultmap clause is as follows: defaultmap(implicit-behavior[:variable-category])
1 Cross References
2 • ompt_callback_target_data_op_t, see Section 4.5.2.25 on page 488.
3 • ompt_callback_target_map_t, see Section 4.5.2.27 on page 492.
4 2.19.7.2 defaultmap Clause
Where implicit-behavior is one of:
alloc to
from
tofrom
firstprivate
none
default
and variable-category is one of:
C / C++
C / C++
scalar
aggregate
pointer
324
OpenMP API – Version 5.0 November 2018
1 and variable-category is one of:
2 3 4 5
6 Description
Fortran
Fortran
scalar
aggregate
allocatable
pointer
7 The defaultmap clause sets the implicit data-mapping attribute for all variables referenced in the
8 construct. If variable-category is specified, the effect of the defaultmap clause is as follows:
9 • 10
11
12 • 13
14
15 • 16
17
18
19
If variable-category is scalar, all scalar variables of non-pointer type or all non-pointer non-allocatable scalar variables that have an implicitly determined data-mapping or data-sharing attribute will have a data-mapping or data-sharing attribute specified by implicit-behavior.
If variable-category is aggregate or allocatable, all aggregate or allocatable variables that have an implicitly determined data-mapping or data-sharing attribute will have a data-mapping or data-sharing attribute specified by implicit-behavior.
If variable-category is pointer, all variables of pointer type or with the POINTER attribute that have implicitly determined data-mapping or data-sharing attributes will have a data-mapping or data-sharing attribute specified by implicit-behavior. The zero-length array section and attachment that are otherwise applied to an implicitly mapped pointer are only provided for the default behavior.
20 If no variable-category is specified in the clause then implicit-behavior specifies the implicitly
21 determined data-mapping or data-sharing attribute for all variables referenced in the construct. If
22 implicit-behavior is none, each variable referenced in the construct that does not have a
23 predetermined data-sharing attribute and does not appear in a to or link clause on a
24 declare target directive must be listed in a data-mapping attribute clause, a data-sharing
25 attribute clause (including a data-sharing attribute clause on a combined construct where target
26 is one of the constituent constructs), or an is_device_ptr clause. If implicit-behavior is
27 default, then the clause has no effect for the variables in the category specified by
28 variable-category.
CHAPTER2. DIRECTIVES 325
2
3 4 5
6
7
8 9
10
11 12
13
14
15
16
17 18
19
20
21
22
23
24
25
Summary
The declare mapper directive declares a user-defined mapper for a given type, and may define
1 2.19.7.3 declare mapper Directive
a mapper-identifier that can be used in a map clause. The declare declarative directive.
mapper directive is a
Syntax
The syntax of the declare
The syntax of the declare
where:
C / C++
mapper directive is as follows:
C / C++ Fortran
mapper directive is as follows:
Fortran
#pragma omp declare mapper([mapper-identifier:]typevar) \ [clause[ [,] clause] ... ] new-line
!$omp declare mapper([mapper-identifier:]type::var) & [clause[ [,] clause] ... ]
326
OpenMP API – Version 5.0 November 2018
• • • •
mapper-identifier is a base-language identifier or default
type is a valid type in scope
var is a valid base-language identifier
clause is map([[map-type-modifier[,] [map-type-modifier[,] ...]] map-type: ] list) , where map-type is one of the following:
– alloc
– to
– from
– tofrom
and where map-type-modifier is one of the following: – always
– close
1 Description
2 User-defined mappers can be defined using the declare mapper directive. The type and the
3 mapper-identifier uniquely identify the mapper for use in a map clause later in the program. If the
4 mapper-identifier is not specified, then default is used. The visibility and accessibility of this
5 declaration are the same as those of a variable declared at the same point in the program.
6 The variable declared by var is available for use in all map clauses on the directive, and no part of
7 the variable to be mapped is mapped by default.
8 The default mapper for all types T, designated by the pre-defined mapper-identifier default, is as
9 follows unless a user-defined mapper is specified for that type.
10 declare mapper(T v) map(tofrom: v)
11 Using the default mapper-identifier overrides the pre-defined default mapper for the given type,
12 making it the default for all variables of type. All map clauses with this construct in scope that map
13 a list item of type will use this mapper unless another is explicitly specified.
14 All map clauses on the directive are expanded into corresponding map clauses wherever this
15 mapper is invoked, either by matching type or by being explicitly named in a map clause. A map
16 clause with list item var maps var as though no mapper were specified.
C++
17 The declare mapper directive can also appear at points in the program at which a static data
18 member could be declared. In this case, the visibility and accessibility of the declaration are the
19 same as those of a static data member declared at the same point in the program.
C++
20 Restrictions
21 The restrictions to the declare mapper directive are as follows:
22 • 23
24
25 •
26 •
27 •
28 •
29
30 •
No instance of type can be mapped as part of the mapper, either directly or indirectly through another type, except the instance passed as the list item. If a set of declare mapper directives results in a cyclic definition then the behavior is unspecified.
The type must be of struct, union or class type in C and C++ or a non-intrinsic type in Fortran.
The type must not declare a new type.
At least one map clause that maps var or at least one element of var is required.
List-items in map clauses on this construct may only refer to the declared variable var and entities that could be referenced by a procedure defined at the same location.
Each map-type-modifier can appear at most once on the map clause.
CHAPTER2. DIRECTIVES 327
1 2
3
4 2.20 5
6
7 8
9 10
11 12
13 14 15
16 17
18 19
20 21 22
23 24
25
26 27 28
• A mapper-identifier may not be redeclared in the current scope for the same type or for a type that is compatible according to the base language rules.
• type must not be an abstract type.
Nesting of Regions
Fortran Fortran
328
OpenMP API – Version 5.0 November 2018
This section describes a set of restrictions on the nesting of regions. The restrictions on nesting are as follows:
• • • •
• • •
•
• •
A worksharing region may not be closely nested inside a worksharing, loop, task, taskloop, critical, ordered, atomic, or master region.
A barrier region may not be closely nested inside a worksharing, loop, task, taskloop, critical, ordered, atomic, or master region.
A master region may not be closely nested inside a worksharing, loop, atomic, task, or taskloop region.
An ordered region corresponding to an ordered construct without any clause or with the threads or depend clause may not be closely nested inside a critical, ordered, loop, atomic, task, or taskloop region.
An ordered region corresponding to an ordered construct without the simd clause specified must be closely nested inside a worksharing-loop region.
An ordered region corresponding to an ordered construct with the simd clause specified must be closely nested inside a simd or worksharing-loop SIMD region.
An ordered region corresponding to an ordered construct with both the simd and threads clauses must be closely nested inside a worksharing-loop SIMD region or closely nested inside a worksharing-loop and simd region.
A critical region may not be nested (closely or otherwise) inside a critical region with the same name. This restriction is not sufficient to prevent deadlock.
OpenMP constructs may not be encountered during execution of an atomic region.
The only OpenMP constructs that can be encountered during execution of a simd (or worksharing-loop SIMD) region are the atomic construct, the loop construct, the simd construct and the ordered construct with the simd clause.
1 • 2
3 • 4
5
6 • 7
8
9 • 10
11 12
13 • 14
15 • 16
17
18
19 20
21 • 22
23
24
25 26
27 • 28
29
30 • 31
If a target update, target data, target enter data, or target exit data construct is encountered during execution of a target region, the behavior is unspecified.
If a target construct is encountered during execution of a target region and a device clause in which the ancestor device-modifier appears is not present on the construct, the behavior is unspecified.
A teams region can only be strictly nested within the implicit parallel region or a target region. If a teams construct is nested within a target construct, that target construct must contain no statements, declarations or directives outside of the teams construct.
distribute, distribute simd, distribute parallel worksharing-loop, distribute parallel worksharing-loop SIMD, loop, parallel regions, including any parallel regions arising from combined constructs, omp_get_num_teams() regions, and omp_get_team_num() regions are the only OpenMP regions that may be strictly nested inside the teams region.
The region corresponding to the distribute construct must be strictly nested inside a teams region.
If construct-type-clause is taskgroup, the cancel construct must be closely nested inside a task construct and the cancel region must be closely nested inside a taskgroup region. If construct-type-clause is sections, the cancel construct must be closely nested inside a sections or section construct. Otherwise, the cancel construct must be closely nested inside an OpenMP construct that matches the type specified in construct-type-clause of the cancel construct.
A cancellation point construct for which construct-type-clause is taskgroup must be closely nested inside a task construct, and the cancellation point region must be closely nested inside a taskgroup region. A cancellation point construct for which construct-type-clause is sections must be closely nested inside a sections or section construct. Otherwise, a cancellation point construct must be closely nested inside an OpenMP construct that matches the type specified in construct-type-clause.
The only constructs that may be nested inside a loop region are the loop construct, the parallel construct, the simd construct, and combined constructs for which the first construct is a parallel construct.
A loop region may not contain calls to procedures that contain OpenMP directives or calls to the OpenMP Runtime API.
CHAPTER2. DIRECTIVES 329
This page intentionally left blank
CHAPTER 3
1 Runtime Library Routines 2
3 This chapter describes the OpenMP API runtime library routines and queryable runtime states. In
4 this chapter, true and false are used as generic terms to simplify the description of the routines.
C / C++
5 true means a nonzero integer value and false means an integer value of zero. C / C++
Fortran
6 true means a logical value of .TRUE. and false means a logical value of .FALSE.. Fortran
Fortran
7 Restrictions
8 The following restriction applies to all OpenMP runtime library routines:
9 • OpenMP runtime library routines may not be called from PURE or ELEMENTAL procedures. Fortran
CHAPTER3. RUNTIMELIBRARYROUTINES 331
1 3.1
2 3 4 5 6
7
8 9
10
11
12
13
14
15
16
17
18
19 20
21 22 23
24 25
26 27
Runtime Library Definitions
For each base language, a compliant implementation must supply a set of definitions for the OpenMP API runtime library routines and the special data types of their parameters. The set of definitions must contain a declaration for each OpenMP API runtime library routine and variable and a definition of each required data type listed below. In addition, each set of definitions may specify other implementation specific values.
C / C++
The library routines are external functions with “C” linkage.
Prototypes for the C/C++ runtime library routines described in this chapter shall be provided in a
header file named omp.h. This file also defines the following:
• The type omp_lock_t;
• The type omp_nest_lock_t;
• The type omp_sync_hint_t;
• The type omp_lock_hint_t (deprecated);
• The type omp_sched_t;
• The type omp_proc_bind_t;
• The type omp_control_tool_t;
• The type omp_control_tool_result_t;
• The type omp_depend_t;
• The type omp_memspace_handle_t, which must be an implementation-defined enum type with an enumerator for at least each predefined memory space in Table 2.8 on page 152;
• The type omp_allocator_handle_t, which must be an implementation-defined enum type with at least the omp_null_allocator enumerator with the value zero and an enumerator for each predefined memory allocator in Table 2.10 on page 155;
• The type omp_uintptr_t, which is an unsigned integer type capable of holding a pointer on any device;
• The type omp_pause_resource_t; and
• The type omp_event_handle_t, which must be an implementation-defined enum type. C / C++
332
OpenMP API – Version 5.0 November 2018
C++
1 The omp.h header file also defines a class template that models the Allocator concept in the
2 omp::allocator namespace for each predefined memory allocator in Table 2.10 on page 155
3 for which the name includes neither the omp_ prefix nor the _alloc suffix.
C++ Fortran
4 The OpenMP Fortran API runtime library routines are external procedures. The return values of
5 these routines are of default kind, unless otherwise specified.
6 Interface declarations for the OpenMP Fortran runtime library routines described in this chapter
7 shall be provided in the form of a Fortran include file named omp_lib.h or a Fortran 90
8 module named omp_lib. It is implementation defined whether the include file or the
9 module file (or both) is provided.
10 These files also define the following:
11 •
12 •
13 •
14 •
15 •
16 •
17 •
18 •
19 •
20 •
21 •
22 •
23 •
24 •
25
26 • 27
28 •
29 •
The integer parameter omp_lock_kind;
The integer parameter omp_nest_lock_kind;
The integer parameter omp_sync_hint_kind;
The integer parameter omp_lock_hint_kind (deprecated);
The integer parameter omp_sched_kind;
The integer parameter omp_proc_bind_kind;
The integer parameter omp_control_tool_kind;
The integer parameter omp_control_tool_result_kind;
The integer parameter omp_depend_kind;
The integer parameter omp_memspace_handle_kind;
The integer parameter omp_allocator_handle_kind;
The integer parameter omp_alloctrait_key_kind;
The integer parameter omp_alloctrait_val_kind;
An integer parameter of kind omp_memspace_handle_kind for each predefined memory space in Table 2.8 on page 152;
An integer parameter of kind omp_allocator_handle_kind for each predefined memory allocator in Table 2.10 on page 155;
The integer parameter omp_pause_resource_kind; The integer parameter omp_event_handle_kind; and
CHAPTER3. RUNTIMELIBRARYROUTINES 333
1 2 3 4
5 6 7
8 3.2 9
10
• The integer parameter openmp_version with a value yyyymm where yyyy and mm are the year and month designations of the version of the OpenMP Fortran API that the implementation supports; this value matches that of the C preprocessor macro _OPENMP, when a macro preprocessor is supported (see Section 2.2 on page 49).
It is implementation defined whether any of the OpenMP runtime library routines that take an argument are extended with a generic interface so arguments of different KIND type can be accommodated.
Fortran
Execution Environment Routines
This section describes routines that affect and monitor threads, processors, and the parallel environment.
11 3.2.1 omp_set_num_threads
12
13 14 15
16 17
18 19
20
21 22
Summary
The omp_set_num_threads routine affects the number of threads to be used for subsequent parallel regions that do not specify a num_threads clause, by setting the value of the first element of the nthreads-var ICV of the current task.
Format
C / C++
void omp_set_num_threads(int num_threads); C / C++
Fortran Fortran
Constraints on Arguments
The value of the argument passed to this routine must evaluate to a positive integer, or else the behavior of this routine is implementation defined.
subroutine omp_set_num_threads(num_threads) integer num_threads
334
OpenMP API – Version 5.0 November 2018
1 Binding
2 The binding task set for an omp_set_num_threads region is the generating task.
3 Effect
4 The effect of this routine is to set the value of the first element of the nthreads-var ICV of the
5 current task to the value specified in the argument.
6 Cross References
7 • nthreads-var ICV, see Section 2.5 on page 63.
8 • parallel construct and num_threads clause, see Section 2.6 on page 74.
9 • Determining the number of threads for a parallel region, see Section 2.6.1 on page 78.
10 • omp_get_num_threads routine, see Section 3.2.2 on page 335.
11 • omp_get_max_threads routine, see Section 3.2.3 on page 336.
12 • OMP_NUM_THREADS environment variable, see Section 6.2 on page 602.
13 3.2.2 omp_get_num_threads
14 Summary
15 The omp_get_num_threads routine returns the number of threads in the current team.
16 Format
C / C++
17 int omp_get_num_threads(void);
C / C++
Fortran
18 integer function omp_get_num_threads() Fortran
19 Binding
20 The binding region for an omp_get_num_threads region is the innermost enclosing
21 parallel region.
CHAPTER3. RUNTIMELIBRARYROUTINES 335
1 Effect
2 The omp_get_num_threads routine returns the number of threads in the team that is executing
3 the parallel region to which the routine region binds. If called from the sequential part of a
4 program, this routine returns 1.
5 Cross References
6 • nthreads-var ICV, see Section 2.5 on page 63.
7 • parallel construct and num_threads clause, see Section 2.6 on page 74.
8 • Determining the number of threads for a parallel region, see Section 2.6.1 on page 78.
9 • omp_set_num_threads routine, see Section 3.2.1 on page 334.
10 • OMP_NUM_THREADS environment variable, see Section 6.2 on page 602.
11 3.2.3 omp_get_max_threads
12
13 14 15
16 17
18
19 20
Summary
The omp_get_max_threads routine returns an upper bound on the number of threads that could be used to form a new team if a parallel construct without a num_threads clause were encountered after execution returns from this routine.
Format
C / C++
int omp_get_max_threads(void);
C / C++ Fortran
integer function omp_get_max_threads()
Fortran
Binding
The binding task set for an omp_get_max_threads region is the generating task.
336
OpenMP API – Version 5.0 November 2018
1 Effect
2 The value returned by omp_get_max_threads is the value of the first element of the
3 nthreads-var ICV of the current task. This value is also an upper bound on the number of threads
4 that could be used to form a new team if a parallel region without a num_threads clause were
5 encountered after execution returns from this routine.
6
7 Note – The return value of the omp_get_max_threads routine can be used to allocate
8 sufficient storage dynamically for all threads in the team formed at the subsequent active
9 parallel region.
10
11 Cross References
12 • nthreads-var ICV, see Section 2.5 on page 63.
13 • parallel construct and num_threads clause, see Section 2.6 on page 74.
14 • Determining the number of threads for a parallel region, see Section 2.6.1 on page 78.
15 • omp_set_num_threads routine, see Section 3.2.1 on page 334.
16 • omp_get_num_threads routine, see Section 3.2.2 on page 335.
17 • omp_get_thread_num routine, see Section 3.2.4 on page 337.
18 • OMP_NUM_THREADS environment variable, see Section 6.2 on page 602.
19 3.2.4 omp_get_thread_num
20 Summary
21 The omp_get_thread_num routine returns the thread number, within the current team, of the
22 calling thread.
23 Format
24 int omp_get_thread_num(void);
C / C++ Fortran
25 integer function omp_get_thread_num() Fortran
C / C++
CHAPTER3. RUNTIMELIBRARYROUTINES 337
1 Binding
2 The binding thread set for an omp_get_thread_num region is the current team. The binding
3 region for an omp_get_thread_num region is the innermost enclosing parallel region.
4 Effect
5 The omp_get_thread_num routine returns the thread number of the calling thread, within the
6 team that is executing the parallel region to which the routine region binds. The thread number
7 is an integer between 0 and one less than the value returned by omp_get_num_threads,
8 inclusive. The thread number of the master thread of the team is 0. The routine returns 0 if it is
9 called from the sequential part of a program.
10
11 Note – The thread number may change during the execution of an untied task. The value returned
12 by omp_get_thread_num is not generally useful during the execution of such a task region.
13
14 Cross References
15 • nthreads-var ICV, see Section 2.5 on page 63.
16 • parallel construct and num_threads clause, see Section 2.6 on page 74.
17 • Determining the number of threads for a parallel region, see Section 2.6.1 on page 78.
18 • omp_set_num_threads routine, see Section 3.2.1 on page 334.
19 • omp_get_num_threads routine, see Section 3.2.2 on page 335.
20 • OMP_NUM_THREADS environment variable, see Section 6.2 on page 602.
21 3.2.5 omp_get_num_procs
22 23
24 25
26
Summary
The omp_get_num_procs routine returns the number of processors available to the device. Format
int omp_get_num_procs(void);
C / C++
C / C++ Fortran
integer function omp_get_num_procs()
Fortran
338
OpenMP API – Version 5.0 November 2018
1 Binding
2 The binding thread set for an omp_get_num_procs region is all threads on a device. The effect
3 of executing this routine is not related to any specific region corresponding to any construct or API
4 routine.
5 Effect
6 The omp_get_num_procs routine returns the number of processors that are available to the
7 device at the time the routine is called. This value may change between the time that it is
8 determined by the omp_get_num_procs routine and the time that it is read in the calling
9 context due to system actions outside the control of the OpenMP implementation.
10 Cross References
11 • omp_get_num_places routine, see Section 3.2.24 on page 358.
12 • omp_get_place_num_procs routine, see Section 3.2.25 on page 359.
13 • omp_get_place_proc_ids routine, see Section 3.2.26 on page 360.
14 • omp_get_place_num routine, see Section 3.2.27 on page 362.
15 3.2.6 omp_in_parallel
16 Summary
17 The omp_in_parallel routine returns true if the active-levels-var ICV is greater than zero;
18 otherwise, it returns false.
19 Format
20 int omp_in_parallel(void);
C / C++
C / C++ Fortran
21 logical function omp_in_parallel() Fortran
22 Binding
23 The binding task set for an omp_in_parallel region is the generating task.
CHAPTER3. RUNTIMELIBRARYROUTINES 339
1 Effect
2 The effect of the omp_in_parallel routine is to return true if the current task is enclosed by an
3 active parallel region, and the parallel region is enclosed by the outermost initial task
4 region on the device; otherwise it returns false.
5 Cross References
6 • active-levels-var, see Section 2.5 on page 63.
7 • parallel construct, see Section 2.6 on page 74.
8 • omp_get_num_threads routine, see Section 3.2.2 on page 335.
9 • omp_get_active_level routine, see Section 3.2.21 on page 355.
10 3.2.7 omp_set_dynamic
11
12 13 14
15 16
17 18
19 20
Summary
The omp_set_dynamic routine enables or disables dynamic adjustment of the number of threads available for the execution of subsequent parallel regions by setting the value of the dyn-var ICV.
Format
C / C++
void omp_set_dynamic(int dynamic_threads); C / C++
Fortran Fortran
Binding
The binding task set for an omp_set_dynamic region is the generating task.
subroutine omp_set_dynamic(dynamic_threads)
logical dynamic_threads
340
OpenMP API – Version 5.0 November 2018
1 Effect
2 For implementations that support dynamic adjustment of the number of threads, if the argument to
3 omp_set_dynamic evaluates to true, dynamic adjustment is enabled for the current task;
4 otherwise, dynamic adjustment is disabled for the current task. For implementations that do not
5 support dynamic adjustment of the number of threads, this routine has no effect: the value of
6 dyn-var remains false.
7 Cross References
8 • dyn-var ICV, see Section 2.5 on page 63.
9 • Determining the number of threads for a parallel region, see Section 2.6.1 on page 78.
10 • omp_get_num_threads routine, see Section 3.2.2 on page 335.
11 • omp_get_dynamic routine, see Section 3.2.8 on page 341.
12 • OMP_DYNAMIC environment variable, see Section 6.3 on page 603.
13 3.2.8 omp_get_dynamic
14 Summary
15 The omp_get_dynamic routine returns the value of the dyn-var ICV, which determines whether
16 dynamic adjustment of the number of threads is enabled or disabled.
17 Format
18 int omp_get_dynamic(void);
C / C++
C / C++ Fortran
19 logical function omp_get_dynamic() Fortran
20 Binding
21 The binding task set for an omp_get_dynamic region is the generating task.
CHAPTER3. RUNTIMELIBRARYROUTINES 341
1 Effect
2 This routine returns true if dynamic adjustment of the number of threads is enabled for the current
3 task; it returns false, otherwise. If an implementation does not support dynamic adjustment of the
4 number of threads, then this routine always returns false.
5 Cross References
6 • dyn-var ICV, see Section 2.5 on page 63.
7 • Determining the number of threads for a parallel region, see Section 2.6.1 on page 78.
8 • omp_set_dynamic routine, see Section 3.2.7 on page 340.
9 • OMP_DYNAMIC environment variable, see Section 6.3 on page 603.
10 3.2.9 omp_get_cancellation
11
12 13
14 15
16
17 18
19 20
Summary
The omp_get_cancellation routine returns the value of the cancel-var ICV, which determines if cancellation is enabled or disabled.
Format
C / C++
int omp_get_cancellation(void);
C / C++ Fortran
logical function omp_get_cancellation()
Fortran
Binding
The binding task set for an omp_get_cancellation region is the whole program. Effect
This routine returns true if cancellation is enabled. It returns false otherwise.
342
OpenMP API – Version 5.0 November 2018
1 Cross References
2 • cancel-var ICV, see Section 2.5.1 on page 64.
3 • cancel construct, see Section 2.18.1 on page 263.
4 • OMP_CANCELLATION environment variable, see Section 6.11 on page 610.
5 3.2.10 omp_set_nested
6 Summary
7 The deprecated omp_set_nested routine enables or disables nested parallelism by setting the
8 max-active-levels-var ICV.
9 Format
10 void omp_set_nested(int nested);
11 12
C / C++
C / C++ Fortran
Fortran
subroutine omp_set_nested(nested) logical nested
13 Binding
14 The binding task set for an omp_set_nested region is the generating task.
15 Effect
16 If the argument to omp_set_nested evaluates to true, the value of the max-active-levels-var
17 ICV is set to the number of active levels of parallelism that the implementation supports; otherwise,
18 if the value of max-active-levels-var is greater than 1 then it is set to 1. This routine has been
19 deprecated.
CHAPTER3. RUNTIMELIBRARYROUTINES 343
9
10 11
12 13
14
15 16
17
18 19 20
Summary
The deprecated omp_get_nested routine returns whether nested parallelism is enabled or disabled, according to the value of the max-active-levels-var ICV.
1 Cross References
2 • max-active-levels-var ICV, see Section 2.5 on page 63.
3 • Determining the number of threads for a parallel region, see Section 2.6.1 on page 78.
4 • omp_get_nested routine, see Section 3.2.11 on page 344.
5 • omp_set_max_active_levels routine, see Section 3.2.16 on page 350.
6 • omp_get_max_active_levels routine, see Section 3.2.17 on page 351.
7 • OMP_NESTED environment variable, see Section 6.9 on page 609.
8 3.2.11 omp_get_nested
Format
int omp_get_nested(void);
C / C++
C / C++ Fortran
logical function omp_get_nested()
Fortran
Binding
The binding task set for an omp_get_nested region is the generating task. Effect
This routine returns true if max-active-levels-var is greater than 1 for the current task; it returns false, otherwise. If an implementation does not support nested parallelism, this routine always returns false. This routine has been deprecated.
344
OpenMP API – Version 5.0 November 2018
1 Cross References
2 • max-active-levels-var ICV, see Section 2.5 on page 63.
3 • Determining the number of threads for a parallel region, see Section 2.6.1 on page 78.
4 • omp_set_nested routine, see Section 3.2.10 on page 343.
5 • omp_set_max_active_levels routine, see Section 3.2.16 on page 350.
6 • omp_get_max_active_levels routine, see Section 3.2.17 on page 351.
7 • OMP_NESTED environment variable, see Section 6.9 on page 609.
8 3.2.12 omp_set_schedule
9 Summary
10 The omp_set_schedule routine affects the schedule that is applied when runtime is used as
11 schedule kind, by setting the value of the run-sched-var ICV.
12 Format
13 void omp_set_schedule(omp_sched_t kind, int chunk_size);
C / C++
14 15 16
C / C++ Fortran
Fortran
subroutine omp_set_schedule(kind, chunk_size) integer (kind=omp_sched_kind) kind
integer chunk_size
CHAPTER3. RUNTIMELIBRARYROUTINES 345
2 3 4 5 6
7 8 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35 36
C / C++
1
Constraints on Arguments
The first argument passed to this routine can be one of the valid OpenMP schedule kinds (except for runtime) or any implementation specific schedule. The C/C++ header file (omp.h) and the Fortran include file (omp_lib.h) and/or Fortran 90 module file (omp_lib) define the valid constants. The valid constants must include the following, which can be extended with implementation specific values:
typedef enum omp_sched_t {
// schedule kinds
omp_sched_static = 0x1,
omp_sched_dynamic = 0x2,
omp_sched_guided = 0x3,
omp_sched_auto = 0x4,
// schedule modifier
omp_sched_monotonic = 0x80000000u
} omp_sched_t;
C / C++ Fortran
! schedule kinds
integer(kind=omp_sched_kind), &
parameter :: omp_sched_static = &
int(Z’1’, kind=omp_sched_kind)
integer(kind=omp_sched_kind), &
parameter :: omp_sched_dynamic = &
int(Z’2’, kind=omp_sched_kind)
integer(kind=omp_sched_kind), &
parameter :: omp_sched_guided = &
int(Z’3’, kind=omp_sched_kind)
integer(kind=omp_sched_kind), &
parameter :: omp_sched__auto = &
int(Z’4’, kind=omp_sched_kind)
! schedule modifier
integer(kind=omp_sched_kind), &
parameter :: omp_sched_monotonic = &
int(Z’80000000’, kind=omp_sched_kind)
Fortran
346
OpenMP API – Version 5.0 November 2018
Binding
The binding task set for an omp_set_schedule region is the generating task.
1 Effect
2 The effect of this routine is to set the value of the run-sched-var ICV of the current task to the
3 values specified in the two arguments. The schedule is set to the schedule kind that is specified by
4 the first argument kind. It can be any of the standard schedule kinds or any other implementation
5 specific one. For the schedule kinds static, dynamic, and guided the chunk_size is set to the
6 value of the second argument, or to the default chunk_size if the value of the second argument is
7 less than 1; for the schedule kind auto the second argument has no meaning; for implementation
8 specific schedule kinds, the values and associated meanings of the second argument are
9 implementation defined.
10 Each of the schedule kinds can be combined with the omp_sched_monotonic modifier by
11 using the + or | operators in C/C++ or the + operator in Fortran. If the schedule kind is combined
12 with the omp_sched_monotonic modifier, the schedule is modified as if the monotonic
13 schedule modifier was specified. Otherwise, the schedule modifier is nonmonotonic.
14 Cross References
15 • run-sched-var ICV, see Section 2.5 on page 63.
16 • Determining the schedule of a worksharing-loop, see Section 2.9.2.1 on page 109.
17 • omp_set_schedule routine, see Section 3.2.12 on page 345.
18 • omp_get_schedule routine, see Section 3.2.13 on page 347.
19 • OMP_SCHEDULE environment variable, see Section 6.1 on page 601.
20 3.2.13 omp_get_schedule
21 Summary
22 The omp_get_schedule routine returns the schedule that is applied when the runtime schedule
23 is used.
24 Format
25 void omp_get_schedule(omp_sched_t *kind, int *chunk_size);
C / C++
26 27 28
C / C++ Fortran
Fortran
subroutine omp_get_schedule(kind, chunk_size) integer (kind=omp_sched_kind) kind
integer chunk_size
CHAPTER3. RUNTIMELIBRARYROUTINES 347
1 Binding
2 The binding task set for an omp_get_schedule region is the generating task.
3 Effect
4 This routine returns the run-sched-var ICV in the task to which the routine binds. The first
5 argument kind returns the schedule to be used. It can be any of the standard schedule kinds as
6 defined in Section 3.2.12 on page 345, or any implementation specific schedule kind. The second
7 argument chunk_size returns the chunk size to be used, or a value less than 1 if the default chunk
8 size is to be used, if the returned schedule kind is static, dynamic, or guided. The value
9 returned by the second argument is implementation defined for any other schedule kinds.
10 Cross References
11 • run-sched-var ICV, see Section 2.5 on page 63.
12 • Determining the schedule of a worksharing-loop, see Section 2.9.2.1 on page 109.
13 • omp_set_schedule routine, see Section 3.2.12 on page 345.
14 • OMP_SCHEDULE environment variable, see Section 6.1 on page 601.
15 3.2.14 omp_get_thread_limit
16
17 18
19 20
21
Summary
The omp_get_thread_limit routine returns the maximum number of OpenMP threads available to participate in the current contention group.
Format
C / C++
int omp_get_thread_limit(void);
C / C++ Fortran
integer function omp_get_thread_limit()
Fortran
348
OpenMP API – Version 5.0 November 2018
1 Binding
2 The binding thread set for an omp_get_thread_limit region is all threads on the device. The
3 effect of executing this routine is not related to any specific region corresponding to any construct
4 or API routine.
5 Effect
6 The omp_get_thread_limit routine returns the value of the thread-limit-var ICV.
7 Cross References
8 • thread-limit-var ICV, see Section 2.5 on page 63.
9 • omp_get_num_threads routine, see Section 3.2.2 on page 335.
10 • OMP_THREAD_LIMIT environment variable, see Section 6.10 on page 610.
11 • OMP_NUM_THREADS environment variable, see Section 6.2 on page 602.
12 3.2.15 omp_get_supported_active_levels
13 Summary
14 The omp_get_supported_active_levels routine returns the number of active levels of
15 parallelism supported by the implementation.
16 Format
17 int omp_get_supported_active_levels(void);
C / C++ Fortran
18 integer function omp_get_supported_active_levels() Fortran
19 Binding
20 The binding task set for an omp_get_supported_active_levels region is the generating
21 task.
C / C++
CHAPTER3. RUNTIMELIBRARYROUTINES 349
1 Effect
2 The omp_get_supported_active_levels routine returns the number of active levels of
3 parallelism supported by the implementation. The max-active-levels-var ICV may not have a value
4 that is greater than this number. The value returned by the
5 omp_get_supported_active_levels routine is implementation defined, but it must be
6 greater than 0.
7 Cross References
8 • max-active-levels-var ICV, see Section 2.5 on page 63.
9 • omp_get_max_active_levels routine, see Section 3.2.17 on page 351.
10 • omp_set_max_active_levels routine, see Section 3.2.16 on page 350.
11 3.2.16 omp_set_max_active_levels
12
13 14
15 16
17 18
19
20 21
Summary
The omp_set_max_active_levels routine limits the number of nested active parallel regions on the device, by setting the max-active-levels-var ICV
Format
C / C++
void omp_set_max_active_levels(int max_levels); C / C++
Fortran Fortran
Constraints on Arguments
The value of the argument passed to this routine must evaluate to a non-negative integer, otherwise the behavior of this routine is implementation defined.
subroutine omp_set_max_active_levels(max_levels) integer max_levels
350
OpenMP API – Version 5.0 November 2018
1 Binding
2 When called from a sequential part of the program, the binding thread set for an
3 omp_set_max_active_levels region is the encountering thread. When called from within
4 any parallel or teams region, the binding thread set (and binding region, if required) for the
5 omp_set_max_active_levels region is implementation defined.
6 Effect
7 The effect of this routine is to set the value of the max-active-levels-var ICV to the value specified
8 in the argument.
9 If the number of active levels requested exceeds the number of active levels of parallelism
10 supported by the implementation, the value of the max-active-levels-var ICV will be set to the
11 number of active levels supported by the implementation.
12 This routine has the described effect only when called from a sequential part of the program. When
13 called from within a parallel or teams region, the effect of this routine is implementation
14 defined.
15 Cross References
16 • max-active-levels-var ICV, see Section 2.5 on page 63.
17 • parallel construct, see Section 2.6 on page 74.
18 • omp_get_supported_active_levels routine, see Section 3.2.15 on page 349.
19 • omp_get_max_active_levels routine, see Section 3.2.17 on page 351.
20 • OMP_MAX_ACTIVE_LEVELS environment variable, see Section 6.8 on page 608.
21 3.2.17 omp_get_max_active_levels
22 Summary
23 The omp_get_max_active_levels routine returns the value of the max-active-levels-var
24 ICV, which determines the maximum number of nested active parallel regions on the device.
25 Format
26 int omp_get_max_active_levels(void);
C / C++ Fortran
27 integer function omp_get_max_active_levels() Fortran
C / C++
CHAPTER3. RUNTIMELIBRARYROUTINES 351
1 Binding
2 When called from a sequential part of the program, the binding thread set for an
3 omp_get_max_active_levels region is the encountering thread. When called from within
4 any parallel or teams region, the binding thread set (and binding region, if required) for the
5 omp_get_max_active_levels region is implementation defined.
6 Effect
7 The omp_get_max_active_levels routine returns the value of the max-active-levels-var
8 ICV, which determines the maximum number of nested active parallel regions on the device.
9 Cross References
10 • max-active-levels-var ICV, see Section 2.5 on page 63.
11 • parallel construct, see Section 2.6 on page 74.
12 • omp_get_supported_active_levels routine, see Section 3.2.15 on page 349.
13 • omp_set_max_active_levels routine, see Section 3.2.16 on page 350.
14 • OMP_MAX_ACTIVE_LEVELS environment variable, see Section 6.8 on page 608.
15 3.2.18 omp_get_level
16 17
18 19
20
21 22
Summary
The omp_get_level routine returns the value of the levels-var ICV. Format
int omp_get_level(void);
C / C++
C / C++ Fortran
integer function omp_get_level()
Fortran
Binding
The binding task set for an omp_get_level region is the generating task.
352
OpenMP API – Version 5.0 November 2018
1 Effect
2 The effect of the omp_get_level routine is to return the number of nested parallel regions
3 (whether active or inactive) that enclose the current task such that all of the parallel regions are
4 enclosed by the outermost initial task region on the current device.
5 Cross References
6 • levels-var ICV, see Section 2.5 on page 63.
7 • parallel construct, see Section 2.6 on page 74.
8 • omp_get_active_level routine, see Section 3.2.21 on page 355.
9 • OMP_MAX_ACTIVE_LEVELS environment variable, see Section 6.8 on page 608.
10 3.2.19 omp_get_ancestor_thread_num
11 Summary
12 The omp_get_ancestor_thread_num routine returns, for a given nested level of the current
13 thread, the thread number of the ancestor of the current thread.
14 Format
15 int omp_get_ancestor_thread_num(int level);
C / C++
16 17
18 Binding
C / C++ Fortran
Fortran
integer function omp_get_ancestor_thread_num(level) integer level
19 The binding thread set for an omp_get_ancestor_thread_num region is the encountering
20 thread. The binding region for an omp_get_ancestor_thread_num region is the innermost
21 enclosing parallel region.
CHAPTER3. RUNTIMELIBRARYROUTINES 353
1 Effect
2 The omp_get_ancestor_thread_num routine returns the thread number of the ancestor at a
3 given nest level of the current thread or the thread number of the current thread. If the requested
4 nest level is outside the range of 0 and the nest level of the current thread, as returned by the
5 omp_get_level routine, the routine returns -1.
6
7 Note – When the omp_get_ancestor_thread_num routine is called with a value of
8 level=0, the routine always returns 0. If level=omp_get_level(), the routine has the
9 same effect as the omp_get_thread_num routine.
10
11 Cross References
12 • parallel construct, see Section 2.6 on page 74.
13 • omp_get_num_threads routine, see Section 3.2.2 on page 335.
14 • omp_get_thread_num routine, see Section 3.2.4 on page 337.
15 • omp_get_level routine, see Section 3.2.18 on page 352.
16 • omp_get_team_size routine, see Section 3.2.20 on page 354.
17 3.2.20 omp_get_team_size
18
19 20
21 22
23 24
Summary
The omp_get_team_size routine returns, for a given nested level of the current thread, the size of the thread team to which the ancestor or the current thread belongs.
Format
C / C++
int omp_get_team_size(int level);
C / C++
Fortran Fortran
354
OpenMP API – Version 5.0 November 2018
integer function omp_get_team_size(level) integer level
1 Binding
2 The binding thread set for an omp_get_team_size region is the encountering thread. The
3 binding region for an omp_get_team_size region is the innermost enclosing parallel
4 region.
5 Effect
6 The omp_get_team_size routine returns the size of the thread team to which the ancestor or
7 the current thread belongs. If the requested nested level is outside the range of 0 and the nested
8 level of the current thread, as returned by the omp_get_level routine, the routine returns -1.
9 Inactive parallel regions are regarded like active parallel regions executed with one thread.
10
11 Note – When the omp_get_team_size routine is called with a value of level=0, the routine
12 always returns 1. If level=omp_get_level(), the routine has the same effect as the
13 omp_get_num_threads routine.
14
15 Cross References
16 • omp_get_num_threads routine, see Section 3.2.2 on page 335.
17 • omp_get_level routine, see Section 3.2.18 on page 352.
18 • omp_get_ancestor_thread_num routine, see Section 3.2.19 on page 353.
19 3.2.21 omp_get_active_level
20 Summary
21 The omp_get_active_level routine returns the value of the active-level-vars ICV..
22 Format
C / C++
23 int omp_get_active_level(void);
C / C++
Fortran
24 integer function omp_get_active_level() Fortran
CHAPTER3. RUNTIMELIBRARYROUTINES 355
1 Binding
2 The binding task set for the an omp_get_active_level region is the generating task.
3 Effect
4 The effect of the omp_get_active_level routine is to return the number of nested active
5 parallel regions enclosing the current task such that all of the parallel regions are enclosed
6 by the outermost initial task region on the current device.
7 Cross References
8 • active-levels-var ICV, see Section 2.5 on page 63.
9 • omp_get_level routine, see Section 3.2.18 on page 352.
10 • omp_set_max_active_levels routine, see Section 3.2.16 on page 350.
11 • omp_get_max_active_levels routine, see Section 3.2.17 on page 351.
12 • OMP_MAX_ACTIVE_LEVELS environment variable, see Section 6.8 on page 608.
13 3.2.22 omp_in_final
14
15 16
17 18
19
20 21
Summary
The omp_in_final routine returns true if the routine is executed in a final task region; otherwise, it returns false.
Format
int omp_in_final(void);
C / C++
C / C++ Fortran
logical function omp_in_final()
Fortran
Binding
The binding task set for an omp_in_final region is the generating task.
356
OpenMP API – Version 5.0 November 2018
1 Effect
2 omp_in_final returns true if the enclosing task region is final. Otherwise, it returns false.
3 Cross References
4 • task construct, see Section 2.10.1 on page 135.
5 3.2.23 omp_get_proc_bind
6 Summary
7 The omp_get_proc_bind routine returns the thread affinity policy to be used for the
8 subsequent nested parallel regions that do not specify a proc_bind clause.
9 Format
10 omp_proc_bind_t omp_get_proc_bind(void);
C / C++ Fortran
11 integer (kind=omp_proc_bind_kind) function omp_get_proc_bind() Fortran
12 Constraints on Arguments
13 The value returned by this routine must be one of the valid affinity policy kinds. The C/C++ header
14 file (omp.h) and the Fortran include file (omp_lib.h) and/or Fortran 90 module file (omp_lib)
15 define the valid constants. The valid constants must include the following:
C / C++
typedef enum omp_proc_bind_t {
omp_proc_bind_false = 0,
omp_proc_bind_true = 1,
omp_proc_bind_master = 2,
omp_proc_bind_close = 3,
omp_proc_bind_spread = 4
} omp_proc_bind_t;
16
17
18
19
20
21
22
C / C++
C / C++
CHAPTER3. RUNTIMELIBRARYROUTINES 357
Fortran
integer (kind=omp_proc_bind_kind), &
parameter :: omp_proc_bind_false = 0
integer (kind=omp_proc_bind_kind), &
parameter :: omp_proc_bind_true = 1
integer (kind=omp_proc_bind_kind), &
parameter :: omp_proc_bind_master = 2
integer (kind=omp_proc_bind_kind), &
parameter :: omp_proc_bind_close = 3
integer (kind=omp_proc_bind_kind), &
parameter :: omp_proc_bind_spread = 4
1 2 3 4 5 6 7 8 9
10
Fortran
11 Binding
12 The binding task set for an omp_get_proc_bind region is the generating task.
13 Effect
14 The effect of this routine is to return the value of the first element of the bind-var ICV of the current
15 task. See Section 2.6.2 on page 80 for the rules that govern the thread affinity policy.
16 Cross References
17 • bind-var ICV, see Section 2.5 on page 63.
18 • Controlling OpenMP thread affinity, see Section 2.6.2 on page 80.
19 • omp_get_num_places routine, see Section 3.2.24 on page 358.
20 • OMP_PROC_BIND environment variable, see Section 6.4 on page 604.
21 • OMP_PLACES environment variable, see Section 6.5 on page 605.
22 3.2.24 omp_get_num_places
23
24 25
Summary
The omp_get_num_places routine returns the number of places available to the execution environment in the place list.
358
OpenMP API – Version 5.0 November 2018
1 Format
2 int omp_get_num_places(void);
C / C++ Fortran
3 integer function omp_get_num_places() Fortran
4 Binding
5 The binding thread set for an omp_get_num_places region is all threads on a device. The
6 effect of executing this routine is not related to any specific region corresponding to any construct
7 or API routine.
8 Effect
9 The omp_get_num_places routine returns the number of places in the place list. This value is
10 equivalent to the number of places in the place-partition-var ICV in the execution environment of
11 the initial task.
12 Cross References
13 • place-partition-var ICV, see Section 2.5 on page 63.
14 • Controlling OpenMP thread affinity, see Section 2.6.2 on page 80.
15 • omp_get_place_num routine, see Section 3.2.27 on page 362.
16 • OMP_PLACES environment variable, see Section 6.5 on page 605.
17 3.2.25 omp_get_place_num_procs
18 Summary
19 The omp_get_place_num_procs routine returns the number of processors available to the
20 execution environment in the specified place.
C / C++
CHAPTER3. RUNTIMELIBRARYROUTINES 359
1 Format
2 int omp_get_place_num_procs(int place_num);
C / C++
3 4
5 Binding
C / C++ Fortran
Fortran
6 The binding thread set for an omp_get_place_num_procs region is all threads on a device.
7 The effect of executing this routine is not related to any specific region corresponding to any
8 construct or API routine.
9 Effect
10 The omp_get_place_num_procs routine returns the number of processors associated with
11 the place numbered place_num. The routine returns zero when place_num is negative, or is greater
12 than or equal to the value returned by omp_get_num_places().
13 Cross References
14 • place-partition-var ICV, see Section 2.5 on page 63.
15 • Controlling OpenMP thread affinity, see Section 2.6.2 on page 80.
16 • omp_get_num_places routine, see Section 3.2.24 on page 358.
17 • omp_get_place_proc_ids routine, see Section 3.2.26 on page 360.
18 • OMP_PLACES environment variable, see Section 6.5 on page 605.
19 3.2.26 omp_get_place_proc_ids
20
21 22
Summary
The omp_get_place_proc_ids routine returns the numerical identifiers of the processors available to the execution environment in the specified place.
360
OpenMP API – Version 5.0 November 2018
integer function omp_get_place_num_procs(place_num) integer place_num
1 Format
2 void omp_get_place_proc_ids(int place_num, int *ids);
C / C++
3 4 5
6 Binding
C / C++ Fortran
Fortran
subroutine omp_get_place_proc_ids(place_num, ids) integer place_num
integer ids(*)
7 The binding thread set for an omp_get_place_proc_ids region is all threads on a device.
8 The effect of executing this routine is not related to any specific region corresponding to any
9 construct or API routine.
10 Effect
11 The omp_get_place_proc_ids routine returns the numerical identifiers of each processor
12 associated with the place numbered place_num. The numerical identifiers are non-negative, and
13 their meaning is implementation defined. The numerical identifiers are returned in the array ids and
14 their order in the array is implementation defined. The array must be sufficiently large to contain
15 omp_get_place_num_procs(place_num) integers; otherwise, the behavior is unspecified.
16 The routine has no effect when place_num has a negative value, or a value greater than or equal to
17 omp_get_num_places().
18 Cross References
19 • place-partition-var ICV, see Section 2.5 on page 63.
20 • Controlling OpenMP thread affinity, see Section 2.6.2 on page 80.
21 • omp_get_num_places routine, see Section 3.2.24 on page 358.
22 • omp_get_place_num_procs routine, see Section 3.2.25 on page 359.
23 • OMP_PLACES environment variable, see Section 6.5 on page 605.
CHAPTER3. RUNTIMELIBRARYROUTINES 361
1 3.2.27 omp_get_place_num
2 Summary
3 The omp_get_place_num routine returns the place number of the place to which the
4 encountering thread is bound.
5 Format
6 int omp_get_place_num(void);
C / C++ Fortran
7 integer function omp_get_place_num() Fortran
8 Binding
9 The binding thread set for an omp_get_place_num region is the encountering thread.
10 Effect
11 When the encountering thread is bound to a place, the omp_get_place_num routine returns the
12 place number associated with the thread. The returned value is between 0 and one less than the
13 value returned by omp_get_num_places(), inclusive. When the encountering thread is not
14 bound to a place, the routine returns -1.
15 Cross References
16 • place-partition-var ICV, see Section 2.5 on page 63.
17 • Controlling OpenMP thread affinity, see Section 2.6.2 on page 80.
18 • omp_get_num_places routine, see Section 3.2.24 on page 358.
19 • OMP_PLACES environment variable, see Section 6.5 on page 605.
20 3.2.28 omp_get_partition_num_places
C / C++
21
22 23
Summary
The omp_get_partition_num_places routine returns the number of places in the place partition of the innermost implicit task.
362
OpenMP API – Version 5.0 November 2018
1 Format
2 int omp_get_partition_num_places(void);
C / C++ Fortran
3 integer function omp_get_partition_num_places() Fortran
4 Binding
5 The binding task set for an omp_get_partition_num_places region is the encountering
6 implicit task.
7 Effect
8 The omp_get_partition_num_places routine returns the number of places in the
9 place-partition-var ICV.
10 Cross References
11 • place-partition-var ICV, see Section 2.5 on page 63.
12 • Controlling OpenMP thread affinity, see Section 2.6.2 on page 80.
13 • omp_get_num_places routine, see Section 3.2.24 on page 358.
14 • OMP_PLACES environment variable, see Section 6.5 on page 605.
15 3.2.29 omp_get_partition_place_nums
16 Summary
17 The omp_get_partition_place_nums routine returns the list of place numbers
18 corresponding to the places in the place-partition-var ICV of the innermost implicit task.
C / C++
CHAPTER3. RUNTIMELIBRARYROUTINES 363
1 Format
2 void omp_get_partition_place_nums(int *place_nums);
C / C++
3 4
C / C++ Fortran
Fortran
5 Binding
6 The binding task set for an omp_get_partition_place_nums region is the encountering
7 implicit task.
8 Effect
9 The omp_get_partition_place_nums routine returns the list of place numbers that
10 correspond to the places in the place-partition-var ICV of the innermost implicit task. The array
11 must be sufficiently large to contain omp_get_partition_num_places() integers;
12 otherwise, the behavior is unspecified.
13 Cross References
14 • place-partition-var ICV, see Section 2.5 on page 63.
15 • Controlling OpenMP thread affinity, see Section 2.6.2 on page 80.
16 • omp_get_partition_num_places routine, see Section 3.2.28 on page 362.
17 • OMP_PLACES environment variable, see Section 6.5 on page 605.
18 3.2.30 omp_set_affinity_format
19
20 21
Summary
The omp_set_affinity_format routine sets the affinity format to be used on the device by setting the value of the affinity-format-var ICV.
364
OpenMP API – Version 5.0 November 2018
subroutine omp_get_partition_place_nums(place_nums) integer place_nums(*)
1 Format
2 void omp_set_affinity_format(const char *format);
C / C++
3 4
5 Binding
C / C++ Fortran
Fortran
subroutine omp_set_affinity_format(format) character(len=*),intent(in) :: format
6 When called from a sequential part of the program, the binding thread set for an
7 omp_set_affinity_format region is the encountering thread. When called from within any
8 parallel or teams region, the binding thread set (and binding region, if required) for the
9 omp_set_affinity_format region is implementation defined.
10 Effect
11 The effect of omp_set_affinity_format routine is to copy the character string specified by
12 the format argument into the affinity-format-var ICV on the current device.
13 This routine has the described effect only when called from a sequential part of the program. When
14 called from within a parallel or teams region, the effect of this routine is implementation
15 defined.
16 Cross References
17 • Controlling OpenMP thread affinity, see Section 2.6.2 on page 80.
18 • omp_get_affinity_format routine, see Section 3.2.31 on page 366.
19 • omp_display_affinity routine, see Section 3.2.32 on page 367.
20 • omp_capture_affinity routine, see Section 3.2.33 on page 368.
21 • OMP_DISPLAY_AFFINITY environment variable, see Section 6.13 on page 612.
22 • OMP_AFFINITY_FORMAT environment variable, see Section 6.14 on page 613.
CHAPTER3. RUNTIMELIBRARYROUTINES 365
2
3 4
5 6
7 8
9
10 11 12 13
14
15
16
17
18
19
20
21 22 23 24
25 26
Summary
The omp_get_affinity_format routine returns the value of the affinity-format-var ICV on the device.
Format
C / C++
size_t omp_get_affinity_format(char *buffer, size_t size); C / C++
Fortran Fortran
Binding
When called from a sequential part of the program, the binding thread set for an omp_get_affinity_format region is the encountering thread. When called from within any parallel or teams region, the binding thread set (and binding region, if required) for the omp_get_affinity_format region is implementation defined.
Effect
C / C++
The omp_get_affinity_format routine returns the number of characters in the affinity-format-var ICV on the current device, excluding the terminating null byte (’\0’) and if size is non-zero, writes the value of the affinity-format-var ICV on the current device to buffer followed by a null byte. If the return value is larger or equal to size, the affinity format specification is truncated, with the terminating null byte stored to buffer[size-1]. If size is zero, nothing is stored and buffer may be NULL.
C / C++
Fortran
The omp_get_affinity_format routine returns the number of characters that are required to
hold the affinity-format-var ICV on the current device and writes the value of the affinity-format-var ICV on the current device to buffer. If the return value is larger than len(buffer), the affinity format specification is truncated.
Fortran
If the buffer argument does not conform to the specified format then the result is implementation defined.
1 3.2.31 omp_get_affinity_format
integer function omp_get_affinity_format(buffer)
character(len=*),intent(out) :: buffer
366
OpenMP API – Version 5.0 November 2018
1 Cross References
2 • Controlling OpenMP thread affinity, see Section 2.6.2 on page 80.
3 • omp_set_affinity_format routine, see Section 3.2.30 on page 364.
4 • omp_display_affinity routine, see Section 3.2.32 on page 367.
5 • omp_capture_affinity routine, see Section 3.2.33 on page 368.
6 • OMP_DISPLAY_AFFINITY environment variable, see Section 6.13 on page 612.
7 • OMP_AFFINITY_FORMAT environment variable, see Section 6.14 on page 613.
8 3.2.32 omp_display_affinity
9 Summary
10 The omp_display_affinity routine prints the OpenMP thread affinity information using the
11 format specification provided.
12 Format
13 void omp_display_affinity(const char *format);
C / C++
14 15
C / C++ Fortran
Fortran
subroutine omp_display_affinity(format) character(len=*),intent(in) :: format
16 Binding
17 The binding thread set for an omp_display_affinity region is the encountering thread.
18 Effect
19 The omp_display_affinity routine prints the thread affinity information of the current
20 thread in the format specified by the format argument, followed by a new-line. If the format is
21 NULL (for C/C++) or a zero-length string (for Fortran and C/C++), the value of the
22 affinity-format-var ICV is used. If the format argument does not conform to the specified format
23 then the result is implementation defined.
CHAPTER3. RUNTIMELIBRARYROUTINES 367
9
10 11
12
13 14 15 16 17
18 19 20
21 22
Summary
The omp_capture_affinity routine prints the OpenMP thread affinity information into a buffer using the format specification provided.
1 Cross References
2 • Controlling OpenMP thread affinity, see Section 2.6.2 on page 80.
3 • omp_set_affinity_format routine, see Section 3.2.30 on page 364.
4 • omp_get_affinity_format routine, see Section 3.2.31 on page 366.
5 • omp_capture_affinity routine, see Section 3.2.33 on page 368.
6 • OMP_DISPLAY_AFFINITY environment variable, see Section 6.13 on page 612.
7 • OMP_AFFINITY_FORMAT environment variable, see Section 6.14 on page 613.
8 3.2.33 omp_capture_affinity
Format
C / C++
C / C++ Fortran
Fortran
size_t omp_capture_affinity(
char *buffer, size_t size,
const char *format
);
368
OpenMP API – Version 5.0 November 2018
integer function omp_capture_affinity(buffer,format) character(len=*),intent(out) :: buffer character(len=*),intent(in) :: format
Binding
The binding thread set for an omp_capture_affinity region is the encountering thread.
1 Effect
C / C++
2 The omp_capture_affinity routine returns the number of characters in the entire thread
3 affinity information string excluding the terminating null byte (’\0’) and if size is non-zero, writes
4 the thread affinity information of the current thread in the format specified by the format argument
5 into the character string buffer followed by a null byte. If the return value is larger or equal to
6 size, the thread affinity information string is truncated, with the terminating null byte stored to
7 buffer[size-1]. If size is zero, nothing is stored and buffer may be NULL. If the format is NULL or
8 a zero-length string, the value of the affinity-format-var ICV is used.
C / C++ Fortran
9 The omp_capture_affinity routine returns the number of characters required to hold the
10 entire thread affinity information string and prints the thread affinity information of the current
11 thread into the character string buffer with the size of len(buffer) in the format specified by
12 the format argument. If the format is a zero-length string, the value of the affinity-format-var ICV
13 is used. If the return value is larger than len(buffer), the thread affinity information string is
14 truncated. If the format is a zero-length string, the value of the affinity-format-var ICV is used. Fortran
15 If the format argument does not conform to the specified format then the result is implementation
16 defined.
17 Cross References
18 • Controlling OpenMP thread affinity, see Section 2.6.2 on page 80.
19 • omp_set_affinity_format routine, see Section 3.2.30 on page 364.
20 • omp_get_affinity_format routine, see Section 3.2.31 on page 366.
21 • omp_display_affinity routine, see Section 3.2.32 on page 367.
22 • OMP_DISPLAY_AFFINITY environment variable, see Section 6.13 on page 612.
23 • OMP_AFFINITY_FORMAT environment variable, see Section 6.14 on page 613.
24 3.2.34 omp_set_default_device
25 Summary
26 The omp_set_default_device routine controls the default target device by assigning the
27 value of the default-device-var ICV.
CHAPTER3. RUNTIMELIBRARYROUTINES 369
1 Format
2 void omp_set_default_device(int device_num);
C / C++
3 4
C / C++ Fortran
Fortran
5 Binding
6 The binding task set for an omp_set_default_device region is the generating task.
7 Effect
8 The effect of this routine is to set the value of the default-device-var ICV of the current task to the
9 value specified in the argument. When called from within a target region the effect of this
10 routine is unspecified.
11 Cross References
12 • default-device-var, see Section 2.5 on page 63.
13 • target construct, see Section 2.12.5 on page 170
14 • omp_get_default_device, see Section 3.2.35 on page 370.
15 • OMP_DEFAULT_DEVICE environment variable, see Section 6.15 on page 615
16 3.2.35 omp_get_default_device
17 18
Summary
The omp_get_default_device routine returns the default target device.
370
OpenMP API – Version 5.0 November 2018
subroutine omp_set_default_device(device_num) integer device_num
1 Format
2 int omp_get_default_device(void);
C / C++ Fortran
3 integer function omp_get_default_device() Fortran
4 Binding
5 The binding task set for an omp_get_default_device region is the generating task.
6 Effect
7 The omp_get_default_device routine returns the value of the default-device-var ICV of the
8 current task. When called from within a target region the effect of this routine is unspecified.
9 Cross References
10 • default-device-var, see Section 2.5 on page 63.
11 • target construct, see Section 2.12.5 on page 170
12 • omp_set_default_device, see Section 3.2.34 on page 369.
13 • OMP_DEFAULT_DEVICE environment variable, see Section 6.15 on page 615.
14 3.2.36 omp_get_num_devices
15 Summary
16 The omp_get_num_devices routine returns the number of target devices.
17 Format
C / C++
18 int omp_get_num_devices(void);
C / C++
Fortran
19 integer function omp_get_num_devices() Fortran
C / C++
CHAPTER3. RUNTIMELIBRARYROUTINES 371
1 Binding
2 The binding task set for an omp_get_num_devices region is the generating task.
3 Effect
4 The omp_get_num_devices routine returns the number of available target devices. When
5 called from within a target region the effect of this routine is unspecified.
6 Cross References
7 • target construct, see Section 2.12.5 on page 170
8 • omp_get_default_device, see Section 3.2.35 on page 370.
9 • omp_get_device_num, see Section 3.2.37 on page 372.
10 3.2.37 omp_get_device_num
11
12 13
14 15
16
17 18
19
20 21 22
Summary
The omp_get_device_num routine returns the device number of the device on which the calling thread is executing.
Format
C / C++
int omp_get_device_num(void);
C / C++ Fortran
integer function omp_get_device_num()
Fortran
Binding
The binding task set for an omp_get_devices_num region is the generating task. Effect
The omp_get_device_num routine returns the device number of the device on which the calling thread is executing. When called on the host device, it will return the same value as the omp_get_initial_device routine.
372
OpenMP API – Version 5.0 November 2018
1 Cross References
2 • target construct, see Section 2.12.5 on page 170
3 • omp_get_default_device, see Section 3.2.35 on page 370.
4 • omp_get_num_devices, see Section 3.2.36 on page 371.
5 • omp_get_initial_device routine, see Section 3.2.41 on page 376.
6 3.2.38 omp_get_num_teams
7 Summary
8 The omp_get_num_teams routine returns the number of initial teams in the current teams
9 region.
10 Format
11 int omp_get_num_teams(void);
C / C++ Fortran
12 integer function omp_get_num_teams() Fortran
13 Binding
14 The binding task set for an omp_get_num_teams region is the generating task
15 Effect
16 The effect of this routine is to return the number of initial teams in the current teams region. The
17 routine returns 1 if it is called from outside of a teams region.
18 Cross References
19 • teams construct, see Section 2.7 on page 82.
20 • target construct, see Section 2.12.5 on page 170.
21 • omp_get_team_num routine, see Section 3.2.39 on page 374.
C / C++
CHAPTER3. RUNTIMELIBRARYROUTINES 373
1 3.2.39 omp_get_team_num
2 3
4 5
6
7 8
9
10 11 12 13
14 15 16 17
Summary
The omp_get_team_num routine returns the initial team number of the calling thread. Format
int omp_get_team_num(void);
C / C++
C / C++ Fortran
integer function omp_get_team_num()
Fortran
Binding
The binding task set for an omp_get_team_num region is the generating task. Effect
The omp_get_team_num routine returns the initial team number of the calling thread. The initial team number is an integer between 0 and one less than the value returned by omp_get_num_teams(), inclusive. The routine returns 0 if it is called outside of a teams region.
Cross References
• teams construct, see Section 2.7 on page 82.
• target construct, see Section 2.12.5 on page 170
• omp_get_num_teams routine, see Section 3.2.38 on page 373.
374
OpenMP API – Version 5.0 November 2018
1 3.2.40 omp_is_initial_device
2 Summary
3 The omp_is_initial_device routine returns true if the current task is executing on the host
4 device; otherwise, it returns false.
5 Format
6 int omp_is_initial_device(void);
C / C++ Fortran
7 logical function omp_is_initial_device() Fortran
8 Binding
9 The binding task set for an omp_is_initial_device region is the generating task.
10 Effect
11 The effect of this routine is to return true if the current task is executing on the host device;
12 otherwise, it returns false.
13 Cross References
14 • omp_get_get_initial_device routine, see Section 3.2.41 on page 376.
15 • Device memory routines, see Section 3.6 on page 397.
C / C++
CHAPTER3. RUNTIMELIBRARYROUTINES 375
1 3.2.41 omp_get_initial_device
2
3 4
5 6
7
8 9
10
11 12 13
14 15 16 17
Summary
The omp_get_initial_device routine returns a device number that represents the host device.
Format
C / C++
int omp_get_initial_device(void);
C / C++ Fortran
integer function omp_get_initial_device()
Fortran
Binding
The binding task set for an omp_get_initial_device region is the generating task. Effect
The effect of this routine is to return the device number of the host device. The value of the device number is implementation defined. When called from within a target region the effect of this routine is unspecified.
Cross References
• target construct, see Section 2.12.5 on page 170.
• omp_is_initial_device routine, see Section 3.2.40 on page 375. • Device memory routines, see Section 3.6 on page 397.
376
OpenMP API – Version 5.0 November 2018
1 3.2.42 omp_get_max_task_priority
2 Summary
3 The omp_get_max_task_priority routine returns the maximum value that can be specified
4 in the priority clause.
5 Format
6 int omp_get_max_task_priority(void);
C / C++ Fortran
7 integer function omp_get_max_task_priority() Fortran
8 Binding
9 The binding thread set for an omp_get_max_task_priority region is all threads on the
10 device. The effect of executing this routine is not related to any specific region that corresponds to
11 any construct or API routine.
12 Effect
13 The omp_get_max_task_priority routine returns the value of the max-task-priority-var
14 ICV, which determines the maximum value that can be specified in the priority clause.
15 Cross References
16 • max-task-priority-var, see Section 2.5 on page 63.
17 • task construct, see Section 2.10.1 on page 135.
C / C++
CHAPTER3. RUNTIMELIBRARYROUTINES 377
2
3 4
5
6 7 8 9
10 11 12
13
14 15 16 17
18
19 20 21 22
23 24 25 26
Summary
The omp_pause_resource routine allows the runtime to relinquish resources used by OpenMP on the specified device.
1 3.2.43 omp_pause_resource
Format
C / C++
C / C++ Fortran
Fortran
int omp_pause_resource(
omp_pause_resource_t kind,
int device_num
);
integer function omp_pause_resource(kind, device_num)
integer (kind=omp_pause_resource_kind) kind
integer device_num
Constraints on Arguments
The first argument passed to this routine can be one of the valid OpenMP pause kind, or any implementation specific pause kind. The C/C++ header file (omp.h) and the Fortran include file (omp_lib.h) and/or Fortran 90 module file (omp_lib) define the valid constants. The valid constants must include the following, which can be extended with implementation specific values:
Format
C / C++
C / C++ Fortran
Fortran
typedef enum omp_pause_resource_t {
omp_pause_soft = 1,
omp_pause_hard = 2
} omp_pause_resource_t;
integer (kind=omp_pause_resource_kind), parameter :: &
omp_pause_soft = 1
integer (kind=omp_pause_resource_kind), parameter :: &
omp_pause_hard = 2
378
OpenMP API – Version 5.0 November 2018
1 The second argument passed to this routine indicates the device that will be paused. The
2 device_num parameter must be greater than or equal to zero and less than the result of
3 omp_get_num_devices() or equal to the result of a call to
4 omp_get_initial_device().
5 Binding
6 The binding task set for an omp_pause_resource region is the whole program.
7 Effect
8 The omp_pause_resource routine allows the runtime to relinquish resources used by OpenMP
9 on the specified device.
10 If successful, the omp_pause_hard value results in a hard pause for which the OpenMP state is
11 not guaranteed to persist across the omp_pause_resource call. A hard pause may relinquish
12 any data allocated by OpenMP on a given device, including data allocated by memory routines for
13 that device as well as data present on the device as a result of a declare target or target
14 data construct. A hard pause may also relinquish any data associated with a threadprivate
15 directive. When relinquished and when applicable, base language appropriate
16 deallocation/finalization is performed. When relinquished and when applicable, mapped data on a
17 device will not be copied back from the device to the host.
18 If successful, the omp_pause_soft value results in a soft pause for which the OpenMP state is
19 guaranteed to persist across the call, with the exception of any data associated with a
20 threadprivate directive, which may be relinquished across the call. When relinquished and
21 when applicable, base language appropriate deallocation/finalization is performed.
22
23 Note – A hard pause may relinquish more resources, but may resume processing OpenMP regions
24 more slowly. A soft pause allows OpenMP regions to restart more quickly, but may relinquish fewer
25 resources. An OpenMP implementation will reclaim resources as needed for OpenMP regions
26 encountered after the omp_pause_resource region. Since a hard pause may unmap data on the
27 specified device, appropriate data mapping is required before using data on the specified device
28 after the omp_pause_region region.
29
30 The routine returns zero in case of success, and nonzero otherwise.
31 Tool Callbacks
32 If the tool is not allowed to interact with the specified device after encountering this call, then the
33 runtime must call the tool finalizer for that device.
CHAPTER3. RUNTIMELIBRARYROUTINES 379
1 Restrictions
2 The omp_pause_resource routine has the following restrictions:
3 •
4 •
5
The omp_pause_resource region may not be nested in any explicit OpenMP region. The routine may only be called when all explicit tasks have finalized execution. Calling the
routine in any other circumstances may result in unspecified behavior.
6 Cross References
7 • target construct, see Section 2.12.5 on page 170
8 • declare target directive, see Section 2.12.7 on page 180
9 • threadprivate directives, see Section 2.19.2 on page 274.
10 • omp_get_num_devices, see Section 3.2.36 on page 371.
11 • omp_get_get_initial_device routine, see Section 3.2.41 on page 376.
12 • To pause resources on all devices at once, see Section 3.2.44 on page 380.
13 3.2.44 omp_pause_resource_all
14
15 16
17 18
19 20
21 22
Summary
The omp_pause_resource_all routine allows the runtime to relinquish resources used by OpenMP on all devices.
Format
C / C++
int omp_pause_resource_all(omp_pause_resource_t kind);
C / C++ Fortran
Fortran
380
OpenMP API – Version 5.0 November 2018
integer function omp_pause_resource_all(kind)
integer (kind=omp_pause_resource_kind) kind
Binding
The binding task set for an omp_pause_resource_all region is the whole program.
1
Effect
The omp_pause_resource_all routine allows the runtime to relinquish resources used by OpenMP on all devices. It is equivalent to calling the omp_pause_resource routine once for each available device, including the host device.
The argument kind passed to this routine can be one of the valid OpenMP pause kind as defined in Section 3.2.43 on page 378, or any implementation specific pause kind.
Tool Callbacks
If the tool is not allowed to interact with a given device after encountering this call, then the runtime must call the tool finalizer for that device.
Restrictions
The omp_pause_resource_all routine has the following restrictions:
• The omp_pause_resource_all region may not be nested in any explicit OpenMP region.
• The routine may only be called when all explicit tasks have finalized execution. Calling the routine in any other circumstances may result in unspecified behavior.
Cross References
• target construct, see Section 2.12.5 on page 170
• declare target directive, see Section 2.12.7 on page 180
• omp_get_num_devices, see Section 3.2.36 on page 371.
• omp_get_get_initial_device routine, see Section 3.2.41 on page 376. • To pause resources on a specific device only, see Section 3.2.43 on page 378.
Lock Routines
The OpenMP runtime library includes a set of general-purpose lock routines that can be used for synchronization. These general-purpose lock routines operate on OpenMP locks that are represented by OpenMP lock variables. OpenMP lock variables must be accessed only through the routines described in this section; programs that otherwise access OpenMP lock variables are non-conforming.
2 3 4
5 6
7
8 9
10
11
12
13 14
15
16
17
18
19
20
21 3.3
22 23 24 25 26
CHAPTER3. RUNTIMELIBRARYROUTINES 381
1 2 3 4 5
6 7 8 9
10
11 12 13 14
15 16 17
18 19 20
21
22 23 24 25
26
27 28 29
30 31
An OpenMP lock can be in one of the following states: uninitialized; unlocked; or locked. If a lock is in the unlocked state, a task can set the lock, which changes its state to locked. The task that sets the lock is then said to own the lock. A task that owns a lock can unset that lock, returning it to the unlocked state. A program in which a task unsets a lock that is owned by another task is non-conforming.
Two types of locks are supported: simple locks and nestable locks. A nestable lock can be set multiple times by the same task before being unset; a simple lock cannot be set if it is already owned by the task trying to set it. Simple lock variables are associated with simple locks and can only be passed to simple lock routines. Nestable lock variables are associated with nestable locks and can only be passed to nestable lock routines.
Each type of lock can also have a synchronization hint that contains information about the intended usage of the lock by the application code. The effect of the hint is implementation defined. An OpenMP implementation can use this hint to select a usage-specific lock, but hints do not change the mutual exclusion semantics of locks. A conforming implementation can safely ignore the hint.
Constraints on the state and ownership of the lock accessed by each of the lock routines are described with the routine. If these constraints are not met, the behavior of the routine is unspecified.
The OpenMP lock routines access a lock variable such that they always read and update the most current value of the lock variable. It is not necessary for an OpenMP program to include explicit flush directives to ensure that the lock variable’s value is consistent among different tasks.
Binding
The binding thread set for all lock routine regions is all threads in the contention group. As a consequence, for each OpenMP lock, the lock routine effects relate to all tasks that call the routines, without regard to which teams the threads in the contention group that are executing the tasks belong.
Simple Lock Routines
C / C++
The type omp_lock_t represents a simple lock. For the following routines, a simple lock variable must be of omp_lock_t type. All simple lock routines require an argument that is a pointer to a variable of type omp_lock_t.
C / C++ Fortran
For the following routines, a simple lock variable must be an integer variable of kind=omp_lock_kind.
Fortran
382
OpenMP API – Version 5.0 November 2018
1 The simple lock routines are as follows:
2 • The omp_init_lock routine initializes a simple lock;
3 • The omp_init_lock_with_hint routine initializes a simple lock and attaches a hint to it;
4 • The omp_destroy_lock routine uninitializes a simple lock;
5 • The omp_set_lock routine waits until a simple lock is available and then sets it;
6 • The omp_unset_lock routine unsets a simple lock; and
7 • The omp_test_lock routine tests a simple lock and sets it if it is available.
8 Nestable Lock Routines
C / C++
9 The type omp_nest_lock_t represents a nestable lock. For the following routines, a nestable
10 lock variable must be of omp_nest_lock_t type. All nestable lock routines require an
11 argument that is a pointer to a variable of type omp_nest_lock_t.
C / C++ Fortran
12 For the following routines, a nestable lock variable must be an integer variable of
13 kind=omp_nest_lock_kind.
Fortran
14 The nestable lock routines are as follows:
15 •
16 •
17
18 •
19 •
20 •
21 •
The omp_init_nest_lock routine initializes a nestable lock;
The omp_init_nest_lock_with_hint routine initializes a nestable lock and attaches a
hint to it;
The omp_destroy_nest_lock routine uninitializes a nestable lock;
The omp_set_nest_lock routine waits until a nestable lock is available and then sets it; The omp_unset_nest_lock routine unsets a nestable lock; and
The omp_test_nest_lock routine tests a nestable lock and sets it if it is available.
22 Restrictions
23 OpenMP lock routines have the following restriction:
24 • The use of the same OpenMP lock in different contention groups results in unspecified behavior.
CHAPTER3. RUNTIMELIBRARYROUTINES 383
1 3.3.1 2
3
4
5 6
7 8 9
10 11
12
13 14
15
16 17
18
19 20 21
omp_init_lock and omp_init_nest_lock Summary
These routines initialize an OpenMP lock without a hint.
Format
C / C++
C / C++ Fortran
Fortran
subroutine omp_init_lock(svar) integer (kind=omp_lock_kind) svar
subroutine omp_init_nest_lock(nvar) integer (kind=omp_nest_lock_kind) nvar
384
OpenMP API – Version 5.0 November 2018
void omp_init_lock(omp_lock_t *lock);
void omp_init_nest_lock(omp_nest_lock_t *lock);
Constraints on Arguments
A program that accesses a lock that is not in the uninitialized state through either routine is non-conforming.
Effect
The effect of these routines is to initialize the lock to the unlocked state; that is, no task owns the lock. In addition, the nesting count for a nestable lock is set to zero.
Execution Model Events
The lock-init event occurs in a thread that executes an omp_init_lock region after initialization of the lock, but before it finishes the region. The nest-lock-init event occurs in a thread that executes an omp_init_nest_lock region after initialization of the lock, but before it finishes the region.
1
Tool Callbacks
A thread dispatches a registered ompt_callback_lock_init callback with omp_sync_hint_none as the hint argument and ompt_mutex_lock as the kind argument for each occurrence of a lock-init event in that thread. Similarly, a thread dispatches a registered ompt_callback_lock_init callback with omp_sync_hint_none as the hint argument and ompt_mutex_nest_lock as the kind argument for each occurrence of a nest-lock-init event in that thread. These callbacks have the type signature ompt_callback_mutex_acquire_t and occur in the task that encounters the routine.
Cross References
• ompt_callback_mutex_acquire_t, see Section 4.5.2.14 on page 476.
omp_init_lock_with_hint and omp_init_nest_lock_with_hint
Summary
These routines initialize an OpenMP lock with a hint. The effect of the hint is implementation-defined. The OpenMP implementation can ignore the hint without changing program semantics.
2 3 4 5 6 7 8
9 10
11 3.3.2 12
13
14 15 16
17
18
19
20
21
22
23
24
25
Format
C / C++
void omp_init_lock_with_hint( omp_lock_t *lock, omp_sync_hint_t hint
);
void omp_init_nest_lock_with_hint(
omp_nest_lock_t *lock,
omp_sync_hint_t hint );
C / C++
CHAPTER3. RUNTIMELIBRARYROUTINES 385
subroutine omp_init_lock_with_hint(svar, hint) integer (kind=omp_lock_kind) svar
integer (kind=omp_sync_hint_kind) hint
subroutine omp_init_nest_lock_with_hint(nvar, hint) integer (kind=omp_nest_lock_kind) nvar
integer (kind=omp_sync_hint_kind) hint
1 2 3 4 5 6 7
8
9 10
11 12
13
14 15 16
17
18 19 20 21
22
23
24
25
26
27
28
29
30
Fortran
Fortran
386
OpenMP API – Version 5.0 November 2018
Constraints on Arguments
A program that accesses a lock that is not in the uninitialized state through either routine is non-conforming.
The second argument passed to these routines (hint) is a hint as described in Section 2.17.12 on page 260.
Effect
The effect of these routines is to initialize the lock to the unlocked state and, optionally, to choose a specific lock implementation based on the hint. After initialization no task owns the lock. In addition, the nesting count for a nestable lock is set to zero.
Execution Model Events
The lock-init event occurs in a thread that executes an omp_init_lock_with_hint region after initialization of the lock, but before it finishes the region. The nest-lock-init_with_hint event occurs in a thread that executes an omp_init_nest_lock region after initialization of the lock, but before it finishes the region.
Tool Callbacks
A thread dispatches a registered ompt_callback_lock_init callback with the same value for its hint argument as the hint argument of the call to omp_init_lock_with_hint and ompt_mutex_lock as the kind argument for each occurrence of a lock-init event in that thread. Similarly, a thread dispatches a registered ompt_callback_lock_init callback with the same value for its hint argument as the hint argument of the call to omp_init_nest_lock_with_hint and ompt_mutex_nest_lock as the kind argument for each occurrence of a nest-lock-init event in that thread. These callbacks have the type signature ompt_callback_mutex_acquire_t and occur in the task that encounters the routine.
1 2 3
4 3.3.3 5
6 7
8
9 10
11 12 13 14 15
16
17 18
19 20
21
22 23 24
Cross References
• Synchronization Hints, see Section 2.17.12 on page 260.
• ompt_callback_mutex_acquire_t, see Section 4.5.2.14 on page 476.
omp_destroy_lock and omp_destroy_nest_lock
Summary
These routines ensure that the OpenMP lock is uninitialized.
Format
C / C++
C / C++ Fortran
Fortran
void omp_destroy_lock(omp_lock_t *lock);
void omp_destroy_nest_lock(omp_nest_lock_t *lock);
subroutine omp_destroy_lock(svar) integer (kind=omp_lock_kind) svar
subroutine omp_destroy_nest_lock(nvar)
integer (kind=omp_nest_lock_kind) nvar
Constraints on Arguments
A program that accesses a lock that is not in the unlocked state through either routine is non-conforming.
Effect
The effect of these routines is to change the state of the lock to uninitialized.
Execution Model Events
The lock-destroy event occurs in a thread that executes an omp_destroy_lock region before it finishes the region. The nest-lock-destroy_with_hint event occurs in a thread that executes an omp_destroy_nest_lock region before it finishes the region.
CHAPTER3. RUNTIMELIBRARYROUTINES 387
1
Tool Callbacks
A thread dispatches a registered ompt_callback_lock_destroy callback with ompt_mutex_lock as the kind argument for each occurrence of a lock-destroy event in that thread. Similarly, a thread dispatches a registered ompt_callback_lock_destroy callback with ompt_mutex_nest_lock as the kind argument for each occurrence of a nest-lock-destroy event in that thread. These callbacks have the type signature ompt_callback_mutex_acquire_t and occur in the task that encounters the routine.
Cross References
• ompt_callback_mutex_t, see Section 4.5.2.15 on page 477.
omp_set_lock and omp_set_nest_lock Summary
These routines provide a means of setting an OpenMP lock. The calling task region behaves as if it was suspended until the lock can be set by this task.
2 3 4 5 6 7
8 9
10 3.3.4 11
12 13
14
15 16
17 18 19 20 21
22
23 24 25
Format
C / C++
C / C++ Fortran
Fortran
subroutine omp_set_lock(svar)
integer (kind=omp_lock_kind) svar
subroutine omp_set_nest_lock(nvar) integer (kind=omp_nest_lock_kind) nvar
388
OpenMP API – Version 5.0 November 2018
void omp_set_lock(omp_lock_t *lock);
void omp_set_nest_lock(omp_nest_lock_t *lock);
Constraints on Arguments
A program that accesses a lock that is in the uninitialized state through either routine is non-conforming. A simple lock accessed by omp_set_lock that is in the locked state must not be owned by the task that contains the call or deadlock will result.
1 Effect
2 Each of these routines has an effect equivalent to suspension of the task that is executing the routine
3 until the specified lock is available.
4
5 Note – The semantics of these routines is specified as if they serialize execution of the region
6 guarded by the lock. However, implementations may implement them in other ways provided that
7 the isolation properties are respected so that the actual execution delivers a result that could arise
8 from some serialization.
9
10 A simple lock is available if it is unlocked. Ownership of the lock is granted to the task that
11 executes the routine.
12 A nestable lock is available if it is unlocked or if it is already owned by the task that executes the
13 routine. The task that executes the routine is granted, or retains, ownership of the lock, and the
14 nesting count for the lock is incremented.
15 Execution Model Events
16 The lock-acquire event occurs in a thread that executes an omp_set_lock region before the
17 associated lock is requested. The nest-lock-acquire event occurs in a thread that executes an
18 omp_set_nest_lock region before the associated lock is requested.
19 The lock-acquired event occurs in a thread that executes an omp_set_lock region after it
20 acquires the associated lock but before it finishes the region. The nest-lock-acquired event occurs in
21 a thread that executes an omp_set_nest_lock region if the thread did not already own the
22 lock, after it acquires the associated lock but before it finishes the region.
23 The nest-lock-owned event occurs in a thread when it already owns the lock and executes an
24 omp_set_nest_lock region. The event occurs after the nesting count is incremented but
25 before the thread finishes the region.
26 Tool Callbacks
27 A thread dispatches a registered ompt_callback_mutex_acquire callback for each
28 occurrence of a lock-acquire or nest-lock-acquire event in that thread. This callback has the type
29 signature ompt_callback_mutex_acquire_t.
30 A thread dispatches a registered ompt_callback_mutex_acquired callback for each
31 occurrence of a lock-acquired or nest-lock-acquired event in that thread. This callback has the type
32 signature ompt_callback_mutex_t.
33 A thread dispatches a registered ompt_callback_nest_lock callback with
34 ompt_scope_begin as its endpoint argument for each occurrence of a nest-lock-owned event in
35 that thread. This callback has the type signature ompt_callback_nest_lock_t.
CHAPTER3. RUNTIMELIBRARYROUTINES 389
1 2 3
4 5 6 7
8 3.3.5 9
10
11
12 13
14 15 16 17 18
19
20 21
The above callbacks occur in the task that encounters the lock function. The kind argument of these callbacks is ompt_mutex_lock when the events arise from an omp_set_lock region while it is ompt_mutex_nest_lock when the events arise from an omp_set_nest_lock region.
Cross References
• ompt_callback_mutex_acquire_t, see Section 4.5.2.14 on page 476. • ompt_callback_mutex_t, see Section 4.5.2.15 on page 477.
• ompt_callback_nest_lock_t, see Section 4.5.2.16 on page 479.
omp_unset_lock and omp_unset_nest_lock Summary
These routines provide the means of unsetting an OpenMP lock.
Format
C / C++
C / C++ Fortran
Fortran
subroutine omp_unset_lock(svar) integer (kind=omp_lock_kind) svar
subroutine omp_unset_nest_lock(nvar) integer (kind=omp_nest_lock_kind) nvar
390
OpenMP API – Version 5.0 November 2018
void omp_unset_lock(omp_lock_t *lock);
void omp_unset_nest_lock(omp_nest_lock_t *lock);
Constraints on Arguments
A program that accesses a lock that is not in the locked state or that is not owned by the task that contains the call through either routine is non-conforming.
1 Effect
2 For a simple lock, the omp_unset_lock routine causes the lock to become unlocked.
3 For a nestable lock, the omp_unset_nest_lock routine decrements the nesting count, and
4 causes the lock to become unlocked if the resulting nesting count is zero.
5 For either routine, if the lock becomes unlocked, and if one or more task regions were effectively
6 suspended because the lock was unavailable, the effect is that one task is chosen and given
7 ownership of the lock.
8 Execution Model Events
9 The lock-release event occurs in a thread that executes an omp_unset_lock region after it
10 releases the associated lock but before it finishes the region. The nest-lock-release event occurs in a
11 thread that executes an omp_unset_nest_lock region after it releases the associated lock but
12 before it finishes the region.
13 The nest-lock-held event occurs in a thread that executes an omp_unset_nest_lock region
14 before it finishes the region when the thread still owns the lock after the nesting count is
15 decremented.
16 Tool Callbacks
17 A thread dispatches a registered ompt_callback_mutex_released callback with
18 ompt_mutex_lock as the kind argument for each occurrence of a lock-release event in that
19 thread. Similarly, a thread dispatches a registered ompt_callback_mutex_released
20 callback with ompt_mutex_nest_lock as the kind argument for each occurrence of a
21 nest-lock-release event in that thread. These callbacks have the type signature
22 ompt_callback_mutex_t and occur in the task that encounters the routine.
23 A thread dispatches a registered ompt_callback_nest_lock callback with
24 ompt_scope_end as its endpoint argument for each occurrence of a nest-lock-held event in that
25 thread. This callback has the type signature ompt_callback_nest_lock_t.
26 Cross References
27 • ompt_callback_mutex_t, see Section 4.5.2.15 on page 477.
28 • ompt_callback_nest_lock_t, see Section 4.5.2.16 on page 479.
CHAPTER3. RUNTIMELIBRARYROUTINES 391
1 3.3.6 2
3 4
5
6 7
8
9 10 11
12
13 14 15
16
17 18 19
20 21
22 23
omp_test_lock and omp_test_nest_lock Summary
These routines attempt to set an OpenMP lock but do not suspend execution of the task that executes the routine.
Format
C / C++
C / C++ Fortran
Fortran
logical function omp_test_lock(svar) integer (kind=omp_lock_kind) svar
integer function omp_test_nest_lock(nvar) integer (kind=omp_nest_lock_kind) nvar
392
OpenMP API – Version 5.0 November 2018
int omp_test_lock(omp_lock_t *lock);
int omp_test_nest_lock(omp_nest_lock_t *lock);
Constraints on Arguments
A program that accesses a lock that is in the uninitialized state through either routine is non-conforming. The behavior is unspecified if a simple lock accessed by omp_test_lock is in the locked state and is owned by the task that contains the call.
Effect
These routines attempt to set a lock in the same manner as omp_set_lock and omp_set_nest_lock, except that they do not suspend execution of the task that executes the routine.
For a simple lock, the omp_test_lock routine returns true if the lock is successfully set; otherwise, it returns false.
For a nestable lock, the omp_test_nest_lock routine returns the new nesting count if the lock is successfully set; otherwise, it returns zero.
1 Execution Model Events
2 The lock-test event occurs in a thread that executes an omp_test_lock region before the
3 associated lock is tested. The nest-lock-test event occurs in a thread that executes an
4 omp_test_nest_lock region before the associated lock is tested.
5 The lock-test-acquired event occurs in a thread that executes an omp_test_lock region before it
6 finishes the region if the associated lock was acquired. The nest-lock-test-acquired event occurs in a
7 thread that executes an omp_test_nest_lock region before it finishes the region if the
8 associated lock was acquired and the thread did not already own the lock.
9 The nest-lock-owned event occurs in a thread that executes an omp_test_nest_lock region
10 before it finishes the region after the nesting count is incremented if the thread already owned the
11 lock.
12 Tool Callbacks
13 A thread dispatches a registered ompt_callback_mutex_acquire callback for each
14 occurrence of a lock-test or nest-lock-test event in that thread. This callback has the type signature
15 ompt_callback_mutex_acquire_t.
16 A thread dispatches a registered ompt_callback_mutex_acquired callback for each
17 occurrence of a lock-test-acquired or nest-lock-test-acquired event in that thread. This callback has
18 the type signature ompt_callback_mutex_t.
19 A thread dispatches a registered ompt_callback_nest_lock callback with
20 ompt_scope_begin as its endpoint argument for each occurrence of a nest-lock-owned event in
21 that thread. This callback has the type signature ompt_callback_nest_lock_t.
22 The above callbacks occur in the task that encounters the lock function. The kind argument of these
23 callbacks is ompt_mutex_test_lock when the events arise from an omp_test_lock
24 region while it is ompt_mutex_test_nest_lock when the events arise from an
25 omp_test_nest_lock region.
26 Cross References
27 • ompt_callback_mutex_acquire_t, see Section 4.5.2.14 on page 476.
28 • ompt_callback_mutex_t, see Section 4.5.2.15 on page 477.
29 • ompt_callback_nest_lock_t, see Section 4.5.2.16 on page 479.
CHAPTER3. RUNTIMELIBRARYROUTINES 393
4 5
6 7
8
9
10 11
12
13 14 15 16
17
18 19
Summary
The omp_get_wtime routine returns elapsed wall clock time in seconds. Format
1 3.4 Timing Routines
2
This section describes routines that support a portable wall clock timer.
3 3.4.1 omp_get_wtime
double omp_get_wtime(void);
C / C++
C / C++ Fortran
double precision function omp_get_wtime()
Fortran
Binding
The binding thread set for an omp_get_wtime region is the encountering thread. The routine’s return value is not guaranteed to be consistent across any set of threads.
Effect
The omp_get_wtime routine returns a value equal to the elapsed wall clock time in seconds since some time-in-the-past. The actual time-in-the-past is arbitrary, but it is guaranteed not to change during the execution of the application program. The time returned is a per-thread time, so it is not required to be globally consistent across all threads that participate in an application.
Note – The routine is anticipated to be used to measure elapsed times as shown in the following example:
394
OpenMP API – Version 5.0 November 2018
double start;
double end;
start = omp_get_wtime();
... work to be timed ...
end = omp_get_wtime();
printf("Work took %f seconds\n", end - start);
1 2 3 4 5 6
7 8 9
10 11
12
13 3.4.2 omp_get_wtick
14 Summary
C / C++
C / C++ Fortran
Fortran
DOUBLE PRECISION START, END START = omp_get_wtime()
... work to be timed ...
END = omp_get_wtime()
PRINT *, "Work took", END - START, "seconds"
15 The omp_get_wtick routine returns the precision of the timer used by omp_get_wtime.
16 Format
C / C++
17 double omp_get_wtick(void);
C / C++
Fortran
18 double precision function omp_get_wtick() Fortran
19 Binding
20 The binding thread set for an omp_get_wtick region is the encountering thread. The routine’s
21 return value is not guaranteed to be consistent across any set of threads.
CHAPTER3. RUNTIMELIBRARYROUTINES 395
9 10
11 12
13 14
15
16 17 18
Summary
This routine fulfills and destroys an OpenMP event.
Format
C / C++
void omp_fulfill_event(omp_event_handle_t event);
1
Effect
The omp_get_wtick routine returns a value equal to the number of seconds between successive clock ticks of the timer used by omp_get_wtime.
Event Routine
This section describes a routine that supports OpenMP event objects.
Binding
The binding thread set for all event routine regions is the encountering thread.
2 3
4 3.5 5
6 7
8 3.5.1 omp_fulfill_event
C / C++ Fortran
Fortran
396
OpenMP API – Version 5.0 November 2018
subroutine omp_fulfill_event(event) integer (kind=omp_event_handle_kind) event
Constraints on Arguments
A program that calls this routine on an event that was already fulfilled is non-conforming. A program that calls this routine with an event handle that was not created by the detach clause is non-conforming.
1
Effect
The effect of this routine is to fulfill the event associated with the event handle argument. The effect of fulfilling the event will depend on how the event was created. The event is destroyed and cannot be accessed after calling this routine, and the event handle becomes unassociated with any event.
Execution Model Events
The task-fulfill event occurs in a thread that executes an omp_fulfill_event region before the event is fulfilled if the OpenMP event object was created by a detach clause on a task.
Tool Callbacks
A thread dispatches a registered ompt_callback_task_schedule callback with NULL as its next_task_data argument while the argument prior_task_data binds to the detached task for each occurrence of a task-fulfill event. If the task-fulfill event occurs before the detached task finished the execution of the associated structured-block, the callback has ompt_task_early_fulfill as its prior_task_status argument; otherwise the callback has ompt_task_late_fulfill as its prior_task_status argument. This callback has type signature ompt_callback_task_schedule_t.
Cross References
• detach clause, see Section 2.10.1 on page 135.
• ompt_callback_task_schedule_t, see Section 4.5.2.10 on page 470.
C / C++
Device Memory Routines
This section describes routines that support allocation of memory and management of pointers in the data environments of target devices.
2 3 4
5
6 7
8
9
10
11
12
13
14
15
16 17 18
19 3.6 20
21
22 3.6.1 omp_target_alloc
23 Summary
24 The omp_target_alloc routine allocates memory in a device data environment.
CHAPTER3. RUNTIMELIBRARYROUTINES 397
C/C++ (cont.)
1 2
3
4 5 6 7 8
9 10
11 12
13 14
15 16
17 18
19
20 21 22
23
24
25
26
27
28
Format
void* omp_target_alloc(size_t size, int device_num); Effect
The omp_target_alloc routine returns the device address of a storage location of size bytes. The storage location is dynamically allocated in the device data environment of the device specified by device_num, which must be greater than or equal to zero and less than the result of omp_get_num_devices() or the result of a call to omp_get_initial_device(). When called from within a target region the effect of this routine is unspecified.
The omp_target_alloc routine returns NULL if it cannot dynamically allocate the memory in the device data environment.
The device address returned by omp_target_alloc can be used in an is_device_ptr clause, Section 2.12.5 on page 170.
Unless unified_address clause appears on a requires directive in the compilation unit, pointer arithmetic is not supported on the device address returned by omp_target_alloc.
Freeing the storage returned by omp_target_alloc with any routine other than omp_target_free results in unspecified behavior.
Execution Model Events
The target-data-allocation event occurs when a thread allocates data on a target device. Tool Callbacks
A thread invokes a registered ompt_callback_target_data_op callback for each occurrence of a target-data-allocation event in that thread. The callback occurs in the context of the target task and has type signature ompt_callback_target_data_op_t.
Cross References
• target construct, see Section 2.12.5 on page 170
• omp_get_num_devices routine, see Section 3.2.36 on page 371
• omp_get_initial_device routine, see Section 3.2.41 on page 376
• omp_target_free routine, see Section 3.6.2 on page 399
• ompt_callback_target_data_op_t, see Section 4.5.2.25 on page 488.
398
OpenMP API – Version 5.0 November 2018
C/C++ (cont.)
1 3.6.2 omp_target_free
2 Summary
3 The omp_target_free routine frees the device memory allocated by the
4 omp_target_alloc routine.
5 Format
6 void omp_target_free(void *device_ptr, int device_num);
7 Constraints on Arguments
8 A program that calls omp_target_free with a non-null pointer that does not have a value
9 returned from omp_target_alloc is non-conforming. The device_num must be greater than or
10 equal to zero and less than the result of omp_get_num_devices() or the result of a call to
11 omp_get_initial_device().
12 Effect
13 The omp_target_free routine frees the memory in the device data environment associated
14 with device_ptr. If device_ptr is NULL, the operation is ignored.
15 Synchronization must be inserted to ensure that all accesses to device_ptr are completed before the
16 call to omp_target_free.
17 When called from within a target region the effect of this routine is unspecified.
18 Execution Model Events
19 The target-data-free event occurs when a thread frees data on a target device.
20 Tool Callbacks
21 A thread invokes a registered ompt_callback_target_data_op callback for each
22 occurrence of a target-data-free event in that thread. The callback occurs in the context of the target
23 task and has type signature ompt_callback_target_data_op_t.
24 Cross References
25 • target construct, see Section 2.12.5 on page 170
26 • omp_get_num_devices routine, see Section 3.2.36 on page 371
27 • omp_get_initial_device routine, see Section 3.2.41 on page 376
28 • omp_target_alloc routine, see Section 3.6.1 on page 397
29 • ompt_callback_target_data_op_t, see Section 4.5.2.25 on page 488.
CHAPTER3. RUNTIMELIBRARYROUTINES 399
C/C++ (cont.)
1 3.6.3 omp_target_is_present
2 Summary
3 The omp_target_is_present routine tests whether a host pointer has corresponding storage
4 on a given device.
5 Format
6 int omp_target_is_present(const void *ptr, int device_num);
7 Constraints on Arguments
8 The value of ptr must be a valid host pointer or NULL. The device_num must be greater than or
9 equal to zero and less than the result of omp_get_num_devices() or the result of a call to
10 omp_get_initial_device().
11 Effect
12 This routine returns non-zero if the specified pointer would be found present on device device_num
13 by a map clause; otherwise, it returns zero.
14 When called from within a target region the effect of this routine is unspecified.
15 Cross References
16 • target construct, see Section 2.12.5 on page 170.
17 • map clause, see Section 2.19.7.1 on page 315.
18 • omp_get_num_devices routine, see Section 3.2.36 on page 371
19 • omp_get_initial_device routine, see Section 3.2.41 on page 376
20 3.6.4 omp_target_memcpy
21
22 23
Summary
The omp_target_memcpy routine copies memory between any combination of host and device pointers.
400
OpenMP API – Version 5.0 November 2018
C/C++ (cont.)
1 Format
2 3 4 5 6 7 8 9
10
11 Constraints on Arguments
12 Each device must be compatible with the device pointer specified on the same side of the copy. The
13 dst_device_num and src_device_num must be greater than or equal to zero and less than the result
14 of omp_get_num_devices() or equal to the result of a call to
15 omp_get_initial_device().
16 Effect
17 length bytes of memory at offset src_offset from src in the device data environment of device
18 src_device_num are copied to dst starting at offset dst_offset in the device data environment of
19 device dst_device_num. The return value is zero on success and non-zero on failure. The host
20 device and host device data environment can be referenced with the device number returned by
21 omp_get_initial_device. This routine contains a task scheduling point.
22 When called from within a target region the effect of this routine is unspecified.
23 Execution Model Events
24 The target-data-op event occurs when a thread transfers data on a target device.
25 Tool Callbacks
26 A thread invokes a registered ompt_callback_target_data_op callback for each
27 occurrence of a target-data-op event in that thread. The callback occurs in the context of the target
28 task and has type signature ompt_callback_target_data_op_t.
29 Cross References
30 • target construct, see Section 2.12.5 on page 170.
31 • omp_get_initial_device routine, see Section 3.2.41 on page 376
32 • omp_target_alloc routine, see Section 3.6.1 on page 397.
33 • ompt_callback_target_data_op_t, see Section 4.5.2.25 on page 488.
int omp_target_memcpy( void *dst,
const void *src, size_t length,
size_t dst_offset,
size_t src_offset, int dst_device_num, int src_device_num
);
CHAPTER3. RUNTIMELIBRARYROUTINES 401
2
3 4 5
6
7 8 9
10
11
12
13
14
15
16
17
18
19
20
21 22 23 24
25 26
27
28
29
30
31
32
33
34
35
Summary
The omp_target_memcpy_rect routine copies a rectangular subvolume from a multi-dimensional array to another multi-dimensional array. The copies can use any combination of host and device pointers.
Format
1 3.6.5 omp_target_memcpy_rect
C/C++ (cont.)
int omp_target_memcpy_rect( void *dst,
const void *src,
size_t element_size,
int num_dims,
const size_t *volume, const size_t *dst_offsets, const size_t *src_offsets,
const size_t *dst_dimensions, const size_t *src_dimensions, int dst_device_num,
int src_device_num
);
402
OpenMP API – Version 5.0 November 2018
Constraints on Arguments
The length of the offset and dimension arrays must be at least the value of num_dims. The dst_device_num and src_device_num must be greater than or equal to zero and less than the result of omp_get_num_devices() or equal to the result of a call to omp_get_initial_device().
The value of num_dims must be between 1 and the implementation-defined limit, which must be at least three.
Effect
This routine copies a rectangular subvolume of src, in the device data environment of device src_device_num, to dst, in the device data environment of device dst_device_num. The volume is specified in terms of the size of an element, number of dimensions, and constant arrays of length num_dims. The maximum number of dimensions supported is at least three, support for higher dimensionality is implementation defined. The volume array specifies the length, in number of elements, to copy in each dimension from src to dst. The dst_offsets (src_offsets) parameter specifies number of elements from the origin of dst (src) in elements. The dst_dimensions (src_dimensions) parameter specifies the length of each dimension of dst (src)
C/C++ (cont.)
1 The routine returns zero if successful. If both dst and src are NULL pointers, the routine returns the
2 number of dimensions supported by the implementation for the specified device numbers. The host
3 device and host device data environment can be referenced with the device number returned by
4 omp_get_initial_device. Otherwise, it returns a non-zero value. The routine contains a
5 task scheduling point.
6 When called from within a target region the effect of this routine is unspecified.
7 Execution Model Events
8 The target-data-op event occurs when a thread transfers data on a target device.
9 Tool Callbacks
10 A thread invokes a registered ompt_callback_target_data_op callback for each
11 occurrence of a target-data-op event in that thread. The callback occurs in the context of the target
12 task and has type signature ompt_callback_target_data_op_t.
13 Cross References
14 • target construct, see Section 2.12.5 on page 170.
15 • omp_get_initial_device routine, see Section 3.2.41 on page 376
16 • omp_target_alloc routine, see Section 3.6.1 on page 397.
17 • ompt_callback_target_data_op_t, see Section 4.5.2.25 on page 488.
18 3.6.6 omp_target_associate_ptr
19 Summary
20 The omp_target_associate_ptr routine maps a device pointer, which may be returned
21 from omp_target_alloc or implementation-defined runtime routines, to a host pointer.
CHAPTER3. RUNTIMELIBRARYROUTINES 403
C/C++ (cont.)
1
2 3 4 5 6 7 8
9
10 11 12 13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31 32
33
34 35 36
Format
int omp_target_associate_ptr( const void *host_ptr,
const void *device_ptr,
size_t size,
size_t device_offset,
int device_num );
404
OpenMP API – Version 5.0 November 2018
Constraints on Arguments
The value of device_ptr value must be a valid pointer to device memory for the device denoted by the value of device_num. The device_num argument must be greater than or equal to zero and less than the result of omp_get_num_devices() or equal to the result of a call to omp_get_initial_device().
Effect
The omp_target_associate_ptr routine associates a device pointer in the device data environment of device device_num with a host pointer such that when the host pointer appears in a subsequent map clause, the associated device pointer is used as the target for data motion associated with that host pointer. The device_offset parameter specifies the offset into device_ptr that is used as the base address for the device side of the mapping. The reference count of the resulting mapping will be infinite. After being successfully associated, the buffer to which the device pointer points is invalidated and accessing data directly through the device pointer results in unspecified behavior. The pointer can be retrieved for other uses by disassociating it. When called from within a target region the effect of this routine is unspecified.
The routine returns zero if successful. Otherwise it returns a non-zero value.
Only one device buffer can be associated with a given host pointer value and device number pair. Attempting to associate a second buffer will return non-zero. Associating the same pair of pointers on the same device with the same offset has no effect and returns zero. Associating pointers that share underlying storage will result in unspecified behavior. The omp_target_is_present function can be used to test whether a given host pointer has a corresponding variable in the device data environment.
Execution Model Events
The target-data-associate event occurs when a thread associates data on a target device. Tool Callbacks
A thread invokes a registered ompt_callback_target_data_op callback for each occurrence of a target-data-associate event in that thread. The callback occurs in the context of the target task and has type signature ompt_callback_target_data_op_t.
1 Cross References
2 • target construct, see Section 2.12.5 on page 170.
3 • map clause, see Section 2.19.7.1 on page 315.
4 • omp_target_alloc routine, see Section 3.6.1 on page 397.
5 • omp_target_disassociate_ptr routine, see Section 3.6.6 on page 403
6 • ompt_callback_target_data_op_t, see Section 4.5.2.25 on page 488.
7 3.6.7 omp_target_disassociate_ptr
8 Summary
9 The omp_target_disassociate_ptr removes the associated pointer for a given device
10 from a host pointer.
11 Format
12 int omp_target_disassociate_ptr(const void *ptr, int device_num);
13 Constraints on Arguments
14 The device_num must be greater than or equal to zero and less than the result of
15 omp_get_num_devices() or equal to the result of a call to
16 omp_get_initial_device().
17 Effect
18 The omp_target_disassociate_ptr removes the associated device data on device
19 device_num from the presence table for host pointer ptr. A call to this routine on a pointer that is
20 not NULL and does not have associated data on the given device results in unspecified behavior.
21 The reference count of the mapping is reduced to zero, regardless of its current value.
22 When called from within a target region the effect of this routine is unspecified.
23 The routine returns zero if successful. Otherwise it returns a non-zero value.
24 After a call to omp_target_disassociate_ptr, the contents of the device buffer are
25 invalidated.
26 Execution Model Events
27 The target-data-disassociate event occurs when a thread disassociates data on a target device.
C/C++ (cont.)
CHAPTER3. RUNTIMELIBRARYROUTINES 405
1
Tool Callbacks
A thread invokes a registered ompt_callback_target_data_op callback for each occurrence of a target-data-disassociate event in that thread. The callback occurs in the context of the target task and has type signature ompt_callback_target_data_op_t.
Cross References
• target construct, see Section 2.12.5 on page 170
• omp_target_associate_ptr routine, see Section 3.6.6 on page 403
• ompt_callback_target_data_op_t, see Section 4.5.2.25 on page 488. C / C++
Memory Management Routines
This section describes routines that support memory management on the current device.
Instances of memory management types must be accessed only through the routines described in this section; programs that otherwise access instances of these types are non-conforming.
Memory Management Types
The following type definitions are used by the memory management routines:
C / C++
2 3 4
5 6 7 8
9 3.7 10
11 12
13 3.7.1 14
15
16
17
18
19
20
21
22
23
24
25
26
typedef enum omp_alloctrait_key_t {
omp_atk_sync_hint = 1,
omp_atk_alignment = 2,
omp_atk_access = 3,
omp_atk_pool_size = 4,
omp_atk_fallback = 5,
omp_atk_fb_data = 6,
omp_atk_pinned = 7,
omp_atk_partition = 8
} omp_alloctrait_key_t;
typedef enum omp_alloctrait_value_t {
406
OpenMP API – Version 5.0 November 2018
omp_atv_false = 0,
omp_atv_true = 1,
omp_atv_default = 2,
omp_atv_contended = 3,
omp_atv_uncontended = 4,
omp_atv_sequential = 5,
omp_atv_private = 6,
omp_atv_all = 7,
omp_atv_thread = 8,
omp_atv_pteam = 9,
omp_atv_cgroup = 10,
omp_atv_default_mem_fb = 11,
omp_atv_null_fb = 12,
omp_atv_abort_fb = 13,
omp_atv_allocator_fb = 14,
omp_atv_environment = 15,
omp_atv_nearest = 16,
omp_atv_blocked = 17,
omp_atv_interleaved = 18
} omp_alloctrait_value_t;
typedef struct omp_alloctrait_t {
omp_alloctrait_key_t key;
omp_uintptr_t value;
} omp_alloctrait_t;
1 2 3 4 5 6 7 8 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
C / C++ Fortran
integer(kind=omp_alloctrait_key_kind), &
parameter :: omp_atk_sync_hint = 1
integer(kind=omp_alloctrait_key_kind), &
parameter :: omp_atk_alignment = 2
integer(kind=omp_alloctrait_key_kind), &
parameter :: omp_atk_access = 3
integer(kind=omp_alloctrait_key_kind), &
parameter :: omp_atk_pool_size = 4
integer(kind=omp_alloctrait_key_kind), &
parameter :: omp_atk_fallback = 5
integer(kind=omp_alloctrait_key_kind), &
parameter :: omp_atk_fb_data = 6
integer(kind=omp_alloctrait_key_kind), &
parameter :: omp_atk_pinned = 7
integer(kind=omp_alloctrait_key_kind), &
CHAPTER3. RUNTIMELIBRARYROUTINES 407
Fortran (cont.)
parameter :: omp_atk_partition = 8
integer(kind=omp_alloctrait_val_kind), &
parameter :: omp_atv_false = 0
integer(kind=omp_alloctrait_val_kind), &
parameter :: omp_atv_true = 1
integer(kind=omp_alloctrait_val_kind), &
parameter :: omp_atv_default = 2
integer(kind=omp_alloctrait_val_kind), &
parameter :: omp_atv_contended = 3
integer(kind=omp_alloctrait_val_kind), &
parameter :: omp_atv_uncontended = 4
integer(kind=omp_alloctrait_val_kind), &
parameter :: omp_atv_sequential = 5
integer(kind=omp_alloctrait_val_kind), &
parameter :: omp_atv_private = 6
integer(kind=omp_alloctrait_val_kind), &
parameter :: omp_atv_all = 7
integer(kind=omp_alloctrait_val_kind), &
parameter :: omp_atv_thread = 8
integer(kind=omp_alloctrait_val_kind), &
parameter :: omp_atv_pteam = 9
integer(kind=omp_alloctrait_val_kind), &
parameter :: omp_atv_cgroup = 10
integer(kind=omp_alloctratit_val_kind), &
parameter :: omp_atv_default_mem_fb = 11
integer(kind=omp_alloctratit_val_kind), &
parameter :: omp_atv_null_fb = 12
integer(kind=omp_alloctratit_val_kind), &
parameter :: omp_atv_abort_fb = 13
integer(kind=omp_alloctratit_val_kind), &
parameter :: omp_atv_allocator_fb = 14
integer(kind=omp_alloctrait_val_kind), &
parameter :: omp_atv_environment = 15
integer(kind=omp_alloctrait_val_kind), &
parameter :: omp_atv_nearest = 16
integer(kind=omp_alloctrait_val_kind), &
parameter :: omp_atv_blocked = 17
integer(kind=omp_alloctrait_val_kind), &
parameter :: omp_atv_interleaved = 18
type omp_alloctrait
integer(kind=omp_alloctrait_key_kind) key
1 2 3 4 5 6 7 8 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
408 OpenMP API – Version 5.0 November 2018
integer(kind=omp_alloctrait_val_kind) value
end type omp_alloctrait
integer(kind=omp_allocator_handle_kind), &
parameter :: omp_null_allocator = 0
1 2 3 4 5
6 3.7.2 omp_init_allocator
7 Summary
Fortran
8 The omp_init_allocator routine initializes an allocator and associates it with a memory
9 space.
10 Format
11 12 13 14 15
16 17 18 19 20
C / C++
C / C++ Fortran
Fortran
omp_allocator_handle_t omp_init_allocator ( omp_memspace_handle_t memspace,
int ntraits,
const omp_alloctrait_t traits[] );
integer(kind=omp_allocator_handle_kind) &
function omp_init_allocator ( memspace, ntraits, traits ) integer(kind=omp_memspace_handle_kind),intent(in) :: memspace integer,intent(in) :: ntraits
type(omp_alloctrait),intent(in) :: traits(*)
21 Constraints on Arguments
22 The memspace argument must be one of the predefined memory spaces defined in Table 2.8.
23 If the ntraits argument is greater than zero then the traits argument must specify at least that many
24 traits. If it specifies fewer than ntraits traits the behavior is unspecified.
25 Unless a requires directive with the dynamic_allocators clause is present in the same
26 compilation unit, using this routine in a target region results in unspecified behavior.
CHAPTER3. RUNTIMELIBRARYROUTINES 409
1 Binding
2 The binding thread set for an omp_init_allocator region is all threads on a device. The
3 effect of executing this routine is not related to any specific region that corresponds to any construct
4 or API routine.
5 Effect
6 The omp_init_allocator routine creates a new allocator that is associated with the
7 memspace memory space and returns a handle to it. All allocations through the created allocator
8 will behave according to the allocator traits specified in the traits argument. The number of traits in
9 the traits argument is specified by the ntraits argument. Specifying the same allocator trait more
10 than once results in unspecified behavior. The routine returns a handle for the created allocator. If
11 the special omp_atv_default value is used for a given trait, then its value will be the default
12 value specified in Table 2.9 for that given trait.
13 If memspace is omp_default_mem_space and the traits argument is an empty set this
14 routine will always return a handle to an allocator. Otherwise if an allocator based on the
15 requirements cannot be created then the special omp_null_allocator handle is returned.
16 The use of an allocator returned by this routine on a device other than the one on which it was
17 created results in unspecified behavior.
18 Cross References
19 • Memory Spaces, see Section 2.11.1 on page 152.
20 • Memory Allocators, see Section 2.11.2 on page 152.
21 3.7.3 omp_destroy_allocator
22 23
24 25
26 27
Summary
The omp_destroy_allocator routine releases all resources used by the allocator handle. Format
C / C++
void omp_destroy_allocator (omp_allocator_handle_t allocator); C / C++
Fortran Fortran
subroutine omp_destroy_allocator ( allocator )
integer(kind=omp_allocator_handle_kind),intent(in) :: allocator
410
OpenMP API – Version 5.0 November 2018
1 Constraints on Arguments
2 The allocator argument must not represent a predefined memory allocator.
3 Unless a requires directive with the dynamic_allocators clause is present in the same
4 compilation unit, using this routine in a target region results in unspecified behavior.
5 Binding
6 The binding thread set for an omp_destroy_allocator region is all threads on a device. The
7 effect of executing this routine is not related to any specific region that corresponds to any construct
8 or API routine.
9 Effect
10 The omp_destroy_allocator routine releases all resources used to implement the allocator
11 handle. Accessing any memory allocated by the allocator after this call results in unspecified
12 behavior.
13 If allocator is omp_null_allocator then this routine will have no effect.
14 Cross References
15 • Memory Allocators, see Section 2.11.2 on page 152.
16 3.7.4 omp_set_default_allocator
17 Summary
18 The omp_set_default_allocator routine sets the default memory allocator to be used by
19 allocation calls, allocate directives and allocate clauses that do not specify an allocator.
20 Format
21 void omp_set_default_allocator (omp_allocator_handle_t allocator);
C / C++
22 23
C / C++ Fortran
Fortran
subroutine omp_set_default_allocator ( allocator ) integer(kind=omp_allocator_handle_kind),intent(in) :: allocator
CHAPTER3. RUNTIMELIBRARYROUTINES 411
1 Constraints on Arguments
2 The allocator argument must be a valid memory allocator handle.
3 Binding
4 The binding task set for an omp_set_default_allocator region is the binding implicit task.
5 Effect
6 The effect of this routine is to set the value of the def-allocator-var ICV of the binding implicit task
7 to the value specified in the allocator argument.
8 Cross References
9 • def-allocator-var ICV, see Section 2.5 on page 63.
10 • Memory Allocators, see Section 2.11.2 on page 152.
11 • omp_alloc routine, see Section 3.7.6 on page 413.
12 3.7.5 omp_get_default_allocator
13
14 15 16
17 18
19 20
21 22
Summary
The omp_get_default_allocator routine returns a handle to the memory allocator to be used by allocation calls, allocate directives and allocate clauses that do not specify an allocator.
Format
C / C++
omp_allocator_handle_t omp_get_default_allocator (void);
C / C++ Fortran
Fortran
412
OpenMP API – Version 5.0 November 2018
integer(kind=omp_allocator_handle_kind)&
function omp_get_default_allocator ()
Binding
The binding task set for an omp_get_default_allocator region is the binding implicit task.
1 Effect
2 The effect of this routine is to return the value of the def-allocator-var ICV of the binding implicit
3 task.
4 Cross References
5 • def-allocator-var ICV, see Section 2.5 on page 63.
6 • Memory Allocators, see Section 2.11.2 on page 152.
7 • omp_alloc routine, see Section 3.7.6 on page 413.
C / C++
8 3.7.6 omp_alloc
9 Summary
10 The omp_alloc routine requests a memory allocation from a memory allocator.
11 Format
C
12 void *omp_alloc (size_t size, omp_allocator_handle_t allocator); C
void *omp_alloc( size_t size,
omp_allocator_handle_t allocator=omp_null_allocator );
13 14 15 16
17 Constraints on Arguments
C++
C++
18 Unless dynamic_allocators appears on a requires directive in the same compilation unit,
19 omp_alloc invocations that appear in target regions must not pass omp_null_allocator
20 as the allocator argument, which must be a constant expression that evaluates to one of the
21 predefined memory allocator values.
CHAPTER3. RUNTIMELIBRARYROUTINES 413
C/C++ (cont.)
1 Effect
2 The omp_alloc routine requests a memory allocation of size bytes from the specified memory
3 allocator. If the allocator argument is omp_null_allocator the memory allocator used by the
4 routine will be the one specified by the def-allocator-var ICV of the binding implicit task. Upon
5 success it returns a pointer to the allocated memory. Otherwise, the behavior specified by the
6 fallback trait will be followed.
7 Allocated memory will be byte aligned to at least the alignment required by malloc.
8 Cross References
9 • Memory allocators, see Section 2.11.2 on page 152.
10 3.7.7 omp_free
11 12
13 14
15 16 17 18
19
20
21
22
23
24
25
Summary
The omp_free routine deallocates previously allocated memory. Format
C
void omp_free (void *ptr, omp_allocator_handle_t allocator); C
C++
C++
void omp_free(
void *ptr,
omp_allocator_handle_t allocator=omp_null_allocator
);
414
OpenMP API – Version 5.0 November 2018
Effect
The omp_free routine deallocates the memory to which ptr points. The ptr argument must point to memory previously allocated with a memory allocator. If the allocator argument is specified it must be the memory allocator to which the allocation request was made. If the allocator argument is omp_null_allocator the implementation will determine that value automatically. Using omp_free on memory that was already deallocated or that was allocated by an allocator that has already been destroyed with omp_destroy_allocator results in unspecified behavior.
1 2
3 3.8 4
5
6 7
8
9 10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Cross References
• Memory allocators, see Section 2.11.2 on page 152. C / C++
Tool Control Routine Summary
The omp_control_tool routine enables a program to pass commands to an active tool. Format
C / C++
int omp_control_tool(int command, int modifier, void *arg); C / C++
Fortran
Fortran
integer function omp_control_tool(command, modifier) integer (kind=omp_control_tool_kind) command integer modifier
Description
An OpenMP program may use omp_control_tool to pass commands to a tool. An application can use omp_control_tool to request that a tool starts or restarts data collection when a code region of interest is encountered, that a tool pauses data collection when leaving the region of interest, that a tool flushes any data that it has collected so far, or that a tool ends data collection. Additionally, omp_control_tool can be used to pass tool-specific commands to a particular tool.
The following types correspond to return values from omp_control_tool: C / C++
C / C++
typedef enum omp_control_tool_result_t {
omp_control_tool_notool = -2,
omp_control_tool_nocallback = -1,
omp_control_tool_success = 0,
omp_control_tool_ignored = 1
} omp_control_tool_result_t;
CHAPTER3. RUNTIMELIBRARYROUTINES 415
Fortran
integer (kind=omp_control_tool_result_kind), &
parameter :: omp_control_tool_notool = -2
integer (kind=omp_control_tool_result_kind), &
parameter :: omp_control_tool_nocallback = -1
integer (kind=omp_control_tool_result_kind), &
parameter :: omp_control_tool_success = 0
integer (kind=omp_control_tool_result_kind), &
parameter :: omp_control_tool_ignored = 1
1 2 3 4 5 6 7 8
9
10
11
12
13
14
15
16
17
18
19 20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Fortran
If the OMPT interface state is inactive, the OpenMP implementation returns omp_control_tool_notool. If the OMPT interface state is active, but no callback is registered for the tool-control event, the OpenMP implementation returns omp_control_tool_nocallback. An OpenMP implementation may return other implementation-defined negative values strictly smaller than -64; an application may assume that any negative return value indicates that a tool has not received the command. A return value of omp_control_tool_success indicates that the tool has performed the specified command. A return value of omp_control_tool_ignored indicates that the tool has ignored the specified command. A tool may return other positive values strictly greater than 64 that are tool-defined.
Constraints on Arguments
The following enumeration type defines four standard commands. Table 3.1 describes the actions that these commands request from a tool.
C / C++
C / C++ Fortran
typedef enum omp_control_tool_t {
omp_control_tool_start = 1,
omp_control_tool_pause = 2,
omp_control_tool_flush = 3,
omp_control_tool_end = 4
} omp_control_tool_t;
integer (kind=omp_control_tool_kind), &
parameter :: omp_control_tool_start = 1
integer (kind=omp_control_tool_kind), &
parameter :: omp_control_tool_pause = 2
integer (kind=omp_control_tool_kind), &
parameter :: omp_control_tool_flush = 3
integer (kind=omp_control_tool_kind), &
parameter :: omp_control_tool_end = 4
Fortran
416
OpenMP API – Version 5.0 November 2018
1 Tool-specific values for command must be greater or equal to 64. Tools must ignore command
2 values that they are not explicitly designed to handle. Other values accepted by a tool for command,
3 and any values for modifier and arg are tool-defined.
TABLE 3.1: Standard Tool Control Commands
Command
omp_control_tool_start
omp_control_tool_pause
omp_control_tool_flush
omp_control_tool_end
Action
Start or restart monitoring if it is off. If monitoring is already on, this command is idempotent. If monitoring has already been turned off permanently, this command will have no effect.
Temporarily turn monitoring off. If monitoring is already off, it is idempotent.
Flush any data buffered by a tool. This command may be applied whether monitoring is on or off.
Turn monitoring off permanently; the tool finalizes itself and flushes all output.
4 Execution Model Events
5 The tool-control event occurs in the thread that encounters a call to omp_control_tool at a
6 point inside its corresponding OpenMP region.
7 Tool Callbacks
8 A thread dispatches a registered ompt_callback_control_tool callback for each
9 occurrence of a tool-control event. The callback executes in the context of the call that occurs in the
10 user program and has type signature ompt_callback_control_tool_t. The callback may
11 return any non-negative value, which will be returned to the application by the OpenMP
12 implementation as the return value of the omp_control_tool call that triggered the callback.
13 Arguments passed to the callback are those passed by the user to omp_control_tool. If the
14 call is made in Fortran, the tool will be passed NULL as the third argument to the callback. If any of
15 the four standard commands is presented to a tool, the tool will ignore the modifier and arg
16 argument values.
17 Cross References
18 • OMPT Interface, see Chapter 4 on page 419
19 • ompt_callback_control_tool_t, see Section 4.5.2.29 on page 495
CHAPTER3. RUNTIMELIBRARYROUTINES 417
This page intentionally left blank
CHAPTER 4
1 2
3 4 5 6 7 8
9 4.1
10 11 12 13 14
15
OMPT Interface
This chapter describes OMPT, which is an interface for first-party tools. First-party tools are linked or loaded directly into the OpenMP program. OMPT defines mechanisms to initialize a tool, to examine OpenMP state associated with an OpenMP thread, to interpret the call stack of an OpenMP thread, to receive notification about OpenMP events, to trace activity on OpenMP target devices, to assess implementation-dependent details of an OpenMP implementation (such as supported states and mutual exclusion implementations), and to control a tool from an OpenMP application.
OMPT Interfaces Definitions
C / C++
A compliant implementation must supply a set of definitions for the OMPT runtime entry points, OMPT callback signatures, and the special data types of their parameters and return values. These definitions, which are listed throughout this chapter, and their associated declarations shall be provided in a header file named omp-tools.h. In addition, the set of definitions may specify other implementation-specific values.
The ompt_start_tool function is an external function with C linkage. C / C++
CHAPTER4. OMPTINTERFACE 419
8
9 10
11
12 13 14 15
16
17
18
19
20
21
22
23
24
25 26
Summary
In order to use the OMPT interface provided by an OpenMP implementation, a tool must implement the ompt_start_tool function, through which the OpenMP implementation initializes the tool.
1 4.2
2 3 4 5 6
Activating a First-Party Tool
To activate a tool, an OpenMP implementation first determines whether the tool should be initialized. If so, the OpenMP implementation invokes the initializer of the tool, which enables the tool to prepare to monitor execution on the host. The tool may then also arrange to monitor computation that executes on target devices. This section explains how the tool and an OpenMP implementation interact to accomplish these tasks.
7 4.2.1 ompt_start_tool
Format
C
C
ompt_start_tool_result_t *ompt_start_tool( unsigned int omp_version,
const char *runtime_version
);
420
OpenMP API – Version 5.0 November 2018
Description
For a tool to use the OMPT interface that an OpenMP implementation provides, the tool must define a globally-visible implementation of the function ompt_start_tool. The tool indicates that it will use the OMPT interface that an OpenMP implementation provides by returning a non-null pointer to an ompt_start_tool_result_t structure from the ompt_start_tool implementation that it provides. The ompt_start_tool_result_t structure contains pointers to tool initialization and finalization callbacks as well as a tool data word that an OpenMP implementation must pass by reference to these callbacks. A tool may return NULL from ompt_start_tool to indicate that it will not use the OMPT interface in a particular execution.
A tool may use the omp_version argument to determine if it is compatible with the OMPT interface that the OpenMP implementation provides.
1
Description of Arguments
The argument omp_version is the value of the _OPENMP version macro associated with the OpenMP API implementation. This value identifies the OpenMP API version that an OpenMP implementation supports, which specifies the version of the OMPT interface that it supports.
The argument runtime_version is a version string that unambiguously identifies the OpenMP implementation.
Constraints on Arguments
The argument runtime_version must be an immutable string that is defined for the lifetime of a program execution.
Effect
If a tool returns a non-null pointer to an ompt_start_tool_result_t structure, an OpenMP implementation will call the tool initializer specified by the initialize field in this structure before beginning execution of any OpenMP construct or completing execution of any environment routine invocation; the OpenMP implementation will call the tool finalizer specified by the finalize field in this structure when the OpenMP implementation shuts down.
Cross References
• ompt_start_tool_result_t, see Section 4.4.1 on page 433.
Determining Whether a First-Party Tool Should be Initialized
An OpenMP implementation examines the tool-var ICV as one of its first initialization steps. If the value of tool-var is disabled, the initialization continues without a check for the presence of a tool and the functionality of the OMPT interface will be unavailable as the program executes. In this case, the OMPT interface state remains inactive.
Otherwise, the OMPT interface state changes to pending and the OpenMP implementation activates any first-party tool that it finds. A tool can provide a definition of ompt_start_tool to an OpenMP implementation in three ways:
• By statically-linking its definition of ompt_start_tool into an OpenMP application;
• By introducing a dynamically-linked library that includes its definition of ompt_start_tool into the application’s address space; or
2 3 4
5 6
7
8 9
10
11 12 13 14 15
16 17
18 4.2.2
19 20 21 22
23 24 25
26
27 28
CHAPTER4. OMPTINTERFACE 421
enabled
Pending
Inactive
tool-var
Found?
yes
disabled
no
Inactive
422
OpenMP API – Version 5.0 November 2018
1
0
Return value
r=NULL
Return value r
r=non-null
Active
Runtime (re)start
Runtime shutdown or pause
Call ompt_start_tool
Find next tool
Call r->initialize
1 2 3
4 5 6 7 8 9
10 11 12 13 14
15 16
• By providing, in the tool-libraries-var ICV, the name of a dynamically-linked library that is appropriate for the architecture and operating system used by the application and that includes a definition of ompt_start_tool.
If the value of tool-var is enabled, the OpenMP implementation must check if a tool has provided an implementation of ompt_start_tool. The OpenMP implementation first checks if a tool-provided implementation of ompt_start_tool is available in the address space, either statically-linked into the application or in a dynamically-linked library loaded in the address space. If multiple implementations of ompt_start_tool are available, the OpenMP implementation will use the first tool-provided implementation of ompt_start_tool that it finds.
If the implementation does not find a tool-provided implementation of ompt_start_tool in the address space, it consults the tool-libraries-var ICV, which contains a (possibly empty) list of dynamically-linked libraries. As described in detail in Section 6.19 on page 617, the libraries in tool-libraries-var are then searched for the first usable implementation of ompt_start_tool that one of the libraries in the list provides.
If the implementation finds a tool-provided definition of ompt_start_tool, it invokes that method; if a NULL pointer is returned, the OMPT interface state remains pending and the
FIGURE 4.1: First-Party Tool Activation Flow Chart
1 2 3 4 5
6
7 8 9
10 11
12 4.2.3
13 14 15 16
17
18
19
20
21
22
23 24 25 26 27
28 29
implementation continues to look for implementations of ompt_start_tool; otherwise a non-null pointer to an ompt_start_tool_result_t structure is returned, the OMPT interface state changes to active and the OpenMP implementation makes the OMPT interface available as the program executes. In this case, as the OpenMP implementation completes its initialization, it initializes the OMPT interface.
If no tool can be found, the OMPT interface state changes to inactive.
Cross References
• tool-libraries-var ICV, see Section 2.5 on page 63.
• tool-var ICV, see Section 2.5 on page 63.
• ompt_start_tool function, see Section 4.2.1 on page 420.
• ompt_start_tool_result_t type, see Section 4.4.1 on page 433.
Initializing a First-Party Tool
To initialize the OMPT interface, the OpenMP implementation invokes the tool initializer that is specified in the ompt_start_tool_result_t structure that is indicated by the non-null pointer that ompt_start_tool returns. The initializer is invoked prior to the occurrence of any OpenMP event.
A tool initializer, described in Section 4.5.1.1 on page 457, uses the function specified in its lookup argument to look up pointers to OMPT interface runtime entry points that the OpenMP implementation provides; this process is described in Section 4.2.3.1 on page 424. Typically, a tool initializer obtains a pointer to the ompt_set_callback runtime entry point with type signature ompt_set_callback_t and then uses this runtime entry point to register tool callbacks for OpenMP events, as described in Section 4.2.4 on page 425.
A tool initializer may use the ompt_enumerate_states runtime entry point, which has type signature ompt_enumerate_states_t, to determine the thread states that an OpenMP implementation employs. Similarly, it may use the ompt_enumerate_mutex_impls runtime entry point, which has type signature ompt_enumerate_mutex_impls_t, to determine the mutual exclusion implementations that the OpenMP implementation employs.
If a tool initializer returns a non-zero value, the OMPT interface state remains active for the execution; otherwise, the OMPT interface state changes to inactive.
CHAPTER4. OMPTINTERFACE 423
1 2 3 4 5 6 7 8 9
10 4.2.3.1
11
12
13
14
15
16
17
18 19 20 21
22
23
24
25
26
27
28 29
30 31 32
Cross References
• ompt_start_tool function, see Section 4.2.1 on page 420.
• ompt_start_tool_result_t type, see Section 4.4.1 on page 433.
• ompt_initialize_t type, see Section 4.5.1.1 on page 457.
• ompt_callback_thread_begin_t type, see Section 4.5.2.1 on page 459. • ompt_enumerate_states_t type, see Section 4.6.1.1 on page 498.
• ompt_enumerate_mutex_impls_t type, see Section 4.6.1.2 on page 499. • ompt_set_callback_t type, see Section 4.6.1.3 on page 500.
• ompt_function_lookup_t type, see Section 4.6.3 on page 531.
Binding Entry Points in the OMPT Callback Interface
Functions that an OpenMP implementation provides to support the OMPT interface are not defined as global function symbols. Instead, they are defined as runtime entry points that a tool can only identify through the lookup function that is provided as an argument with type signature ompt_function_lookup_t to the tool initializer. A tool can use this function to obtain a pointer to each of the runtime entry points that an OpenMP implementation provides to support the OMPT interface. Once a tool has obtained a lookup function, it may employ it at any point in the future.
For each runtime entry point in the OMPT interface for the host device, Table 4.1 provides the string name by which it is known and its associated type signature. Implementations can provide additional implementation-specific names and corresponding entry points. Any names that begin with ompt_ are reserved names.
During initialization, a tool should look up each runtime entry point in the OMPT interface by name and bind a pointer maintained by the tool that can later be used to invoke the entry point. The entry points described in Table 4.1 enable a tool to assess the thread states and mutual exclusion implementations that an OpenMP implementation supports, to register tool callbacks, to inspect registered callbacks, to introspect OpenMP state associated with threads, and to use tracing to monitor computations that execute on target devices.
Detailed information about each runtime entry point listed in Table 4.1 is included as part of the description of its type signature.
Cross References
• ompt_enumerate_states_t type, see Section 4.6.1.1 on page 498.
• ompt_enumerate_mutex_impls_t type, see Section 4.6.1.2 on page 499.
424
OpenMP API – Version 5.0 November 2018
1 2 3 4 5 6 7 8 9
10
11
12
13
14
15
16
17
18
19 4.2.4
20
21
22
23
24
25
26 27
28 29
• ompt_set_callback_t type, see Section 4.6.1.3 on page 500.
• ompt_get_callback_t type, see Section 4.6.1.4 on page 502.
• ompt_get_thread_data_t type, see Section 4.6.1.5 on page 503.
• ompt_get_num_procs_t type, see Section 4.6.1.6 on page 503.
• ompt_get_num_places_t type, see Section 4.6.1.7 on page 504.
• ompt_get_place_proc_ids_t type, see Section 4.6.1.8 on page 505.
• ompt_get_place_num_t type, see Section 4.6.1.9 on page 506.
• ompt_get_partition_place_nums_t type, see Section 4.6.1.10 on page 507. • ompt_get_proc_id_t type, see Section 4.6.1.11 on page 508.
• ompt_get_state_t type, see Section 4.6.1.12 on page 508.
• ompt_get_parallel_info_t type, see Section 4.6.1.13 on page 510.
• ompt_get_task_info_t type, see Section 4.6.1.14 on page 512.
• ompt_get_task_memory_t type, see Section 4.6.1.15 on page 514.
• ompt_get_target_info_t type, see Section 4.6.1.16 on page 515.
• ompt_get_num_devices_t type, see Section 4.6.1.17 on page 516.
• ompt_get_unique_id_t type, see Section 4.6.1.18 on page 517.
• ompt_finalize_tool_t type, see Section 4.6.1.19 on page 517.
• ompt_function_lookup_t type, see Section 4.6.3 on page 531.
Monitoring Activity on the Host with OMPT
To monitor the execution of an OpenMP program on the host device, a tool initializer must register to receive notification of events that occur as an OpenMP program executes. A tool can use the ompt_set_callback runtime entry point to register callbacks for OpenMP events. The return codes for ompt_set_callback use the ompt_set_result_t enumeration type. If the ompt_set_callback runtime entry point is called outside a tool initializer, registration of supported callbacks may fail with a return value of ompt_set_error.
All callbacks registered with ompt_set_callback or returned by ompt_get_callback use the dummy type signature ompt_callback_t.
Table 4.2 shows the valid registration return codes of the ompt_set_callback runtime entry point with specific values of its event argument. For callbacks for which ompt_set_always is
CHAPTER4. OMPTINTERFACE 425
TABLE 4.1: OMPT Callback Interface Runtime Entry Point Names and Their Type Signatures
Entry Point String Name
“ompt_enumerate_states” “ompt_enumerate_mutex_impls” “ompt_set_callback” “ompt_get_callback” “ompt_get_thread_data” “ompt_get_num_places” “ompt_get_place_proc_ids” “ompt_get_place_num” “ompt_get_partition_place_nums” “ompt_get_proc_id” “ompt_get_state” “ompt_get_parallel_info” “ompt_get_task_info” “ompt_get_task_memory” “ompt_get_num_devices” “ompt_get_num_procs” “ompt_get_target_info” “ompt_get_unique_id” “ompt_finalize_tool”
Type signature
ompt_enumerate_states_t
ompt_enumerate_mutex_impls_t
ompt_set_callback_t
ompt_get_callback_t
ompt_get_thread_data_t
ompt_get_num_places_t
ompt_get_place_proc_ids_t
ompt_get_place_num_t
ompt_get_partition_place_nums_t
ompt_get_proc_id_t
ompt_get_state_t
ompt_get_parallel_info_t
ompt_get_task_info_t
ompt_get_task_memory_t
ompt_get_num_devices_t
ompt_get_num_procs_t
ompt_get_target_info_t
ompt_get_unique_id_t
ompt_finalize_tool_t
426 OpenMP API – Version 5.0 November 2018
1 2 3 4 5 6 7 8 9
10
11
12
13
14
15
16
17
18
19 20 21 22
23 4.2.5
24 25 26 27
28
29
30
31
32
33
the only registration return code that is allowed, an OpenMP implementation must guarantee that the callback will be invoked every time that a runtime event that is associated with it occurs. Support for such callbacks is required in a minimal implementation of the OMPT interface. For callbacks for which the ompt_set_callback runtime entry may return values other than ompt_set_always, whether an OpenMP implementation invokes a registered callback never, sometimes, or always is implementation-defined. If registration for a callback allows a return code of omp_set_never, support for invoking such a callback may not be present in a minimal implementation of the OMPT interface. The return code from registering a callback indicates the implementation-defined level of support for the callback.
Two techniques reduce the size of the OMPT interface. First, in cases where events are naturally paired, for example, the beginning and end of a region, and the arguments needed by the callback at each endpoint are identical, a tool registers a single callback for the pair of events, with ompt_scope_begin or ompt_scope_end provided as an argument to identify for which endpoint the callback is invoked. Second, when a class of events is amenable to uniform treatment, OMPT provides a single callback for that class of events, for example, an ompt_callback_sync_region_wait callback is used for multiple kinds of synchronization regions, such as barrier, taskwait, and taskgroup regions. Some events, for example, ompt_callback_sync_region_wait, use both techniques.
Cross References
• ompt_set_result_t type, see Section 4.4.4.2 on page 438.
• ompt_set_callback_t type, see Section 4.6.1.3 on page 500. • ompt_get_callback_t type, see Section 4.6.1.4 on page 502.
Tracing Activity on Target Devices with OMPT
A target device may or may not initialize a full OpenMP runtime system. Unless it does, it may not be possible to monitor activity on a device using a tool interface based on callbacks. To accommodate such cases, the OMPT interface defines a monitoring interface for tracing activity on target devices. Tracing activity on a target device involves the following steps:
• To prepare to trace activity on a target device, a tool must register for an ompt_callback_device_initialize callback. A tool may also register for an ompt_callback_device_load callback to be notified when code is loaded onto a target device or an ompt_callback_device_unload callback to be notified when code is unloaded from a target device. A tool may also optionally register an ompt_callback_device_finalize callback.
CHAPTER4. OMPTINTERFACE 427
TABLE 4.2: Valid Return Codes of ompt_set_callback for Each Callback
Return code abbreviation
ompt_callback_thread_begin
ompt_callback_thread_end
ompt_callback_parallel_begin
ompt_callback_parallel_end
ompt_callback_task_create
ompt_callback_task_schedule
ompt_callback_implicit_task
ompt_callback_target
ompt_callback_target_data_op
ompt_callback_target_submit
ompt_callback_control_tool
ompt_callback_device_initialize
ompt_callback_device_finalize
ompt_callback_device_load
ompt_callback_device_unload
ompt_callback_sync_region_wait
ompt_callback_mutex_released
ompt_callback_dependences
ompt_callback_task_dependence
ompt_callback_work
ompt_callback_master
ompt_callback_target_map
ompt_callback_sync_region
ompt_callback_reduction
ompt_callback_lock_init
ompt_callback_lock_destroy
ompt_callback_mutex_acquire
ompt_callback_mutex_acquired
ompt_callback_nest_lock
ompt_callback_flush
ompt_callback_cancel
ompt_callback_dispatch
N = ompt_set_never
P = ompt_set_sometimes_paired
N S/P A *
*
*
*
*
*
*
*
*
*
*
*
*
*
*
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
S = ompt_set_sometimes A = ompt_set_always
428 OpenMP API – Version 5.0 November 2018
1 • 2
3
4
5 6
7 • 8
9
10
11
12
13
14
15
16
17
18
19
20
21
22 • 23
24
25
26 27
28 • 29
30
31
32 • 33
34 • 35
36
37
38 39 40 41
When an OpenMP implementation initializes a target device, the OpenMP implementation dispatches the device initialization callback of the tool on the host device. If the OpenMP implementation or target device does not support tracing, the OpenMP implementation passes NULL to the device initializer of the tool for its lookup argument; otherwise, the OpenMP implementation passes a pointer to a device-specific runtime entry point with type signature ompt_function_lookup_t to the device initializer of the tool.
If a non-null lookup pointer is provided to the device initializer of the tool, the tool may use it to determine the runtime entry points in the tracing interface that are available for the device and may bind the returned function pointers to tool variables. Table 4.3 indicates the names of runtime entry points that may be available for a device; an implementations may provide additional implementation-defined names and corresponding entry points. The driver for the device provides the runtime entry points that enable a tool to control the trace collection interface of the device. The native trace format that the interface uses may be device specific and the available kinds of trace records are implementation-defined. Some devices may allow a tool to collect traces of records in a standard format known as OMPT trace records. Each OMPT trace record serves as a substitute for an OMPT callback that cannot be made on the device. The fields in each trace record type are defined in the description of the callback that the record represents. If this type of record is provided then the lookup function returns values for the runtime entry points ompt_set_trace_ompt and ompt_get_record_ompt, which support collecting and decoding OMPT traces. If the native tracing format for a device is the OMPT format then tracing can be controlled using the runtime entry points for native or OMPT tracing.
The tool uses the ompt_set_trace_native and/or the ompt_set_trace_ompt runtime entry point to specify what types of events or activities to monitor on the device. The return codes for ompt_set_trace_ompt and ompt_set_trace_native use the ompt_set_result_t enumeration type. If the ompt_set_trace_native /or the ompt_set_trace_ompt runtime entry point is called outside a device initializer, registration of supported callbacks may fail with a return code of ompt_set_error.
The tool initiates tracing on the device by invoking ompt_start_trace. Arguments to ompt_start_trace include two tool callbacks through which the OpenMP implementation can manage traces associated with the device. One allocates a buffer in which the device can deposit trace events. The second callback processes a buffer of trace events from the device.
If the device requires a trace buffer, the OpenMP implementation invokes the tool-supplied callback function on the host device to request a new buffer.
The OpenMP implementation monitors the execution of OpenMP constructs on the device and records a trace of events or activities into a trace buffer. If possible, device trace records are marked with a host_op_id—an identifier that associates device activities with the target operation that the host initiated to cause these activities. To correlate activities on the host with activities on a device, a tool can register a ompt_callback_target_submit callback. Before the host initiates each distinct activity associated with a structured block for a target construct on a device, the OpenMP implementation dispatches the ompt_callback_target_submit callback on the host in the thread that is executing the task that encounters the target construct.
CHAPTER4. OMPTINTERFACE 429
TABLE 4.3: OMPT Tracing Interface Runtime Entry Point Names and Their Type Signatures
Entry Point String Name
“ompt_get_device_num_procs” “ompt_get_device_time” “ompt_translate_time” “ompt_set_trace_ompt” “ompt_set_trace_native” “ompt_start_trace” “ompt_pause_trace” “ompt_flush_trace” “ompt_stop_trace” “ompt_advance_buffer_cursor” “ompt_get_record_type” “ompt_get_record_ompt” “ompt_get_record_native” “ompt_get_record_abstract”
Type Signature
ompt_get_device_num_procs_t
ompt_get_device_time_t
ompt_translate_time_t
ompt_set_trace_ompt_t
ompt_set_trace_native_t
ompt_start_trace_t
ompt_pause_trace_t
ompt_flush_trace_t
ompt_stop_trace_t
ompt_advance_buffer_cursor_t
ompt_get_record_type_t
ompt_get_record_ompt_t
ompt_get_record_native_t
ompt_get_record_abstract_t
430 OpenMP API – Version 5.0 November 2018
1 2 3 4 5
6 • 7
8
9 • 10
11
12
13
14
15
16
17
18
19
20 • 21
22 • 23
24 • 25
26
27
28 •
29 •
30 31 32
33 34
Examples of activities that could cause an ompt_callback_target_submit callback to be dispatched include an explicit data copy between a host and target device or execution of a computation. This callback provides the tool with a pair of identifiers: one that identifies the target region and a second that uniquely identifies an activity associated with that region. These identifiers help the tool correlate activities on the target device with their target region.
When appropriate, for example, when a trace buffer fills or needs to be flushed, the OpenMP implementation invokes the tool-supplied buffer completion callback to process a non-empty sequence of records in a trace buffer that is associated with the device.
The tool-supplied buffer completion callback may return immediately, ignoring records in the trace buffer, or it may iterate through them using the ompt_advance_buffer_cursor entry point to inspect each record. A tool may use the ompt_get_record_type runtime entry point to inspect the type of the record at the current cursor position. Three runtime entry points (ompt_get_record_ompt, ompt_get_record_native, and ompt_get_record_abstract) allow tools to inspect the contents of some or all records in a trace buffer. The ompt_get_record_native runtime entry point uses the native trace format of the device. The ompt_get_record_abstract runtime entry point decodes the contents of a native trace record and summarizes them as an ompt_record_abstract_t record. The ompt_get_record_ompt runtime entry point can only be used to retrieve records in OMPT format.
Once tracing has been started on a device, a tool may pause or resume tracing on the device at any time by invoking ompt_pause_trace with an appropriate flag value as an argument.
A tool may invoke the ompt_flush_trace runtime entry point for a device at any time between device initialization and finalization to cause the device to flush pending trace records.
At any time, a tool may use the ompt_start_trace runtime entry point to start tracing or the ompt_stop_trace runtime entry point to stop tracing on a device. When tracing is stopped on a device, the OpenMP implementation eventually gathers all trace records already collected on the device and presents them to the tool using the buffer completion callback.
An OpenMP implementation can be shut down while device tracing is in progress.
When an OpenMP implementation is shut down, it finalize each device. Device finalization occurs in three steps. First, the OpenMP implementation halts any tracing in progress for the device. Second, the OpenMP implementation flushes all trace records collected for the device and uses the buffer completion callback associated with that device to present them to the tool. Finally, the OpenMP implementation dispatches any ompt_callback_device_finalize callback registered for the device.
35 Restrictions
36 Tracing activity on devices has the following restriction:
37 • Implementation-defined names must not start with the prefix ompt_, which is reserved for the
38 OpenMP specification.
CHAPTER4. OMPTINTERFACE 431
1
Cross References
• ompt_callback_device_initialize_t callback type, see Section 4.5.2.19 on page 482.
• ompt_callback_device_finalize_t callback type, see Section 4.5.2.20 on page 484.
• ompt_get_device_num_procs runtime entry point, see Section 4.6.2.1 on page 518.
• ompt_get_device_time runtime entry point, see Section 4.6.2.2 on page 519.
• ompt_translate_time runtime entry point, see Section 4.6.2.3 on page 520.
• ompt_set_trace_ompt runtime entry point, see Section 4.6.2.4 on page 521.
• ompt_set_trace_native runtime entry point, see Section 4.6.2.5 on page 522.
• ompt_start_trace runtime entry point, see Section 4.6.2.6 on page 523.
• ompt_pause_trace runtime entry point, see Section 4.6.2.7 on page 524.
• ompt_flush_trace runtime entry point, see Section 4.6.2.8 on page 525.
• ompt_stop_trace runtime entry point, see Section 4.6.2.9 on page 526.
• ompt_advance_buffer_cursor runtime entry point, see Section 4.6.2.10 on page 527.
• ompt_get_record_type runtime entry point, see Section 4.6.2.11 on page 528.
• ompt_get_record_ompt runtime entry point, see Section 4.6.2.12 on page 529.
• ompt_get_record_native runtime entry point, see Section 4.6.2.13 on page 530.
• ompt_get_record_abstract runtime entry point, see Section 4.6.2.14 on page 531.
Finalizing a First-Party Tool
If the OMPT interface state is active, the tool finalizer, which has type signature ompt_finalize_t and is specified by the finalize field in the ompt_start_tool_result_t structure returned from the ompt_start_tool function, is called when the OpenMP implementation shuts down.
Cross References
• ompt_finalize_t callback type, see Section 4.5.1.2 on page 458
2 3
4 5 6 7 8 9
10
11
12
13
14
15
16
17
18
19 4.3
20 21 22 23
24 25
432
OpenMP API – Version 5.0 November 2018
1 4.4 2
3
4 4.4.1 5
6 7 8
9
10 11 12 13 14
15 16
17 18
19 20 21 22 23
OMPT Data Types
The C/C++ header file (omp-tools.h) provides the definitions of the types that are specified throughout this subsection.
Tool Initialization and Finalization Summary
A tool’s implementation of ompt_start_tool returns a pointer to an ompt_start_tool_result_t structure, which contains pointers to the tool’s initialization and finalization callbacks as well as an ompt_data_t object for use by the tool.
Format
C / C++
C / C++
typedef struct ompt_start_tool_result_t { ompt_initialize_t initialize;
ompt_finalize_t finalize;
ompt_data_t tool_data;
} ompt_start_tool_result_t;
Restrictions
The ompt_start_tool_result_t type has the following restriction:
• The initialize and finalize callback pointer values in an ompt_start_tool_result_t
structure that ompt_start_tool returns must be non-null.
Cross References
• ompt_start_tool function, see Section 4.2.1 on page 420.
• ompt_data_t type, see Section 4.4.4.4 on page 440.
• ompt_initialize_t callback type, see Section 4.5.1.1 on page 457. • ompt_finalize_t callback type, see Section 4.5.1.2 on page 458.
CHAPTER4. OMPTINTERFACE 433
2
3 4
5
6 7 8 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Summary
The ompt_callbacks_t enumeration type indicates the integer codes used to identify OpenMP callbacks when registering or querying them.
1 4.4.2 Callbacks
Format
C / C++
typedef enum ompt_callbacks_t {
ompt_callback_thread_begin
ompt_callback_thread_end
ompt_callback_parallel_begin
ompt_callback_parallel_end
= 1,
= 2,
= 3,
= 4,
ompt_callback_task_create
ompt_callback_task_schedule
ompt_callback_implicit_task
ompt_callback_target
ompt_callback_target_data_op
ompt_callback_target_submit
=5, =6, =7, =8, =9, = 10,
ompt_callback_control_tool
ompt_callback_device_initialize
ompt_callback_device_finalize
ompt_callback_device_load
ompt_callback_device_unload
ompt_callback_sync_region_wait
= 11,
= 12,
= 13,
= 14,
= 15,
= 16,
ompt_callback_mutex_released
ompt_callback_dependences
ompt_callback_task_dependence
ompt_callback_work
ompt_callback_master
ompt_callback_target_map
= 17,
= 18,
= 19,
= 20,
= 21,
= 22,
ompt_callback_sync_region
ompt_callback_lock_init
ompt_callback_lock_destroy
ompt_callback_mutex_acquire
ompt_callback_mutex_acquired
ompt_callback_nest_lock
= 23,
= 24,
= 25,
= 26,
= 27,
= 28,
ompt_callback_flush
ompt_callback_cancel
ompt_callback_reduction
ompt_callback_dispatch
} ompt_callbacks_t;
= 29,
= 30,
= 31,
= 32
C / C++
434
OpenMP API – Version 5.0 November 2018
1 4.4.3 Tracing
2
3 4.4.3.1 4
5 6
7
8
9 10 11 12
13 4.4.3.2 14
15 16
17
18 19 20 21
OpenMP provides type definitions that support tracing with OMPT.
Record Type Summary
The ompt_record_t enumeration type indicates the integer codes used to identify OpenMP trace record formats.
Format
C / C++
C / C++
typedef enum ompt_record_t {
ompt_record_ompt
ompt_record_native
ompt_record_invalid
= 1, = 2, = 3
} ompt_record_t;
Native Record Kind Summary
The ompt_record_native_t enumeration type indicates the integer codes used to identify OpenMP native trace record contents.
Format
C / C++
C / C++
typedef enum ompt_record_native_t {
ompt_record_native_info = 1,
ompt_record_native_event = 2
} ompt_record_native_t;
CHAPTER4. OMPTINTERFACE 435
1 4.4.3.3 2
3 4
5
6 7 8 9
10 11 12
13
14
15
16
17
18
19
20
21
22
23
24
25 4.4.3.4
26 27
Native Record Abstract Type Summary
The ompt_record_abstract_t type provides an abstract trace record format that is used to summarize native device trace records.
Format
C / C++
C / C++
typedef struct ompt_record_abstract_t {
ompt_record_native_t rclass; const char *type; ompt_device_time_t start_time; ompt_device_time_t end_time; ompt_hwid_t hwid;
} ompt_record_abstract_t;
436
OpenMP API – Version 5.0 November 2018
Description
An ompt_record_abstract_t record contains information that a tool can use to process a native record that it may not fully understand. The rclass field indicates that the record is informational or that it represents an event; this information can help a tool determine how to present the record. The record type field points to a statically-allocated, immutable character string that provides a meaningful name that a tool can use to describe the event to a user. The start_time and end_time fields are used to place an event in time. The times are relative to the device clock. If an event does not have an associated start_time (end_time), the value of the start_time (end_time) field is ompt_time_none. The hardware identifier field, hwid, indicates the location on the device where the event occurred. A hwid may represent a hardware abstraction such as a core or a hardware thread identifier. The meaning of a hwid value for a device is implementation defined. If no hardware abstraction is associated with the record then the value of hwid is ompt_hwid_none.
Record Type Summary
The ompt_record_ompt_t type provides an standard complete trace record format.
1 Format
C / C++
typedef struct ompt_record_ompt_t { ompt_callbacks_t type; ompt_device_time_t time;
ompt_id_t thread_id;
ompt_id_t target_id; union {
ompt_record_thread_begin_t thread_begin; ompt_record_parallel_begin_t parallel_begin; ompt_record_parallel_end_t parallel_end; ompt_record_work_t work;
ompt_record_dispatch_t dispatch; ompt_record_task_create_t task_create; ompt_record_dependences_t dependences; ompt_record_task_dependence_t task_dependence; ompt_record_task_schedule_t task_schedule; ompt_record_implicit_task_t implicit_task;
ompt_record_master_t master; ompt_record_sync_region_t sync_region; ompt_record_mutex_acquire_t mutex_acquire; ompt_record_mutex_t mutex; ompt_record_nest_lock_t nest_lock; ompt_record_flush_t flush;
ompt_record_cancel_t cancel; ompt_record_target_t target; ompt_record_target_data_op_t target_data_op; ompt_record_target_map_t target_map; ompt_record_target_kernel_t target_kernel; ompt_record_control_tool_t control_tool;
} record;
} ompt_record_ompt_t;
2 3 4 5 6 7 8 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
C / C++
32 Description
33 The field type specifies the type of record provided by this structure. According to the type, event
34 specific information is stored in the matching record entry.
35 Restrictions
36 The ompt_record_ompt_t type has the following restriction:
37 • If type is set to ompt_callback_thread_end_t then the value of record is undefined.
CHAPTER4. OMPTINTERFACE 437
1 4.4.4 Miscellaneous Type Definitions
2
This section describes miscellaneous types and enumerations used by the tool interface.
3 4.4.4.1 ompt_callback_t
4 Summary
5 Pointers to tool callback functions with different type signatures are passed to the
6 ompt_set_callback runtime entry point and returned by the ompt_get_callback
7 runtime entry point. For convenience, these runtime entry points expect all type signatures to be
8 cast to a dummy type ompt_callback_t.
9 Format
10 typedef void (*ompt_callback_t) (void);
11 4.4.4.2 ompt_set_result_t
C / C++ C / C++
12
13 14
15
16
17
18
19
20
21
22
23
Summary
The ompt_result_t enumeration type corresponds to values that the ompt_set_callback, ompt_set_trace_ompt and ompt_set_trace_native runtime entry points return.
Format
C / C++
typedef enum ompt_set_result_t {
ompt_set_error
ompt_set_never
ompt_set_impossible
ompt_set_sometimes
= 0,
= 1,
= 2,
= 3,
ompt_set_sometimes_paired = 4,
ompt_set_always = 5
} ompt_set_result_t;
C / C++
438
OpenMP API – Version 5.0 November 2018
1 Description
2 Values of ompt_set_result_t, may indicate several possible outcomes. The
3 omp_set_error value indicates that the associated call failed. Otherwise, the value indicates
4 when an event may occur and, when appropriate, dispatching a callback event leads to the
5 invocation of the callback. The ompt_set_never value indicates that the event will never occur
6 or that the callback will never be invoked at runtime. The ompt_set_impossible value
7 indicates that the event may occur but that tracing of it is not possible. The
8 ompt_set_sometimes value indicates that the event may occur and, for an
9 implementation-defined subset of associated event occurrences, will be traced or the callback will
10 be invoked at runtime. The ompt_set_sometimes_paired value indicates the same result as
11 ompt_set_sometimes and, in addition, that a callback with an endpoint value of
12 ompt_scope_begin will be invoked if and only if the same callback with an endpoint value of
13 ompt_scope_end will also be invoked sometime in the future. The ompt_set_always value
14 indicates that, whenever an associated event occurs, it will be traced or the callback will be invoked.
15 Cross References
16 • Monitoring activity on the host with OMPT, see Section 4.2.4 on page 425.
17 • Tracing activity on target devices with OMPT, see Section 4.2.5 on page 427.
18 • ompt_set_callback runtime entry point, see Section 4.6.1.3 on page 500.
19 • ompt_set_trace_ompt runtime entry point, see Section 4.6.2.4 on page 521.
20 • ompt_set_trace_native runtime entry point, see Section 4.6.2.5 on page 522.
21 4.4.4.3 ompt_id_t
22 Summary
23 The ompt_id_t type is used to provide various identifiers to tools.
24 Format
C / C++
25 typedef uint64_t ompt_id_t;
C / C++
CHAPTER4. OMPTINTERFACE 439
1 Description
2 When tracing asynchronous activity on devices, identifiers enable tools to correlate target regions
3 and operations that the host initiates with associated activities on a target device. In addition,
4 OMPT provides identifiers to refer to parallel regions and tasks that execute on a device. These
5 various identifiers are of type ompt_id_t.
6 ompt_id_none is defined as an instance of type ompt_id_t with the value 0.
7 Restrictions
8 The ompt_id_t type has the following restriction:
9 • Identifiers created on each device must be unique from the time an OpenMP implementation is
10 initialized until it is shut down. Identifiers for each target region and target operation instance
11 that the host device initiates must be unique over time on the host. Identifiers for parallel and task
12 region instances that execute on a device must be unique over time within that device.
13 4.4.4.4 ompt_data_t
14 15
16
17 18 19 20
21
22 23 24 25
Summary
The ompt_data_t type represents data associated with threads and with parallel and task regions. Format
C / C++
C / C++
typedef union ompt_data_t {
uint64_t value;
void *ptr;
} ompt_data_t;
440
OpenMP API – Version 5.0 November 2018
Description
The ompt_data_t type represents data that is reserved for tool use and that is related to a thread or to a parallel or task region. When an OpenMP implementation creates a thread or an instance of a parallel or task region, it initializes the associated ompt_data_t object with the value ompt_data_none, which is an instance of the type with the data and pointer fields equal to 0.
1 4.4.4.5 ompt_device_t
2 Summary
3 The ompt_device_t opaque object type represents a device.
4 Format
C / C++
5 typedef void ompt_device_t;
C / C++
6 4.4.4.6 ompt_device_time_t
7 Summary
8 The ompt_device_time_t type represents raw device time values.
9 Format
C / C++
10 typedef uint64_t ompt_device_time_t; C / C++
11 Description
12 The ompt_device_time_t opaque object type represents raw device time values.
13 ompt_time_none refers to an unknown or unspecified time and is defined as an instance of type
14 ompt_device_time_t with the value 0.
15 4.4.4.7 ompt_buffer_t
16 Summary
17 The ompt_buffer_t opaque object type is a handle for a target buffer.
18 Format
C / C++
19 typedef void ompt_buffer_t;
C / C++
CHAPTER4. OMPTINTERFACE 441
1 4.4.4.8 ompt_buffer_cursor_t
2 Summary
3 The ompt_buffer_cursor_t opaque type is a handle for a position in a target buffer.
4 Format
C / C++
5 typedef uint64_t ompt_buffer_cursor_t; C / C++
6 4.4.4.9 ompt_dependence_t
7 8
9
10 11 12 13
14
15 16 17 18
19 20
Summary
The ompt_dependence_t type represents a task dependence. Format
C / C++
C / C++
typedef struct ompt_dependence_t { ompt_data_t variable; ompt_dependence_type_t dependence_type;
} ompt_dependence_t;
442
OpenMP API – Version 5.0 November 2018
Description
The ompt_dependence_t type is a structure that holds information about a depend clause. For task dependences, the variable field points to the storage location of the dependence. For doacross dependences, the variable field contains the value of a vector element that describes the dependence. The dependence_type field indicates the type of the dependence.
Cross References
• ompt_dependence_type_t type, see Section 4.4.4.23 on page 450.
1 4.4.4.10 ompt_thread_t
2 Summary
3 The ompt_thread_t enumeration type defines the valid thread type values.
4 Format
typedef enum ompt_thread_t {
ompt_thread_initial = 1,
ompt_thread_worker
ompt_thread_other
ompt_thread_unknown
} ompt_thread_t;
= 2, = 3, = 4
5 6 7 8 9
10
11 Description
C / C++
C / C++
12 Any initial thread has thread type ompt_thread_initial. All OpenMP threads that are not
13 initial threads have thread type ompt_thread_worker. A thread that an OpenMP
14 implementation uses but that does not execute user code has thread type ompt_thread_other.
15 Any thread that is created outside an OpenMP implementation and that is not an initial thread has
16 thread type ompt_thread_unknown.
17 4.4.4.11 ompt_scope_endpoint_t
18 Summary
19 The ompt_scope_endpoint_t enumeration type defines valid scope endpoint values.
20 Format
typedef enum ompt_scope_endpoint_t {
ompt_scope_begin = 1,
ompt_scope_end = 2
} ompt_scope_endpoint_t;
21 22 23 24
C / C++
C / C++
CHAPTER4. OMPTINTERFACE 443
1 4.4.4.12 ompt_dispatch_t
2 Summary
3 The ompt_dispatch_t enumeration type defines the valid dispatch kind values.
4 Format
typedef enum ompt_dispatch_t {
ompt_dispatch_iteration = 1,
ompt_dispatch_section = 2
} ompt_dispatch_t;
5 6 7 8
9 4.4.4.13 ompt_sync_region_t
10 Summary
C / C++
C / C++
11 The ompt_sync_region_t enumeration type defines the valid synchronization region kind
12 values.
13 Format
14
15
16
17
18
19
20
21
22
23 4.4.4.14 ompt_target_data_op_t
C / C++
typedef enum ompt_sync_region_t {
ompt_sync_region_barrier = 1,
ompt_sync_region_barrier_implicit =2, ompt_sync_region_barrier_explicit =3, ompt_sync_region_barrier_implementation = 4,
ompt_sync_region_taskwait
ompt_sync_region_taskgroup
ompt_sync_region_reduction
=5, =6, = 7
} ompt_sync_region_t;
24 25
Summary
444
OpenMP API – Version 5.0 November 2018
C / C++
The ompt_target_data_op_t enumeration type defines the valid target data operation values.
1 Format
C / C++
typedef enum ompt_target_data_op_t {
ompt_target_data_alloc = 1,
ompt_target_data_transfer_to_device = 2,
ompt_target_data_transfer_from_device = 3,
ompt_target_data_delete
ompt_target_data_associate
ompt_target_data_disassociate
} ompt_target_data_op_t;
= 4, = 5, = 6
2 3 4 5 6 7 8 9
10 4.4.4.15 ompt_work_t
11 Summary
C / C++
12 The ompt_work_t enumeration type defines the valid work type values.
13 Format
C / C++
typedef enum ompt_work_t {
ompt_work_loop
ompt_work_sections
ompt_work_single_executor
ompt_work_single_other
= 1,
= 2,
= 3,
= 4,
ompt_work_workshare
ompt_work_distribute
ompt_work_taskloop
} ompt_work_t;
= 5, = 6, = 7
14
15
16
17
18
19
20
21
22
23 4.4.4.16 ompt_mutex_t
24 Summary
C / C++
25 The ompt_mutex_t enumeration type defines the valid mutex kind values.
CHAPTER4. OMPTINTERFACE 445
1 Format
C / C++
typedef enum ompt_mutex_t {
ompt_mutex_lock = 1,
ompt_mutex_test_lock
ompt_mutex_nest_lock
ompt_mutex_test_nest_lock
ompt_mutex_critical
ompt_mutex_atomic
ompt_mutex_ordered
=2, =3, =4, =5, =6, = 7
} ompt_mutex_t;
2 3 4 5 6 7 8 9
10
11 4.4.4.17 ompt_native_mon_flag_t
12 Summary
C / C++
13 The ompt_native_mon_flag_t enumeration type defines the valid native monitoring flag
14 values.
15 Format
16
17
18
19
20
21
22
23
24
25
26 4.4.4.18 ompt_task_flag_t
C / C++
typedef enum ompt_native_mon_flag_t {
ompt_native_data_motion_explicit = 0x01,
ompt_native_data_motion_implicit = 0x02,
ompt_native_kernel_invocation
ompt_native_kernel_execution
ompt_native_driver
ompt_native_runtime
ompt_native_overhead
ompt_native_idleness
= 0x04,
= 0x08,
= 0x10,
= 0x20,
= 0x40,
= 0x80
} ompt_native_mon_flag_t;
27 28
Summary
446
OpenMP API – Version 5.0 November 2018
The ompt_task_flag_t enumeration type defines valid task types.
C / C++
1 Format
C / C++
typedef enum ompt_task_flag_t {
ompt_task_initial = 0x00000001,
ompt_task_implicit
ompt_task_explicit
ompt_task_target
ompt_task_undeferred
ompt_task_untied
ompt_task_final
= 0x00000002,
= 0x00000004,
= 0x00000008,
= 0x08000000,
= 0x10000000,
= 0x20000000,
ompt_task_mergeable = 0x40000000,
ompt_task_merged = 0x80000000
} ompt_task_flag_t;
2 3 4 5 6 7 8 9
10 11 12
13 Description
C / C++
14 The ompt_task_flag_t enumeration type defines valid task type values. The least significant
15 byte provides information about the general classification of the task. The other bits represent
16 properties of the task.
17 4.4.4.19 ompt_task_status_t
18 Summary
19 The ompt_task_status_t enumeration type indicates the reason that a task was switched
20 when it reached a task scheduling point.
21 Format
22
23
24
25
26
27
28
29
30
C / C++
typedef enum ompt_task_status_t {
ompt_task_complete
ompt_task_yield
ompt_task_cancel
= 1, = 2, = 3,
ompt_task_detach = 4,
ompt_task_early_fulfill = 5,
ompt_task_late_fulfill = 6,
ompt_task_switch = 7
} ompt_task_status_t;
C / C++
CHAPTER4. OMPTINTERFACE 447
1 Description
2 The value ompt_task_complete of the ompt_task_status_t type indicates that the task
3 that encountered the task scheduling point completed execution of the associated structured-block
4 and an associated allow-completion-event was fulfilled. The value ompt_task_yield indicates
5 that the task encountered a taskyield construct. The value ompt_task_cancel indicates
6 that the task was canceled when it encountered an active cancellation point. The value
7 ompt_task_detach indicates that a task with detach clause completed execution of the
8 associated structured-block and is waiting for an allow-completion-event to be fulfilled. The value
9 ompt_task_early_fulfill indicates that the allow-completion-event of the task is fulfilled
10 before the task completed execution of the associated structured-block. The value
11 ompt_task_late_fulfill indicates that the allow-completion-event of the task is fulfilled
12 after the task completed execution of the associated structured-block. The value
13 ompt_task_switch is used for all other cases that a task was switched.
14 4.4.4.20 ompt_target_t
15 Summary
16 The ompt_target_t enumeration type defines the valid target type values.
17 Format
typedef enum ompt_target_t {
ompt_target
ompt_target_enter_data
ompt_target_exit_data
ompt_target_update
} ompt_target_t;
= 1, = 2, = 3, = 4
18
19
20
21
22
23
24 4.4.4.21 ompt_parallel_flag_t
C / C++
C / C++
25 26
Summary
448
OpenMP API – Version 5.0 November 2018
The ompt_parallel_flag_t enumeration type defines valid invoker values.
1 Format
typedef enum ompt_parallel_flag_t {
ompt_parallel_invoker_program = 0x00000001,
ompt_parallel_invoker_runtime = 0x00000002,
ompt_parallel_league = 0x40000000,
ompt_parallel_team = 0x80000000
} ompt_parallel_flag_t;
2 3 4 5 6 7
C / C++
C / C++
8 Description
9 The ompt_parallel_flag_t enumeration type defines valid invoker values, which indicate
10 how an outlined function is invoked.
11 The value ompt_parallel_invoker_program indicates that the outlined function
12 associated with implicit tasks for the region is invoked directly by the application on the master
13 thread for a parallel region.
14 The value ompt_parallel_invoker_runtime indicates that the outlined function
15 associated with implicit tasks for the region is invoked by the runtime on the master thread for a
16 parallel region.
17 The value ompt_parallel_league indicates that the callback is invoked due to the creation of
18 a league of teams by a teams construct.
19 The value ompt_parallel_team indicates that the callback is invoked due to the creation of a
20 team of threads by a parallel construct.
21 4.4.4.22 ompt_target_map_flag_t
22 Summary
23 The ompt_target_map_flag_t enumeration type defines the valid target map flag values.
CHAPTER4. OMPTINTERFACE 449
1 Format
C / C++
typedef enum ompt_target_map_flag_t {
ompt_target_map_flag_to = 0x01,
ompt_target_map_flag_from
ompt_target_map_flag_alloc
ompt_target_map_flag_release
ompt_target_map_flag_delete
ompt_target_map_flag_implicit
} ompt_target_map_flag_t;
= 0x02,
= 0x04,
= 0x08,
= 0x10,
= 0x20
2 3 4 5 6 7 8 9
10 4.4.4.23 ompt_dependence_type_t
11 Summary
C / C++
12 The ompt_dependence_type_t enumeration type defines the valid task dependence type
13 values.
14 Format
15
16
17
18
19
20
21
22
23 4.4.4.24 ompt_cancel_flag_t
C / C++
typedef enum ompt_dependence_type_t {
ompt_dependence_type_in
ompt_dependence_type_out
ompt_dependence_type_inout
= 1, = 2, = 3,
ompt_dependence_type_mutexinoutset
ompt_dependence_type_source
ompt_dependence_type_sink
} ompt_dependence_type_t;
= 4, = 5, = 6
24 25
Summary
450
OpenMP API – Version 5.0 November 2018
The ompt_cancel_flag_t enumeration type defines the valid cancel flag values.
C / C++
1 Format
C / C++
typedef enum ompt_cancel_flag_t {
ompt_cancel_parallel = 0x01,
ompt_cancel_sections
ompt_cancel_loop
ompt_cancel_taskgroup
ompt_cancel_activated
ompt_cancel_detected
ompt_cancel_discarded_task = 0x40
= 0x02,
= 0x04,
= 0x08,
= 0x10,
= 0x20,
} ompt_cancel_flag_t;
2 3 4 5 6 7 8 9
10
11 4.4.4.25 ompt_hwid_t
12 Summary
C / C++
13 The ompt_hwid_t opaque type is a handle for a hardware identifier for a target device.
14 Format
C / C++
15 typedef uint64_t ompt_hwid_t;
C / C++
16 Description
17 The ompt_hwid_t opaque type is a handle for a hardware identifier for a target device.
18 ompt_hwid_none is an instance of the type that refers to an unknown or unspecified hardware
19 identifier and that has the value 0. If no hwid is associated with an
20 ompt_record_abstract_t then the value of hwid is ompt_hwid_none.
21 Cross References
22 • ompt_record_abstract_t type, see Section 4.4.3.3 on page 436.
CHAPTER4. OMPTINTERFACE 451
2
3 4 5
6
7 8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Summary
If the OMPT interface is in the active state then an OpenMP implementation must maintain thread state information for each thread. The thread state maintained is an approximation of the instantaneous state of a thread.
Format
C / C++
A thread state must be one of the values of the enumeration type ompt_state_t or an implementation-defined state value of 512 or higher.
1 4.4.4.26 ompt_state_t
typedef enum ompt_state_t {
ompt_state_work_serial = 0x000,
ompt_state_work_parallel = 0x001,
ompt_state_work_reduction
ompt_state_wait_barrier
ompt_state_wait_barrier_implicit_parallel
ompt_state_wait_barrier_implicit_workshare = 0x012,
ompt_state_wait_barrier_implicit = 0x013,
= 0x002,
= 0x010,
= 0x011,
ompt_state_wait_barrier_explicit
ompt_state_wait_taskwait
ompt_state_wait_taskgroup
ompt_state_wait_mutex
= 0x014,
= 0x020,
= 0x021,
= 0x040,
ompt_state_wait_lock
ompt_state_wait_critical
ompt_state_wait_atomic
ompt_state_wait_ordered
ompt_state_wait_target
= 0x041,
= 0x042,
= 0x043,
= 0x044,
= 0x080,
ompt_state_wait_target_map
ompt_state_wait_target_update
ompt_state_idle
ompt_state_overhead
ompt_state_undefined
= 0x081,
= 0x082,
= 0x100,
= 0x101,
= 0x102
} ompt_state_t;
C / C++
452
OpenMP API – Version 5.0 November 2018
1 Description
2 A tool can query the OpenMP state of a thread at any time. If a tool queries the state of a thread that
3 is not associated with OpenMP then the implementation reports the state as
4 ompt_state_undefined.
5 The value ompt_state_work_serial indicates that the thread is executing code outside all
6 parallel regions.
7 The value ompt_state_work_parallel indicates that the thread is executing code within the
8 scope of a parallel region.
9 The value ompt_state_work_reduction indicates that the thread is combining partial
10 reduction results from threads in its team. An OpenMP implementation may never report a thread
11 in this state; a thread that is combining partial reduction results may have its state reported as
12 ompt_state_work_parallel or ompt_state_overhead.
13 The value ompt_state_wait_barrier indicates that the thread is waiting at either an
14 implicit or explicit barrier. An implementation may never report a thread in this state; instead, a
15 thread may have its state reported as ompt_state_wait_barrier_implicit or
16 ompt_state_wait_barrier_explicit, as appropriate.
17 The value ompt_state_wait_barrier_implicit indicates that the thread is waiting at an
18 implicit barrier in a parallel region. An OpenMP implementation may report
19 ompt_state_wait_barrier for implicit barriers.
20 The value ompt_state_wait_barrier_implicit_parallel indicates that the thread is
21 waiting at an implicit barrier at the end of a parallel region. An OpenMP implementation may
22 report ompt_state_wait_barrier or ompt_state_wait_barrier_implicit for
23 these barriers.
24 The value ompt_state_wait_barrier_implicit_workshare indicates that the thread
25 is waiting at an implicit barrier at the end of a worksharing construct. An OpenMP implementation
26 may report ompt_state_wait_barrier or ompt_state_wait_barrier_implicit
27 for these barriers.
28 The value ompt_state_wait_barrier_explicit indicates that the thread is waiting in a
29 barrier region. An OpenMP implementation may report ompt_state_wait_barrier for
30 these barriers.
31 The value ompt_state_wait_taskwait indicates that the thread is waiting at a taskwait
32 construct.
33 The value ompt_state_wait_taskgroup indicates that the thread is waiting at the end of a
34 taskgroup construct.
35 The value ompt_state_wait_mutex indicates that the thread is waiting for a mutex of an
36 unspecified type.
CHAPTER4. OMPTINTERFACE 453
1 The value ompt_state_wait_lock indicates that the thread is waiting for a lock or nestable
2 lock.
3 The value ompt_state_wait_critical indicates that the thread is waiting to enter a
4 critical region.
5 The value ompt_state_wait_atomic indicates that the thread is waiting to enter an atomic
6 region.
7 The value ompt_state_wait_ordered indicates that the thread is waiting to enter an
8 ordered region.
9 The value ompt_state_wait_target indicates that the thread is waiting for a target
10 region to complete.
11 The value ompt_state_wait_target_map indicates that the thread is waiting for a target
12 data mapping operation to complete. An implementation may report
13 ompt_state_wait_target for target data constructs.
14 The value ompt_state_wait_target_update indicates that the thread is waiting for a
15 target update operation to complete. An implementation may report
16 ompt_state_wait_target for target update constructs.
17 The value ompt_state_idle indicates that the thread is idle, that is, it is not part of an
18 OpenMP team.
19 The value ompt_state_overhead indicates that the thread is in the overhead state at any point
20 while executing within the OpenMP runtime, except while waiting at a synchronization point.
21 The value ompt_state_undefined indicates that the native thread is not created by the
22 OpenMP implementation.
23 4.4.4.27 ompt_frame_t
24 25
26
27
28
29
30
31
32
Summary
The ompt_frame_t type describes procedure frame information for an OpenMP task. Format
C / C++
C / C++
typedef struct ompt_frame_t { ompt_data_t exit_frame; ompt_data_t enter_frame;
int exit_frame_flags;
int enter_frame_flags;
} ompt_frame_t;
454
OpenMP API – Version 5.0 November 2018
1 Description
2 Each ompt_frame_t object is associated with the task to which the procedure frames belong.
3 Each non-merged initial, implicit, explicit, or target task with one or more frames on the stack of a
4 native thread has an associated ompt_frame_t object.
5 The exit_frame field of an ompt_frame_t object contains information to identify the first
6 procedure frame executing the task region. The exit_frame for the ompt_frame_t object
7 associated with the initial task that is not nested inside any OpenMP construct is NULL.
8 The enter_frame field of an ompt_frame_t object contains information to identify the latest still
9 active procedure frame executing the task region before entering the OpenMP runtime
10 implementation or before executing a different task. If a task with frames on the stack has not been
11 suspended, the value of enter_frame for the ompt_frame_t object associated with the task may
12 contain NULL.
13 For exit_frame, the exit_frame_flags and, for enter_frame, the enter_frame_flags field indicates that
14 the provided frame information points to a runtime or an application frame address. The same
15 fields also specify the kind of information that is provided to identify the frame, These fields are a
16 disjunction of values in the ompt_frame_flag_t enumeration type.
17 The lifetime of an ompt_frame_t object begins when a task is created and ends when the task is
18 destroyed. Tools should not assume that a frame structure remains at a constant location in memory
19 throughout the lifetime of the task. A pointer to an ompt_frame_t object is passed to some
20 callbacks; a pointer to the ompt_frame_t object of a task can also be retrieved by a tool at any
21 time, including in a signal handler, by invoking the ompt_get_task_info runtime entry point
22 (described in Section 4.6.1.14). A pointer to an ompt_frame_t object that a tool retrieved is
23 valid as long as the tool does not pass back control to the OpenMP implementation.
24
25 Note – A monitoring tool that uses asynchronous sampling can observe values of exit_frame and
26 enter_frame at inconvenient times. Tools must be prepared to handle ompt_frame_t objects
27 observed just prior to when their field values will be set or cleared.
28
29 4.4.4.28 ompt_frame_flag_t
30 Summary
31 The ompt_frame_flag_t enumeration type defines valid frame information flags.
CHAPTER4. OMPTINTERFACE 455
1 Format
typedef enum ompt_frame_flag_t {
ompt_frame_runtime = 0x00,
ompt_frame_application
ompt_frame_cfa
ompt_frame_framepointer
ompt_frame_stackaddress
} ompt_frame_flag_t;
= 0x01,
= 0x10,
= 0x20,
= 0x30
2 3 4 5 6 7 8
9 Description
C / C++
C / C++
10 The value ompt_frame_runtime of the ompt_frame_flag_t type indicates that a frame
11 address is a procedure frame in the OpenMP runtime implementation. The value
12 ompt_frame_application of the ompt_frame_flag_t type indicates that an exit frame
13 address is a procedure frame in the OpenMP application.
14 Higher order bits indicate the kind of provided information that is unique for the particular frame
15 pointer. The value ompt_frame_cfa indicates that a frame address specifies a canonical frame
16 address. The value ompt_frame_framepointer indicates that a frame address provides the
17 value of the frame pointer register. The value ompt_frame_stackaddress indicates that a
18 frame address specifies a pointer address that is contained in the current stack frame.
19 4.4.4.29 ompt_wait_id_t
20 21
22 23
Summary
The ompt_wait_id_t type describes wait identifiers for an OpenMP thread. Format
C / C++
typedef uint64_t ompt_wait_id_t;
C / C++
456
OpenMP API – Version 5.0 November 2018
1
Description
Each thread maintains a wait identifier of type ompt_wait_id_t. When a task that a thread executes is waiting for mutual exclusion, the wait identifier of the thread indicates the reason that the thread is waiting. A wait identifier may represent a critical section name, a lock, a program variable accessed in an atomic region, or a synchronization object that is internal to an OpenMP implementation. When a thread is not in a wait state then the value of the wait identifier of the thread is undefined.
ompt_wait_id_none is defined as an instance of type ompt_wait_id_t with the value 0.
OMPT Tool Callback Signatures and Trace Records
The C/C++ header file (omp-tools.h) provides the definitions of the types that are specified throughout this subsection.
Restrictions
• Tool callbacks may not use OpenMP directives or call any runtime library routines described in Section 3.
2 3 4 5 6 7
8
9 4.5 10
11
12
13 14
Initialization and Finalization Callback Signature
17 Summary
18 A callback with type signature ompt_initialize_t initializes use of the OMPT interface.
19 Format
15 4.5.1
16 4.5.1.1 ompt_initialize_t
typedef int (*ompt_initialize_t) ( ompt_function_lookup_t lookup,
int initial_device_num,
ompt_data_t *tool_data );
20 21 22 23 24
C / C++
C / C++
CHAPTER4. OMPTINTERFACE 457
1 Description
2 To use the OMPT interface, an implementation of ompt_start_tool must return a non-null
3 pointer to an ompt_start_tool_result_t structure that contains a non-null pointer to a tool
4 initializer with type signature ompt_initialize_t. An OpenMP implementation will call the
5 initializer after fully initializing itself but before beginning execution of any OpenMP construct or
6 completing execution of any environment routine invocation.
7 The initializer returns a non-zero value if it succeeds.
8 Description of Arguments
9 The lookup argument is a callback to an OpenMP runtime routine that must be used to obtain a
10 pointer to each runtime entry point in the OMPT interface. The initial_device_num argument
11 provides the value of omp_get_initial_device(). The tool_data argument is a pointer to
12 the tool_data field in the ompt_start_tool_result_t structure that ompt_start_tool
13 returned. The expected actions of an initializer are described in Section 4.2.3.
14 Cross References
15 • omp_get_initial_device routine, see Section 3.2.41 on page 376.
16 • ompt_start_tool function, see Section 4.2.1 on page 420.
17 • ompt_start_tool_result_t type, see Section 4.4.1 on page 433.
18 • ompt_data_t type, see Section 4.4.4.4 on page 440.
19 • ompt_function_lookup_t type, see Section 4.6.3 on page 531.
20 4.5.1.2 ompt_finalize_t
21
22 23
24
25 26 27
Summary
A tool implements a finalizer with the type signature ompt_finalize_t to finalize the tool’s use of the OMPT interface.
Format
C / C++
C / C++
typedef void (*ompt_finalize_t) (
ompt_data_t *tool_data );
458
OpenMP API – Version 5.0 November 2018
1
Description
To use the OMPT interface, an implementation ofompt_start_tool must return a non-null pointer to an ompt_start_tool_result_t structure that contains a non-null pointer to a tool finalizer with type signature ompt_finalize_t. An OpenMP implementation will call the tool finalizer after the last OMPT event as the OpenMP implementation shuts down.
Description of Arguments
The tool_data argument is a pointer to the tool_data field in the ompt_start_tool_result_t structure returned by ompt_start_tool.
Cross References
• ompt_start_tool function, see Section 4.2.1 on page 420.
• ompt_start_tool_result_t type, see Section 4.4.1 on page 433. • ompt_data_t type, see Section 4.4.4.4 on page 440.
Event Callback Signatures and Trace Records
This section describes the signatures of tool callback functions that an OMPT tool may register and that are called during runtime of an OpenMP program. An implementation may also provide a trace of events per device. Along with the callbacks, the following defines standard trace records. For the trace records, tool data arguments are replaced by an ID, which must be initialized by the OpenMP implementation. Each of parallel_id, task_id, and thread_id must be unique per target region. Tool implementations of callbacks are not required to be async signal safe.
Cross References
• ompt_id_t type, see Section 4.4.4.3 on page 439.
• ompt_data_t type, see Section 4.4.4.4 on page 440.
2 3 4 5
6
7 8
9 10 11 12
13 4.5.2
14
15
16
17
18
19
20 21 22
23 4.5.2.1 ompt_callback_thread_begin_t
24 Summary
25 The ompt_callback_thread_begin_t type is used for callbacks that are dispatched when
26 native threads are created.
CHAPTER4. OMPTINTERFACE 459
1 Format
typedef void (*ompt_callback_thread_begin_t) ( ompt_thread_t thread_type,
ompt_data_t *thread_data );
2 3 4 5
6 Trace Record
7 8 9
C / C++
C / C++
C / C++
C / C++
10 Description of Arguments
11 The thread_type argument indicates the type of the new thread: initial, worker, or other. The
12 binding of the thread_data argument is the new thread.
13 Cross References
14 • parallel construct, see Section 2.6 on page 74.
15 • teams construct, see Section 2.7 on page 82.
16 • Initial task, see Section 2.10.5 on page 148.
17 • ompt_data_t type, see Section 4.4.4.4 on page 440.
18 • ompt_thread_t type, see Section 4.4.4.10 on page 443.
19 4.5.2.2 ompt_callback_thread_end_t
20
21 22
Summary
The ompt_callback_thread_end_t type is used for callbacks that are dispatched when native threads are destroyed.
460
OpenMP API – Version 5.0 November 2018
typedef struct ompt_record_thread_begin_t { ompt_thread_t thread_type;
} ompt_record_thread_begin_t;
1 Format
typedef void (*ompt_callback_thread_end_t) ( ompt_data_t *thread_data
);
2 3 4
C / C++
C / C++
5 Description of Arguments
6 The binding of the thread_data argument is the thread that will be destroyed.
7 Cross References
8 • parallel construct, see Section 2.6 on page 74.
9 • teams construct, see Section 2.7 on page 82.
10 • Initial task, see Section 2.10.5 on page 148.
11 • ompt_record_ompt_t type, see Section 4.4.3.4 on page 436.
12 • ompt_data_t type, see Section 4.4.4.4 on page 440.
13 4.5.2.3 ompt_callback_parallel_begin_t
14 Summary
15 The ompt_callback_parallel_begin_t type is used for callbacks that are dispatched
16 when parallel and teams regions start.
17 Format
18
19
20
21
22
23
24
25
C / C++
typedef void (*ompt_callback_parallel_begin_t) (
ompt_data_t *encountering_task_data,
const ompt_frame_t *encountering_task_frame, ompt_data_t *parallel_data,
unsigned int requested_parallelism,
int flags,
const void *codeptr_ra
);
C / C++
CHAPTER4. OMPTINTERFACE 461
typedef struct ompt_record_parallel_begin_t { ompt_id_t encountering_task_id;
ompt_id_t parallel_id;
unsigned int requested_parallelism; int flags;
const void *codeptr_ra;
} ompt_record_parallel_begin_t;
2 3 4 5 6 7 8
9
10
11 12
13
14 15
16 17 18
19
20
21
22
23
24
25
26
27
28
29
30
C / C++
C / C++
1
Trace Record
462
OpenMP API – Version 5.0 November 2018
Description of Arguments
The binding of the encountering_task_data argument is the encountering task.
The encountering_task_frame argument points to the frame object that is associated with the
encountering task.
The binding of the parallel_data argument is the parallel or teams region that is beginning.
The requested_parallelism argument indicates the number of threads or teams that the user requested.
The flags argument indicates whether the code for the region is inlined into the application or invoked by the runtime and also whether the region is a parallel or teams region. Valid values for flags are a disjunction of elements in the enum ompt_parallel_flag_t.
The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a runtime routine implements the region associated with a callback that has type signature ompt_callback_parallel_begin_t then codeptr_ra contains the return address of the call to that runtime routine. If the implementation the region is inlined then codeptr_ra contains the return address of the invocation of the callback. If attribution to source code is impossible or inappropriate, codeptr_ra may be NULL.
Cross References
• parallel construct, see Section 2.6 on page 74.
• teams construct, see Section 2.7 on page 82.
• ompt_data_t type, see Section 4.4.4.4 on page 440.
• ompt_parallel_flag_t type, see Section 4.4.4.21 on page 448. • ompt_frame_t type, see Section 4.4.4.27 on page 454.
1 4.5.2.4 ompt_callback_parallel_end_t
2 Summary
3 The ompt_callback_parallel_end_t type is used for callbacks that are dispatched when
4 parallel and teams regions ends.
5 Format
6 7 8 9
10 11
12 Trace Record
13
14
15
16
17
18
C / C++
C / C++
C / C++
C / C++
typedef void (*ompt_callback_parallel_end_t) (
ompt_data_t *parallel_data, ompt_data_t *encountering_task_data, int flags,
const void *codeptr_ra
);
typedef struct ompt_record_parallel_end_t {
ompt_id_t parallel_id; ompt_id_t encountering_task_id; int flags;
const void *codeptr_ra;
} ompt_record_parallel_end_t;
19 Description of Arguments
20 The binding of the parallel_data argument is the parallel or teams region that is ending.
21 The binding of the encountering_task_data argument is the encountering task.
22 The flags argument indicates whether the execution of the region is inlined into the application or
23 invoked by the runtime and also whether it is a parallel or teams region. Values for flags are a
24 disjunction of elements in the enum ompt_parallel_flag_t.
25 The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a
26 runtime routine implements the region associated with a callback that has type signature
27 ompt_callback_parallel_end_t then codeptr_ra contains the return address of the call to
28 that runtime routine. If the implementation of the region is inlined then codeptr_ra contains the
29 return address of the invocation of the callback. If attribution to source code is impossible or
30 inappropriate, codeptr_ra may be NULL.
CHAPTER4. OMPTINTERFACE 463
7
8 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Summary
The ompt_callback_work_t type is used for callbacks that are dispatched when worksharing regions, loop-related regions, and taskloop regions begin and end.
1 Cross References
2 • parallel construct, see Section 2.6 on page 74.
3 • teams construct, see Section 2.7 on page 82.
4 • ompt_data_t type, see Section 4.4.4.4 on page 440.
5 • ompt_parallel_flag_t type, see Section 4.4.4.21 on page 448.
6 4.5.2.5 ompt_callback_work_t
Format
C / C++
typedef void (*ompt_callback_work_t) ( ompt_work_t wstype, ompt_scope_endpoint_t endpoint,
ompt_data_t *parallel_data, ompt_data_t *task_data, uint64_t count,
const void *codeptr_ra
);
Trace Record
C / C++
C / C++
typedef struct ompt_record_work_t {
ompt_work_t wstype; ompt_scope_endpoint_t endpoint; ompt_id_t parallel_id;
ompt_id_t task_id;
uint64_t count;
const void *codeptr_ra;
} ompt_record_work_t;
C / C++
464
OpenMP API – Version 5.0 November 2018
1 Description of Arguments
2 The wstype argument indicates the kind of region.
3 The endpoint argument indicates that the callback signals the beginning of a scope or the end of a
4 scope.
5 The binding of the parallel_data argument is the current parallel region.
6 The binding of the task_data argument is the current task.
7 The count argument is a measure of the quantity of work involved in the construct. For a
8 worksharing-loop construct, count represents the number of iterations of the loop. For a
9 taskloop construct, count represents the number of iterations in the iteration space, which may
10 be the result of collapsing several associated loops. For a sections construct, count represents
11 the number of sections. For a workshare construct, count represents the units of work, as defined
12 by the workshare construct. For a single construct, count is always 1. When the endpoint
13 argument signals the end of a scope, a count value of 0 indicates that the actual count value is not
14 available.
15 The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a
16 runtime routine implements the region associated with a callback that has type signature
17 ompt_callback_work_t then codeptr_ra contains the return address of the call to that
18 runtime routine. If the implementation of the region is inlined then codeptr_ra contains the return
19 address of the invocation of the callback. If attribution to source code is impossible or
20 inappropriate, codeptr_ra may be NULL.
21 Cross References
22 • Worksharing constructs, see Section 2.8 on page 86 and Section 2.9.2 on page 101.
23 • Loop-related constructs, see Section 2.9 on page 95.
24 • taskloop construct, see Section 2.10.2 on page 140.
25 • ompt_data_t type, see Section 4.4.4.4 on page 440.
26 • ompt_scope_endpoint_t type, see Section 4.4.4.11 on page 443.
27 • ompt_work_t type, see Section 4.4.4.15 on page 445.
28 4.5.2.6 ompt_callback_dispatch_t
29 Summary
30 The ompt_callback_dispatch_t type is used for callbacks that are dispatched when a
31 thread begins to execute a section or loop iteration.
CHAPTER4. OMPTINTERFACE 465
typedef void (*ompt_callback_dispatch_t) ( ompt_data_t *parallel_data,
ompt_data_t *task_data, ompt_dispatch_t kind, ompt_data_t instance
);
2 3 4 5 6 7
8
9 10 11 12 13 14
15 16
17 18
19
20
21
22
23
24
25
Trace Record
C / C++
C / C++
C / C++
C / C++
1
Format
typedef struct ompt_record_dispatch_t { ompt_id_t parallel_id;
ompt_id_t task_id;
ompt_dispatch_t kind;
ompt_data_t instance;
} ompt_record_dispatch_t;
466
OpenMP API – Version 5.0 November 2018
Description of Arguments
The binding of the parallel_data argument is the current parallel region.
The binding of the task_data argument is the implicit task that executes the structured block of the
parallel region.
The kind argument indicates whether a loop iteration or a section is being dispatched.
For a loop iteration, the instance.value argument contains the iteration variable value. For a structured block in the sections construct, instance.ptr contains a code address that identifies the structured block. In cases where a runtime routine implements the structured block associated with this callback, instance.ptr contains the return address of the call to the runtime routine. In cases where the implementation of the structured block is inlined, instance.ptr contains the return address of the invocation of this callback.
1 Cross References
2 • sections and section constructs, see Section 2.8.1 on page 86.
3 • Worksharing-loop construct, see Section 2.9.2 on page 101.
4 • taskloop construct, see Section 2.10.2 on page 140.
5 • ompt_data_t type, see Section 4.4.4.4 on page 440.
6 • ompt_dispatch_t type, see Section 4.4.4.12 on page 444.
7 4.5.2.7 ompt_callback_task_create_t
8 Summary
9 The ompt_callback_task_create_t type is used for callbacks that are dispatched when
10 task regions or initial tasks are generated.
11 Format
12
13
14
15
16
17
18
19
20 Trace Record
21
22
23
24
25
26
27
C / C++
typedef void (*ompt_callback_task_create_t) ( ompt_data_t *encountering_task_data,
const ompt_frame_t *encountering_task_frame, ompt_data_t *new_task_data,
int flags,
int has_dependences,
const void *codeptr_ra );
C / C++
C / C++
C / C++
typedef struct ompt_record_task_create_t { ompt_id_t encountering_task_id;
ompt_id_t new_task_id;
int flags;
int has_dependences; const void *codeptr_ra;
} ompt_record_task_create_t;
CHAPTER4. OMPTINTERFACE 467
1 Description of Arguments
2 The binding of the encountering_task_data argument is the encountering task. This argument is
3 NULL for an initial task.
4 The encountering_task_frame argument points to the frame object associated with the encountering
5 task. This argument is NULL for an initial task.
6 The binding of the new_task_data argument is the generated task.
7 The flags argument indicates the kind of the task (initial, explicit, or target) that is generated.
8 Values for flags are a disjunction of elements in the ompt_task_flag_t enumeration type.
9 The has_dependences argument is true if the generated task has dependences and false otherwise.
10 The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a
11 runtime routine implements the region associated with a callback that has type signature
12 ompt_callback_task_create_t then codeptr_ra contains the return address of the call to
13 that runtime routine. If the implementation of the region is inlined then codeptr_ra contains the
14 return address of the invocation of the callback. If attribution to source code is impossible or
15 inappropriate, codeptr_ra may be NULL.
16 Cross References
17 • task construct, see Section 2.10.1 on page 135.
18 • Initial task, see Section 2.10.5 on page 148.
19 • ompt_data_t type, see Section 4.4.4.4 on page 440.
20 • ompt_task_flag_t type, see Section 4.4.4.18 on page 446.
21 • ompt_frame_t type, see Section 4.4.4.27 on page 454.
22 4.5.2.8 ompt_callback_dependences_t
23
24 25 26
Summary
The ompt_callback_dependences_t type is used for callbacks that are related to dependences and that are dispatched when new tasks are generated and when ordered constructs are encountered.
468
OpenMP API – Version 5.0 November 2018
1 Format
typedef void (*ompt_callback_dependences_t) ( ompt_data_t *task_data,
const ompt_dependence_t *deps,
int ndeps );
2 3 4 5 6
7 Trace Record
8
9 10 11 12
C / C++
C / C++
C / C++
C / C++
typedef struct ompt_record_dependences_t { ompt_id_t task_id;
ompt_dependence_t dep;
int ndeps;
} ompt_record_dependences_t;
13 Description of Arguments
14 The binding of the task_data argument is the generated task.
15 The deps argument lists dependences of the new task or the dependence vector of the ordered
16 construct.
17 The ndeps argument specifies the length of the list passed by the deps argument. The memory for
18 deps is owned by the caller; the tool cannot rely on the data after the callback returns.
19 The performance monitor interface for tracing activity on target devices provides one record per
20 dependence.
21 Cross References
22 • ordered construct, see Section 2.17.9 on page 250.
23 • depend clause, see Section 2.17.11 on page 255.
24 • ompt_data_t type, see Section 4.4.4.4 on page 440.
25 • ompt_dependence_t type, see Section 4.4.4.9 on page 442.
CHAPTER4. OMPTINTERFACE 469
1 4.5.2.9 ompt_callback_task_dependence_t
2 Summary
3 The ompt_callback_task_dependence_t type is used for callbacks that are dispatched
4 when unfulfilled task dependences are encountered.
5 Format
6 7 8 9
10 Trace Record
11 12 13 14
C / C++
C / C++
C / C++
C / C++
typedef void (*ompt_callback_task_dependence_t) (
ompt_data_t *src_task_data,
ompt_data_t *sink_task_data );
typedef struct ompt_record_task_dependence_t { ompt_id_t src_task_id;
ompt_id_t sink_task_id;
} ompt_record_task_dependence_t;
15 Description of Arguments
16 The binding of the src_task_data argument is a running task with an outgoing dependence.
17 The binding of the sink_task_data argument is a task with an unsatisfied incoming dependence.
18 Cross References
19 • depend clause, see Section 2.17.11 on page 255.
20 • ompt_data_t type, see Section 4.4.4.4 on page 440.
21 4.5.2.10 ompt_callback_task_schedule_t
22
23 24
Summary
The ompt_callback_task_schedule_t type is used for callbacks that are dispatched when task scheduling decisions are made.
470
OpenMP API – Version 5.0 November 2018
1 Format
typedef void (*ompt_callback_task_schedule_t) ( ompt_data_t *prior_task_data,
ompt_task_status_t prior_task_status,
ompt_data_t *next_task_data );
2 3 4 5 6
7 Trace Record
8
9 10 11 12
C / C++
C / C++
C / C++
C / C++
typedef struct ompt_record_task_schedule_t { ompt_id_t prior_task_id;
ompt_task_status_t prior_task_status;
ompt_id_t next_task_id;
} ompt_record_task_schedule_t;
13 Description of Arguments
14 The prior_task_status argument indicates the status of the task that arrived at a task scheduling
15 point.
16 The binding of the prior_task_data argument is the task that arrived at the scheduling point.
17 The binding of the next_task_data argument is the task that is resumed at the scheduling point.
18 This argument is NULL if the callback is dispatched for a task-fulfill event.
19 Cross References
20 • Task scheduling, see Section 2.10.6 on page 149.
21 • ompt_data_t type, see Section 4.4.4.4 on page 440.
22 • ompt_task_status_t type, see Section 4.4.4.19 on page 447.
23 4.5.2.11 ompt_callback_implicit_task_t
24 Summary
25 The ompt_callback_implicit_task_t type is used for callbacks that are dispatched when
26 initial tasks and implicit tasks are generated and completed.
CHAPTER4. OMPTINTERFACE 471
typedef void (*ompt_callback_implicit_task_t) ( ompt_scope_endpoint_t endpoint,
ompt_data_t *parallel_data,
ompt_data_t *task_data,
unsigned int actual_parallelism,
unsigned int index,
int flags );
2 3 4 5 6 7 8 9
10
11
12
13
14
15
16
17
18
19
20 21
22 23
24 25
26 27 28 29
30 31 32
33
Trace Record
C / C++
C / C++ C / C++
C / C++
1
Format
typedef struct ompt_record_implicit_task_t { ompt_scope_endpoint_t endpoint;
ompt_id_t parallel_id;
ompt_id_t task_id;
unsigned int actual_parallelism; unsigned int index;
int flags;
} ompt_record_implicit_task_t;
472
OpenMP API – Version 5.0 November 2018
Description of Arguments
The endpoint argument indicates that the callback signals the beginning of a scope or the end of a scope.
The binding of the parallel_data argument is the current parallel region. For the implicit-task-end event, this argument is NULL.
The binding of the task_data argument is the implicit task that executes the structured block of the parallel region.
The actual_parallelism argument indicates the number of threads in the parallel region or the number of teams in the teams region. For initial tasks, that are not closely nested in a teams construct, this argument is 1. For the implicit-task-end and the initial-task-end events, this argument is 0.
The index argument indicates the thread number or team number of the calling thread, within the team or league that is executing the parallel or teams region to which the implicit task region binds. For initial tasks, that are not created by a teams construct, this argument is 1.
The flags argument indicates the kind of the task (initial or implicit).
1 Cross References
2 • parallel construct, see Section 2.6 on page 74.
3 • teams construct, see Section 2.7 on page 82.
4 • ompt_data_t type, see Section 4.4.4.4 on page 440.
5 • ompt_scope_endpoint_t enumeration type, see Section 4.4.4.11 on page 443.
6 4.5.2.12 ompt_callback_master_t
7 Summary
8 The ompt_callback_master_t type is used for callbacks that are dispatched when master
9 regions start and end.
10 Format
11
12
13
14
15
16
17 Trace Record
18
19
20
21
22
23
C / C++
C / C++
C / C++
C / C++
typedef void (*ompt_callback_master_t) ( ompt_scope_endpoint_t endpoint, ompt_data_t *parallel_data,
ompt_data_t *task_data,
const void *codeptr_ra );
typedef struct ompt_record_master_t { ompt_scope_endpoint_t endpoint; ompt_id_t parallel_id;
ompt_id_t task_id;
const void *codeptr_ra; } ompt_record_master_t;
CHAPTER4. OMPTINTERFACE 473
1 Description of Arguments
2 The endpoint argument indicates that the callback signals the beginning of a scope or the end of a
3 scope.
4 The binding of the parallel_data argument is the current parallel region.
5 The binding of the task_data argument is the encountering task.
6 The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a
7 runtime routine implements the region associated with a callback that has type signature
8 ompt_callback_master_t then codeptr_ra contains the return address of the call to that
9 runtime routine. If the implementation of the region is inlined then codeptr_ra contains the return
10 address of the invocation of the callback. If attribution to source code is impossible or
11 inappropriate, codeptr_ra may be NULL.
12 Cross References
13 • master construct, see Section 2.16 on page 221.
14 • ompt_data_t type, see Section 4.4.4.4 on page 440.
15 • ompt_scope_endpoint_t type, see Section 4.4.4.11 on page 443.
16 4.5.2.13 ompt_callback_sync_region_t
17
18 19 20
21
22
23
24
25
26
27
28
Summary
The ompt_callback_sync_region_t type is used for callbacks that are dispatched when barrier regions, taskwait regions, and taskgroup regions begin and end and when waiting begins and ends for them as well as for when reductions are performed.
Format
C / C++
C / C++
typedef void (*ompt_callback_sync_region_t) ( ompt_sync_region_t kind, ompt_scope_endpoint_t endpoint,
ompt_data_t *parallel_data,
ompt_data_t *task_data,
const void *codeptr_ra );
474
OpenMP API – Version 5.0 November 2018
1 Trace Record
typedef struct ompt_record_sync_region_t { ompt_sync_region_t kind;
ompt_scope_endpoint_t endpoint; ompt_id_t parallel_id;
ompt_id_t task_id;
const void *codeptr_ra;
} ompt_record_sync_region_t;
2 3 4 5 6 7 8
C / C++
C / C++
9 Description of Arguments
10 The kind argument indicates the kind of synchronization.
11 The endpoint argument indicates that the callback signals the beginning of a scope or the end of a
12 scope.
13 The binding of the parallel_data argument is the current parallel region. For the barrier-end event
14 at the end of a parallel region this argument is NULL.
15 The binding of the task_data argument is the current task.
16 The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a
17 runtime routine implements the region associated with a callback that has type signature
18 ompt_callback_sync_region_t then codeptr_ra contains the return address of the call to
19 that runtime routine. If the implementation of the region is inlined then codeptr_ra contains the
20 return address of the invocation of the callback. If attribution to source code is impossible or
21 inappropriate, codeptr_ra may be NULL.
22 Cross References
23 • barrier construct, see Section 2.17.2 on page 226.
24 • Implicit barriers, see Section 2.17.3 on page 228.
25 • taskwait construct, see Section 2.17.5 on page 230.
26 • taskgroup construct, see Section 2.17.6 on page 232.
27 • Properties common to all reduction clauses, see Section 2.19.5.1 on page 294.
28 • ompt_data_t type, see Section 4.4.4.4 on page 440.
29 • ompt_scope_endpoint_t type, see Section 4.4.4.11 on page 443.
30 • ompt_sync_region_t type, see Section 4.4.4.13 on page 444.
CHAPTER4. OMPTINTERFACE 475
2
3 4 5
6
7 8 9
10 11 12 13
14
15
16
17
18
19
20
21
22 23
24 25 26
27 28
Summary
The ompt_callback_mutex_acquire_t type is used for callbacks that are dispatched when locks are initialized, acquired and tested and when critical regions, atomic regions, and ordered regions are begun.
1 4.5.2.14 ompt_callback_mutex_acquire_t
Format
typedef void (*ompt_callback_mutex_acquire_t) ( ompt_mutex_t kind,
unsigned int hint,
unsigned int impl,
ompt_wait_id_t wait_id, const void *codeptr_ra
);
Trace Record
C / C++
C / C++
C / C++
C / C++
typedef struct ompt_record_mutex_acquire_t { ompt_mutex_t kind;
unsigned int hint;
unsigned int impl;
ompt_wait_id_t wait_id; const void *codeptr_ra;
} ompt_record_mutex_acquire_t;
476
OpenMP API – Version 5.0 November 2018
Description of Arguments
The kind argument indicates the kind of the lock involved.
The hint argument indicates the hint that was provided when initializing an implementation of mutual exclusion. If no hint is available when a thread initiates acquisition of mutual exclusion, the runtime may supply omp_sync_hint_none as the value for hint.
The impl argument indicates the mechanism chosen by the runtime to implement the mutual exclusion.
1 The wait_id argument indicates the object being awaited.
2 The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a
3 runtime routine implements the region associated with a callback that has type signature
4 ompt_callback_mutex_acquire_t then codeptr_ra contains the return address of the call
5 to that runtime routine. If the implementation of the region is inlined then codeptr_ra contains the
6 return address of the invocation of the callback. If attribution to source code is impossible or
7 inappropriate, codeptr_ra may be NULL.
8 Cross References
9 • critical construct, see Section 2.17.1 on page 223.
10 • atomic construct, see Section 2.17.7 on page 234.
11 • ordered construct, see Section 2.17.9 on page 250.
12 • omp_init_lock and omp_init_nest_lock routines, see Section 3.3.1 on page 384.
13 • ompt_mutex_t type, see Section 4.4.4.16 on page 445.
14 • ompt_wait_id_t type, see Section 4.4.4.29 on page 456.
15 4.5.2.15 ompt_callback_mutex_t
16 Summary
17 The ompt_callback_mutex_t type is used for callbacks that indicate important
18 synchronization events.
19 Format
20 21 22 23 24
C / C++
C / C++
typedef void (*ompt_callback_mutex_t) ( ompt_mutex_t kind,
ompt_wait_id_t wait_id,
const void *codeptr_ra
);
CHAPTER4. OMPTINTERFACE 477
1
Trace Record
typedef struct ompt_record_mutex_t { ompt_mutex_t kind;
ompt_wait_id_t wait_id;
const void *codeptr_ra; } ompt_record_mutex_t;
2 3 4 5 6
7 8 9
10
11
12
13
14
15
16
17
18
19
20 21
22 23 24 25 26
C / C++
C / C++
478
OpenMP API – Version 5.0 November 2018
Description of Arguments
The kind argument indicates the kind of mutual exclusion event. The wait_id argument indicates the object being awaited.
The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a runtime routine implements the region associated with a callback that has type signature ompt_callback_mutex_t then codeptr_ra contains the return address of the call to that runtime routine. If the implementation of the region is inlined then codeptr_ra contains the return address of the invocation of the callback. If attribution to source code is impossible or inappropriate, codeptr_ra may be NULL.
Cross References
• critical construct, see Section 2.17.1 on page 223.
• atomic construct, see Section 2.17.7 on page 234.
• ordered construct, see Section 2.17.9 on page 250.
• omp_destroy_lock and omp_destroy_nest_lock routines, see Section 3.3.3 on page 387.
• omp_set_lock and omp_set_nest_lock routines, see Section 3.3.4 on page 388.
• omp_unset_lock and omp_unset_nest_lock routines, see Section 3.3.5 on page 390. • omp_test_lock and omp_test_nest_lock routines, see Section 3.3.6 on page 392.
• ompt_mutex_t type, see Section 4.4.4.16 on page 445.
• ompt_wait_id_t type, see Section 4.4.4.29 on page 456.
1 4.5.2.16 ompt_callback_nest_lock_t
2 Summary
3 The ompt_callback_nest_lock_t type is used for callbacks that indicate that a thread that
4 owns a nested lock has performed an action related to the lock but has not relinquished ownership
5 of it.
6 Format
7 8 9
10 11
12 Trace Record
13 14 15 16 17
C / C++
C / C++
C / C++
C / C++
typedef void (*ompt_callback_nest_lock_t) ( ompt_scope_endpoint_t endpoint, ompt_wait_id_t wait_id,
const void *codeptr_ra
);
typedef struct ompt_record_nest_lock_t {
ompt_scope_endpoint_t endpoint; ompt_wait_id_t wait_id;
const void *codeptr_ra;
} ompt_record_nest_lock_t;
18 Description of Arguments
19 The endpoint argument indicates that the callback signals the beginning of a scope or the end of a
20 scope.
21 The wait_id argument indicates the object being awaited.
22 The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a
23 runtime routine implements the region associated with a callback that has type signature
24 ompt_callback_nest_lock_t then codeptr_ra contains the return address of the call to that
25 runtime routine. If the implementation of the region is inlined then codeptr_ra contains the return
26 address of the invocation of the callback. If attribution to source code is impossible or
27 inappropriate, codeptr_ra may be NULL.
CHAPTER4. OMPTINTERFACE 479
8
9 10
11
12 13 14 15
16
17 18 19
20 21
22
23
24
25
26
27
Summary
The ompt_callback_flush_t type is used for callbacks that are dispatched when flush constructs are encountered.
1 Cross References
2 • omp_set_nest_lock routine, see Section 3.3.4 on page 388.
3 • omp_unset_nest_lock routine, see Section 3.3.5 on page 390.
4 • omp_test_nest_lock routine, see Section 3.3.6 on page 392.
5 • ompt_scope_endpoint_t type, see Section 4.4.4.11 on page 443.
6 • ompt_wait_id_t type, see Section 4.4.4.29 on page 456.
7 4.5.2.17 ompt_callback_flush_t
Format
typedef void (*ompt_callback_flush_t) ( ompt_data_t *thread_data,
const void *codeptr_ra
);
Trace Record
Description of Arguments
C / C++
C / C++ C / C++
C / C++
480
OpenMP API – Version 5.0 November 2018
typedef struct ompt_record_flush_t { const void *codeptr_ra;
} ompt_record_flush_t;
The binding of the thread_data argument is the executing thread.
The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a runtime routine implements the region associated with a callback that has type signature ompt_callback_flush_t then codeptr_ra contains the return address of the call to that runtime routine. If the implementation of the region is inlined then codeptr_ra contains the return address of the invocation of the callback. If attribution to source code is impossible or inappropriate, codeptr_ra may be NULL.
1 Cross References
2 • flush construct, see Section 2.17.8 on page 242.
3 • ompt_data_t type, see Section 4.4.4.4 on page 440.
4 4.5.2.18 ompt_callback_cancel_t
5 Summary
6 The ompt_callback_cancel_t type is used for callbacks that are dispatched for cancellation,
7 cancel and discarded-task events.
8 Format
9 10 11 12 13
14 Trace Record
15 16 17 18 19
20 Description of Arguments
C / C++
C / C++
C / C++
C / C++
typedef void (*ompt_callback_cancel_t) ( ompt_data_t *task_data,
int flags,
const void *codeptr_ra
);
typedef struct ompt_record_cancel_t {
ompt_id_t task_id;
int flags;
const void *codeptr_ra;
} ompt_record_cancel_t;
21 The binding of the task_data argument is the task that encounters a cancel construct, a
22 cancellation point construct, or a construct defined as having an implicit cancellation
23 point.
24 The flags argument, defined by the ompt_cancel_flag_t enumeration type, indicates whether
25 cancellation is activated by the current task, or detected as being activated by another task. The
26 construct that is being canceled is also described in the flags argument. When several constructs are
27 detected as being concurrently canceled, each corresponding bit in the argument will be set.
CHAPTER4. OMPTINTERFACE 481
10
11 12
13
14
15
16
17
18
19
20
21
22 23 24 25
Summary
The ompt_callback_device_initialize_t type is used for callbacks that initialize device tracing interfaces.
1 The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a
2 runtime routine implements the region associated with a callback that has type signature
3 ompt_callback_cancel_t then codeptr_ra contains the return address of the call to that
4 runtime routine. If the implementation of the region is inlined then codeptr_ra contains the return
5 address of the invocation of the callback. If attribution to source code is impossible or
6 inappropriate, codeptr_ra may be NULL.
7 Cross References
8 • omp_cancel_flag_t enumeration type, see Section 4.4.4.24 on page 450.
9 4.5.2.19 ompt_callback_device_initialize_t
Format
C / C++
C / C++
typedef void (*ompt_callback_device_initialize_t) ( int device_num,
const char *type,
ompt_device_t *device, ompt_function_lookup_t lookup, const char *documentation
);
482
OpenMP API – Version 5.0 November 2018
Description
Registration of a callback with type signature ompt_callback_device_initialize_t for the ompt_callback_device_initialize event enables asynchronous collection of a trace for a device. The OpenMP implementation invokes this callback after OpenMP is initialized for the device but before execution of any OpenMP construct is started on the device.
1 Description of Arguments
2 The device_num argument identifies the logical device that is being initialized.
3 The type argument is a character string that indicates the type of the device. A device type string is
4 a semicolon separated character string that includes at a minimum the vendor and model name of
5 the device. These names may be followed by a semicolon-separated sequence of properties that
6 describe the hardware or software of the device.
7 The device argument is a pointer to an opaque object that represents the target device instance.
8 Functions in the device tracing interface use this pointer to identify the device that is being
9 addressed.
10 The lookup argument points to a runtime callback that a tool must use to obtain pointers to runtime
11 entry points in the device’s OMPT tracing interface. If a device does not support tracing then
12 lookup is NULL.
13 The documentation argument is a string that describes how to use any device-specific runtime entry
14 points that can be obtained through the lookup argument. This documentation string may be a
15 pointer to external documentation, or it may be inline descriptions that include names and type
16 signatures for any device-specific interfaces that are available through the lookup argument along
17 with descriptions of how to use these interface functions to control monitoring and analysis of
18 device traces.
19 Constraints on Arguments
20 The type and documentation arguments must be immutable strings that are defined for the lifetime
21 of a program execution.
22 Effect
23 A device initializer must fulfill several duties. First, the type argument should be used to determine
24 if any special knowledge about the hardware and/or software of a device is employed. Second, the
25 lookup argument should be used to look up pointers to runtime entry points in the OMPT tracing
26 interface for the device. Finally, these runtime entry points should be used to set up tracing for the
27 device.
28 Initialization of tracing for a target device is described in Section 4.2.5 on page 427.
29 Cross References
30 • ompt_function_lookup_t type, see Section 4.6.3 on page 531.
CHAPTER4. OMPTINTERFACE 483
1 4.5.2.20 ompt_callback_device_finalize_t
2 Summary
3 The ompt_callback_device_initialize_t type is used for callbacks that finalize device
4 tracing interfaces.
5 Format
6 7 8
C / C++
C / C++
typedef void (*ompt_callback_device_finalize_t) (
int device_num );
9 Description of Arguments
10 The device_num argument identifies the logical device that is being finalized.
11 Description
12 A registered callback with type signature ompt_callback_device_finalize_t is
13 dispatched for a device immediately prior to finalizing the device. Prior to dispatching a finalization
14 callback for a device on which tracing is active, the OpenMP implementation stops tracing on the
15 device and synchronously flushes all trace records for the device that have not yet been reported.
16 These trace records are flushed through one or more buffer completion callbacks with type
17 signature ompt_callback_buffer_complete_t as needed prior to the dispatch of the
18 callback with type signature ompt_callback_device_finalize_t.
19 Cross References
20 • ompt_callback_buffer_complete_t callback type, see Section 4.5.2.24 on page 487.
21 4.5.2.21 ompt_callback_device_load_t
22
23 24
Summary
The ompt_callback_device_load_t type is used for callbacks that the OpenMP runtime invokes to indicate that it has just loaded code onto the specified device.
484
OpenMP API – Version 5.0 November 2018
1 Format
C / C++
typedef void (*ompt_callback_device_load_t) ( int device_num,
const char *filename, int64_t offset_in_file, void *vma_in_file, size_t bytes,
void *host_addr, void *device_addr,
uint64_t module_id );
2 3 4 5 6 7 8 9
10 11
C / C++
12 Description of Arguments
13 The device_num argument specifies the device.
14 The filename argument indicates the name of a file in which the device code can be found. A NULL
15 filename indicates that the code is not available in a file in the file system.
16 The offset_in_file argument indicates an offset into filename at which the code can be found. A
17 value of -1 indicates that no offset is provided.
18 ompt_addr_none is defined as a pointer with the value ~0.
19 The vma_in_file argument indicates an virtual address in filename at which the code can be found.
20 A value of ompt_addr_none indicates that a virtual address in the file is not available.
21 The bytes argument indicates the size of the device code object in bytes.
22 The host_addr argument indicates the address at which a copy of the device code is available in
23 host memory. A value of ompt_addr_none indicates that a host code address is not available.
24 The device_addr argument indicates the address at which the device code has been loaded in device
25 memory. A value of ompt_addr_none indicates that a device code address is not available.
26 The module_id argument is an identifier that is associated with the device code object.
27 Cross References
28 • Device directives, see Section 2.12 on page 160.
CHAPTER4. OMPTINTERFACE 485
1 4.5.2.22 ompt_callback_device_unload_t
2 Summary
3 The ompt_callback_device_unload_t type is used for callbacks that the OpenMP
4 runtime invokes to indicate that it is about to unload code from the specified device.
5 Format
6 7 8 9
C / C++
C / C++
typedef void (*ompt_callback_device_unload_t) (
int device_num,
uint64_t module_id );
10 Description of Arguments
11 The device_num argument specifies the device.
12 The module_id argument is an identifier that is associated with the device code object.
13 Cross References
14 • Device directives, see Section 2.12 on page 160.
15 4.5.2.23 ompt_callback_buffer_request_t
16
17 18
19
20 21 22 23 24
Summary
The ompt_callback_buffer_request_t type is used for callbacks that are dispatched when a buffer to store event records for a device is requested.
Format
C / C++
C / C++
typedef void (*ompt_callback_buffer_request_t) ( int device_num,
ompt_buffer_t **buffer,
size_t *bytes
);
486
OpenMP API – Version 5.0 November 2018
1 Description
2 A callback with type signature ompt_callback_buffer_request_t requests a buffer to
3 store trace records for the specified device. A buffer request callback may set *bytes to 0 if it does
4 not provide a buffer. If a callback sets *bytes to 0, further recording of events for the device is
5 disabled until the next invocation of ompt_start_trace. This action causes the device to drop
6 future trace records until recording is restarted.
7 Description of Arguments
8 The device_num argument specifies the device.
9 The *buffer argument points to a buffer where device events may be recorded. The *bytes argument
10 indicates the length of that buffer.
11 Cross References
12 • ompt_buffer_t type, see Section 4.4.4.7 on page 441.
13 4.5.2.24 ompt_callback_buffer_complete_t
14 Summary
15 The ompt_callback_buffer_complete_t type is used for callbacks that are dispatched
16 when devices will not record any more trace records in an event buffer and all records written to the
17 buffer are valid.
18 Format
19
20
21
22
23
24
25
C / C++
C / C++
typedef void (*ompt_callback_buffer_complete_t) ( int device_num,
ompt_buffer_t *buffer,
size_t bytes,
ompt_buffer_cursor_t begin,
int buffer_owned );
CHAPTER4. OMPTINTERFACE 487
1 Description
2 A callback with type signature ompt_callback_buffer_complete_t provides a buffer that
3 contains trace records for the specified device. Typically, a tool will iterate through the records in
4 the buffer and process them.
5 The OpenMP implementation makes these callbacks on a thread that is not an OpenMP master or
6 worker thread.
7 The callee may not delete the buffer if the buffer_owned argument is 0.
8 The buffer completion callback is not required to be async signal safe.
9 Description of Arguments
10 The device_num argument indicates the device which the buffer contains events.
11 The buffer argument is the address of a buffer that was previously allocated by a buffer request
12 callback.
13 The bytes argument indicates the full size of the buffer.
14 The begin argument is an opaque cursor that indicates the position of the beginning of the first
15 record in the buffer.
16 The buffer_owned argument is 1 if the data to which the buffer points can be deleted by the callback
17 and 0 otherwise. If multiple devices accumulate trace events into a single buffer, this callback may
18 be invoked with a pointer to one or more trace records in a shared buffer with buffer_owned = 0. In
19 this case, the callback may not delete the buffer.
20 Cross References
21 • ompt_buffer_t type, see Section 4.4.4.7 on page 441.
22 • ompt_buffer_cursor_t type, see Section 4.4.4.8 on page 442.
23 4.5.2.25 ompt_callback_target_data_op_t
24
25 26
Summary
The ompt_callback_target_data_op_t type is used for callbacks that are dispatched when a thread maps data to a device.
488
OpenMP API – Version 5.0 November 2018
1 Format
C / C++
typedef void (*ompt_callback_target_data_op_t) ( ompt_id_t target_id,
ompt_id_t host_op_id, ompt_target_data_op_t optype, void *src_addr,
int src_device_num,
void *dest_addr,
int dest_device_num,
size_t bytes,
const void *codeptr_ra );
2 3 4 5 6 7 8 9
10 11 12
13 Trace Record
14
15
16
17
18
19
20
21
22
23
24
C / C++
C / C++
typedef struct ompt_record_target_data_op_t {
ompt_id_t host_op_id; ompt_target_data_op_t optype; void *src_addr;
int src_device_num;
void *dest_addr;
int dest_device_num;
size_t bytes; ompt_device_time_t end_time; const void *codeptr_ra;
} ompt_record_target_data_op_t;
C / C++
25 Description
26 A registered ompt_callback_target_data_op callback is dispatched when device memory
27 is allocated or freed, as well as when data is copied to or from a device.
28
29 Note – An OpenMP implementation may aggregate program variables and data operations upon
30 them. For instance, an OpenMP implementation may synthesize a composite to represent multiple
31 scalars and then allocate, free, or copy this composite as a whole rather than performing data
32 operations on each scalar individually. Thus, callbacks may not be dispatched as separate data
33 operations on each variable.
34
CHAPTER4. OMPTINTERFACE 489
1 Description of Arguments
2 The host_op_id argument is a unique identifier for a data operations on a target device.
3 The optype argument indicates the kind of data mapping.
4 The src_addr argument indicates the data address before the operation, where applicable.
5 The src_device_num argument indicates the source device number for the data operation, where
6 applicable.
7 The dest_addr argument indicates the data address after the operation.
8 The dest_device_num argument indicates the destination device number for the data operation.
9 It is implementation defined whether in some operations src_addr or dest_addr may point to an
10 intermediate buffer.
11 The bytes argument indicates the size of data.
12 The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a
13 runtime routine implements the region associated with a callback that has type signature
14 ompt_callback_target_data_op_t then codeptr_ra contains the return address of the call
15 to that runtime routine. If the implementation of the region is inlined then codeptr_ra contains the
16 return address of the invocation of the callback. If attribution to source code is impossible or
17 inappropriate, codeptr_ra may be NULL.
18 Cross References
19 • map clause, see Section 2.19.7.1 on page 315.
20 • ompt_id_t type, see Section 4.4.4.3 on page 439.
21 • ompt_target_data_op_t type, see Section 4.4.4.14 on page 444.
22 4.5.2.26 ompt_callback_target_t
23
24 25
Summary
The ompt_callback_target_t type is used for callbacks that are dispatched when a thread begins to execute a device construct.
490
OpenMP API – Version 5.0 November 2018
1 Format
C / C++
typedef void (*ompt_callback_target_t) ( ompt_target_t kind,
ompt_scope_endpoint_t endpoint, int device_num,
ompt_data_t *task_data, ompt_id_t target_id,
const void *codeptr_ra );
2 3 4 5 6 7 8 9
10 Trace Record
11
12
13
14
15
16
17
18
C / C++
C / C++
typedef struct ompt_record_target_t { ompt_target_t kind; ompt_scope_endpoint_t endpoint;
int device_num;
ompt_id_t task_id; ompt_id_t target_id;
const void *codeptr_ra; } ompt_record_target_t;
C / C++
19 Description of Arguments
20 The kind argument indicates the kind of target region.
21 The endpoint argument indicates that the callback signals the beginning of a scope or the end of a
22 scope.
23 The device_num argument indicates the id of the device that will execute the target region.
24 The binding of the task_data argument is the generating task.
25 The binding of the target_id argument is the target region.
26 The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a
27 runtime routine implements the region associated with a callback that has type signature
28 ompt_callback_target_t then codeptr_ra contains the return address of the call to that
29 runtime routine. If the implementation of the region is inlined then codeptr_ra contains the return
30 address of the invocation of the callback. If attribution to source code is impossible or
31 inappropriate, codeptr_ra may be NULL.
CHAPTER4. OMPTINTERFACE 491
1 Cross References
2 • target data construct, see Section 2.12.2 on page 161.
3 • target enter data construct, see Section 2.12.3 on page 164.
4 • target exit data construct, see Section 2.12.4 on page 166.
5 • target construct, see Section 2.12.5 on page 170.
6 • target update construct, see Section 2.12.6 on page 176.
7 • ompt_id_t type, see Section 4.4.4.3 on page 439.
8 • ompt_data_t type, see Section 4.4.4.4 on page 440.
9 • ompt_scope_endpoint_t type, see Section 4.4.4.11 on page 443.
10 • ompt_target_t type, see Section 4.4.4.20 on page 448.
11 4.5.2.27 ompt_callback_target_map_t
12
13 14
15
16
17
18
19
20
21
22
23
24
Summary
The ompt_callback_target_map_t type is used for callbacks that are dispatched to indicate data mapping relationships.
Format
C / C++
typedef void (*ompt_callback_target_map_t) ( ompt_id_t target_id,
unsigned int nitems,
void **host_addr,
void **device_addr,
size_t *bytes,
unsigned int *mapping_flags, const void *codeptr_ra
);
C / C++
492
OpenMP API – Version 5.0 November 2018
1 Trace Record
C / C++
typedef struct ompt_record_target_map_t { ompt_id_t target_id;
unsigned int nitems;
void **host_addr;
void **device_addr; size_t *bytes;
unsigned int *mapping_flags;
const void *codeptr_ra;
} ompt_record_target_map_t;
2 3 4 5 6 7 8 9
10
11 Description
C / C++
12 An instance of a target, target
13 construct may contain one or more map clauses. An OpenMP implementation may report the set of
14 mappings associated with map clauses for a construct with a single
15 ompt_callback_target_map callback to report the effect of all mappings or multiple
16 ompt_callback_target_map callbacks with each reporting a subset of the mappings.
17 Furthermore, an OpenMP implementation may omit mappings that it determines are unnecessary.
18 If an OpenMP implementation issues multiple ompt_callback_target_map callbacks, these
19 callbacks may be interleaved with ompt_callback_target_data_op callbacks used to
20 report data operations associated with the mappings.
21 Description of Arguments
22 The binding of the target_id argument is the target region.
23 The nitems argument indicates the number of data mappings that this callback reports.
24 The host_addr argument indicates an array of host data addresses.
25 The device_addr argument indicates an array of device data addresses.
26 The bytes argument indicates an array of size of data.
27 The mapping_flags argument indicates the kind of data mapping. Flags for a mapping include one
28 or more values specified by the ompt_target_map_flag_t type.
29 The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a
30 runtime routine implements the region associated with a callback that has type signature
31 ompt_callback_target_map_t then codeptr_ra contains the return address of the call to
32 that runtime routine. If the implementation of the region is inlined then codeptr_ra contains the
33 return address of the invocation of the callback. If attribution to source code is impossible or
34 inappropriate, codeptr_ra may be NULL.
data, target
enter
data, or target
exit data
CHAPTER4. OMPTINTERFACE 493
10
11 12
13
14 15 16 17 18
19
20
21
22
23
24
25
Summary
The ompt_callback_target_submit_t type is used for callbacks that are dispatched when an initial task is created on a device.
1 Cross References
2 • target data construct, see Section 2.12.2 on page 161.
3 • target enter data construct, see Section 2.12.3 on page 164.
4 • target exit data construct, see Section 2.12.4 on page 166.
5 • target construct, see Section 2.12.5 on page 170.
6 • ompt_id_t type, see Section 4.4.4.3 on page 439.
7 • ompt_target_map_flag_t type, see Section 4.4.4.22 on page 449.
8 • ompt_callback_target_data_op_t callback type, see Section 4.5.2.25 on page 488.
9 4.5.2.28 ompt_callback_target_submit_t
Format
typedef void (*ompt_callback_target_submit_t) ( ompt_id_t target_id,
ompt_id_t host_op_id,
unsigned int requested_num_teams );
Trace Record
C / C++
C / C++
C / C++
C / C++
typedef struct ompt_record_target_kernel_t { ompt_id_t host_op_id;
unsigned int requested_num_teams;
unsigned int granted_num_teams;
ompt_device_time_t end_time;
} ompt_record_target_kernel_t;
494
OpenMP API – Version 5.0 November 2018
1 Description
2 A thread dispatches a registered ompt_callback_target_submit callback on the host when
3 a target task creates an initial task on a target device.
4 Description of Arguments
5 The target_id argument is a unique identifier for the associated target region.
6 The host_op_id argument is a unique identifier for the initial task on the target device.
7 The requested_num_teams argument is the number of teams that the host requested to execute the
8 kernel. The actual number of teams that execute the kernel may be smaller and generally will not be
9 known until the kernel begins to execute on the device.
10 If ompt_set_trace_ompt has configured the device to trace kernel execution then the device
11 will log a ompt_record_target_kernel_t record in a trace. The fields in the record are as
12 follows:
13 • 14
15
16 • 17
18 • 19
20 • 21
22 • 23
The host_op_id field contains a unique identifier that can be used to correlate a ompt_record_target_kernel_t record with its associated ompt_callback_target_submit callback on the host;
The requested_num_teams field contains the number of teams that the host requested to execute the kernel;
The granted_num_teams field contains the number of teams that the device actually used to execute the kernel;
The time when the initial task began execution on the device is recorded in the time field of an enclosing ompt_record_t structure; and
The time when the initial task completed execution on the device is recorded in the end_time field.
24 Cross References
25 • target construct, see Section 2.12.5 on page 170.
26 • ompt_id_t type, see Section 4.4.4.3 on page 439.
27 4.5.2.29 ompt_callback_control_tool_t
28 Summary
29 The ompt_callback_control_tool_t type is used for callbacks that dispatch tool-control
30 events.
CHAPTER4. OMPTINTERFACE 495
typedef int (*ompt_callback_control_tool_t) ( uint64_t command,
uint64_t modifier,
void *arg,
const void *codeptr_ra
);
2 3 4 5 6 7
8
9 10 11 12 13
14
15 16 17
18
19 20
21
22 23
24 25
26
27
28
29
30
31
Trace Record
C / C++
C / C++
C / C++
C / C++
1
Format
typedef struct ompt_record_control_tool_t { uint64_t command;
uint64_t modifier;
const void *codeptr_ra;
} ompt_record_control_tool_t;
496
OpenMP API – Version 5.0 November 2018
Description
Callbacks with type signature ompt_callback_control_tool_t may return any non-negative value, which will be returned to the application as the return value of the omp_control_tool call that triggered the callback.
Description of Arguments
The command argument passes a command from an application to a tool. Standard values for command are defined by omp_control_tool_t in Section 3.8 on page 415.
The modifier argument passes a command modifier from an application to a tool.
The command and modifier arguments may have tool-specific values. Tools must ignore command
values that they are not designed to handle.
The arg argument is a void pointer that enables a tool and an application to exchange arbitrary state.
The arg argument may be NULL.
The codeptr_ra argument relates the implementation of an OpenMP region to its source code. If a runtime routine implements the region associated with a callback that has type signature ompt_callback_control_tool_t then codeptr_ra contains the return address of the call to that runtime routine. If the implementation of the region is inlined then codeptr_ra contains the return address of the invocation of the callback. If attribution to source code is impossible or inappropriate, codeptr_ra may be NULL.
1 2
3 4
5 4.6
6 7 8 9
10 11 12
13 14 15 16
17 18
19 20
21 22
23
24 4.6.1
25 26 27 28
Constraints on Arguments
Tool-specific values for command must be ≥ 64. Cross References
• omp_control_tool_t enumeration type, see Section 3.8 on page 415.
OMPT Runtime Entry Points for Tools
OMPT supports two principal sets of runtime entry points for tools. One set of runtime entry points enables a tool to register callbacks for OpenMP events and to inspect the state of an OpenMP thread while executing in a tool callback or a signal handler. The second set of runtime entry points enables a tool to trace activities on a device. When directed by the tracing interface, an OpenMP implementation will trace activities on a device, collect buffers of trace records, and invoke callbacks on the host to process these records. OMPT runtime entry points should not be global symbols since tools cannot rely on the visibility of such symbols.
OMPT also supports runtime entry points for two classes of lookup routines. The first class of lookup routines contains a single member: a routine that returns runtime entry points in the OMPT callback interface. The second class of lookup routines includes a unique lookup routine for each kind of device that can return runtime entry points in a device’s OMPT tracing interface.
The C/C++ header file (omp-tools.h) provides the definitions of the types that are specified throughout this subsection.
Restrictions
OMPT runtime entry points have the following restrictions:
• OMPT runtime entry points must not be called from a signal handler on a native thread before a
native-thread-begin or after a native-thread-end event.
• OMPT device runtime entry points must not be called after a device-finalize event for that device.
Entry Points in the OMPT Callback Interface
Entry points in the OMPT callback interface enable a tool to register callbacks for OpenMP events and to inspect the state of an OpenMP thread while executing in a tool callback or a signal handler. Pointers to these runtime entry points are obtained through the lookup function that is provided through the OMPT initializer.
CHAPTER4. OMPTINTERFACE 497
2
3 4 5
6
7 8 9
10 11
12
13 14 15 16 17
18 19 20
21 22 23
24
25 26 27 28
29 30
31 32
Summary
The ompt_enumerate_states_t type is the type signature of the ompt_enumerate_states runtime entry point, which enumerates the thread states that an OpenMP implementation supports.
1 4.6.1.1 ompt_enumerate_states_t
Format
C / C++
C / C++
typedef int (*ompt_enumerate_states_t) ( int current_state,
int *next_state,
const char **next_state_name );
498
OpenMP API – Version 5.0 November 2018
Description
An OpenMP implementation may support only a subset of the states defined by the ompt_state_t enumeration type. An OpenMP implementation may also support implementation-specific states. The ompt_enumerate_states runtime entry point, which has type signature ompt_enumerate_states_t, enables a tool to enumerate the supported thread states.
When a supported thread state is passed as current_state, the runtime entry point assigns the next thread state in the enumeration to the variable passed by reference in next_state and assigns the name associated with that state to the character pointer passed by reference in next_state_name.
Whenever one or more states are left in the enumeration, the ompt_enumerate_states runtime entry point returns 1. When the last state in the enumeration is passed as current_state, ompt_enumerate_states returns 0, which indicates that the enumeration is complete.
Description of Arguments
The current_state argument must be a thread state that the OpenMP implementation supports. To begin enumerating the supported states, a tool should pass ompt_state_undefined as current_state. Subsequent invocations of ompt_enumerate_states should pass the value assigned to the variable passed by reference in next_state to the previous call.
The value ompt_state_undefined is reserved to indicate an invalid thread state. ompt_state_undefined is defined as an integer with the value 0.
The next_state argument is a pointer to an integer in which ompt_enumerate_states returns the value of the next state in the enumeration.
1 The next_state_name argument is a pointer to a character string pointer through which
2 ompt_enumerate_states returns a string that describes the next state.
3 Constraints on Arguments
4 Any string returned through the next_state_name argument must be immutable and defined for the
5 lifetime of a program execution.
6 Cross References
7 • ompt_state_t type, see Section 4.4.4.26 on page 452.
8 4.6.1.2 ompt_enumerate_mutex_impls_t
9 Summary
10 The ompt_enumerate_mutex_impls_t type is the type signature of the
11 ompt_enumerate_mutex_impls runtime entry point, which enumerates the kinds of mutual
12 exclusion implementations that an OpenMP implementation employs.
13 Format
14 15 16 17 18
19 Description
C / C++
C / C++
typedef int (*ompt_enumerate_mutex_impls_t) ( int current_impl,
int *next_impl,
const char **next_impl_name
);
20 Mutual exclusion for locks, critical sections, and atomic regions may be implemented in
21 several ways. The ompt_enumerate_mutex_impls runtime entry point, which has type
22 signature ompt_enumerate_mutex_impls_t, enables a tool to enumerate the supported
23 mutual exclusion implementations.
24 When a supported mutex implementation is passed as current_impl, the runtime entry point assigns
25 the next mutex implementation in the enumeration to the variable passed by reference in next_impl
26 and assigns the name associated with that mutex implementation to the character pointer passed by
27 reference in next_impl_name.
CHAPTER4. OMPTINTERFACE 499
1 Whenever one or more mutex implementations are left in the enumeration, the
2 ompt_enumerate_mutex_impls runtime entry point returns 1. When the last mutex
3 implementation in the enumeration is passed as current_impl, the runtime entry point returns 0,
4 which indicates that the enumeration is complete.
5 Description of Arguments
6 The current_impl argument must be a mutex implementation that an OpenMP implementation
7 supports. To begin enumerating the supported mutex implementations, a tool should pass
8 ompt_mutex_impl_none as current_impl. Subsequent invocations of
9 ompt_enumerate_mutex_impls should pass the value assigned to the variable passed in
10 next_impl to the previous call.
11 The value ompt_mutex_impl_none is reserved to indicate an invalid mutex implementation.
12 ompt_mutex_impl_none is defined as an integer with the value 0.
13 The next_impl argument is a pointer to an integer in which ompt_enumerate_mutex_impls
14 returns the value of the next mutex implementation in the enumeration.
15 The next_impl_name argument is a pointer to a character string pointer in which
16 ompt_enumerate_mutex_impls returns a string that describes the next mutex
17 implementation.
18 Constraints on Arguments
19 Any string returned through the next_impl_name argument must be immutable and defined for the
20 lifetime of a program execution.
21 Cross References
22 • ompt_mutex_t type, see Section 4.4.4.16 on page 445.
23 4.6.1.3 ompt_set_callback_t
24
25 26 27
Summary
The ompt_set_callback_t type is the type signature of the ompt_set_callback runtime entry point, which registers a pointer to a tool callback that an OpenMP implementation invokes when a host OpenMP event occurs.
500
OpenMP API – Version 5.0 November 2018
1 Format
typedef ompt_set_result_t (*ompt_set_callback_t) ( ompt_callbacks_t event,
ompt_callback_t callback );
2 3 4 5
6 Description
C / C++
C / C++
7 OpenMP implementations can use callbacks to indicate the occurrence of events during the
8 execution of an OpenMP program. The ompt_set_callback runtime entry point, which has
9 type signature ompt_set_callback_t, registers a callback for an OpenMP event on the
10 current device, The return value of ompt_set_callback indicates the outcome of registering
11 the callback.
12 Description of Arguments
13 The event argument indicates the event for which the callback is being registered.
14 The callback argument is a tool callback function. If callback is NULL then callbacks associated
15 with event are disabled. If callbacks are successfully disabled then ompt_set_always is
16 returned.
17 Constraints on Arguments
18 When a tool registers a callback for an event, the type signature for the callback must match the
19 type signature appropriate for the event.
20 Restrictions
21 The ompt_set_callback runtime entry point has the following restriction:
22 • The entry point must not return ompt_set_impossible.
23 Cross References
24 • Monitoring activity on the host with OMPT, see Section 4.2.4 on page 425.
25 • ompt_callbacks_t enumeration type, see Section 4.4.2 on page 434.
26 • ompt_callback_t type, see Section 4.4.4.1 on page 438.
27 • ompt_set_result_t type, see Section 4.4.4.2 on page 438.
28 • ompt_get_callback_t host callback type signature, see Section 4.6.1.4 on page 502.
CHAPTER4. OMPTINTERFACE 501
2
3 4 5
6
7 8 9
10
11
12
13
14
15
16
17
18
19 20 21
22 23
24 25 26 27
Summary
The ompt_get_callback_t type is the type signature of the ompt_get_callback runtime entry point, which retrieves a pointer to a registered tool callback routine (if any) that an OpenMP implementation invokes when a host OpenMP event occurs.
1 4.6.1.4 ompt_get_callback_t
Format
C / C++
C / C++
typedef int (*ompt_get_callback_t) ( ompt_callbacks_t event, ompt_callback_t *callback
);
502
OpenMP API – Version 5.0 November 2018
Description
The ompt_get_callback runtime entry point, which has type signature ompt_get_callback_t, retrieves a pointer to the tool callback that an OpenMP implementation may invoke when a host OpenMP event occurs. If a non-null tool callback is registered for the specified event, the pointer to the tool callback is assigned to the variable passed by reference in callback and ompt_get_callback returns 1; otherwise, it returns 0. If ompt_get_callback returns 0, the value of the variable passed by reference as callback is undefined.
Description of Arguments
The event argument indicates the event for which the callback would be invoked. The callback argument returns a pointer to the callback associated with event.
Constraints on Arguments
The callback argument must be a reference to a variable of specified type.
Cross References
• ompt_callbacks_t enumeration type, see Section 4.4.2 on page 434.
• ompt_callback_t type, see Section 4.4.4.1 on page 438.
• ompt_set_callback_t type signature, see Section 4.6.1.3 on page 500.
1 4.6.1.5 ompt_get_thread_data_t
2 Summary
3 The ompt_get_thread_data_t type is the type signature of the
4 ompt_get_thread_data runtime entry point, which returns the address of the thread data
5 object for the current thread.
6 Format
7 typedef ompt_data_t *(*ompt_get_thread_data_t) (void);
C / C++ C / C++
8 Binding
9 The binding thread for the ompt_get_thread_data runtime entry point is the current thread.
10 Description
11 Each OpenMP thread can have an associated thread data object of type ompt_data_t. The
12 ompt_get_thread_data runtime entry point, which has type signature
13 ompt_get_thread_data_t, retrieves a pointer to the thread data object, if any, that is
14 associated with the current thread. A tool may use a pointer to an OpenMP thread’s data object that
15 ompt_get_thread_data retrieves to inspect or to modify the value of the data object. When
16 an OpenMP thread is created, its data object is initialized with value ompt_data_none.
17 This runtime entry point is async signal safe.
18 Cross References
19 • ompt_data_t type, see Section 4.4.4.4 on page 440.
20 4.6.1.6 ompt_get_num_procs_t
21 Summary
22 The ompt_get_num_procs_t type is the type signature of the ompt_get_num_procs
23 runtime entry point, which returns the number of processors currently available to the execution
24 environment on the host device.
CHAPTER4. OMPTINTERFACE 503
1 Format
2 typedef int (*ompt_get_num_procs_t) (void);
C / C++ C / C++
3 Binding
4 The binding thread set for the ompt_get_num_procs runtime entry point is all threads on the
5 host device.
6 Description
7 The ompt_get_num_procs runtime entry point, which has type signature
8 ompt_get_num_procs_t, returns the number of processors that are available on the host
9 device at the time the routine is called. This value may change between the time that it is
10 determined and the time that it is read in the calling context due to system actions outside the
11 control of the OpenMP implementation.
12 This runtime entry point is async signal safe.
13 4.6.1.7 ompt_get_num_places_t
14
15 16 17
18 19
20
21 22
Summary
The ompt_get_num_places_t type is the type signature of the ompt_get_num_places runtime entry point, which returns the number of places currently available to the execution environment in the place list.
Format
C / C++
typedef int (*ompt_get_num_places_t) (void);
C / C++
Binding
The binding thread set for the ompt_get_num_places runtime entry point is all threads on a device.
504
OpenMP API – Version 5.0 November 2018
1 Description
2 The ompt_get_num_places runtime entry point, which has type signature
3 ompt_get_num_places_t, returns the number of places in the place list. This value is
4 equivalent to the number of places in the place-partition-var ICV in the execution environment of
5 the initial task.
6 This runtime entry point is async signal safe.
7 Cross References
8 • place-partition-var ICV, see Section 2.5 on page 63.
9 • OMP_PLACES environment variable, see Section 6.5 on page 605.
10 4.6.1.8 ompt_get_place_proc_ids_t
11 Summary
12 The ompt_get_place_procs_ids_t type is the type signature of the
13 ompt_get_num_place_procs_ids runtime entry point, which returns the numerical
14 identifiers of the processors that are available to the execution environment in the specified place.
15 Format
16 17 18 19 20
C / C++
C / C++
typedef int (*ompt_get_place_proc_ids_t) ( int place_num,
int ids_size,
int *ids );
21 Binding
22 The binding thread set for the ompt_get_place_proc_ids runtime entry point is all threads
23 on a device.
24 Description
25 The ompt_get_place_proc_ids runtime entry point, which has type signature
26 ompt_get_place_proc_ids_t, returns the numerical identifiers of each processor that is
27 associated with the specified place. These numerical identifiers are non-negative and their meaning
28 is implementation defined.
CHAPTER4. OMPTINTERFACE 505
1 Description of Arguments
2 The place_num argument specifies the place that is being queried.
3 The ids argument is an array in which the routine can return a vector of processor identifiers in the
4 specified place.
5 The ids_size argument indicates the size of the result array that is specified by ids.
6 Effect
7 If the ids array of size ids_size is large enough to contain all identifiers then they are returned in ids
8 and their order in the array is implementation defined. Otherwise, if the ids array is too small the
9 values in ids when the function returns are unspecified. The routine always returns the number of
10 numerical identifiers of the processors that are available to the execution environment in the
11 specified place.
12 4.6.1.9 ompt_get_place_num_t
13
14 15 16
17 18
19 20
21
22 23 24 25
26
Summary
The ompt_get_place_num_t type is the type signature of the ompt_get_place_num runtime entry point, which returns the place number of the place to which the current thread is bound.
Format
C / C++
typedef int (*ompt_get_place_num_t) (void);
C / C++
Binding
The binding thread set of the ompt_get_place_num runtime entry point is the current thread. Description
When the current thread is bound to a place, ompt_get_place_num returns the place number associated with the thread. The returned value is between 0 and one less than the value returned by ompt_get_num_places, inclusive. When the current thread is not bound to a place, the routine returns -1.
This runtime entry point is async signal safe.
506
OpenMP API – Version 5.0 November 2018
1 4.6.1.10 ompt_get_partition_place_nums_t
2 Summary
3 The ompt_get_partition_place_nums_t type is the type signature of the
4 ompt_get_partition_place_nums runtime entry point, which returns a list of place
5 numbers that correspond to the places in the place-partition-var ICV of the innermost implicit task.
6 Format
7 8 9
10
C / C++
C / C++
typedef int (*ompt_get_partition_place_nums_t) ( int place_nums_size,
int *place_nums
);
11 Binding
12 The binding task set for the ompt_get_partition_place_nums runtime entry point is the
13 current implicit task.
14 Description
15 The ompt_get_partition_place_nums runtime entry point, which has type signature
16 ompt_get_partition_place_nums_t, returns a list of place numbers that correspond to
17 the places in the place-partition-var ICV of the innermost implicit task.
18 This runtime entry point is async signal safe.
19 Description of Arguments
20 The place_nums argument is an array in which the routine can return a vector of place identifiers.
21 The place_nums_size argument indicates the size of the result array that the place_nums argument
22 specifies.
23 Effect
24 If the place_nums array of size place_nums_size is large enough to contain all identifiers then they
25 are returned in place_nums and their order in the array is implementation defined. Otherwise, if the
26 place_nums array is too small, the values in place_nums when the function returns are unspecified.
27 The routine always returns the number of places in the place-partition-var ICV of the innermost
28 implicit task.
CHAPTER4. OMPTINTERFACE 507
1 Cross References
2 • place-partition-var ICV, see Section 2.5 on page 63.
3 • OMP_PLACES environment variable, see Section 6.5 on page 605.
4 4.6.1.11 ompt_get_proc_id_t
5 Summary
6 The ompt_get_proc_id_t type is the type signature of the ompt_get_proc_id runtime
7 entry point, which returns the numerical identifier of the processor of the current thread.
8 Format
9 typedef int (*ompt_get_proc_id_t) (void);
C / C++ C / C++
10 Binding
11 The binding thread set for the ompt_get_proc_id runtime entry point is the current thread.
12 Description
13 The ompt_get_proc_id runtime entry point, which has type signature
14 ompt_get_proc_id_t, returns the numerical identifier of the processor of the current thread.
15 A defined numerical identifier is non-negative and its meaning is implementation defined. A
16 negative number indicates a failure to retrieve the numerical identifier.
17 This runtime entry point is async signal safe.
18 4.6.1.12 ompt_get_state_t
19
20 21
Summary
The ompt_get_state_t type is the type signature of the ompt_get_state runtime entry point, which returns the state and the wait identifier of the current thread.
508
OpenMP API – Version 5.0 November 2018
1 Format
typedef int (*ompt_get_state_t) ( ompt_wait_id_t *wait_id
);
2 3 4
C / C++
C / C++
5 Binding
6 The binding thread for the ompt_get_state runtime entry point is the current thread.
7 Description
8 Each OpenMP thread has an associated state and a wait identifier. If a thread’s state indicates that
9 the thread is waiting for mutual exclusion then its wait identifier contains an opaque handle that
10 indicates the data object upon which the thread is waiting. The ompt_get_state runtime entry
11 point, which has type signature ompt_get_state_t, retrieves the state and wait identifier of the
12 current thread. The returned value may be any one of the states predefined by ompt_state_t or
13 a value that represents any implementation specific state. The tool may obtain a string
14 representation for each state with the ompt_enumerate_states function.
15 If the returned state indicates that the thread is waiting for a lock, nest lock, critical section, atomic
16 region, or ordered region then the value of the thread’s wait identifier is assigned to a non-null wait
17 identifier passed as the wait_id argument.
18 This runtime entry point is async signal safe.
19 Description of Arguments
20 The wait_id argument is a pointer to an opaque handle that is available to receive the value of the
21 thread’s wait identifier. If wait_id is not NULL then the entry point assigns the value of the thread’s
22 wait identifier to the object to which wait_id points. If the returned state is not one of the specified
23 wait states then the value of opaque object to which wait_id points is undefined after the call.
24 Constraints on Arguments
25 The argument passed to the entry point must be a reference to a variable of the specified type or
26 NULL.
CHAPTER4. OMPTINTERFACE 509
6
7 8 9
10
11 12 13 14 15
16
17
18
19
20
21
22
23 24 25
26
Summary
The ompt_get_parallel_info_t type is the type signature of the ompt_get_parallel_info runtime entry point, which returns information about the parallel region, if any, at the specified ancestor level for the current execution context.
1 Cross References
2 • ompt_state_t type, see Section 4.4.4.26 on page 452.
3 • ompt_wait_id_t type, see Section 4.4.4.29 on page 456.
4 • ompt_enumerate_states_t type, see Section 4.6.1.1 on page 498.
5 4.6.1.13 ompt_get_parallel_info_t
Format
C / C++
C / C++
typedef int (*ompt_get_parallel_info_t) (
int ancestor_level, ompt_data_t **parallel_data, int *team_size
);
510
OpenMP API – Version 5.0 November 2018
Description
During execution, an OpenMP program may employ nested parallel regions. The ompt_get_parallel_info runtime entry point known, which has type signature ompt_get_parallel_info_t, retrieves information, about the current parallel region and any enclosing parallel regions for the current execution context. The entry point returns 2 if there is a parallel region at the specified ancestor level and the information is available, 1 if there is a parallel region at the specified ancestor level but the information is currently unavailable, and 0 otherwise.
A tool may use the pointer to a parallel region’s data object that it obtains from this runtime entry point to inspect or to modify the value of the data object. When a parallel region is created, its data object will be initialized with the value ompt_data_none.
This runtime entry point is async signal safe.
1 Between a parallel-begin event and an implicit-task-begin event, a call to
2 ompt_get_parallel_info(0,…) may return information about the outer parallel team,
3 the new parallel team or an inconsistent state.
4 If a thread is in the state ompt_state_wait_barrier_implicit_parallel then a call to
5 ompt_get_parallel_info may return a pointer to a copy of the specified parallel region’s
6 parallel_data rather than a pointer to the data word for the region itself. This convention enables
7 the master thread for a parallel region to free storage for the region immediately after the region
8 ends, yet avoid having some other thread in the region’s team potentially reference the region’s
9 parallel_data object after it has been freed.
10 Description of Arguments
11 The ancestor_level argument specifies the parallel region of interest by its ancestor level. Ancestor
12 level 0 refers to the innermost parallel region; information about enclosing parallel regions may be
13 obtained using larger values for ancestor_level.
14 The parallel_data argument returns the parallel data if the argument is not NULL.
15 The team_size argument returns the team size if the argument is not NULL.
16 Effect
17 If the runtime entry point returns 0 or 1, no argument is modified. Otherwise,
18 ompt_get_parallel_info has the following effects:
19 • If a non-null value was passed for parallel_data, the value returned in parallel_data is a pointer
20 to a data word that is associated with the parallel region at the specified level; and
21 • If a non-null value was passed for team_size, the value returned in the integer to which team_size
22 point is the number of threads in the team that is associated with the parallel region.
23 Constraints on Arguments
24 While argument ancestor_level is passed by value, all other arguments to the entry point must be
25 pointers to variables of the specified types or NULL.
26 Cross References
27 • ompt_data_t type, see Section 4.4.4.4 on page 440.
CHAPTER4. OMPTINTERFACE 511
2
3 4 5
6
7 8 9
10 11 12 13 14
15
16 17 18 19 20
21 22
23 24 25
26 27 28 29
30 31 32 33
34
Summary
The ompt_get_task_info_t type is the type signature of the ompt_get_task_info runtime entry point, which returns information about the task, if any, at the specified ancestor level in the current execution context.
1 4.6.1.14 ompt_get_task_info_t
Format
C / C++
typedef int (*ompt_get_task_info_t) ( int ancestor_level,
int *flags,
ompt_data_t **task_data, ompt_frame_t **task_frame, ompt_data_t **parallel_data, int *thread_num
);
C / C++
512
OpenMP API – Version 5.0 November 2018
Description
During execution, an OpenMP thread may be executing an OpenMP task. Additionally, the thread’s stack may contain procedure frames that are associated with suspended OpenMP tasks or OpenMP runtime system routines. To obtain information about any task on the current thread’s stack, a tool uses the ompt_get_task_info runtime entry point, which has type signature ompt_get_task_info_t.
Ancestor level 0 refers to the active task; information about other tasks with associated frames present on the stack in the current execution context may be queried at higher ancestor levels.
The ompt_get_task_info runtime entry point returns 2 if there is a task region at the specified ancestor level and the information is available, 1 if there is a task region at the specified ancestor level but the information is currently unavailable, and 0 otherwise.
If a task exists at the specified ancestor level and the information is available then information is returned in the variables passed by reference to the entry point. If no task region exists at the specified ancestor level or the information is unavailable then the values of variables passed by reference to the entry point are undefined when ompt_get_task_info returns.
A tool may use a pointer to a data object for a task or parallel region that it obtains from ompt_get_task_info to inspect or to modify the value of the data object. When either a parallel region or a task region is created, its data object will be initialized with the value ompt_data_none.
This runtime entry point is async signal safe.
1 Description of Arguments
2 The ancestor_level argument specifies the task region of interest by its ancestor level. Ancestor
3 level 0 refers to the active task; information about ancestor tasks found in the current execution
4 context may be queried at higher ancestor levels.
5 The flags argument returns the task type if the argument is not NULL.
6 The task_data argument returns the task data if the argument is not NULL.
7 The task_frame argument returns the task frame pointer if the argument is not NULL.
8 The parallel_data argument returns the parallel data if the argument is not NULL.
9 The thread_num argument returns the thread number if the argument is not NULL.
10 Effect
11 If the runtime entry point returns 0 or 1, no argument is modified. Otherwise,
12 ompt_get_task_info has the following effects:
13 • 14
15
16 • 17
18 • 19
20
21 • 22
23
24
25 • 26
27
If a non-null value was passed for flags then the value returned in the integer to which flags points represents the type of the task at the specified level; possible task types include initial, implicit, explicit, and target tasks;
If a non-null value was passed for task_data then the value that is returned in the object to which it points is a pointer to a data word that is associated with the task at the specified level;
If a non-null value was passed for task_frame then the value that is returned in the object to which task_frame points is a pointer to the ompt_frame_t structure that is associated with the task at the specified level;
If a non-null value was passed for parallel_data then the value that is returned in the object to which parallel_data points is a pointer to a data word that is associated with the parallel region that contains the task at the specified level or, if the task at the specified level is an initial task, NULL; and
If a non-null value was passed for thread_num then the value that is returned in the object to which thread_num points indicates the number of the thread in the parallel region that is executing the task at the specified level.
28 Constraints on Arguments
29 While argument ancestor_level is passed by value, all other arguments to
30 ompt_get_task_info must be pointers to variables of the specified types or NULL.
CHAPTER4. OMPTINTERFACE 513
6
7 8 9
10
11 12 13 14 15
16
17 18 19 20 21
22 23
24 25 26
27
Summary
The ompt_get_task_memory_t type is the type signature of the ompt_get_task_memory runtime entry point, which returns information about memory ranges that are associated with the task.
1 Cross References
2 • ompt_data_t type, see Section 4.4.4.4 on page 440.
3 • ompt_task_flag_t type, see Section 4.4.4.18 on page 446.
4 • ompt_frame_t type, see Section 4.4.4.27 on page 454.
5 4.6.1.15 ompt_get_task_memory_t
Format
C / C++
C / C++
typedef int (*ompt_get_task_memory_t)(
void **addr, size_t *size, int block
);
514
OpenMP API – Version 5.0 November 2018
Description
During execution, an OpenMP thread may be executing an OpenMP task. The OpenMP implementation must preserve the data environment from the creation of the task for the execution of the task. The ompt_get_task_memory runtime entry point, which has type signature ompt_get_task_memory_t, provides information about the memory ranges used to store the data environment for the current task.
Multiple memory ranges may be used to store these data. The block argument supports iteration over these memory ranges.
The ompt_get_task_memory runtime entry point returns 1 if there are more memory ranges available, and 0 otherwise. If no memory is used for a task, size is set to 0. In this case, addr is unspecified.
This runtime entry point is async signal safe.
1 Description of Arguments
2 The addr argument is a pointer to a void pointer return value to provide the start address of a
3 memory block.
4 The size argument is a pointer to a size type return value to provide the size of the memory block.
5 The block argument is an integer value to specify the memory block of interest.
6 4.6.1.16 ompt_get_target_info_t
7 Summary
8 The ompt_get_target_info_t type is the type signature of the
9 ompt_get_target_info runtime entry point, which returns identifiers that specify a thread’s
10 current target region and target operation ID, if any.
11 Format
12 13 14 15 16
17 Description
C / C++
C / C++
typedef int (*ompt_get_target_info_t) ( uint64_t *device_num,
ompt_id_t *target_id,
ompt_id_t *host_op_id
);
18 The ompt_get_target_info entry point, which has type signature
19 ompt_get_target_info_t, returns 1 if the current thread is in a target region and 0
20 otherwise. If the entry point returns 0 then the values of the variables passed by reference as its
21 arguments are undefined.
22 If the current thread is in a target region then ompt_get_target_info returns information
23 about the current device, active target region, and active host operation, if any.
24 This runtime entry point is async signal safe.
CHAPTER4. OMPTINTERFACE 515
1 Description of Arguments
2 The device_num argument returns the device number if the current thread is in a target region.
3 Th target_id argument returns the target region identifier if the current thread is in a target
4 region.
5 If the current thread is in the process of initiating an operation on a target device (for example,
6 copying data to or from an accelerator or launching a kernel) then host_op_id returns the identifier
7 for the operation; otherwise, host_op_id returns ompt_id_none.
8 Constraints on Arguments
9 Arguments passed to the entry point must be valid references to variables of the specified types.
10 Cross References
11 • ompt_id_t type, see Section 4.4.4.3 on page 439.
12 4.6.1.17 ompt_get_num_devices_t
13
14 15
16 17
18
19 20
21
Summary
The ompt_get_num_devices_t type is the type signature of the ompt_get_num_devices runtime entry point, which returns the number of available devices.
Format
C / C++
typedef int (*ompt_get_num_devices_t) (void);
C / C++
Description
The ompt_get_num_devices runtime entry point, which has type signature ompt_get_num_devices_t, returns the number of devices available to an OpenMP program.
This runtime entry point is async signal safe.
516
OpenMP API – Version 5.0 November 2018
1 4.6.1.18 ompt_get_unique_id_t
2 Summary
3 The ompt_get_unique_id_t type is the type signature of the ompt_get_unique_id
4 runtime entry point, which returns a unique number.
5 Format
6 typedef uint64_t (*ompt_get_unique_id_t) (void);
7 Description
8 The ompt_get_unique_id runtime entry point, which has type signature
9 ompt_get_unique_id_t, returns a number that is unique for the duration of an OpenMP
10 program. Successive invocations may not result in consecutive or even increasing numbers.
11 This runtime entry point is async signal safe.
12 4.6.1.19 ompt_finalize_tool_t
13 Summary
14 The ompt_finalize_tool_t type is the type signature of the ompt_finalize_tool
15 runtime entry point, which enables a tool to finalize itself.
16 Format
17 typedef void (*ompt_finalize_tool_t) (void);
18 Description
19 A tool may detect that the execution of an OpenMP program is ending before the OpenMP
20 implementation does. To facilitate clean termination of the tool, the tool may invoke the
21 ompt_finalize_tool runtime entry point, which has type signature
22 ompt_finalize_tool_t. Upon completion of ompt_finalize_tool, no OMPT
23 callbacks are dispatched.
C / C++ C / C++
C / C++ C / C++
CHAPTER4. OMPTINTERFACE 517
1
Effect
The ompt_finalize_tool routine detaches the tool from the runtime, unregisters all callbacks and invalidates all OMPT entry points passed to the tool in the lookup-function. Upon completion of ompt_finalize_tool, no further callbacks will be issued on any thread.
Before the callbacks are unregistered, the OpenMP runtime should attempt to dispatch all outstanding registered callbacks as well as the callbacks that would be encountered during shutdown of the runtime, if possible in the current execution context.
Entry Points in the OMPT Device Tracing Interface
The runtime entry points with type signatures of the types that are specified in this section enable a tool to trace activities on a device.
2 3 4
5 6 7
8 4.6.2 9
10
11 4.6.2.1 ompt_get_device_num_procs_t
12
13 14 15
16
17 18 19
20
21 22 23 24 25
Summary
The ompt_get_device_num_procs_t type is the type signature of the ompt_get_device_num_procs runtime entry point, which returns the number of processors currently available to the execution environment on the specified device.
Format
Description
C / C++
C / C++
518
OpenMP API – Version 5.0 November 2018
typedef int (*ompt_get_device_num_procs_t) ( ompt_device_t *device
);
The ompt_get_device_num_procs runtime entry point, which has type signature ompt_get_device_num_procs_t, returns the number of processors that are available on the device at the time the routine is called. This value may change between the time that it is determined and the time that it is read in the calling context due to system actions outside the control of the OpenMP implementation.
1 Description of Arguments
2 The device argument is a pointer to an opaque object that represents the target device instance. The
3 pointer to the device instance object is used by functions in the device tracing interface to identify
4 the device being addressed.
5 Cross References
6 • ompt_device_t type, see Section 4.4.4.5 on page 441.
7 4.6.2.2 ompt_get_device_time_t
8 Summary
9 The ompt_get_device_time_t type is the type signature of the
10 ompt_get_device_time runtime entry point, which returns the current time on the specified
11 device.
12 Format
13 14 15
16 Description
C / C++
C / C++
typedef ompt_device_time_t (*ompt_get_device_time_t) ( ompt_device_t *device
);
17 Host and target devices are typically distinct and run independently. If host and target devices are
18 different hardware components, they may use different clock generators. For this reason, a common
19 time base for ordering host-side and device-side events may not be available.
20 The ompt_get_device_time runtime entry point, which has type signature
21 ompt_get_device_time_t, returns the current time on the specified device. A tool can use
22 this information to align time stamps from different devices.
23 Description of Arguments
24 The device argument is a pointer to an opaque object that represents the target device instance. The
25 pointer to the device instance object is used by functions in the device tracing interface to identify
26 the device being addressed.
CHAPTER4. OMPTINTERFACE 519
5
6 7 8
9
10 11 12 13
14
15 16 17 18
19
20 21 22 23
24
25 26 27
28
Summary
The ompt_translate_time_t type is the type signature of the ompt_translate_time runtime entry point, which translates a time value that is obtained from the specified device to a corresponding time value on the host device.
1 Cross References
2 • ompt_device_t type, see Section 4.4.4.5 on page 441.
3 • ompt_device_time_t type, see Section 4.4.4.6 on page 441.
4 4.6.2.3 ompt_translate_time_t
Format
C / C++
C / C++
typedef double (*ompt_translate_time_t) ( ompt_device_t *device,
ompt_device_time_t time
);
Description
The ompt_translate_time runtime entry point, which has type signature ompt_translate_time_t, translates a time value obtained from the specified device to a corresponding time value on the host device. The returned value for the host time has the same meaning as the value returned from omp_get_wtime.
Note – The accuracy of time translations may degrade if they are not performed promptly after a device time value is received and if either the host or device vary their clock speeds. Prompt translation of device times to host times is recommended.
Description of Arguments
The device argument is a pointer to an opaque object that represents the target device instance. The pointer to the device instance object is used by functions in the device tracing interface to identify the device being addressed.
The time argument is a time from the specified device.
520
OpenMP API – Version 5.0 November 2018
1 Cross References
2 • omp_get_wtime routine, see Section 3.4.1 on page 394.
3 • ompt_device_t type, see Section 4.4.4.5 on page 441.
4 • ompt_device_time_t type, see Section 4.4.4.6 on page 441.
5 4.6.2.4 ompt_set_trace_ompt_t
6 Summary
7 The ompt_set_trace_ompt_t type is the type signature of the ompt_set_trace_ompt
8 runtime entry point, which enables or disables the recording of trace records for one or more types
9 of OMPT events.
10 Format
11 12 13 14 15
C / C++
C / C++
typedef ompt_set_result_t (*ompt_set_trace_ompt_t) (
ompt_device_t *device, unsigned int enable, unsigned int etype
);
16 Description of Arguments
17 The device argument points to an opaque object that represents the target device instance. Functions
18 in the device tracing interface use this pointer to identify the device that is being addressed.
19 The etype argument indicates the events to which the invocation of ompt_set_trace_ompt
20 applies. If the value of etype is 0 then the invocation applies to all events. If etype is positive then it
21 applies to the event in ompt_callbacks_t that matches that value.
22 The enable argument indicates whether tracing should be enabled or disabled for the event or events
23 that the etype argument specifies. A positive value for enable indicates that recording should be
24 enabled; a value of 0 for enable indicates that recording should be disabled.
25 Restrictions
26 The ompt_set_trace_ompt runtime entry point has the following restriction:
27 • The entry point must not return ompt_set_sometimes_paired.
CHAPTER4. OMPTINTERFACE 521
7
8
9 10
11
12 13 14 15 16
17
18 19 20 21
22
23 24
25
Summary
The ompt_set_trace_native_t type is the type signature of the ompt_set_trace_native runtime entry point, which enables or disables the recording of native trace records for a device.
1 Cross References
2 • Tracing activity on target devices with OMPT, see Section 4.2.5 on page 427.
3 • ompt_callbacks_t type, see Section 4.4.2 on page 434.
4 • ompt_set_result_t type, see Section 4.4.4.2 on page 438.
5 • ompt_device_t type, see Section 4.4.4.5 on page 441.
6 4.6.2.5 ompt_set_trace_native_t
Format
C / C++
C / C++
typedef ompt_set_result_t (*ompt_set_trace_native_t) ( ompt_device_t *device,
int enable,
int flags );
522
OpenMP API – Version 5.0 November 2018
Description
This interface is designed for use by a tool that cannot directly use native control functions for the device. If a tool can directly use the native control functions then it can invoke native control functions directly using pointers that the lookup function associated with the device provides and that are described in the documentation string that is provided to the device initializer callback.
Description of Arguments
The device argument points to an opaque object that represents the target device instance. Functions in the device tracing interface use this pointer to identify the device that is being addressed.
The enable argument indicates whether this invocation should enable or disable recording of events.
1 The flags argument specifies the kinds of native device monitoring to enable or to disable. Each
2 kind of monitoring is specified by a flag bit. Flags can be composed by using logical or to combine
3 enumeration values from type ompt_native_mon_flag_t.
4 To start, to pause, to flush, or to stop tracing for a specific target device associated with device, a
5 tool invokes the ompt_start_trace, ompt_pause_trace, ompt_flush_trace, or
6 ompt_stop_trace runtime entry point for the device.
7 Restrictions
8 The ompt_set_trace_native runtime entry point has the following restriction:
9 • The entry point must not return ompt_set_sometimes_paired.
10 Cross References
11 • Tracing activity on target devices with OMPT, see Section 4.2.5 on page 427.
12 • ompt_set_result_t type, see Section 4.4.4.2 on page 438.
13 • ompt_device_t type, see Section 4.4.4.5 on page 441.
14 4.6.2.6 ompt_start_trace_t
15 Summary
16 The ompt_start_trace_t type is the type signature of the ompt_start_trace runtime
17 entry point, which starts tracing of activity on a specific device.
18 Format
19 20 21 22 23
C / C++
C / C++
typedef int (*ompt_start_trace_t) ( ompt_device_t *device, ompt_callback_buffer_request_t request,
ompt_callback_buffer_complete_t complete );
CHAPTER4. OMPTINTERFACE 523
1 Description
2 A device’s ompt_start_trace runtime entry point, which has type signature
3 ompt_start_trace_t, initiates tracing on the device. Under normal operating conditions,
4 every event buffer provided to a device by a tool callback is returned to the tool before the OpenMP
5 runtime shuts down. If an exceptional condition terminates execution of an OpenMP program, the
6 OpenMP runtime may not return buffers provided to the device.
7 An invocation of ompt_start_trace returns 1 if the command succeeds and 0 otherwise.
8 Description of Arguments
9 The device argument points to an opaque object that represents the target device instance. Functions
10 in the device tracing interface use this pointer to identify the device that is being addressed.
11 The request argument specifies a tool callback that supplies a device with a buffer to deposit events.
12 The complete argument specifies a tool callback that is invoked by the OpenMP implementation to
13 empty a buffer that contains event records.
14 Cross References
15 • ompt_device_t type, see Section 4.4.4.5 on page 441.
16 • ompt_callback_buffer_request_t callback type, see Section 4.5.2.23 on page 486.
17 • ompt_callback_buffer_complete_t callback type, see Section 4.5.2.24 on page 487.
18 4.6.2.7 ompt_pause_trace_t
19
20 21
22
23 24 25 26
Summary
The ompt_pause_trace_t type is the type signature of the ompt_pause_trace runtime entry point, which pauses or restarts activity tracing on a specific device.
Format
C / C++
C / C++
typedef int (*ompt_pause_trace_t) ( ompt_device_t *device,
int begin_pause
);
524
OpenMP API – Version 5.0 November 2018
1 Description
2 A device’s ompt_pause_trace runtime entry point, which has type signature
3 ompt_pause_trace_t, pauses or resumes tracing on a device. An invocation of
4 ompt_pause_trace returns 1 if the command succeeds and 0 otherwise. Redundant pause or
5 resume commands are idempotent and will return the same value as the prior command.
6 Description of Arguments
7 The device argument points to an opaque object that represents the target device instance. Functions
8 in the device tracing interface use this pointer to identify the device that is being addressed.
9 The begin_pause argument indicates whether to pause or to resume tracing. To resume tracing,
10 zero should be supplied for begin_pause; To pause tracing, any other value should be supplied.
11 Cross References
12 • ompt_device_t type, see Section 4.4.4.5 on page 441.
13 4.6.2.8 ompt_flush_trace_t
14 Summary
15 The ompt_flush_trace_t type is the type signature of the ompt_flush_trace runtime
16 entry point, which causes all pending trace records for the specified device to be delivered.
17 Format
18 19 20
21 Description
C / C++
C / C++
typedef int (*ompt_flush_trace_t) ( ompt_device_t *device
);
22 A device’s ompt_flush_trace runtime entry point, which has type signature
23 ompt_flush_trace_t, causes the OpenMP implementation to issue a sequence of zero or more
24 buffer completion callbacks to deliver all trace records that have been collected prior to the flush.
25 An invocation of ompt_flush_trace returns 1 if the command succeeds and 0 otherwise.
CHAPTER4. OMPTINTERFACE 525
7
8 9
10
11 12 13
14
15 16 17 18
19
20 21
22 23
Summary
The ompt_stop_trace_t type is the type signature of the ompt_stop_trace runtime entry point, which stops tracing for a device.
1 Description of Arguments
2 The device argument points to an opaque object that represents the target device instance. Functions
3 in the device tracing interface use this pointer to identify the device that is being addressed.
4 Cross References
5 • ompt_device_t type, see Section 4.4.4.5 on page 441.
6 4.6.2.9 ompt_stop_trace_t
Format
Description
C / C++
C / C++
typedef int (*ompt_stop_trace_t) ( ompt_device_t *device
);
526
OpenMP API – Version 5.0 November 2018
A device’s ompt_stop_trace runtime entry point, which has type signature ompt_stop_trace_t, halts tracing on the device and requests that any pending trace records are flushed. An invocation of ompt_stop_trace returns 1 if the command succeeds and 0 otherwise.
Description of Arguments
The device argument points to an opaque object that represents the target device instance. Functions in the device tracing interface use this pointer to identify the device that is being addressed.
Cross References
• ompt_device_t type, see Section 4.4.4.5 on page 441.
1 4.6.2.10 ompt_advance_buffer_cursor_t
2 Summary
3 The ompt_advance_buffer_cursor_t type is the type signature of the
4 ompt_advance_buffer_cursor runtime entry point, which advances a trace buffer cursor to
5 the next record.
6 Format
7 8 9
10 11 12 13
14 Description
C / C++
C / C++
typedef int (*ompt_advance_buffer_cursor_t) ( ompt_device_t *device,
ompt_buffer_t *buffer,
size_t size,
ompt_buffer_cursor_t current, ompt_buffer_cursor_t *next
);
15 A device’s ompt_advance_buffer_cursor runtime entry point, which has type signature
16 ompt_advance_buffer_cursor_t, advances a trace buffer pointer to the next trace record.
17 An invocation of ompt_advance_buffer_cursor returns true if the advance is successful
18 and the next position in the buffer is valid.
19 Description of Arguments
20 The device argument points to an opaque object that represents the target device instance. Functions
21 in the device tracing interface use this pointer to identify the device that is being addressed.
22 The buffer argument indicates a trace buffer that is associated with the cursors.
23 The argument size indicates the size of buffer in bytes.
24 The current argument is an opaque buffer cursor.
25 The next argument returns the next value of an opaque buffer cursor.
26 Cross References
27 • ompt_device_t type, see Section 4.4.4.5 on page 441.
28 • ompt_buffer_cursor_t type, see Section 4.4.4.8 on page 442.
CHAPTER4. OMPTINTERFACE 527
2
3 4
5
6 7 8 9
10
11 12 13 14
15 16 17 18
19 20 21
22 23 24 25
Summary
The ompt_get_record_type_t type is the type signature of the ompt_get_record_type runtime entry point, which inspects the type of a trace record.
1 4.6.2.11 ompt_get_record_type_t
Format
C / C++
C / C++
typedef ompt_record_t (*ompt_get_record_type_t) (
ompt_buffer_t *buffer,
ompt_buffer_cursor_t current );
528
OpenMP API – Version 5.0 November 2018
Description
Trace records for a device may be in one of two forms: native record format, which may be device-specific, or OMPT record format, in which each trace record corresponds to an OpenMP event and most fields in the record structure are the arguments that would be passed to the OMPT callback for the event.
A device’s ompt_get_record_type runtime entry point, which has type signature ompt_get_record_type_t, inspects the type of a trace record and indicates whether the record at the current position in the trace buffer is an OMPT record, a native record, or an invalid record. An invalid record type is returned if the cursor is out of bounds.
Description of Arguments
The buffer argument indicates a trace buffer.
The current argument is an opaque buffer cursor.
Cross References
• ompt_record_t type, see Section 4.4.3.1 on page 435.
• ompt_buffer_t type, see Section 4.4.4.7 on page 441.
• ompt_buffer_cursor_t type, see Section 4.4.4.8 on page 442.
1 4.6.2.12 ompt_get_record_ompt_t
2 Summary
3 The ompt_get_record_ompt_t type is the type signature of the
4 ompt_get_record_ompt runtime entry point, which obtains a pointer to an OMPT trace
5 record from a trace buffer associated with a device.
6 Format
7 8 9
10
11 Description
C / C++
C / C++
typedef ompt_record_ompt_t *(*ompt_get_record_ompt_t) ( ompt_buffer_t *buffer,
ompt_buffer_cursor_t current
);
12 A device’s ompt_get_record_ompt runtime entry point, which has type signature
13 ompt_get_record_ompt_t, returns a pointer that may point to a record in the trace buffer, or
14 it may point to a record in thread local storage in which the information extracted from a record was
15 assembled. The information available for an event depends upon its type.
16 The return value of the ompt_record_ompt_t type includes a field of a union type that can
17 represent information for any OMPT event record type. Another call to the runtime entry point may
18 overwrite the contents of the fields in a record returned by a prior invocation.
19 Description of Arguments
20 The buffer argument indicates a trace buffer.
21 The current argument is an opaque buffer cursor.
22 Cross References
23 • ompt_record_ompt_t type, see Section 4.4.3.4 on page 436.
24 • ompt_device_t type, see Section 4.4.4.5 on page 441.
25 • ompt_buffer_cursor_t type, see Section 4.4.4.8 on page 442.
CHAPTER4. OMPTINTERFACE 529
2
3 4 5
6
7 8 9
10 11
12
13
14
15
16
17
18
19
20
21 22 23
24 25 26
27 28 29 30
Summary
The ompt_get_record_native_t type is the type signature of the ompt_get_record_native runtime entry point, which obtains a pointer to a native trace record from a trace buffer associated with a device.
1 4.6.2.13 ompt_get_record_native_t
Format
C / C++
C / C++
typedef void *(*ompt_get_record_native_t) ( ompt_buffer_t *buffer,
ompt_buffer_cursor_t current,
ompt_id_t *host_op_id
);
530
OpenMP API – Version 5.0 November 2018
Description
A device’s ompt_get_record_native runtime entry point, which has type signature ompt_get_record_native_t, returns a pointer that may point may point into the specified trace buffer, or into thread local storage in which the information extracted from a trace record was assembled. The information available for a native event depends upon its type. If the function returns a non-null result, it will also set the object to which host_op_id points to a host-side identifier for the operation that is associated with the record. A subsequent call to ompt_get_record_native may overwrite the contents of the fields in a record returned by a prior invocation.
Description of Arguments
The buffer argument indicates a trace buffer.
The current argument is an opaque buffer cursor.
The host_op_id argument is a pointer to an identifier that is returned by the function. The entry point sets the identifier to which host_op_id points to the value of a host-side identifier for an operation on a target device that was created when the operation was initiated by the host.
Cross References
• ompt_id_t type, see Section 4.4.4.3 on page 439.
• ompt_buffer_t type, see Section 4.4.4.7 on page 441.
• ompt_buffer_cursor_t type, see Section 4.4.4.8 on page 442.
1 4.6.2.14 ompt_get_record_abstract_t
2
3 4 5
6
7 8 9
10
11
12 13 14 15
16 17
18 19
20 4.6.3 21
22 23
Summary
The ompt_get_record_abstract_t type is the type signature of the ompt_get_record_abstract runtime entry point, which summarizes the context of a native (device-specific) trace record.
Format
C / C++
C / C++
typedef ompt_record_abstract_t *
(*ompt_get_record_abstract_t) (
void *native_record );
Description
An OpenMP implementation may execute on a device that logs trace records in a native (device-specific) format that a tool cannot interpret directly. A device’s ompt_get_record_abstract runtime entry point, which has type signature ompt_get_record_abstract_t, translates a native trace record into a standard form.
Description of Arguments
The native_record argument is a pointer to a native trace record. Cross References
• ompt_record_abstract_t type, see Section 4.4.3.3 on page 436.
Lookup Entry Points: ompt_function_lookup_t Summary
The ompt_function_lookup_t type is the type signature of the lookup runtime entry points that provide pointers to runtime entry points that are part of the OMPT interface.
CHAPTER4. OMPTINTERFACE 531
1
Format
typedef void (*ompt_interface_fn_t) (void);
typedef ompt_interface_fn_t (*ompt_function_lookup_t) ( const char *interface_function_name
);
2 3 4 5 6
7
8
9 10 11 12 13 14
15
16 17
18
19
20
21 22
23 24
C / C++
C / C++
532
OpenMP API – Version 5.0 November 2018
Description
An OpenMP implementation provides a pointer to a lookup routine that provides pointers to OMPT runtime entry points. When the implementation invokes a tool initializer to configure the OMPT callback interface, it provides a lookup function that provides pointers to runtime entry points that implement routines that are part of the OMPT callback interface. Alternatively, when it invokes a tool initializer to configure the OMPT tracing interface for a device, it provides a lookup function that provides pointers to runtime entry points that implement tracing control routines appropriate for that device.
Description of Arguments
The interface_function_name argument is a C string that represents the name of a runtime entry point.
Cross References
• • •
•
Tool initializer for a device’s OMPT tracing interface, see Section 4.2.5 on page 427. Tool initializer for the OMPT callback interface, see Section 4.5.1.1 on page 457.
Entry points in the OMPT callback interface, see Table 4.1 on page 426 for a list and Section 4.6.1 on page 497 for detailed definitions.
Entry points in the OMPT tracing interface, see Table 4.3 on page 430 for a list and Section 4.6.2 on page 518 for detailed definitions.
CHAPTER 5
1 OMPD Interface 2
3 This chapter describes OMPD, which is an interface for third-party tools. Third-party tools exist in
4 separate processes from the OpenMP program. To provide OMPD support, an OpenMP
5 implementation must provide an OMPD library to be loaded by the third-party tool. An OpenMP
6 implementation does not need to maintain any extra information to support OMPD inquiries from
7 third-party tools unless it is explicitly instructed to do so.
8 OMPD allows third-party tools such as a debuggers to inspect the OpenMP state of a live program
9 or core file in an implementation-agnostic manner. That is, a tool that uses OMPD should work
10 with any conforming OpenMP implementation. An OpenMP implementor provides a library for
11 OMPD that a third-party tool can dynamically load. Using the interface exported by the OMPD
12 library, the external tool can inspect the OpenMP state of a program. In order to satisfy requests
13 from the third-party tool, the OMPD library may need to read data from, or to find the addresses of
14 symbols in the OpenMP program. The OMPD library provides this functionality through a callback
15 interface that the third-party tool must instantiate for the OMPD library.
16 To use OMPD, the third-party tool loads the OMPD library. The OMPD library exports the API
17 that is defined throughout this section and that the tool uses to determine OpenMP information
18 about the OpenMP program. The OMPD library must look up the symbols and read data out of the
19 program. It does not perform these operations directly, but instead it uses the callback interface that
20 the tool exports to cause the tool to perform them.
21 The OMPD architecture insulates tools from the internal structure of the OpenMP runtime while
22 the OMPD library is insulated from the details of how to access the OpenMP program. This
23 decoupled design allows for flexibility in how the OpenMP program and tool are deployed, so that,
24 for example, the tool and the OpenMP program are not required to execute on the same machine.
25 Generally the tool does not interact directly with the OpenMP runtime and, instead, interacts with it
26 through the OMPD library. However, a few cases require the tool to access the OpenMP runtime
27 directly. These cases fall into two broad categories. The first is during initialization, where the tool
28 must look up symbols and read variables in the OpenMP runtime in order to identify the OMPD
29 library that it should use, which is discussed in Section 5.2.2 on page 535 and Section 5.2.3 on
30 page 536. The second category relates to arranging for the tool to be notified when certain events
533
1 2 3 4 5 6
7 5.1
8
9 10 11 12
13 14
15 5.2 16
17
18 5.2.1
19 20 21
occur during the execution of the OpenMP program. For this purpose, the OpenMP implementation must define certain symbols in the runtime code, as is discussed in Section 5.6 on page 594. Each of these symbols corresponds to an event type. The runtime must ensure that control passes through the appropriate named location when events occur. If the tool requires notification of an event, it can plant a breakpoint at the matching location. The location can, but may not, be a function. It can, for example, simply be a label. However, the names of the locations must have external C linkage.
OMPD Interfaces Definitions
C / C++
A compliant implementation must supply a set of definitions for the OMPD runtime entry points, OMPD tool callback signatures, OMPD tool interface routines, and the special data types of their parameters and return values. These definitions, which are listed throughout this chapter, and their associated declarations shall be provided in a header file named omp-tools.h. In addition, the set of definitions may specify other implementation-specific values.
The ompd_dll_locations function, all OMPD tool interface functions, and all OMPD runtime entry points are external functions with C linkage.
C / C++
Activating an OMPD Tool
The tool and the OpenMP program exist as separate processes. Thus, coordination is required between the OpenMP runtime and the external tool for OMPD.
Enabling the Runtime for OMPD
In order to support third-party tools, the OpenMP runtime may need to collect and to maintain information that it might not otherwise. The OpenMP runtime collects whatever information is necessary to support OMPD if the environment variable OMP_DEBUG is set to enabled.
534
OpenMP API – Version 5.0 November 2018
1 Cross References
2 • Activating an OMPT Tool, Section 4.2 on page 420
3 • OMP_DEBUG, Section 6.20 on page 617
4 5.2.2 ompd_dll_locations
5 Summary
6 The ompd_dll_locations global variable indicates the location of OMPD libraries that are
7 compatible with the OpenMP implementation.
8 Format
9 const char **ompd_dll_locations;
10 Description
C C
11 An OpenMP runtime may have more than one OMPD library. The tool must be able to locate the
12 right library to use for the OpenMP program that it is examining. The OpenMP runtime system
13 must provide a public variable ompd_dll_locations, which is an argv-style vector of
14 filename string pointers that provides the name(s) of any compatible OMPD library. This variable
15 must have C linkage. The tool uses the name of the variable verbatim and, in particular, does not
16 apply any name mangling before performing the look up.
17 The programming model or architecture of the tool and, thus, that of OMPD does not have to match
18 that of the OpenMP program that is being examined. The tool must interpret the contents of
19 ompd_dll_locations to find a suitable OMPD that matches its own architectural
20 characteristics. On platforms that support different programming models (for example, 32-bit vs
21 64-bit), OpenMP implementations are encouraged to provide OMPD libraries for all models, and
22 that can handle OpenMP programs of any model. Thus, for example, a 32-bit debugger that uses
23 OMPD should be able to debug a 64-bit OpenMP program by loading a 32-bit OMPD
24 implementation that can manage a 64-bit OpenMP runtime.
25 ompd_dll_locations points to a NULL-terminated vector of zero or more NULL-terminated
26 pathname strings that do not have any filename conventions. This vector must be fully initialized
27 before ompd_dll_locations is set to a non-null value, such that if a tool, such as a debugger,
28 stops execution of the OpenMP program at any point at which ompd_dll_locations is
29 non-null, then the vector of strings to which it points is valid and complete.
CHAPTER5. OMPDINTERFACE 535
1 Cross References
2 • ompd_dll_locations_valid, see Section 5.2.3 on page 536
3 5.2.3 ompd_dll_locations_valid
4
5 6
7 8
9
10
11
12
13
14
15
16
17 5.3 18
19 5.3.1 20
21 22
Summary
The OpenMP runtime notifies third-party tools that ompd_dll_locations is valid by allowing execution to pass through a location that the symbol ompd_dll_locations_valid identifies.
Format
C
void ompd_dll_locations_valid(void);
C
Description
Since ompd_dll_locations may not be a static variable, it may require runtime initialization. The OpenMP runtime notifies third-party tools that ompd_dll_locations is valid by having execution pass through a location that the symbol ompd_dll_locations_valid identifies. If ompd_dll_locations is NULL, a third-party tool can place a breakpoint at ompd_dll_locations_valid to be notified that ompd_dll_locations is initialized. In practice, the symbol ompd_dll_locations_valid may not be a function; instead, it may be a labeled machine instruction through which execution passes once the vector is valid.
OMPD Data Types
This section defines the OMPD types.
Size Type Summary
The ompd_size_t type specifies the number of bytes in opaque data objects that are passed across the OMPD API.
536
OpenMP API – Version 5.0 November 2018
1 2
3 5.3.2 4
5
6 7
8 5.3.3 9
10
11
12 13 14
15
16 17 18
Format
C / C++
typedef uint64_t ompd_size_t;
C / C++
Wait ID Type Summary
This ompd_wait_id_t type identifies the object on which a thread. Format
C / C++
typedef uint64_t ompd_wait_id_t;
C / C++
Basic Value Types Summary
These definitions represent a word, address, and segment value types.
Format
Description
C / C++
C / C++
typedef uint64_t ompd_addr_t;
typedef int64_t ompd_word_t;
typedef uint64_t ompd_seg_t;
The ompd_addr_t type represents an unsigned integer address in an OpenMP process. The ompd_word_t type represents a signed version of ompd_addr_t to hold a signed integer of the OpenMP process. The ompd_seg_t type represents an unsigned integer segment value.
CHAPTER5. OMPDINTERFACE 537
1 5.3.4 2
3
4
5 6 7 8
9
10 11 12 13
14 5.3.5 15
16
17
18 19 20 21
Address Type Summary
The ompd_address_t type is used to specify device addresses. Format
C / C++
C / C++
typedef struct ompd_address_t { ompd_seg_t segment;
ompd_addr_t address;
} ompd_address_t;
Description
The ompd_address_t type is a structure that OMPD uses to specify device addresses, which may or may not be segmented. For non-segmented architectures, ompd_segment_none is used in the segment field of ompd_address_t; it is an instance of the ompd_seg_t type that has the value 0.
Frame Information Type Summary
The ompd_frame_info_t type is used to specify frame information. Format
C / C++
C / C++
typedef struct ompd_frame_info_t { ompd_address_t frame_address;
ompd_word_t frame_flag; } ompd_frame_info_t;
538
OpenMP API – Version 5.0 November 2018
1
Description
The ompd_frame_info_t type is a structure that OMPD uses to specify frame information. The frame_address field of ompd_frame_info_t identifies a frame. The frame_flag field of ompd_frame_info_t indicates what type of information is provided in frame_address. The values and meaning is the same as defined for the ompt_frame_t enumeration type.
Cross References
• ompt_frame_t, see Section 4.4.4.27 on page 454
System Device Identifiers Summary
The ompd_device_t type provides information about OpenMP devices. Format
C / C++
typedef uint64_t ompd_device_t;
C / C++
Description
Different OpenMP runtimes may utilize different underlying devices. The Device identifiers can vary in size and format and, thus, are not explicitly represented in OMPD. Instead, device identifiers are passed across the interface via the ompd_device_t type, which is a pointer to where the device identifier is stored, and the size of the device identifier in bytes. The OMPD library and a tool that uses it must agree on the format of the object that is passed. Each different kind of device identifier uses a unique unsigned 64-bit integer value.
Recommended values of ompd_device_t are defined in the ompd-types.h header file, which is available on http://www.openmp.org/.
Native Thread Identifiers Summary
The ompd_thread_id_t type provides information about native threads.
2 3 4 5
6 7
8 5.3.6 9
10
11 12
13
14
15
16
17
18
19
20 21
22 5.3.7
23 24
CHAPTER5. OMPDINTERFACE 539
1 2
3
4 5 6 7 8 9
10
11 12
13 5.3.8 14
15
16
17 18 19 20
Format
C / C++
typedef uint64_t ompd_thread_id_t;
C / C++
Description
Different OpenMP runtimes may use different native thread implementations. Native thread identifiers can vary in size and format and, thus, are not explicitly represented in the OMPD API. Instead, native thread identifiers are passed across the interface via the ompd_thread_id_t type, which is a pointer to where the native thread identifier is stored, and the size of the native thread identifier in bytes. The OMPD library and a tool that uses it must agree on the format of the object that is passed. Each different kind of native thread identifier uses a unique unsigned 64-bit integer value.
Recommended values of ompd_thread_id_t are defined in the ompd-types.h header file, which is available on http://www.openmp.org/.
OMPD Handle Types Summary
OMPD handle types are opaque types.
Format
C / C++
C / C++
typedef struct _ompd_aspace_handle ompd_address_space_handle_t;
typedef struct _ompd_thread_handle ompd_thread_handle_t;
typedef struct _ompd_parallel_handle ompd_parallel_handle_t;
typedef struct _ompd_task_handle ompd_task_handle_t;
540
OpenMP API – Version 5.0 November 2018
1
Description
OMPD uses handles for address spaces (ompd_address_space_handle_t), threads (ompd_thread_handle_t), parallel regions (ompd_parallel_handle_t), and tasks (ompd_task_handle_t). Each operation of the OMPD interface that applies to a particular address space, thread, parallel region, or task must explicitly specify a corresponding handle. A handle for an entity is constant while the entity itself is alive. Handles are defined by the OMPD library, and are opaque to the tool.
Defining externally visible type names in this way introduces type safety to the interface, and helps to catch instances where incorrect handles are passed by the tool to the OMPD library. The structures do not need to be defined; instead, the OMPD library must cast incoming (pointers to) handles to the appropriate internal, private types.
OMPD Scope Types Summary
The ompd_scope_t type identifies OMPD scopes. Format
2 3 4 5 6 7
8
9 10 11
12 5.3.9 13
14
15
16
17
18
19
20
21
22
23
24
25 26 27
C / C++
typedef enum ompd_scope_t {
ompd_scope_global = 1,
ompd_scope_address_space = 2,
ompd_scope_thread = 3,
ompd_scope_parallel = 4,
ompd_scope_implicit_task = 5,
ompd_scope_task = 6
} ompd_scope_t;
C / C++
Description
The ompd_scope_t type identifies OpenMP scopes, including those related to parallel regions and tasks. When used in an OMPD interface function call, the scope type and the ompd handle must match according to Table 5.1.
CHAPTER5. OMPDINTERFACE 541
TABLE 5.1: Mapping of Scope Type and OMPD Handles
Scope types
ompd_scope_global ompd_scope_address_space ompd_scope_thread ompd_scope_parallel ompd_scope_implicit_task ompd_scope_task
ICV ID Type Summary
Handles
Address space handle for the host device Any address space handle
Any thread handle
Any parallel handle
Task handle for an implicit task Any task handle
1 5.3.10 2
3
4 5
6 7
8 5.3.11 9
10 11
12
13 14
The ompd_icv_id_t type identifies an OpenMP implementation ICV. Format
C / C++
typedef uint64_t ompd_icv_id_t;
C / C++
The ompd_icv_id_t type identifies OpenMP implementation ICVs. ompd_icv_undefined is an instance of this type with the value 0.
Tool Context Types Summary
A third-party tool uses contexts to uniquely identify abstractions. These contexts are opaque to the OMPD library and are defined as follows:
Format
C / C++ C / C++
542
OpenMP API – Version 5.0 November 2018
typedef struct _ompd_aspace_cont ompd_address_space_context_t;
typedef struct _ompd_thread_cont ompd_thread_context_t;
1 5.3.12 2
3
4
5 6 7 8 9
10
11
12
13
14
15
16
17
18
19 20
21
22 23
24
25
26
27
28 29
30 31
Return Code Types Summary
The ompd_rc_t type is the return code type of OMPD operations Format
C / C++
typedef enum ompd_rc_t {
ompd_rc_ok = 0,
ompd_rc_unavailable = 1,
ompd_rc_stale_handle = 2,
ompd_rc_bad_input = 3,
ompd_rc_error = 4,
ompd_rc_unsupported = 5,
ompd_rc_needs_state_tracking = 6,
ompd_rc_incompatible = 7,
ompd_rc_device_read_error = 8,
ompd_rc_device_write_error = 9,
ompd_rc_nomem = 10,
} ompd_rc_t;
Description
C / C++
The ompd_rc_t type is used for the return codes of OMPD operations. The return code types and their semantics are defined as follows:
• ompd_rc_ok is returned when the operation is successful;
• ompd_rc_unavailable is returned when information is not available for the specified
context;
• ompd_rc_stale_handle is returned when the specified handle is no longer valid;
• ompd_rc_bad_input is returned when the input parameters (other than handle) are invalid;
• ompd_rc_error is returned when a fatal error occurred;
• ompd_rc_unsupported is returned when the requested operation is not supported;
• ompd_rc_needs_state_tracking is returned when the state tracking operation failed because state tracking is not currently enabled;
• ompd_rc_device_read_error is returned when a read operation failed on the device;
• ompd_rc_device_write_error is returned when a write operation failed on the device;
CHAPTER5. OMPDINTERFACE 543
1 2
3
4 5.3.13 5
6 7
8
9
10
11
12
13
14
15
16
17
18 19 20 21 22
23 24
• ompd_rc_incompatible is returned when this OMPD library is incompatible with, or is not capable of handling, the OpenMP program; and
• ompd_rc_nomem is returned when a memory allocation fails.
Primitive Type Sizes Summary
The ompd_device_type_sizes_t type provides the “sizeof” of primitive types in the OpenMP architecture address space.
Format
C / C++
typedef struct ompd_device_type_sizes_t { uint8_t sizeof_char;
uint8_t sizeof_short;
uint8_t sizeof_int;
uint8_t sizeof_long;
uint8_t sizeof_long_long;
uint8_t sizeof_pointer;
} ompd_device_type_sizes_t;
C / C++
544
OpenMP API – Version 5.0 November 2018
Description
The ompd_device_type_sizes_t type is used in operations through which the OMPD library can interrogate the tool about the “sizeof” of primitive types in the OpenMP architecture address space. The fields of ompd_device_type_sizes_t give the sizes of the eponymous basic types used by the OpenMP runtime. As the tool and the OMPD library, by definition, have the same architecture and programming model, the size of the fields can be given as uint8_t.
Cross References
• ompd_callback_sizeof_fn_t, see Section 5.4.2.2 on page 549
1 5.4
2 3 4 5 6 7 8 9
10 11
12 5.4.1
13
14
15
16
17
18
19 20 21 22
23 24 25
26 27 28 29 30
31 32
OMPD Tool Callback Interface
For the OMPD library to provide information about the internal state of the OpenMP runtime system in an OpenMP process or core file, it must have a means to extract information from the OpenMP process that the tool is debugging. The OpenMP process on which the tool is operating may be either a “live” process or a core file, and a thread may be either a “live” thread in an OpenMP process, or a thread in a core file. To enable the OMPD library to extract state information from an OpenMP process or core file, the tool must supply the OMPD library with callback functions to inquire about the size of primitive types in the device of the OpenMP process, to look up the addresses of symbols, and to read and to write memory in the device. The OMPD library uses these callbacks to implement its interface operations. The OMPD library only invokes the callback functions in direct response to calls made by the tool to the OMPD library.
Memory Management of OMPD Library
The OMPD library must not access the heap manager directly. Instead, if it needs heap memory it must use the memory allocation and deallocation callback functions that are described in this section, ompd_callback_memory_alloc_fn_t (see Section 5.4.1.1 on page 546) and ompd_callback_memory_free_fn_t (see Section 5.4.1.2 on page 546), which are provided by the tool to obtain and to release heap memory. This mechanism ensures that the library does not interfere with any custom memory management scheme that the tool may use.
If the OMPD library is implemented in C++, memory management operators like new and delete in all their variants, must all be overloaded and implemented in terms of the callbacks that the tool provides. The OMPD library must be coded so that any of its definitions of new or delete do not interfere with any that the tool defines.
In some cases, the OMPD library must allocate memory to return results to the tool. The tool then owns this memory and has the responsibility to release it. Thus, the OMPD library and the tool must use the same memory manager.
The OMPD library creates OMPD handles, which are opaque to the tool and may have a complex internal structure. The tool cannot determine if the handle pointers that the API returns correspond to discrete heap allocations. Thus, the tool must not simply deallocate a handle by passing an address that it receives from the OMPD library to its own memory manager. Instead, the API includes functions that the tool must use when it no longer needs a handle.
A tool creates contexts and passes them to the OMPD library. The OMPD library does not release contexts; instead the tool release them after it releases any handles that may reference the contexts.
CHAPTER5. OMPDINTERFACE 545
1 5.4.1.1 ompd_callback_memory_alloc_fn_t
2 Summary
3 The ompd_callback_memory_alloc_fn_t type is the type signature of the callback routine
4 that the tool provides to the OMPD library to allocate memory.
5 Format
6 7 8 9
10 Description
C
C
typedef ompd_rc_t (*ompd_callback_memory_alloc_fn_t) (
ompd_size_t nbytes,
void **ptr );
11 The ompd_callback_memory_alloc_fn_t type is the type signature of the memory
12 allocation callback routine that the tool provides. The OMPD library may call the
13 ompd_callback_memory_alloc_fn_t callback function to allocate memory.
14 Description of Arguments
15 The nbytes argument is the size in bytes of the block of memory to allocate.
16 The address of the newly allocated block of memory is returned in the location to which the ptr
17 argument points. The newly allocated block is suitably aligned for any type of variable, and is not
18 guaranteed to be zeroed.
19 Cross References
20 • ompd_size_t, see Section 5.3.1 on page 536.
21 • ompd_rc_t, see Section 5.3.12 on page 543.
22 5.4.1.2 ompd_callback_memory_free_fn_t
23
24 25
Summary
The ompd_callback_memory_free_fn_t type is the type signature of the callback routine that the tool provides to the OMPD library to deallocate memory.
546
OpenMP API – Version 5.0 November 2018
1
Format
typedef ompd_rc_t (*ompd_callback_memory_free_fn_t) ( void *ptr
);
2 3 4
5
6 7 8 9
10 11
12 13 14 15
16 5.4.2 17
18
C
C
Description
The ompd_callback_memory_free_fn_t type is the type signature of the memory deallocation callback routine that the tool provides. The OMPD library may call the ompd_callback_memory_free_fn_t callback function to deallocate memory that was obtained from a prior call to the ompd_callback_memory_alloc_fn_t callback function.
Description of Arguments
The ptr argument is the address of the block to be deallocated.
Cross References
• ompd_rc_t, see Section 5.3.12 on page 543.
• ompd_callback_memory_alloc_fn_t, see Section 5.4.1.1 on page 546. • ompd_callbacks_t, see Section 5.4.6 on page 556.
Context Management and Navigation Summary
The tool provides the OMPD library with callbacks to manage and to navigate context relationships.
19 5.4.2.1 ompd_callback_get_thread_context_for_thread_id_fn_t
20 Summary
21 The ompd_callback_get_thread_context_for_thread_id_fn_t is the type
22 signature of the callback routine that the tool provides to the OMPD library to map a thread
23 identifier to a tool thread context.
CHAPTER5. OMPDINTERFACE 547
1
Format
C
typedef ompd_rc_t
(*ompd_callback_get_thread_context_for_thread_id_fn_t) (
ompd_address_space_context_t *address_space_context, ompd_thread_id_t kind,
ompd_size_t sizeof_thread_id,
const void *thread_id,
ompd_thread_context_t **thread_context );
2 3 4 5 6 7 8 9
10
11 12 13 14 15
16
17 18 19 20
21
22 23
24 25
26
27
28
29
30
31
C
548
OpenMP API – Version 5.0 November 2018
Description
The ompd_callback_get_thread_context_for_thread_id_fn_t is the type signature of the context mapping callback routine that the tool provides. This callback maps a thread identifier to a tool thread context. The thread identifier is within the address space that address_space_context identifies. The OMPD library can use the thread context, for example, to access thread local storage.
Description of Arguments
The address_space_context argument is an opaque handle that the tool provides to reference an address space. The kind, sizeof_thread_id, and thread_id arguments represent a native thread identifier. On return, the thread_context argument provides an opaque handle that maps a native thread identifier to a tool thread context.
Restrictions
Routines that use ompd_callback_get_thread_context_for_thread_id_fn_t have the following restriction:
• The provided thread_context must be valid until the OMPD library returns from the OMPD tool interface routine.
Cross References
• ompd_size_t, see Section 5.3.1 on page 536.
• ompd_thread_id_t, see Section 5.3.7 on page 539.
• ompd_address_space_context_t, see Section 5.3.11 on page 542. • ompd_thread_context_t, see Section 5.3.11 on page 542.
• ompd_rc_t, see Section 5.3.12 on page 543.
1 5.4.2.2 ompd_callback_sizeof_fn_t
2
3 4
5
6 7 8 9
10
11 12 13
14
15 16
17 18 19 20 21
22 5.4.3
23 24 25
Summary
The ompd_callback_sizeof_fn_t type is the type signature of the callback routine that the tool provides to the OMPD library to determine the sizes of the primitive types in an address space.
Format
C
C
typedef ompd_rc_t (*ompd_callback_sizeof_fn_t) (
ompd_address_space_context_t *address_space_context,
ompd_device_type_sizes_t *sizes );
Description
The ompd_callback_sizeof_fn_t is the type signature of the type-size query callback routine that the tool provides. This callback provides the sizes of the basic primitive types for a given address space.
Description of Arguments
The callback returns the sizes of the basic primitive types used by the address space context that the address_space_context argument specifies in the location to which the sizes argument points.
Cross References
• ompd_address_space_context_t, see Section 5.3.11 on page 542. • ompd_rc_t, see Section 5.3.12 on page 543.
• ompd_device_type_sizes_t, see Section 5.3.13 on page 544.
• ompd_callbacks_t, see Section 5.4.6 on page 556.
Accessing Memory in the OpenMP Program or Runtime
The OMPD library may need to read from or to write to the OpenMP program. It cannot do this directly. Instead the OMPD library must use callbacks that the tool provides so that the tool performs the operation.
CHAPTER5. OMPDINTERFACE 549
2
3 4
5
6 7 8 9
10 11 12
13
14 15 16
17 18
19 20
21 22 23 24 25
26 27 28 29 30
31 32
Summary
The ompd_callback_symbol_addr_fn_t type is the type signature of the callback that the tool provides to look up the addresses of symbols in an OpenMP program.
1 5.4.3.1 ompd_callback_symbol_addr_fn_t
Format
C
C
typedef ompd_rc_t (*ompd_callback_symbol_addr_fn_t) ( ompd_address_space_context_t *address_space_context, ompd_thread_context_t *thread_context,
const char *symbol_name,
ompd_address_t *symbol_addr,
const char *file_name );
550
OpenMP API – Version 5.0 November 2018
Description
The ompd_callback_symbol_addr_fn_t is the type signature of the symbol-address query callback routine that the tool provides. This callback looks up addresses of symbols within a specified address space.
Description of Arguments
This callback looks up the symbol provided in the symbol_name argument.
The address_space_context argument is the tool’s representation of the address space of the
process, core file, or device.
The thread_context argument is NULL for global memory access. If thread_context is not NULL, thread_context gives the thread specific context for the symbol lookup, for the purpose of calculating thread local storage addresses. If thread_context is non-null then the thread to which thread_context refers must be associated with either the process or the device that corresponds to the address_space_context argument.
The tool uses the symbol_name argument that the OMPD library supplies verbatim. In particular, no name mangling, demangling or other transformations are performed prior to the lookup. The symbol_name parameter must correspond to a statically allocated symbol within the specified address space. The symbol can correspond to any type of object, such as a variable, thread local storage variable, function, or untyped label. The symbol can have a local, global, or weak binding.
The file_name argument is an optional input parameter that indicates the name of the shared library in which the symbol is defined, and is intended to help the third party tool disambiguate symbols
1 that are defined multiple times across the executable or shared library files. The shared library
2 name may not be an exact match for the name seen by the tool. If file_name is NULL then the tool
3 first tries to find the symbol in the executable file, and, if the symbol is not found, the tool tries to
4 find the symbol in the shared libraries in the order in which the shared libraries are loaded into the
5 address space. If file_name is non-null then the tool first tries to find the symbol in the libraries that
6 match the name in the file_name argument and, if the symbol is not found, the tool then uses the
7 same procedure as when file_name is NULL.
8 The callback does not support finding symbols that are dynamically allocated on the call stack, or
9 statically allocated symbols that are defined within the scope of a function or subroutine.
10 The callback returns the symbol’s address in the location to which symbol_addr points.
11 Restrictions
12 Routines that use the ompd_callback_symbol_addr_fn_t type have the following
13 restrictions:
14 • The address_space_context argument must be non-null.
15 • The symbol that the symbol_name argument specifies must be defined.
16 Cross References
17 • ompd_address_t, see Section 5.3.4 on page 538.
18 • ompd_address_space_context_t, see Section 5.3.11 on page 542.
19 • ompd_thread_context_t, see Section 5.3.11 on page 542.
20 • ompd_rc_t, see Section 5.3.12 on page 543.
21 • ompd_callbacks_t, see Section 5.4.6 on page 556.
22 5.4.3.2 ompd_callback_memory_read_fn_t
23 Summary
24 The ompd_callback_memory_read_fn_t type is the type signature of the callback that the
25 tool provides to read data from an OpenMP program.
CHAPTER5. OMPDINTERFACE 551
typedef ompd_rc_t (*ompd_callback_memory_read_fn_t) ( ompd_address_space_context_t *address_space_context,
ompd_thread_context_t *thread_context, const ompd_address_t *addr, ompd_size_t nbytes,
void *buffer
);
2 3 4 5 6 7 8
9
10 11
12 13
14 15 16
17
18 19 20 21 22
23 24 25 26
C
C
1
Format
552
OpenMP API – Version 5.0 November 2018
Description
The ompd_callback_memory_read_fn_t is the type signature of the read callback routines that the tool provides.
The read_memory callback copies a block of data from addr within the address space to the tool buffer.
The read_string callback copies a string to which addr points, including the terminating null byte (’\0’), to the tool buffer. At most nbytes bytes are copied. If a null byte is not among the first nbytes bytes, the string placed in buffer is not null-terminated.
Description of Arguments
The address from which the data are to be read from the OpenMP program specified by address_space_context is given by addr. while nbytes gives the number of bytes to be transferred. The thread_context argument is optional for global memory access, and in this case should be NULL. If it is non-null, thread_context identifies the thread specific context for the memory access for the purpose of accessing thread local storage.
The data are returned through buffer, which is allocated and owned by the OMPD library. The contents of the buffer are unstructured, raw bytes. The OMPD library must arrange for any transformations such as byte-swapping that may be necessary (see Section 5.4.4 on page 554) to interpret the data.
1 Cross References
2 • ompd_size_t, see Section 5.3.1 on page 536.
3 • ompd_address_t, see Section 5.3.4 on page 538.
4 • ompd_address_space_context_t, see Section 5.3.11 on page 542.
5 • ompd_thread_context_t, see Section 5.3.11 on page 542.
6 • ompd_rc_t, see Section 5.3.12 on page 543.
7 • ompd_callback_device_host_fn_t, see Section 5.4.4 on page 554.
8 • ompd_callbacks_t, see Section 5.4.6 on page 556.
9 5.4.3.3 ompd_callback_memory_write_fn_t
10 Summary
11 The ompd_callback_memory_write_fn_t type is the type signature of the callback that
12 the tool provides to write data to an OpenMP program.
13 Format
14
15
16
17
18
19
20
21 Description
C
C
typedef ompd_rc_t (*ompd_callback_memory_write_fn_t) ( ompd_address_space_context_t *address_space_context,
ompd_thread_context_t *thread_context, const ompd_address_t *addr, ompd_size_t nbytes,
const void *buffer
);
22 The ompd_callback_memory_write_fn_t is the type signature of the write callback
23 routine that the tool provides. The OMPD library may call this callback to have the tool write a
24 block of data to a location within an address space from a provided buffer.
CHAPTER5. OMPDINTERFACE 553
1
Description of Arguments
The address to which the data are to be written in the OpenMP program that address_space_context specifies is given by addr. The nbytes argument is the number of bytes to be transferred. The thread_context argument is optional for global memory access, and, in this case, should be NULL. If it is non-null then thread_context identifies the thread-specific context for the memory access for the purpose of accessing thread local storage.
The data to be written are passed through buffer, which is allocated and owned by the OMPD library. The contents of the buffer are unstructured, raw bytes. The OMPD library must arrange for any transformations such as byte-swapping that may be necessary (see Section 5.4.4 on page 554) to render the data into a form that is compatible with the OpenMP runtime.
Cross References
• ompd_size_t, see Section 5.3.1 on page 536.
• ompd_address_t, see Section 5.3.4 on page 538.
• ompd_address_space_context_t, see Section 5.3.11 on page 542. • ompd_thread_context_t, see Section 5.3.11 on page 542.
• ompd_rc_t, see Section 5.3.12 on page 543.
• ompd_callback_device_host_fn_t, see Section 5.4.4 on page 554. • ompd_callbacks_t, see Section 5.4.6 on page 556.
Data Format Conversion:
ompd_callback_device_host_fn_t
Summary
The ompd_callback_device_host_fn_t type is the type signature of the callback that the tool provides to convert data between the formats that the tool and the OMPD library use and that the OpenMP program uses.
2 3 4 5 6
7 8 9
10
11
12
13
14
15
16
17
18
19 5.4.4 20
21
22 23 24
554
OpenMP API – Version 5.0 November 2018
1 Format
typedef ompd_rc_t (*ompd_callback_device_host_fn_t) ( ompd_address_space_context_t *address_space_context,
const void *input, ompd_size_t unit_size, ompd_size_t count, void *output
);
2 3 4 5 6 7 8
9 Description
C
C
10 The architecture and/or programming-model of the tool and the OMPD library may be different
11 from that of the OpenMP program that is being examined. Thus, the conventions for representing
12 data may differ. The callback interface includes operations to convert between the conventions,
13 such as the byte order (endianness), that the tool and OMPD library use and the one that the
14 OpenMP program uses. The callback with the ompd_callback_device_host_fn_t type
15 signature convert data between formats
16 Description of Arguments
17 The address_space_context argument specifies the OpenMP address space that is associated with
18 the data. The input argument is the source buffer and the output argument is the destination buffer.
19 The unit_size argument is the size of each of the elements to be converted. The count argument is
20 the number of elements to be transformed.
21 The OMPD library allocates and owns the input and output buffers. It must ensure that the buffers
22 have the correct size, and are eventually deallocated when they are no longer needed.
23 Cross References
24 • ompd_size_t, see Section 5.3.1 on page 536.
25 • ompd_address_space_context_t, see Section 5.3.11 on page 542.
26 • ompd_rc_t, see Section 5.3.12 on page 543.
27 • ompd_callbacks_t, see Section 5.4.6 on page 556.
CHAPTER5. OMPDINTERFACE 555
1 5.4.5 2
3 4
5
6 7 8 9
10
11 12 13 14
15
16 17
18
19 20 21
22 5.4.6 23
24 25 26
Output: ompd_callback_print_string_fn_t Summary
The ompd_callback_print_string_fn_t type is the type signature of the callback that tool provides so that the OMPD library can emit output.
Format
C
C
typedef ompd_rc_t (*ompd_callback_print_string_fn_t) ( const char *string,
int category
);
556
OpenMP API – Version 5.0 November 2018
Description
The OMPD library may call the ompd_callback_print_string_fn_t callback function to emit output, such as logging or debug information. The tool may set the ompd_callback_print_string_fn_t callback function to NULL to prevent the OMPD library from emitting output; the OMPD may not write to file descriptors that it did not open.
Description of Arguments
The string argument is the null-terminated string to be printed. No conversion or formatting is performed on the string.
The category argument is the implementation-defined category of the string to be printed.
Cross References
• ompd_rc_t, see Section 5.3.12 on page 543.
• ompd_callbacks_t, see Section 5.4.6 on page 556.
The Callback Interface Summary
All OMPD library interactions with the OpenMP program must be through a set of callbacks that the tool provides. These callbacks must also be used for allocating or releasing resources, such as memory, that the library needs.
1 Format
2 3 4 5 6 7 8 9
10
11
12
13
14
15
16 Description
C
C
typedef struct ompd_callbacks_t { ompd_callback_memory_alloc_fn_t alloc_memory; ompd_callback_memory_free_fn_t free_memory; ompd_callback_print_string_fn_t print_string; ompd_callback_sizeof_fn_t sizeof_type; ompd_callback_symbol_addr_fn_t symbol_addr_lookup; ompd_callback_memory_read_fn_t read_memory; ompd_callback_memory_write_fn_t write_memory; ompd_callback_memory_read_fn_t read_string; ompd_callback_device_host_fn_t device_to_host; ompd_callback_device_host_fn_t host_to_device; ompd_callback_get_thread_context_for_thread_id_fn_t
get_thread_context_for_thread_id; } ompd_callbacks_t;
17 The set of callbacks that the OMPD library must use is collected in the ompd_callbacks_t
18 record structure. An instance of this type is passed to the OMPD library as a parameter to
19 ompd_initialize (see Section 5.5.1.1 on page 558). Each field points to a function that the
20 OMPD library must use to interact with the OpenMP program or for memory operations.
21 The alloc_memory and free_memory fields are pointers to functions the OMPD library uses to
22 allocate and to release dynamic memory.
23 print_string points to a function that prints a string.
24 The architectures or programming models of the OMPD library and third party tool may be
25 different from that of the OpenMP program that is being examined. sizeof_type points to function
26 that allows the OMPD library to determine the sizes of the basic integer and pointer types that the
27 OpenMP program uses. Because of the differences in architecture or programming model, the
28 conventions for representing data in the OMPD library and the OpenMP program may be different.
29 The device_to_host field points to a function that translates data from the conventions that the
30 OpenMP program uses to those that the tool and OMPD library use. The reverse operation is
31 performed by the function to which the host_to_device field points.
32 The symbol_addr_lookup field points to a callback that the OMPD library can use to find the
33 address of a global or thread local storage symbol. The read_memory, read_string, and
34 write_memory fields are pointers to functions for reading from and writing to global memory or
35 thread local storage in the OpenMP program.
36 The get_thread_context_for_thread_id field is a pointer to a function that the OMPD library can
37 use to obtain a thread context that corresponds to a native thread identifier.
CHAPTER5. OMPDINTERFACE 557
1
Cross References
• ompd_callback_memory_alloc_fn_t, see Section 5.4.1.1 on page 546.
• ompd_callback_memory_free_fn_t, see Section 5.4.1.2 on page 546.
• ompd_callback_get_thread_context_for_thread_id_fn_t, see Section 5.4.2.1 on page 547.
• ompd_callback_sizeof_fn_t, see Section 5.4.2.2 on page 549.
• ompd_callback_symbol_addr_fn_t, see Section 5.4.3.1 on page 550.
• ompd_callback_memory_read_fn_t, see Section 5.4.3.2 on page 551.
• ompd_callback_memory_write_fn_t, see Section 5.4.3.3 on page 553.
• ompd_callback_device_host_fn_t, see Section 5.4.4 on page 554.
• ompd_callback_print_string_fn_t, see Section 5.4.5 on page 556
OMPD Tool Interface Routines
Per OMPD Library Initialization and Finalization
The OMPD library must be initialized exactly once after it is loaded, and finalized exactly once before it is unloaded. Per OpenMP process or core file initialization and finalization are also required.
Once loaded, the tool can determine the version of the OMPD API that the library supports by calling ompd_get_api_version (see Section 5.5.1.2 on page 559). If the tool supports the version that ompd_get_api_version returns, the tool starts the initialization by calling ompd_initialize (see Section 5.5.1.1 on page 558) using the version of the OMPD API that the library supports. If the tool does not support the version that ompd_get_api_version returns, it may attempt to call ompd_initialize with a different version.
2
3
4 5
6 7 8 9
10 11
12 5.5
13 5.5.1
14 15 16
17
18
19
20
21
22
23 5.5.1.1 ompd_initialize
24 25
Summary
The ompd_initialize function initializes the OMPD library.
558
OpenMP API – Version 5.0 November 2018
1 Format
ompd_rc_t ompd_initialize( ompd_word_t api_version,
const ompd_callbacks_t *callbacks );
2 3 4 5
6 Description
C
C
7 A tool that uses OMPD calls ompd_initialize to initialize each OMPD library that it loads.
8 More than one library may be present in a third-party tool, such as a debugger, because the tool
9 may control multiple devices, which may use different runtime systems that require different
10 OMPD libraries. This initialization must be performed exactly once before the tool can begin to
11 operate on an OpenMP process or core file.
12 Description of Arguments
13 The api_version argument is the OMPD API version that the tool requests to use. The tool may call
14 ompd_get_api_version to obtain the latest version that the OMPD library supports.
15 The tool provides the OMPD library with a set of callback functions in the callbacks input
16 argument which enables the OMPD library to allocate and to deallocate memory in the tool’s
17 address space, to lookup the sizes of basic primitive types in the device, to lookup symbols in the
18 device, and to read and to write memory in the device.
19 Cross References
20 • ompd_rc_t type, see Section 5.3.12 on page 543.
21 • ompd_callbacks_t type, see Section 5.4.6 on page 556.
22 • ompd_get_api_version call, see Section 5.5.1.2 on page 559.
23 5.5.1.2 ompd_get_api_version
24 Summary
25 The ompd_get_api_version function returns the OMPD API version.
CHAPTER5. OMPDINTERFACE 559
1 Format
2 ompd_rc_t ompd_get_api_version(ompd_word_t *version);
3 Description
4 The tool may call the ompd_get_api_version function to obtain the latest OMPD API
5 version number of the OMPD library.
6 Description of Arguments
7 The latest version number is returned into the location to which the version argument points.
8 Cross References
9 • ompd_rc_t type, see Section 5.3.12 on page 543.
10 5.5.1.3 ompd_get_version_string
C C
11
12 13
14 15
16
17 18
Summary
The ompd_get_version_string function returns a descriptive string for the OMPD API version.
Format
C
ompd_rc_t ompd_get_version_string(const char **string); C
Description
The tool may call this function to obtain a pointer to a descriptive version string of the OMPD API version.
560
OpenMP API – Version 5.0 November 2018
1 Description of Arguments
2 A pointer to a descriptive version string is placed into the location to which string output argument
3 points. The OMPD library owns the string that the OMPD library returns; the tool must not modify
4 or release this string. The string remains valid for as long as the library is loaded. The
5 ompd_get_version_string function may be called before ompd_initialize (see
6 Section 5.5.1.1 on page 558). Accordingly, the OMPD library must not use heap or stack memory
7 for the string.
8 The signatures of ompd_get_api_version (see Section 5.5.1.2 on page 559) and
9 ompd_get_version_string are guaranteed not to change in future versions of the API. In
10 contrast, the type definitions and prototypes in the rest of the API do not carry the same guarantee.
11 Therefore a tool that uses OMPD should check the version of the API of the loaded OMPD library
12 before it calls any other function of the API.
13 Cross References
14 • ompd_rc_t type, see Section 5.3.12 on page 543.
15 5.5.1.4 ompd_finalize
16 Summary
17 When the tool is finished with the OMPD library it should call ompd_finalize before it
18 unloads the library.
19 Format
20 ompd_rc_t ompd_finalize(void);
21 Description
22 The call to ompd_finalize must be the last OMPD call that the tool makes before it unloads the
23 library. This call allows the OMPD library to free any resources that it may be holding.
24 The OMPD library may implement a finalizer section, which executes as the library is unloaded
25 and therefore after the call to ompd_finalize. During finalization, the OMPD library may use
26 the callbacks that the tool earlier provided after the call to ompd_initialize.
C C
CHAPTER5. OMPDINTERFACE 561
5
6 7
8
9 10 11 12
13
14
15
16
17
18
19
20
21
22 23 24
Format
1 2
Cross References
• ompd_rc_t type, see Section 5.3.12 on page 543.
Per OpenMP Process Initialization and Finalization Summary
A tool calls ompd_process_initialize to obtain an address space handle when it initializes a session on a live process or core file.
3 5.5.2
4 5.5.2.1 ompd_process_initialize
C
C
ompd_rc_t ompd_process_initialize( ompd_address_space_context_t *context, ompd_address_space_handle_t **handle
);
562
OpenMP API – Version 5.0 November 2018
Description
A tool calls ompd_process_initialize to obtain an address space handle when it initializes a session on a live process or core file. On return from ompd_process_initialize, the tool owns the address space handle, which it must release with ompd_rel_address_space_handle. The initialization function must be called before any OMPD operations are performed on the OpenMP process. This call allows the OMPD library to confirm that it can handle the OpenMP process or core file that the context identifies. Incompatibility is signaled by a return value of ompd_rc_incompatible.
Description of Arguments
The context argument is an opaque handle that the tool provides to address an address space. On return, the handle argument provides an opaque handle to the tool for this address space, which the tool must release when it is no longer needed.
1 Cross References
2 • ompd_address_space_handle_t type, see Section 5.3.8 on page 540.
3 • ompd_address_space_context_t type, see Section 5.3.11 on page 542.
4 • ompd_rc_t type, see Section 5.3.12 on page 543.
5 • ompd_rel_address_space_handle type, see Section 5.5.2.3 on page 564.
6 5.5.2.2 ompd_device_initialize
7 Summary
8 A tool calls ompd_device_initialize to obtain an address space handle for a device that has
9 at least one active target region.
10 Format
11
12
13
14
15
16
17
18
19 Description
C
ompd_rc_t ompd_device_initialize( ompd_address_space_handle_t *process_handle, ompd_address_space_context_t *device_context,
ompd_device_t kind,
ompd_size_t sizeof_id,
void *id,
ompd_address_space_handle_t **device_handle
);
C
20 A tool calls ompd_device_initialize to obtain an address space handle for a device that has
21 at least one active target region. On return from ompd_device_initialize, the tool owns the
22 address space handle.
23 Description of Arguments
24 The process_handle argument is an opaque handle that the tool provides to reference the address
25 space of the OpenMP process. The device_context argument is an opaque handle that the tool
26 provides to reference a device address space. The kind, sizeof_id, and id arguments represent a
27 device identifier. On return the device_handle argument provides an opaque handle to the tool for
28 this address space.
CHAPTER5. OMPDINTERFACE 563
8 9
10
11 12 13
14
15 16 17
18 19
20 21
22 23
Summary
A tool calls ompd_rel_address_space_handle to release an address space handle. Format
1 Cross References
2 • ompd_size_t type, see Section 5.3.1 on page 536.
3 • ompd_device_t type, see Section 5.3.6 on page 539.
4 • ompd_address_space_handle_t type, see Section 5.3.8 on page 540.
5 • ompd_address_space_context_t type, see Section 5.3.11 on page 542.
6 • ompd_rc_t type, see Section 5.3.12 on page 543.
7 5.5.2.3 ompd_rel_address_space_handle
C
C
564
OpenMP API – Version 5.0 November 2018
ompd_rc_t ompd_rel_address_space_handle( ompd_address_space_handle_t *handle
);
Description
When the tool is finished with the OpenMP process address space handle it should call ompd_rel_address_space_handle to release the handle, which allows the OMPD library to release any resources that it has related to the address space.
Description of Arguments
The handle argument is an opaque handle for the address space to be released.
Restrictions
The ompd_rel_address_space_handle has the following restriction:
• An address space context must not be used after the corresponding address space handle is
released.
1 2 3
4 5.5.3
5 6 7
Cross References
• ompd_address_space_handle_t type, see Section 5.3.8 on page 540. • ompd_rc_t type, see Section 5.3.12 on page 543.
Thread and Signal Safety
The OMPD library does not need to be reentrant. The tool must ensure that only one thread enters the OMPD library at a time. The OMPD library must not install signal handlers or otherwise interfere with the tool’s signal configuration.
Address Space Information
10 Summary
11 The tool may call the ompd_get_omp_version function to obtain the version of the OpenMP
12 API that is associated with an address space.
8 5.5.4
9 5.5.4.1 ompd_get_omp_version
13 Format
14 15 16 17
18 Description
C
C
ompd_rc_t ompd_get_omp_version(
ompd_address_space_handle_t *address_space,
ompd_word_t *omp_version );
19 The tool may call the ompd_get_omp_version function to obtain the version of the OpenMP
20 API that is associated with the address space.
CHAPTER5. OMPDINTERFACE 565
10
11 12
13
14 15 16 17
18
19 20
21
22 23 24 25 26
Summary
The ompd_get_omp_version_string function returns a descriptive string for the OpenMP API version that is associated with an address space.
1 Description of Arguments
2 The address_space argument is an opaque handle that the tool provides to reference the address
3 space of the OpenMP process or device.
4 Upon return, the omp_version argument contains the version of the OpenMP runtime in the
5 _OPENMP version macro format.
6 Cross References
7 • ompd_address_space_handle_t type, see Section 5.3.8 on page 540.
8 • ompd_rc_t type, see Section 5.3.12 on page 543.
9 5.5.4.2 ompd_get_omp_version_string
Format
C
C
ompd_rc_t ompd_get_omp_version_string( ompd_address_space_handle_t *address_space, const char **string
);
566
OpenMP API – Version 5.0 November 2018
Description
After initialization, the tool may call the ompd_get_omp_version_string function to obtain the version of the OpenMP API that is associated with an address space.
Description of Arguments
The address_space argument is an opaque handle that the tool provides to reference the address space of the OpenMP process or device. A pointer to a descriptive version string is placed into the location to which the string output argument points. After returning from the call, the tool owns the string. The OMPD library must use the memory allocation callback that the tool provides to allocate the string storage. The tool is responsible for releasing the memory.
1 2 3
Cross References
• ompd_address_space_handle_t type, see Section 5.3.8 on page 540. • ompd_rc_t type, see Section 5.3.12 on page 543.
Thread Handles
6 Summary
7 The ompd_get_thread_in_parallel function enables a tool to obtain handles for OpenMP
8 threads that are associated with a parallel region.
4 5.5.5
5 5.5.5.1 ompd_get_thread_in_parallel
9 Format
10 11 12 13 14
15 Description
C
C
ompd_rc_t ompd_get_thread_in_parallel(
ompd_parallel_handle_t *parallel_handle, int thread_num,
ompd_thread_handle_t **thread_handle
);
16 A successful invocation of ompd_get_thread_in_parallel returns a pointer to a thread
17 handle in the location to which thread_handle points. This call yields meaningful results only
18 if all OpenMP threads in the parallel region are stopped.
19 Description of Arguments
20 The parallel_handle argument is an opaque handle for a parallel region and selects the parallel
21 region on which to operate. The thread_num argument selects the thread of the team to be returned.
22 On return, the thread_handle argument is an opaque handle for the selected thread.
23 Restrictions
24 The ompd_get_thread_in_parallel function has the following restriction:
25 • The value of thread_num must be a non-negative integer smaller than the team size that was
26 provided as the ompd-team-size-var from ompd_get_icv_from_scope.
CHAPTER5. OMPDINTERFACE 567
7 8
9
10
11
12
13
14
15
16
17
18 19 20
21
22 23 24
25 26
Summary
The ompd_get_thread_handle function maps a native thread to an OMPD thread handle. Format
1 Cross References
2 • ompd_parallel_handle_t type, see Section 5.3.8 on page 540.
3 • ompd_thread_handle_t type, see Section 5.3.8 on page 540.
4 • ompd_rc_t type, see Section 5.3.12 on page 543.
5 • ompd_get_icv_from_scope call, see Section 5.5.9.2 on page 590.
6 5.5.5.2 ompd_get_thread_handle
C
C
ompd_rc_t ompd_get_thread_handle( ompd_address_space_handle_t *handle, ompd_thread_id_t kind,
ompd_size_t sizeof_thread_id,
const void *thread_id,
ompd_thread_handle_t **thread_handle );
568
OpenMP API – Version 5.0 November 2018
Description
The ompd_get_thread_handle function determines if the native thread identifier to which thread_id points represents an OpenMP thread. If so, the function returns ompd_rc_ok and the location to which thread_handle points is set to the thread handle for the OpenMP thread.
Description of Arguments
The handle argument is an opaque handle that the tool provides to reference an address space. The kind, sizeof_thread_id, and thread_id arguments represent a native thread identifier. On return, the thread_handle argument provides an opaque handle to the thread within the provided address space.
The native thread identifier to which thread_id points is guaranteed to be valid for the duration of the call. If the OMPD library must retain the native thread identifier, it must copy it.
1 Cross References
2 • ompd_size_t type, see Section 5.3.1 on page 536.
3 • ompd_thread_id_t type, see Section 5.3.7 on page 539.
4 • ompd_address_space_handle_t type, see Section 5.3.8 on page 540.
5 • ompd_thread_handle_t type, see Section 5.3.8 on page 540.
6 • ompd_rc_t type, see Section 5.3.12 on page 543.
7 5.5.5.3 ompd_rel_thread_handle
8 Summary
9 The ompd_rel_thread_handle function releases a thread handle.
10 Format
11 12 13
C
C
ompd_rc_t ompd_rel_thread_handle( ompd_thread_handle_t *thread_handle
);
14 Description
15 Thread handles are opaque to tools, which therefore cannot release them directly. Instead, when the
16 tool is finished with a thread handle it must pass it to ompd_rel_thread_handle for disposal.
17 Description of Arguments
18 The thread_handle argument is an opaque handle for a thread to be released.
19 Cross References
20 • ompd_thread_handle_t type, see Section 5.3.8 on page 540.
21 • ompd_rc_t type, see Section 5.3.12 on page 543.
CHAPTER5. OMPDINTERFACE 569
1 5.5.5.4 ompd_thread_handle_compare
2 Summary
3 The ompd_thread_handle_compare function allows tools to compare two thread handles.
4 Format
ompd_rc_t ompd_thread_handle_compare( ompd_thread_handle_t *thread_handle_1, ompd_thread_handle_t *thread_handle_2, int *cmp_value
);
5 6 7 8 9
10 Description
C
C
11 The internal structure of thread handles is opaque to a tool. While the tool can easily compare
12 pointers to thread handles, it cannot determine whether handles of two different addresses refer to
13 the same underlying thread. The ompd_thread_handle_compare function compares thread
14 handles.
15 On success, ompd_thread_handle_compare returns in the location to which cmp_value
16 points a signed integer value that indicates how the underlying threads compare: a value less than,
17 equal to, or greater than 0 indicates that the thread corresponding to thread_handle_1 is,
18 respectively, less than, equal to, or greater than that corresponding to thread_handle_2.
19 Description of Arguments
20 The thread_handle_1 and thread_handle_2 arguments are opaque handles for threads. On return
21 the cmp_value argument is set to a signed integer value.
22 Cross References
23 • ompd_thread_handle_t type, see Section 5.3.8 on page 540.
24 • ompd_rc_t type, see Section 5.3.12 on page 543.
25 5.5.5.5 ompd_get_thread_id
26 27
Summary
The ompd_get_thread_id maps an OMPD thread handle to a native thread.
570
OpenMP API – Version 5.0 November 2018
ompd_rc_t ompd_get_thread_id( ompd_thread_handle_t *thread_handle,
ompd_thread_id_t kind, ompd_size_t sizeof_thread_id, void *thread_id
);
2 3 4 5 6 7
8 9
10
11 12 13
14 15 16 17 18
C
C
1
Format
Description
The ompd_get_thread_id function maps an OMPD thread handle to a native thread identifier. Description of Arguments
The thread_handle argument is an opaque thread handle. The kind argument represents the native thread identifier. The sizeof_thread_id argument represents the size of the native thread identifier. On return, the thread_id argument is a buffer that represents a native thread identifier.
Cross References
• ompd_size_t type, see Section 5.3.1 on page 536.
• ompd_thread_id_t type, see Section 5.3.7 on page 539.
• ompd_thread_handle_t type, see Section 5.3.8 on page 540. • ompd_rc_t type, see Section 5.3.12 on page 543.
Parallel Region Handles
21 Summary
22 The ompd_get_curr_parallel_handle function obtains a pointer to the parallel handle for
23 an OpenMP thread’s current parallel region.
19 5.5.6
20 5.5.6.1 ompd_get_curr_parallel_handle
CHAPTER5. OMPDINTERFACE 571
1 Format
ompd_rc_t ompd_get_curr_parallel_handle( ompd_thread_handle_t *thread_handle,
ompd_parallel_handle_t **parallel_handle );
2 3 4 5
6 Description
C
C
7 The ompd_get_curr_parallel_handle function enables the tool to obtain a pointer to the
8 parallel handle for the current parallel region that is associated with an OpenMP thread. This call is
9 meaningful only if the associated thread is stopped. The parallel handle must be released by calling
10 ompd_rel_parallel_handle.
11 Description of Arguments
12 The thread_handle argument is an opaque handle for a thread and selects the thread on which to
13 operate. On return, the parallel_handle argument is set to a handle for the parallel region that the
14 associated thread is currently executing, if any.
15 Cross References
16 • ompd_thread_handle_t type, see Section 5.3.8 on page 540.
17 • ompd_parallel_handle_t type, see Section 5.3.8 on page 540.
18 • ompd_rc_t type, see Section 5.3.12 on page 543.
19 • ompd_rel_parallel_handle call, see Section 5.5.6.4 on page 574.
20 5.5.6.2 ompd_get_enclosing_parallel_handle
21
22 23
Summary
The ompd_get_enclosing_parallel_handle function obtains a pointer to the parallel handle for an enclosing parallel region.
572
OpenMP API – Version 5.0 November 2018
1 Format
ompd_rc_t ompd_get_enclosing_parallel_handle( ompd_parallel_handle_t *parallel_handle,
ompd_parallel_handle_t **enclosing_parallel_handle );
2 3 4 5
6 Description
C
C
7 The ompd_get_enclosing_parallel_handle function enables a tool to obtain a pointer
8 to the parallel handle for the parallel region that encloses the parallel region that
9 parallel_handle specifies. This call is meaningful only if at least one thread in the parallel
10 region is stopped. A pointer to the parallel handle for the enclosing region is returned in the
11 location to which enclosing_parallel_handle points. After the call, the tool owns the handle; the
12 tool must release the handle with ompd_rel_parallel_handle when it is no longer required.
13 Description of Arguments
14 The parallel_handle argument is an opaque handle for a parallel region that selects the parallel
15 region on which to operate. On return, the enclosing_parallel_handle argument is set to a handle
16 for the parallel region that encloses the selected parallel region.
17 Cross References
18 • ompd_parallel_handle_t type, see Section 5.3.8 on page 540.
19 • ompd_rc_t type, see Section 5.3.12 on page 543.
20 • ompd_rel_parallel_handle call, see Section 5.5.6.4 on page 574.
21 5.5.6.3 ompd_get_task_parallel_handle
22 Summary
23 The ompd_get_task_parallel_handle function obtains a pointer to the parallel handle for
24 the parallel region that encloses a task region.
CHAPTER5. OMPDINTERFACE 573
1 Format
ompd_rc_t ompd_get_task_parallel_handle( ompd_task_handle_t *task_handle, ompd_parallel_handle_t **task_parallel_handle
);
2 3 4 5
6 Description
C
C
7 The ompd_get_task_parallel_handle function enables a tool to obtain a pointer to the
8 parallel handle for the parallel region that encloses the task region that task_handle specifies. This
9 call is meaningful only if at least one thread in the parallel region is stopped. A pointer to the
10 parallel regions handle is returned in the location to which task_parallel_handle points. The tool
11 owns that parallel handle, which it must release with ompd_rel_parallel_handle.
12 Description of Arguments
13 The task_handle argument is an opaque handle that selects the task on which to operate. On return,
14 the parallel_handle argument is set to a handle for the parallel region that encloses the selected task.
15 Cross References
16 • ompd_task_handle_t type, see Section 5.3.8 on page 540.
17 • ompd_parallel_handle_t type, see Section 5.3.8 on page 540.
18 • ompd_rc_t type, see Section 5.3.12 on page 543.
19 • ompd_rel_parallel_handle call, see Section 5.5.6.4 on page 574.
20 5.5.6.4 ompd_rel_parallel_handle
21 22
23
24 25 26
Summary
The ompd_rel_parallel_handle function releases a parallel region handle. Format
C
C
574
OpenMP API – Version 5.0 November 2018
ompd_rc_t ompd_rel_parallel_handle( ompd_parallel_handle_t *parallel_handle
);
1 Description
2 Parallel region handles are opaque so tools cannot release them directly. Instead, a tool must pass a
3 parallel region handle to the ompd_rel_parallel_handle function for disposal when
4 finished with it.
5 Description of Arguments
6 The parallel_handle argument is an opaque handle to be released.
7 Cross References
8 • ompd_parallel_handle_t type, see Section 5.3.8 on page 540.
9 • ompd_rc_t type, see Section 5.3.12 on page 543.
10 5.5.6.5 ompd_parallel_handle_compare
11 Summary
12 The ompd_parallel_handle_compare function compares two parallel region handles.
13 Format
14 15 16 17 18
19 Description
C
C
ompd_rc_t ompd_parallel_handle_compare( ompd_parallel_handle_t *parallel_handle_1, ompd_parallel_handle_t *parallel_handle_2,
int *cmp_value );
20 The internal structure of parallel region handles is opaque to tools. While tools can easily compare
21 pointers to parallel region handles, they cannot determine whether handles at two different
22 addresses refer to the same underlying parallel region and, instead must use the
23 ompd_parallel_handle_compare function.
24 On success, ompd_parallel_handle_compare returns a signed integer value in the location
25 to which cmp_value points that indicates how the underlying parallel regions compare. A value less
26 than, equal to, or greater than 0 indicates that the region corresponding to parallel_handle_1 is,
27 respectively, less than, equal to, or greater than that corresponding to parallel_handle_2. This
28 function is provided since the means by which parallel region handles are ordered is
29 implementation defined.
CHAPTER5. OMPDINTERFACE 575
10
11 12
13
14 15 16 17
18
19 20 21 22
23
24 25 26
Format
1
Description of Arguments
The parallel_handle_1 and parallel_handle_2 arguments are opaque handles that correspond to parallel regions. On return the cmp_value argument points to a signed integer value that indicates how the underlying parallel regions compare.
Cross References
• ompd_parallel_handle_t type, see Section 5.3.8 on page 540. • ompd_rc_t type, see Section 5.3.12 on page 543.
2 3 4
5 6 7
Task Handles Summary
The ompd_get_curr_task_handle function obtains a pointer to the task handle for the current task region that is associated with an OpenMP thread.
8 5.5.7
9 5.5.7.1 ompd_get_curr_task_handle
C
C
ompd_rc_t ompd_get_curr_task_handle(
ompd_thread_handle_t *thread_handle,
ompd_task_handle_t **task_handle );
576
OpenMP API – Version 5.0 November 2018
Description
The ompd_get_curr_task_handle function obtains a pointer to the task handle for the current task region that is associated with an OpenMP thread. This call is meaningful only if the thread for which the handle is provided is stopped. The task handle must be released with ompd_rel_task_handle.
Description of Arguments
The thread_handle argument is an opaque handle that selects the thread on which to operate. On return, the task_handle argument points to a location that points to a handle for the task that the thread is currently executing.
1 Cross References
2 • ompd_thread_handle_t type, see Section 5.3.8 on page 540.
3 • ompd_task_handle_t type, see Section 5.3.8 on page 540.
4 • ompd_rc_t type, see Section 5.3.12 on page 543.
5 • ompd_rel_task_handle call, see Section 5.5.7.5 on page 580.
6 5.5.7.2 ompd_get_generating_task_handle
7 Summary
8 The ompd_get_generating_task_handle function obtains a pointer to the task handle of
9 the generating task region.
10 Format
11 12 13 14
15 Description
C
C
ompd_rc_t ompd_get_generating_task_handle( ompd_task_handle_t *task_handle, ompd_task_handle_t **generating_task_handle
);
16 The ompd_get_generating_task_handle function obtains a pointer to the task handle for
17 the task that encountered the OpenMP task construct that generated the task represented by
18 task_handle. The generating task is the OpenMP task that was active when the task specified by
19 task_handle was created. This call is meaningful only if the thread that is executing the task that
20 task_handle specifies is stopped. The generating task handle must be released with
21 ompd_rel_task_handle.
22 Description of Arguments
23 The task_handle argument is an opaque handle that selects the task on which to operate. On return,
24 the generating_task_handle argument points to a location that points to a handle for the generating
25 task.
CHAPTER5. OMPDINTERFACE 577
6
7 8
9
10 11 12 13
14
15 16 17 18
19
20 21 22 23
24 25 26 27
Summary
The ompd_get_scheduling_task_handle function obtains a task handle for the task that was active at a task scheduling point.
1 Cross References
2 • ompd_task_handle_t type, see Section 5.3.8 on page 540.
3 • ompd_rc_t type, see Section 5.3.12 on page 543.
4 • ompd_rel_task_handle call, see Section 5.5.7.5 on page 580.
5 5.5.7.3 ompd_get_scheduling_task_handle
Format
C
C
ompd_rc_t ompd_get_scheduling_task_handle( ompd_task_handle_t *task_handle,
ompd_task_handle_t **scheduling_task_handle );
578
OpenMP API – Version 5.0 November 2018
Description
The ompd_get_scheduling_task_handle function obtains a task handle for the task that was active when the task that task_handle represents was scheduled. This call is meaningful only if the thread that is executing the task that task_handle specifies is stopped. The scheduling task handle must be released with ompd_rel_task_handle.
Description of Arguments
The task_handle argument is an opaque handle for a task and selects the task on which to operate. On return, the scheduling_task_handle argument points to a location that points to a handle for the task that is still on the stack of execution on the same thread and was deferred in favor of executing the selected task.
Cross References
• ompd_task_handle_t type, see Section 5.3.8 on page 540.
• ompd_rc_t type, see Section 5.3.12 on page 543.
• ompd_rel_task_handle call, see Section 5.5.7.5 on page 580.
1 5.5.7.4 ompd_get_task_in_parallel
2 Summary
3 The ompd_get_task_in_parallel function obtains handles for the implicit tasks that are
4 associated with a parallel region.
5 Format
6 7 8 9
10
11 Description
C
C
ompd_rc_t ompd_get_task_in_parallel(
ompd_parallel_handle_t *parallel_handle, int thread_num,
ompd_task_handle_t **task_handle
);
12 The ompd_get_task_in_parallel function obtains handles for the implicit tasks that are
13 associated with a parallel region. A successful invocation of ompd_get_task_in_parallel
14 returns a pointer to a task handle in the location to which task_handle points. This call yields
15 meaningful results only if all OpenMP threads in the parallel region are stopped.
16 Description of Arguments
17 The parallel_handle argument is an opaque handle that selects the parallel region on which to
18 operate. The thread_num argument selects the implicit task of the team that is returned. The
19 selected implicit task would return thread_num from a call of the omp_get_thread_num()
20 routine. On return, the task_handle argument points to a location that points to an opaque handle
21 for the selected implicit task.
22 Restrictions
23 The following restriction applies to the ompd_get_task_in_parallel function:
24 • The value of thread_num must be a non-negative integer that is smaller than the size of the team
25 size that is the value of the ompd-team-size-var that ompd_get_icv_from_scope returns.
CHAPTER5. OMPDINTERFACE 579
ompd_rc_t ompd_rel_task_handle( ompd_task_handle_t *task_handle
);
10 11 12
C
C
1 Cross References
2 • ompd_parallel_handle_t type, see Section 5.3.8 on page 540.
3 • ompd_task_handle_t type, see Section 5.3.8 on page 540.
4 • ompd_rc_t type, see Section 5.3.12 on page 543.
5 • ompd_get_icv_from_scope call, see Section 5.5.9.2 on page 590.
6 5.5.7.5 ompd_rel_task_handle
7 Summary
8 This ompd_rel_task_handle function releases a task handle.
9 Format
13 Description
14 Task handles are opaque so tools cannot release them directly. Instead, when a tool is finished with
15 a task handle it must use the ompd_rel_task_handle function to release it.
16 Description of Arguments
17 The task_handle argument is an opaque task handle to be released.
18 Cross References
19 • ompd_task_handle_t type, see Section 5.3.8 on page 540.
20 • ompd_rc_t type, see Section 5.3.12 on page 543.
21 5.5.7.6 ompd_task_handle_compare
22 23
Summary
The ompd_task_handle_compare function compares task handles.
580
OpenMP API – Version 5.0 November 2018
1 Format
ompd_rc_t ompd_task_handle_compare( ompd_task_handle_t *task_handle_1,
ompd_task_handle_t *task_handle_2,
int *cmp_value );
2 3 4 5 6
7 Description
C
C
8 The internal structure of task handles is opaque so tools cannot directly determine if handles at two
9 different addresses refer to the same underlying task. The ompd_task_handle_compare
10 function compares task handles. After a successful call to ompd_task_handle_compare, the
11 value of the location to which cmp_value points is a signed integer that indicates how the underlying
12 tasks compare: a value less than, equal to, or greater than 0 indicates that the task that corresponds
13 to task_handle_1 is, respectively, less than, equal to, or greater than the task that corresponds to
14 task_handle_2. The means by which task handles are ordered is implementation defined.
15 Description of Arguments
16 The task_handle_1 and task_handle_2 arguments are opaque handles that correspond to tasks. On
17 return, the cmp_value argument points to a location in which a signed integer value indicates how
18 the underlying tasks compare.
19 Cross References
20 • ompd_task_handle_t type, see Section 5.3.8 on page 540.
21 • ompd_rc_t type, see Section 5.3.12 on page 543.
22 5.5.7.7 ompd_get_task_function
23 Summary
24 This ompd_get_task_function function returns the entry point of the code that corresponds
25 to the body of a task.
CHAPTER5. OMPDINTERFACE 581
1 Format
ompd_rc_t ompd_get_task_function (
ompd_task_handle_t *task_handle,
ompd_address_t *entry_point );
2 3 4 5
C
C
6 Description
7 The ompd_get_task_function function returns the entry point of the code that corresponds
8 to the body of code that the task executes.
9 Description of Arguments
10 The task_handle argument is an opaque handle that selects the task on which to operate. On return,
11 the entry_point argument is set to an address that describes the beginning of application code that
12 executes the task region.
13 Cross References
14 • ompd_address_t type, see Section 5.3.4 on page 538.
15 • ompd_task_handle_t type, see Section 5.3.8 on page 540.
16 • ompd_rc_t type, see Section 5.3.12 on page 543.
17 5.5.7.8 ompd_get_task_frame
18 19
20
21 22 23 24 25
Summary
The ompd_get_task_frame function extracts the frame pointers of a task. Format
C
C
ompd_rc_t ompd_get_task_frame ( ompd_task_handle_t *task_handle, ompd_frame_info_t *exit_frame, ompd_frame_info_t *enter_frame
);
582
OpenMP API – Version 5.0 November 2018
1 Description
2 An OpenMP implementation maintains an ompt_frame_t object for every implicit or explicit
3 task. The ompd_get_task_frame function extracts the enter_frame and exit_frame fields of
4 the ompt_frame_t object of the task that task_handle identifies.
5 Description of Arguments
6 The task_handle argument specifies an OpenMP task. On return, the exit_frame argument points to
7 an ompd_frame_info_t object that has the frame information with the same semantics as the
8 exit_frame field in the ompt_frame_t object that is associated with the specified task. On return,
9 the enter_frame argument points to an ompd_frame_info_t object that has the frame
10 information with the same semantics as the enter_frame field in the ompt_frame_t object that is
11 associated with the specified task.
12 Cross References
13 • ompt_frame_t type, see Section 4.4.4.27 on page 454.
14 • ompd_address_t type, see Section 5.3.4 on page 538.
15 • ompd_frame_info_t type, see Section 5.3.5 on page 538.
16 • ompd_task_handle_t type, see Section 5.3.8 on page 540.
17 • ompd_rc_t type, see Section 5.3.12 on page 543.
18 5.5.7.9 ompd_enumerate_states
19 Summary
20 The ompd_enumerate_states function enumerates thread states that an OpenMP
21 implementation supports.
22 Format
23
24
25
26
27
28
29
C
C
ompd_rc_t ompd_enumerate_states ( ompd_address_space_handle_t *address_space_handle, ompd_word_t current_state,
ompd_word_t *next_state, const char **next_state_name, ompd_word_t *more_enums
);
CHAPTER5. OMPDINTERFACE 583
1
Description
An OpenMP implementation may support only a subset of the states that the ompt_state_t enumeration type defines. In addition, an OpenMP implementation may support implementation-specific states. The ompd_enumerate_states call enables a tool to enumerate the thread states that an OpenMP implementation supports.
When the current_state argument is a thread state that an OpenMP implementation supports, the call assigns the value and string name of the next thread state in the enumeration to the locations to which the next_state and next_state_name arguments point.
On return, the third-party tool owns the next_state_name string. The OMPD library allocates storage for the string with the memory allocation callback that the tool provides. The tool is responsible for releasing the memory.
On return, the location to which the more_enums argument points has the value 1 whenever one or more states are left in the enumeration. On return, the location to which the more_enums argument points has the value 0 when current_state is the last state in the enumeration.
Description of Arguments
The address_space_handle argument identifies the address space. The current_state argument must be a thread state that the OpenMP implementation supports. To begin enumerating the supported states, a tool should pass ompt_state_undefined as the value of current_state. Subsequent calls to ompd_enumerate_states by the tool should pass the value that the call returned in the next_state argument. On return, the next_state argument points to an integer with the value of the next state in the enumeration. On return, the next_state_name argument points to a character string that describes the next state. On return, the more_enums argument points to an integer with a value of 1 when more states are left to enumerate and a value of 0 when no more states are left.
Constraints on Arguments
Any string that is returned through the next_state_name argument must be immutable and defined for the lifetime of program execution.
Cross References
• ompt_state_t type, see Section 4.4.4.26 on page 452.
• ompd_address_space_handle_t type, see Section 5.3.8 on page 540. • ompd_rc_t type, see Section 5.3.12 on page 543.
2 3 4 5
6 7 8
9 10 11
12 13 14
15
16
17
18
19
20
21
22
23
24
25 26
27 28 29 30
584
OpenMP API – Version 5.0 November 2018
1 5.5.7.10 ompd_get_state
2 Summary
3 The ompd_get_state function obtains the state of a thread.
4 Format
ompd_rc_t ompd_get_state ( ompd_thread_handle_t *thread_handle,
ompd_word_t *state,
ompt_wait_id_t *wait_id );
5 6 7 8 9
C
C
10 Description
11 The ompd_get_state function returns the state of an OpenMP thread.
12 Description of Arguments
13 The thread_handle argument identifies the thread. The state argument represents the state of that
14 thread as represented by a value that ompd_enumerate_states returns. On return, if the
15 wait_id argument is non-null then it points to a handle that corresponds to the wait_id wait
16 identifier of the thread. If the thread state is not one of the specified wait states, the value to which
17 wait_id points is undefined.
18 Cross References
19 • ompd_wait_id_t type, see Section 5.3.2 on page 537.
20 • ompd_thread_handle_t type, see Section 5.3.8 on page 540.
21 • ompd_rc_t type, see Section 5.3.12 on page 543.
22 • ompd_enumerate_states call, see Section 5.5.7.9 on page 583.
CHAPTER5. OMPDINTERFACE 585
3
4 5
6
7 8 9
10
11
12
13
14
15
16
17
18
19 20 21 22 23
24
25 26
Summary
The ompd_get_display_control_vars function returns a list of name/value pairs for OpenMP control variables.
1 5.5.8 Display Control Variables
2 5.5.8.1 ompd_get_display_control_vars
Format
C
C
ompd_rc_t ompd_get_display_control_vars ( ompd_address_space_handle_t *address_space_handle,
const char * const **control_vars );
586
OpenMP API – Version 5.0 November 2018
Description
The ompd_get_display_control_vars function returns a NULL-terminated vector of NULL-terminated strings of name/value pairs of control variables that have user controllable settings and are important to the operation or performance of an OpenMP runtime system. The control variables that this interface exposes include all OpenMP environment variables, settings that may come from vendor or platform-specific environment variables, and other settings that affect the operation or functioning of an OpenMP runtime.
The format of the strings is name=a string.
On return, the third-party tool owns the vector and the strings. The OMP library must satisfy the termination constraints; it may use static or dynamic memory for the vector and/or the strings and is unconstrained in how it arranges them in memory. If it uses dynamic memory then the OMPD library must use the allocate callback that the tool provides to ompd_initialize. The tool must use ompd_rel_display_control_vars() to release the vector and the strings.
Description of Arguments
The address_space_handle argument identifies the address space. On return, the control_vars argument points to the vector of display control variables.
1 Cross References
2 • ompd_address_space_handle_t type, see Section 5.3.8 on page 540.
3 • ompd_rc_t type, see Section 5.3.12 on page 543.
4 • ompd_initialize call, see Section 5.5.1.1 on page 558.
5 • ompd_rel_display_control_vars type, see Section 5.5.8.2 on page 587.
6 5.5.8.2 ompd_rel_display_control_vars
7 Summary
8 The ompd_rel_display_control_vars releases a list of name/value pairs of OpenMP
9 control variables previously acquired with ompd_get_display_control_vars.
10 Format
11 12 13
14 Description
C
C
ompd_rc_t ompd_rel_display_control_vars ( const char * const **control_vars
);
15 The third-party tool owns the vector and strings that ompd_get_display_control_vars
16 returns. The tool must call ompd_rel_display_control_vars to release the vector and the
17 strings.
18 Description of Arguments
19 The control_vars argument is the vector of display control variables to be released.
20 Cross References
21 • ompd_rc_t type, see Section 5.3.12 on page 543.
22 • ompd_get_display_control_vars call, see Section 5.5.8.1 on page 586.
CHAPTER5. OMPDINTERFACE 587
3 4
5
6 7 8 9
10 11 12 13
14
15 16 17 18 19
20 21 22 23 24
25 26 27
Summary
The ompd_enumerate_icvs function enumerates ICVs. Format
1 5.5.9 Accessing Scope-Specific Information
2 5.5.9.1 ompd_enumerate_icvs
C
ompd_rc_t ompd_enumerate_icvs ( ompd_address_space_handle_t *handle, ompd_icv_id_t current,
ompd_icv_id_t *next_id, const char **next_icv_name, ompd_scope_t *next_scope, int *more
);
C
588
OpenMP API – Version 5.0 November 2018
Description
In addition to the ICVs listed in Table 2.1, an OpenMP implementation must support the OMPD specific ICVs listed in Table 5.2. An OpenMP implementation may support additional implementation specific variables. An implementation may store ICVs in a different scope than Table 2.3 indicates. The ompd_enumerate_icvs function enables a tool to enumerate the ICVs that an OpenMP implementation supports and their related scopes.
When the current argument is set to the identifier of a supported ICV, ompd_enumerate_icvs assigns the value, string name, and scope of the next ICV in the enumeration to the locations to which the next_id, next_icv_name, and next_scope arguments point. On return, the third-party tool owns the next_icv_name string. The OMPD library uses the memory allocation callback that the tool provides to allocate the string storage; the tool is responsible for releasing the memory.
On return, the location to which the more argument points has the value of 1 whenever one or more ICV are left in the enumeration. on return, that location has the value 0 when current is the last ICV in the enumeration.
1 Description of Arguments
2 The address_space_handle argument identifies the address space. The current argument must be
3 an ICV that the OpenMP implementation supports. To begin enumerating the ICVs, a tool should
4 pass ompd_icv_undefined as the value of current. Subsequent calls to
5 ompd_enumerate_icvs should pass the value returned by the call in the next_id output
6 argument. On return, the next_id argument points to an integer with the value of the ID of the next
7 ICV in the enumeration. On return, the next_icv argument points to a character string with the
8 name of the next ICV. On return, the next_scope argument points to the scope enum value of the
9 scope of the next ICV. On return, the more_enums argument points to an integer with the value of 1
10 when more ICVs are left to enumerate and the value of 0 when no more ICVs are left.
11 Constraints on Arguments
12 Any string that next_icv returns must be immutable and defined for the lifetime of a program
13 execution.
TABLE 5.2: OMPD-specific ICVs
Variable
ompd-num-procs-var
ompd-thread-num-var
ompd-final-var
ompd-implicit-var ompd-team-size-var
Scope
device
task
task
task team
Meaning
return value of omp_get_num_procs() when executed on this device
return value of omp_get_thread_num() when executed in this task
return value of omp_in_final() when executed in this task
the task is an implicit task
return value of omp_get_num_threads() when executed in this team
14 Cross References
15 • ompd_address_space_handle_t
16 • ompd_scope_t type, see Section 5.3.9 on page 541.
17 • ompd_icv_id_t type, see Section 5.3.10 on page 542.
18 • ompd_rc_t type, see Section 5.3.12 on page 543.
type, see Section 5.3.8 on page 540.
CHAPTER5. OMPDINTERFACE 589
2 3
4
5 6 7 8 9
10
11
12 13
14
15 16 17
18
19 20
21 22
Summary
The ompd_get_icv_from_scope function returns the value of an ICV. Format
1 5.5.9.2 ompd_get_icv_from_scope
C
C
ompd_rc_t ompd_get_icv_from_scope ( void *handle,
ompd_scope_t scope, ompd_icv_id_t icv_id, ompd_word_t *icv_value
);
590
OpenMP API – Version 5.0 November 2018
Description
The ompd_get_icv_from_scope function provides access to the ICVs that ompd_enumerate_icvs identifies.
Description of Arguments
The handle argument provides an OpenMP scope handle. The scope argument specifies the kind of scope provided in handle. The icv_id argument specifies the ID of the requested ICV. On return, the icv_value argument points to a location with the value of the requested ICV.
Constraints on Arguments
If the ICV cannot be represented by an integer type value then the function returns ompd_rc_incompatible.
The provided handle must match the scope as defined in Section 5.3.10 on page 542.
The provided scope must match the scope for icv_id as requested by ompd_enumerate_icvs.
1 Cross References
2 • ompd_address_space_handle_t type, see Section 5.3.8 on page 540.
3 • ompd_thread_handle_t type, see Section 5.3.8 on page 540.
4 • ompd_parallel_handle_t type, see Section 5.3.8 on page 540.
5 • ompd_task_handle_t type, see Section 5.3.8 on page 540.
6 • ompd_scope_t type, see Section 5.3.9 on page 541.
7 • ompd_icv_id_t type, see Section 5.3.10 on page 542.
8 • ompd_rc_t type, see Section 5.3.12 on page 543.
9 • ompd_enumerate_icvs, see Section 5.5.9.1 on page 588.
10 5.5.9.3 ompd_get_icv_string_from_scope
11 Summary
12 The ompd_get_icv_string_from_scope function returns the value of an ICV.
13 Format
ompd_rc_t ompd_get_icv_string_from_scope ( void *handle,
ompd_scope_t scope,
ompd_icv_id_t icv_id,
const char **icv_string );
14
15
16
17
18
19
20 Description
C
C
21 The ompd_get_icv_string_from_scope function provides access to the ICVs that
22 ompd_enumerate_icvs identifies.
CHAPTER5. OMPDINTERFACE 591
1 Description of Arguments
2 The handle argument provides an OpenMP scope handle. The scope argument specifies the kind of
3 scope provided in handle. The icv_id argument specifies the ID of the requested ICV. On return,
4 the icv_string argument points to a string representation of the requested ICV.
5 On return, the third-party tool owns the icv_string string. The OMPD library allocates the string
6 storage with the memory allocation callback that the tool provides. The tool is responsible for
7 releasing the memory.
8 Constraints on Arguments
9 The provided handle must match the scope as defined in Section 5.3.10 on page 542.
10 The provided scope must match the scope for icv_id as requested by ompd_enumerate_icvs.
11 Cross References
12 • ompd_address_space_handle_t type, see Section 5.3.8 on page 540.
13 • ompd_thread_handle_t type, see Section 5.3.8 on page 540.
14 • ompd_parallel_handle_t type, see Section 5.3.8 on page 540.
15 • ompd_task_handle_t type, see Section 5.3.8 on page 540.
16 • ompd_scope_t type, see Section 5.3.9 on page 541.
17 • ompd_icv_id_t type, see Section 5.3.10 on page 542.
18 • ompd_rc_t type, see Section 5.3.12 on page 543.
19 • ompd_enumerate_icvs, see Section 5.5.9.1 on page 588.
20 5.5.9.4 ompd_get_tool_data
21
22 23
Summary
The ompd_get_tool_data function provides access to the OMPT data variable stored for each OpenMP scope.
592
OpenMP API – Version 5.0 November 2018
1 Format
ompd_rc_t ompd_get_tool_data( void* handle,
ompd_scope_t scope, ompd_word_t *value, ompd_address_t *ptr
);
2 3 4 5 6 7
8 Description
C
C
9 The ompd_get_tool_data function provides access to the OMPT tool data stored for each
10 scope. If the runtime library does not support OMPT then the function returns
11 ompd_rc_unsupported.
12 Description of Arguments
13 The handle argument provides an OpenMP scope handle. The scope argument specifies the kind of
14 scope provided in handle. On return, the value argument points to the value field of the
15 ompt_data_t union stored for the selected scope. On return, the ptr argument points to the ptr
16 field of the ompt_data_t union stored for the selected scope.
17 Cross References
18 • ompt_data_t type, see Section 4.4.4.4 on page 440.
19 • ompd_address_space_handle_t type, see Section 5.3.8 on page 540.
20 • ompd_thread_handle_t type, see Section 5.3.8 on page 540.
21 • ompd_parallel_handle_t type, see Section 5.3.8 on page 540.
22 • ompd_task_handle_t type, see Section 5.3.8 on page 540.
23 • ompd_scope_t type, see Section 5.3.9 on page 541.
24 • ompd_rc_t type, see Section 5.3.12 on page 543.
CHAPTER5. OMPDINTERFACE 593
1 5.6
2 3 4
5 6 7 8
9 5.6.1 10
11 12
13 14
15
16 17 18 19 20
21 22 23 24
Runtime Entry Points for OMPD
The OpenMP implementation must define several entry point symbols through which execution must pass when particular events occur and data collection for OMPD is enabled. A tool can enable notification of an event by setting a breakpoint at the address of the entry point symbol.
Entry point symbols have external C linkage and do not require demangling or other transformations to look up their names to obtain the address in the OpenMP program. While each entry point symbol conceptually has a function type signature, it may not be a function. It may be a labeled location
Beginning Parallel Regions Summary
Before starting the execution of an OpenMP parallel region, the implementation executes ompd_bp_parallel_begin.
Format
C
void ompd_bp_parallel_begin(void);
C
Description
The OpenMP implementation must execute ompd_bp_parallel_begin at every parallel-begin event. At the point that the implementation reaches ompd_bp_parallel_begin, the binding for ompd_get_curr_parallel_handle is the parallel region that is beginning and the binding for ompd_get_curr_task_handle is the task that encountered the parallel construct.
Cross References
• parallel construct, see Section 2.6 on page 74.
• ompd_get_curr_parallel_handle, see Section 5.5.6.1 on page 571. • ompd_get_curr_task_handle, see Section 5.5.7.1 on page 576.
594
OpenMP API – Version 5.0 November 2018
1 5.6.2 2
3 4
5 6
7
8
9 10 11 12 13
14 15 16 17 18
19 5.6.3 20
21 22
Ending Parallel Regions Summary
After finishing the execution of an OpenMP parallel region, the implementation executes ompd_bp_parallel_end.
Format
C
void ompd_bp_parallel_end(void);
C
Description
The OpenMP implementation must execute ompd_bp_parallel_end at every parallel-end event. At the point that the implementation reaches ompd_bp_parallel_end, the binding for ompd_get_curr_parallel_handle is the parallel region that is ending and the binding for ompd_get_curr_task_handle is the task that encountered the parallel construct. After execution of ompd_bp_parallel_end, any parallel_handle that was acquired for the parallel region is invalid and should be released.
Cross References
• parallel construct, see Section 2.6 on page 74.
• ompd_get_curr_parallel_handle, see Section 5.5.6.1 on page 571. • ompd_rel_parallel_handle, see Section 5.5.6.4 on page 574.
• ompd_get_curr_task_handle, see Section 5.5.7.1 on page 576.
Beginning Task Regions Summary
Before starting the execution of an OpenMP task region, the implementation executes ompd_bp_task_begin.
CHAPTER5. OMPDINTERFACE 595
1 2
3
4 5 6 7
8 9
10 5.6.4 11
12 13
14 15
16
17 18 19 20 21
Format
void ompd_bp_task_begin(void);
Description
C C
The OpenMP implementation must execute ompd_bp_task_begin immediately before starting execution of a structured-block that is associated with a non-merged task. At the point that the implementation reaches ompd_bp_task_begin, the binding for ompd_get_curr_task_handle is the task that is scheduled to execute.
Cross References
• ompd_get_curr_task_handle, see Section 5.5.7.1 on page 576.
Ending Task Regions Summary
After finishing the execution of an OpenMP task region, the implementation executes ompd_bp_task_end.
Format
void ompd_bp_task_end(void);
Description
C C
596
OpenMP API – Version 5.0 November 2018
The OpenMP implementation must execute ompd_bp_task_end immediately after completion of a structured-block that is associated with a non-merged task. At the point that the implementation reaches ompd_bp_task_end, the binding for ompd_get_curr_task_handle is the task that finished execution. After execution of ompd_bp_task_end, any task_handle that was acquired for the task region is invalid and should be released.
1 2 3
4 5.6.5 5
6
7 8
9
10 11 12
13 14 15
16 5.6.6 17
18
19 20
Cross References
• ompd_get_curr_task_handle, see Section 5.5.7.1 on page 576. • ompd_rel_task_handle, see Section 5.5.7.5 on page 580.
Beginning OpenMP Threads Summary
When starting an OpenMP thread, the implementation executes ompd_bp_thread_begin. Format
C
void ompd_bp_thread_begin(void);
C
Description
The OpenMP implementation must execute ompd_bp_thread_begin at every native-thread-begin and initial-thread-begin event. This execution occurs before the thread starts the execution of any OpenMP region.
Cross References
• parallel construct, see Section 2.6 on page 74. • Initial task, see Section 2.10.5 on page 148.
Ending OpenMP Threads Summary
When terminating an OpenMP thread, the implementation executes ompd_bp_thread_end. Format
void ompd_bp_thread_end(void);
C C
CHAPTER5. OMPDINTERFACE 597
1
Description
The OpenMP implementation must execute ompd_bp_thread_end at every native-thread-end and the initial-thread-end event. This execution occurs after the thread completes the execution of all OpenMP regions. After executing ompd_bp_thread_end, any thread_handle that was acquired for this thread is invalid and should be released.
Cross References
• parallel construct, see Section 2.6 on page 74.
• Initial task, see Section 2.10.5 on page 148.
• ompd_rel_thread_handle, see Section 5.5.5.3 on page 569.
Initializing OpenMP Devices Summary
The OpenMP implementation must execute ompd_bp_device_begin at every device-initialize event.
Format
C
void ompd_bp_device_begin(void);
C
Description
When initializing a device for execution of a target region, the implementation must execute ompd_bp_device_begin. This execution occurs before the work associated with any OpenMP region executes on the device.
Cross References
• Device Initialization, see Section 2.12.1 on page 160.
2 3 4 5
6 7 8 9
10 5.6.7 11
12 13
14 15
16
17 18 19
20 21
598
OpenMP API – Version 5.0 November 2018
1 5.6.8 2
3
4 5
6
7 8 9
10
11 12 13
Finalizing OpenMP Devices Summary
When terminating an OpenMP thread, the implementation executes ompd_bp_device_end. Format
void ompd_bp_device_end(void);
Description
C C
The OpenMP implementation must execute ompd_bp_device_end at every device-finalize event. This execution occurs after the thread executes all OpenMP regions. After execution of ompd_bp_device_end, any address_space_handle that was acquired for this device is invalid and should be released.
Cross References
• Device Initialization, see Section 2.12.1 on page 160.
• ompd_rel_address_space_handle, see Section 5.5.2.3 on page 564.
CHAPTER5. OMPDINTERFACE 599
This page intentionally left blank
CHAPTER 6
1 2
3 4 5 6 7 8 9
10
11 12
13 14
15 16
17 18
19 6.1 20
21 22
Environment Variables
This chapter describes the OpenMP environment variables that specify the settings of the ICVs that affect the execution of OpenMP programs (see Section 2.5 on page 63). The names of the environment variables must be upper case. The values assigned to the environment variables are case insensitive and may have leading and trailing white space. Modifications to the environment variables after the program has started, even if modified by the program itself, are ignored by the OpenMP implementation. However, the settings of some of the ICVs can be modified during the execution of the OpenMP program by the use of the appropriate directive clauses or OpenMP API routines.
The following examples demonstrate how the OpenMP environment variables can be set in different environments:
• csh-like shells:
setenv OMP_SCHEDULE “dynamic”
• bash-like shells:
export OMP_SCHEDULE=”dynamic”
• Windows Command Line:
set OMP_SCHEDULE=dynamic
OMP_SCHEDULE
The OMP_SCHEDULE environment variable controls the schedule kind and chunk size of all loop
directives that have the schedule kind runtime, by setting the value of the run-sched-var ICV. The value of this environment variable takes the form:
601
1
[modifier:]kind[, chunk]
where
• modifier is one of monotonic or nonmonotonic;
• kind is one of static, dynamic, guided, or auto;
• chunk is an optional positive integer that specifies the chunk size.
If the modifier is not present, the modifier is set to monotonic if kind is static; for any other kind it is set to nonmonotonic.
If chunk is present, white space may be on either side of the “,”. See Section 2.9.2 on page 101 for a detailed description of the schedule kinds.
The behavior of the program is implementation defined if the value of OMP_SCHEDULE does not conform to the above format.
Implementation specific schedules cannot be specified in OMP_SCHEDULE. They can only be specified by calling omp_set_schedule, described in Section 3.2.12 on page 345.
Examples:
Cross References
• run-sched-var ICV, see Section 2.5 on page 63.
• Worksharing-Loop construct, see Section 2.9.2 on page 101.
• Parallel worksharing-loop construct, see Section 2.13.1 on page 185. • omp_set_schedule routine, see Section 3.2.12 on page 345.
• omp_get_schedule routine, see Section 3.2.13 on page 347.
OMP_NUM_THREADS
The OMP_NUM_THREADS environment variable sets the number of threads to use for parallel regions by setting the initial value of the nthreads-var ICV. See Section 2.5 on page 63 for a comprehensive set of rules about the interaction between the OMP_NUM_THREADS environment variable, the num_threads clause, the omp_set_num_threads library routine and dynamic
2
3
4
5
6 7
8 9
10 11
12 13
14
15 16 17
18
19
20
21
22
23
24 6.2
25 26 27 28
setenv OMP_SCHEDULE “guided,4”
setenv OMP_SCHEDULE “dynamic”
setenv OMP_SCHEDULE “nonmonotonic:dynamic,4”
602
OpenMP API – Version 5.0 November 2018
1 2
3 4
5 6 7
8 9
10
11
12
13
14
15
16
17 6.3 18
19 20 21
22 23 24 25 26
27 28
adjustment of threads, and Section 2.6.1 on page 78 for a complete algorithm that describes how the number of threads for a parallel region is determined.
The value of this environment variable must be a list of positive integer values. The values of the list set the number of threads to use for parallel regions at the corresponding nested levels.
The behavior of the program is implementation defined if any value of the list specified in the OMP_NUM_THREADS environment variable leads to a number of threads that is greater than an implementation can support, or if any value is not a positive integer.
Example:
setenv OMP_NUM_THREADS 4,3,2
Cross References
• nthreads-var ICV, see Section 2.5 on page 63.
• num_threads clause, see Section 2.6 on page 74.
• omp_set_num_threads routine, see Section 3.2.1 on page 334. • omp_get_num_threads routine, see Section 3.2.2 on page 335. • omp_get_max_threads routine, see Section 3.2.3 on page 336. • omp_get_team_size routine, see Section 3.2.20 on page 354.
OMP_DYNAMIC
The OMP_DYNAMIC environment variable controls dynamic adjustment of the number of threads
to use for executing parallel regions by setting the initial value of the dyn-var ICV. The value of this environment variable must be one of the following:
true | false
If the environment variable is set to true, the OpenMP implementation may adjust the number of threads to use for executing parallel regions in order to optimize the use of system resources. If the environment variable is set to false, the dynamic adjustment of the number of threads is disabled. The behavior of the program is implementation defined if the value of OMP_DYNAMIC is neither true nor false.
Example:
setenv OMP_DYNAMIC true
CHAPTER6. ENVIRONMENTVARIABLES 603
1 2 3 4
5 6.4
6 7 8 9
10 11 12
13 14 15
16 17 18 19
20
21 22
23 24 25 26
Cross References
• dyn-var ICV, see Section 2.5 on page 63.
• omp_set_dynamic routine, see Section 3.2.7 on page 340. • omp_get_dynamic routine, see Section 3.2.8 on page 341.
OMP_PROC_BIND
The OMP_PROC_BIND environment variable sets the initial value of the bind-var ICV. The value of this environment variable is either true, false, or a comma separated list of master, close, or spread. The values of the list set the thread affinity policy to be used for parallel regions at the corresponding nested level.
If the environment variable is set to false, the execution environment may move OpenMP threads between OpenMP places, thread affinity is disabled, and proc_bind clauses on parallel constructs are ignored.
Otherwise, the execution environment should not move OpenMP threads between OpenMP places, thread affinity is enabled, and the initial thread is bound to the first place in the OpenMP place list prior to the first active parallel region.
The behavior of the program is implementation defined if the value in the OMP_PROC_BIND environment variable is not true, false, or a comma separated list of master, close, or spread. The behavior is also implementation defined if an initial thread cannot be bound to the first place in the OpenMP place list.
Examples:
Cross References
• bind-var ICV, see Section 2.5 on page 63.
• proc_bind clause, see Section 2.6.2 on page 80.
• omp_get_proc_bind routine, see Section 3.2.23 on page 357.
604
OpenMP API – Version 5.0 November 2018
setenv OMP_PROC_BIND false
setenv OMP_PROC_BIND “spread, spread, close”
1 6.5
2 3 4 5 6
7 8 9
10 11
12 13 14 15 16
17 18
19 20 21
OMP_PLACES
A list of places can be specified in the OMP_PLACES environment variable. The place-partition-var ICV obtains its initial value from the OMP_PLACES value, and makes the list available to the execution environment. The value of OMP_PLACES can be one of two types of values: either an abstract name that describes a set of places or an explicit list of places described by non-negative numbers.
The OMP_PLACES environment variable can be defined using an explicit ordered list of comma-separated places. A place is defined by an unordered set of comma-separated non-negative numbers enclosed by braces. The meaning of the numbers and how the numbering is done are implementation defined. Generally, the numbers represent the smallest unit of execution exposed by the execution environment, typically a hardware thread.
Intervals may also be used to define places. Intervals can be specified using the
An exclusion operator “!” can also be used to exclude the number or place immediately following the operator.
Alternatively, the abstract names listed in Table 6.1 should be understood by the execution and runtime environment. The precise definitions of the abstract names are implementation defined. An implementation may also add abstract names as appropriate for the target platform.
TABLE 6.1: Defined Abstract Names for OMP_PLACES
Abstract Name
threads
cores
sockets
Meaning
Each place corresponds to a single hardware thread on the target machine.
Each place corresponds to a single core (having one or more hardware threads) on the target machine.
Each place corresponds to a single socket (consisting of one or more cores) on the target machine.
22 23 24 25 26
27 28
The abstract name may be appended by a positive number in parentheses to denote the length of the place list to be created, that is abstract_name(num-places). When requesting fewer places than available on the system, the determination of which resources of type abstract_name are to be included in the place list is implementation defined. When requesting more resources than available, the length of the place list is implementation defined.
The behavior of the program is implementation defined when the execution environment cannot map a numerical value (either explicitly defined or implicitly derived from an interval) within the
CHAPTER6. ENVIRONMENTVARIABLES 605
5
6 7 8 9
10 11
12 13 14
15 16 17 18 19
where each of the last three definitions corresponds to the same 4 places including the smallest units of execution exposed by the execution environment numbered, in turn, 0 to 3, 4 to 7, 8 to 11, and 12 to 15.
Cross References
• place-partition-var, see Section 2.5 on page 63.
• Controlling OpenMP thread affinity, see Section 2.6.2 on page 80.
• omp_get_num_places routine, see Section 3.2.24 on page 358.
• omp_get_place_num_procs routine, see Section 3.2.25 on page 359.
1 2 3
4
OMP_PLACES list to a processor on the target platform, or if it maps to an unavailable processor. The behavior is also implementation defined when the OMP_PLACES environment variable is defined using an abstract name.
The following grammar describes the values accepted for the OMP_PLACES environment variable.
⟨list⟩ |= ⟨p-list⟩ |= ⟨p-interval⟩ |= ⟨place⟩ |= ⟨res-list⟩ |= ⟨res-interval⟩ |= ⟨aname⟩ |= ⟨word⟩ |= ⟨res⟩ |= ⟨num-places⟩ |= ⟨stride⟩ |= ⟨len⟩ |=
Examples:
⟨p-list⟩ | ⟨aname⟩
⟨p-interval⟩ | ⟨p-list⟩,⟨p-interval⟩
⟨place⟩:⟨len⟩:⟨stride⟩ | ⟨place⟩:⟨len⟩ | ⟨place⟩ | !⟨place⟩ {⟨res-list⟩}
⟨res-interval⟩ | ⟨res-list⟩,⟨res-interval⟩
⟨res⟩:⟨num-places⟩:⟨stride⟩ | ⟨res⟩:⟨num-places⟩ | ⟨res⟩ | !⟨res⟩ ⟨word⟩(⟨num-places⟩) | ⟨word⟩
sockets | cores | threads |
positive integer
integer
positive integer
setenv OMP_PLACES threads
setenv OMP_PLACES “threads(4)”
setenv OMP_PLACES
“{0,1,2,3},{4,5,6,7},{8,9,10,11},{12,13,14,15}”
setenv OMP_PLACES “{0:4},{4:4},{8:4},{12:4}”
setenv OMP_PLACES “{0:4}:4:4”
606
OpenMP API – Version 5.0 November 2018
1 2 3 4
5 6.6
6 7 8
9
10
11
12 13
14 15 16
17
18 19
20
21
22
23
24
25
26
27
28 29
• omp_get_place_proc_ids routine, see Section 3.2.26 on page 360.
• omp_get_place_num routine, see Section 3.2.27 on page 362.
• omp_get_partition_num_places routine, see Section 3.2.28 on page 362. • omp_get_partition_place_nums routine, see Section 3.2.29 on page 363.
OMP_STACKSIZE
The OMP_STACKSIZE environment variable controls the size of the stack for threads created by the OpenMP implementation, by setting the value of the stacksize-var ICV. The environment variable does not control the size of the stack for an initial thread.
The value of this environment variable takes the form: size | sizeB | sizeK | sizeM | sizeG
where:
• •
size is a positive integer that specifies the size of the stack for threads that are created by the OpenMP implementation.
B, K, M, and G are letters that specify whether the given size is in Bytes, Kilobytes (1024 Bytes), Megabytes (1024 Kilobytes), or Gigabytes (1024 Megabytes), respectively. If one of these letters is present, there may be white space between size and the letter.
If only size is specified and none of B, K, M, or G is specified, then size is assumed to be in Kilobytes. The behavior of the program is implementation defined if OMP_STACKSIZE does not conform to
the above format, or if the implementation cannot provide a stack with the requested size. Examples:
setenv OMP_STACKSIZE 2000500B
setenv OMP_STACKSIZE “3000 k ”
setenv OMP_STACKSIZE 10M
setenv OMP_STACKSIZE ” 10 M ”
setenv OMP_STACKSIZE “20 m ”
setenv OMP_STACKSIZE ” 1G”
setenv OMP_STACKSIZE 20000
Cross References
• stacksize-var ICV, see Section 2.5 on page 63.
CHAPTER6. ENVIRONMENTVARIABLES 607
1 6.7
2 3 4
5 6
7 8
9 10 11
12
13 14
15
16 17 18 19
20 21
22 6.8 23
24
25 26 27 28
OMP_WAIT_POLICY
The OMP_WAIT_POLICY environment variable provides a hint to an OpenMP implementation about the desired behavior of waiting threads by setting the wait-policy-var ICV. A compliant OpenMP implementation may or may not abide by the setting of the environment variable.
The value of this environment variable must be one of the following: ACTIVE | PASSIVE
The ACTIVE value specifies that waiting threads should mostly be active, consuming processor cycles, while waiting. An OpenMP implementation may, for example, make waiting threads spin.
The PASSIVE value specifies that waiting threads should mostly be passive, not consuming processor cycles, while waiting. For example, an OpenMP implementation may make waiting threads yield the processor to other threads or go to sleep.
The details of the ACTIVE and PASSIVE behaviors are implementation defined.
The behavior of the program is implementation defined if the value of OMP_WAIT_POLICY is
neither ACTIVE nor PASSIVE. Examples:
Cross References
• wait-policy-var ICV, see Section 2.5 on page 63.
OMP_MAX_ACTIVE_LEVELS
The OMP_MAX_ACTIVE_LEVELS environment variable controls the maximum number of nested
active parallel regions by setting the initial value of the max-active-levels-var ICV.
The value of this environment variable must be a non-negative integer. The behavior of the program is implementation defined if the requested value of OMP_MAX_ACTIVE_LEVELS is greater than the maximum number of nested active parallel levels an implementation can support, or if the value is not a non-negative integer.
setenv OMP_WAIT_POLICY ACTIVE
setenv OMP_WAIT_POLICY active
setenv OMP_WAIT_POLICY PASSIVE
setenv OMP_WAIT_POLICY passive
608
OpenMP API – Version 5.0 November 2018
1 2 3 4
5 6.9
6 7 8 9
10 11
12 13 14 15
16
17 18
19 20 21 22 23
Cross References
• max-active-levels-var ICV, see Section 2.5 on page 63.
• omp_set_max_active_levels routine, see Section 3.2.16 on page 350. • omp_get_max_active_levels routine, see Section 3.2.17 on page 351.
OMP_NESTED
The OMP_NESTED environment variable controls nested parallelism by setting the initial value of the max-active-levels-var ICV. If the environment variable is set to true, the initial value of max-active-levels-var is set to the number of active levels of parallelism supported by the implementation. If the environment variable is set to false, the initial value of max-active-levels-var is set to 1. The behavior of the program is implementation defined if the value of OMP_NESTED is neither true nor false.
If both the OMP_NESTED and OMP_MAX_ACTIVE_LEVELS environment variables are set, the value of OMP_NESTED is false, and the value of OMP_MAX_ACTIVE_LEVELS is greater than 1, the behavior is implementation defined. Otherwise, if both environment variables are set then the OMP_NESTED environment variable has no effect.
The OMP_NESTED environment variable has been deprecated. Example:
setenv OMP_NESTED false
Cross References
• max-active-levels-var ICV, see Section 2.5 on page 63.
• omp_set_nested routine, see Section 3.2.10 on page 343.
• omp_get_team_size routine, see Section 3.2.20 on page 354.
• OMP_MAX_ACTIVE_LEVELS environment variable, see Section 6.8 on page 608.
CHAPTER6. ENVIRONMENTVARIABLES 609
1 6.10 OMP_THREAD_LIMIT
2 The OMP_THREAD_LIMIT environment variable sets the maximum number of OpenMP threads
3 to use in a contention group by setting the thread-limit-var ICV.
4 The value of this environment variable must be a positive integer. The behavior of the program is
5 implementation defined if the requested value of OMP_THREAD_LIMIT is greater than the
6 number of threads an implementation can support, or if the value is not a positive integer.
7 Cross References
8 • thread-limit-var ICV, see Section 2.5 on page 63.
9 • omp_get_thread_limit routine, see Section 3.2.14 on page 348.
10 6.11 OMP_CANCELLATION
11 12 13
14 15 16 17
18 19 20 21 22
The OMP_CANCELLATION environment variable sets the initial value of the cancel-var ICV. The value of this environment variable must be one of the following:
true | false
If set to true, the effects of the cancel construct and of cancellation points are enabled and cancellation is activated. If set to false, cancellation is disabled and the cancel construct and cancellation points are effectively ignored. The behavior of the program is implementation defined if OMP_CANCELLATION is set to neither true nor false.
Cross References
• cancel-var, see Section 2.5.1 on page 64.
• cancel construct, see Section 2.18.1 on page 263.
• cancellation point construct, see Section 2.18.2 on page 267. • omp_get_cancellation routine, see Section 3.2.9 on page 342.
610
OpenMP API – Version 5.0 November 2018
1 6.12 OMP_DISPLAY_ENV
2 The OMP_DISPLAY_ENV environment variable instructs the runtime to display the OpenMP
3 version number and the value of the ICVs associated with the environment variables described in
4 Chapter 6, as name = value pairs. The runtime displays this information once, after processing the
5 environment variables and before any user calls to change the ICV values by runtime routines
6 defined in Chapter 3.
7 The value of the OMP_DISPLAY_ENV environment variable may be set to one of these values:
8 TRUE | FALSE | VERBOSE
9 The TRUE value instructs the runtime to display the OpenMP version number defined by the
10 _OPENMP version macro (or the openmp_version Fortran parameter) value and the initial ICV
11 values for the environment variables listed in Chapter 6. The VERBOSE value indicates that the
12 runtime may also display the values of runtime variables that may be modified by vendor-specific
13 environment variables. The runtime does not display any information when the
14 OMP_DISPLAY_ENV environment variable is FALSE or undefined. For all values of the
15 environment variable other than TRUE, FALSE, and VERBOSE, the displayed information is
16 unspecified.
17 The display begins with “OPENMP DISPLAY ENVIRONMENT BEGIN”, followed by the
18 _OPENMP version macro (or the openmp_version Fortran parameter) value and ICV values, in
19 the format NAME ’=’ VALUE. NAME corresponds to the macro or environment variable name,
20 optionally prepended by a bracketed device-type. VALUE corresponds to the value of the macro or
21 ICV associated with this environment variable. Values are enclosed in single quotes. The display is
22 terminated with “OPENMP DISPLAY ENVIRONMENT END”.
23 For the OMP_NESTED environment variable, the printed value is true if the max-active-levels-var
24 ICV is initialized to a value greater than 1; otherwise the printed value is false.
25 Example:
26 % setenv OMP_DISPLAY_ENV TRUE
27 The above example causes an OpenMP implementation to generate output of the following form:
28
29
30
31
32
33
34
35
36
OPENMP DISPLAY ENVIRONMENT BEGIN
_OPENMP=’201811’
[host] OMP_SCHEDULE=’GUIDED,4’
[host] OMP_NUM_THREADS=’4,3,2’
[device] OMP_NUM_THREADS=’2’
[host,device] OMP_DYNAMIC=’TRUE’
[host] OMP_PLACES=’{0:4},{4:4},{8:4},{12:4}’
…
OPENMP DISPLAY ENVIRONMENT END
CHAPTER6. ENVIRONMENTVARIABLES 611
2 3 4 5 6 7 8 9
10 11
12
13 14
15 16 17
18 19
20 21 22
23 24
25
26
27
28
29
30
31
The OMP_DISPLAY_AFFINITY environment variable instructs the runtime to display formatted affinity information for all OpenMP threads in the parallel region upon entering the first parallel region and when any change occurs in the information accessible by the format specifiers listed in Table 6.2. If affinity of any thread in a parallel region changes then thread affinity information for all threads in that region is displayed. If the thread affinity for each respective parallel region at each nesting level has already been displayed and the thread affinity has not changed, then the information is not displayed again. There is no specific order in displaying thread affinity information for all threads in the same parallel region.
The value of the OMP_DISPLAY_AFFINITY environment variable may be set to one of these values:
TRUE | FALSE
The TRUE value instructs the runtime to display the OpenMP thread affinity information, and uses
the format setting defined in the affinity-format-var ICV.
The runtime does not display the OpenMP thread affinity information when the value of the OMP_DISPLAY_AFFINITY environment variable is FALSE or undefined. For all values of the environment variable other than TRUE or FALSE, the display action is implementation defined.
Example:
setenv OMP_DISPLAY_AFFINITY TRUE
The above example causes an OpenMP implementation to display OpenMP thread affinity information during execution of the program, in a format given by the affinity-format-var ICV. The following is a sample output:
Cross References
• Controlling OpenMP thread affinity, see Section 2.6.2 on page 80.
• omp_set_affinity_format routine, see Section 3.2.30 on page 364.
• omp_get_affinity_format routine, see Section 3.2.31 on page 366.
• omp_display_affinity routine, see Section 3.2.32 on page 367.
• omp_capture_affinity routine, see Section 3.2.33 on page 368.
• OMP_AFFINITY_FORMAT environment variable, see Section 6.14 on page 613.
1 6.13 OMP_DISPLAY_AFFINITY
612
OpenMP API – Version 5.0 November 2018
nesting_level= 1, thread_num= 0, thread_affinity= 0,1
nesting_level= 1, thread_num= 1, thread_affinity= 2,3
1 6.14 OMP_AFFINITY_FORMAT
2 The OMP_AFFINITY_FORMAT environment variable sets the initial value of the
3 affinity-format-var ICV which defines the format when displaying OpenMP thread affinity
4 information.
5 The value of this environment variable is a character string that may contain as substrings one or
6 more field specifiers, in addition to other characters. The format of each field specifier is
7 %[[[0].] size ] type
8 where an individual field specifier must contain the percent symbol (%) and a type. The type can be
9 a single character short name or its corresponding long name delimited with curly braces, such as
10 %n or %{thread_num}. A literal percent is specified as %%. Field specifiers can be provided in
11 any order.
12 The 0 modifier indicates whether or not to add leading zeros to the output, following any indication
13 of sign or base. The . modifier indicates the output should be right justified when size is specified.
14 By default, output is left justified. The minimum field length is size, which is a decimal digit string
15 with a non-zero first digit. If no size is specified, the actual length needed to print the field will be
16 used. If the 0 modifier is used with type of A, {thread_affinity}, H, {host}, or a type that
17 is not printed as a number, the result is unspecified. Any other characters in the format string that
18 are not part of a field specifier will be included literally in the output.
TABLE 6.2: Available Field Types for Formatting OpenMP Thread Affinity Information
Short Long Name Name
t team_num
T num_teams
L nesting_level
n thread_num
N num_threads
a ancestor_tnum
table continued on next page
Meaning
The value The value The value The value The value
returned by omp_get_team_num(). returned by omp_get_num_teams(). returned by omp_get_level(). returned by omp_get_thread_num(). returned by omp_get_num_threads().
The value omp_get_ancestor_thread_num(level), where level is omp_get_level() minus 1.
returned by
CHAPTER6. ENVIRONMENTVARIABLES 613
table continued from previous page
Short Long Name Name
H host
P process_id
i native_thread_id
A thread_affinity
Meaning
The name for the host machine on which the OpenMP program is running.
The process identifier used by the implementation.
The native thread identifier used by the implementation.
The list of numerical identifiers, in the format of a comma- separated list of integers or integer ranges, that represent processors on which a thread may execute, subject to OpenMP thread affinity control and/or other external affinity mechanisms.
1 2 3
4
5 6
7 8
9 10
11
12
13
14
15
16
17
Implementations may define additional field types. If an implementation does not have information for a field type, “undefined” is printed for this field when displaying the OpenMP thread affinity information.
Example:
The above example causes an OpenMP implementation to display OpenMP thread affinity information in the following form:
Cross References
• Controlling OpenMP thread affinity, see Section 2.6.2 on page 80.
• omp_set_affinity_format routine, see Section 3.2.30 on page 364.
• omp_get_affinity_format routine, see Section 3.2.31 on page 366.
• omp_display_affinity routine, see Section 3.2.32 on page 367.
• omp_capture_affinity routine, see Section 3.2.33 on page 368.
• OMP_DISPLAY_AFFINITY environment variable, see Section 6.13 on page 612.
setenv OMP_AFFINITY_FORMAT
“Thread Affinity: %0.3L %.8n %.15{thread_affinity} %.12H”
Thread Affinity: 001 0 0-1,16-17 nid003
Thread Affinity: 001 1 2-3,18-19 nid003
614
OpenMP API – Version 5.0 November 2018
1 6.15 OMP_DEFAULT_DEVICE
2 The OMP_DEFAULT_DEVICE environment variable sets the device number to use in device
3 constructs by setting the initial value of the default-device-var ICV.
4 The value of this environment variable must be a non-negative integer value.
5 Cross References
6 • default-device-var ICV, see Section 2.5 on page 63.
7 • device directives, Section 2.12 on page 160.
8 6.16 OMP_MAX_TASK_PRIORITY
9 The OMP_MAX_TASK_PRIORITY environment variable controls the use of task priorities by
10 setting the initial value of the max-task-priority-var ICV. The value of this environment variable
11 must be a non-negative integer.
12 Example:
13 % setenv OMP_MAX_TASK_PRIORITY 20
14 Cross References
15 • max-task-priority-var ICV, see Section 2.5 on page 63.
16 • Tasking Constructs, see Section 2.10 on page 135.
17 • omp_get_max_task_priority routine, see Section 3.2.42 on page 377.
18 6.17 OMP_TARGET_OFFLOAD
19 The OMP_TARGET_OFFLOAD environment variable sets the initial value of the target-offload-var
20 ICV. The value of the OMP_TARGET_OFFLOAD environment variable must be one of the
21 following:
22 MANDATORY | DISABLED | DEFAULT
CHAPTER6. ENVIRONMENTVARIABLES 615
1 The MANDATORY value specifies that program execution is terminated if a device construct or
2 device memory routine is encountered and the device is not available or is not supported by the
3 implementation. Support for the DISABLED value is implementation defined. If an
4 implementation supports it, the behavior is as if the only device is the host device.
5 The DEFAULT value specifies the default behavior as described in Section 1.3 on page 20.
6 Example:
7 % setenv OMP_TARGET_OFFLOAD MANDATORY
8 Cross References
9 • target-offload-var ICV, see Section 2.5 on page 63.
10 • Device Directives, see Section 2.12 on page 160.
11 • Device Memory Routines, see Section 3.6 on page 397.
12 6.18 OMP_TOOL
13 14
15 16
17 18
19 20
21 22 23
The OMP_TOOL environment variable sets the tool-var ICV, which controls whether an OpenMP runtime will try to register a first party tool.
The value of this environment variable must be one of the following: enabled | disabled
If OMP_TOOL is set to any value other than enabled or disabled, the behavior is unspecified. If OMP_TOOL is not defined, the default value for tool-var is enabled.
Example:
% setenv OMP_TOOL enabled
Cross References
• tool-var ICV, see Section 2.5 on page 63.
• OMPT Interface, see Chapter 4 on page 419.
616
OpenMP API – Version 5.0 November 2018
1 6.19 OMP_TOOL_LIBRARIES
2 The OMP_TOOL_LIBRARIES environment variable sets the tool-libraries-var ICV to a list of tool
3 libraries that are considered for use on a device on which an OpenMP implementation is being
4 initialized. The value of this environment variable must be a list of names of dynamically-loadable
5 libraries, separated by an implementation specific, platform typical separator.
6 If the tool-var ICV is not enabled, the value of tool-libraries-var is ignored. Otherwise, if
7 ompt_start_tool is not visible in the address space on a device where OpenMP is being
8 initialized or if ompt_start_tool returns NULL, an OpenMP implementation will consider
9 libraries in the tool-libraries-var list in a left to right order. The OpenMP implementation will
10 search the list for a library that meets two criteria: it can be dynamically loaded on the current
11 device and it defines the symbol ompt_start_tool. If an OpenMP implementation finds a
12 suitable library, no further libraries in the list will be considered.
13 Example:
14
15
16 Cross References
17 • tool-libraries-var ICV, see Section 2.5 on page 63.
18 • OMPT Interface, see Chapter 4 on page 419.
19 • ompt_start_tool routine, see Section 4.2.1 on page 420.
20 6.20 OMP_DEBUG
21 The OMP_DEBUG environment variable sets the debug-var ICV, which controls whether an
22 OpenMP runtime collects information that an OMPD library may need to support a tool.
23 The value of this environment variable must be one of the following:
24 enabled | disabled
25 If OMP_DEBUG is set to any value other than enabled or disabled then the behavior is
26 implementation defined.
27 Example:
28 % setenv OMP_DEBUG enabled
% setenv OMP_TOOL_LIBRARIES libtoolXY64.so:/usr/local/lib/
libtoolXY32.so
CHAPTER6. ENVIRONMENTVARIABLES 617
1 Cross References
2 • debug-var ICV, see Section 2.5 on page 63.
3 • OMPD Interface, see Chapter 5 on page 533.
4 • Enabling the Runtime for OMPD, see Section 5.2.1 on page 534.
5 6.21 OMP_ALLOCATOR
6 7 8 9
10 11 12 13 14
OMP_ALLOCATOR sets the def-allocator-var ICV that specifies the default allocator for allocation calls, directives and clauses that do not specify an allocator. The value of this environment variable is a predefined allocator from Table 2.10 on page 155. The value of this environment variable is not case sensitive.
Cross References
• def-allocator-var ICV, see Section 2.5 on page 63.
• Memory allocators, see Section 2.11.2 on page 152.
• omp_set_default_allocator routine, see Section 3.7.4 on page 411. • omp_get_default_allocator routine, see Section 3.7.5 on page 412.
618
OpenMP API – Version 5.0 November 2018
APPENDIX A
1 OpenMP Implementation-Defined
2 Behaviors
3
4 This appendix summarizes the behaviors that are described as implementation defined in this API.
5 Each behavior is cross-referenced back to its description in the main specification. An
6 implementation is required to define and to document its behavior in these cases.
7 •
8 •
9 •
10
11 • 12
13
14
15 • 16
17 • 18
19 • 20
21
22
23
24 • 25
26
Processor: a hardware unit that is implementation defined (see Section 1.2.1 on page 2). Device: an implementation defined logical execution engine (see Section 1.2.1 on page 2).
Device address: reference to an address in a device data environment (see Section 1.2.6 on page 12).
Memory model: the minimum size at which a memory update may also read and write back adjacent variables that are part of another variable (as array or structure elements) is implementation defined but is no larger than required by the base language (see Section 1.4.1 on page 23).
requires directive: support of requirements is implementation defined. All implementation-defined requirements should begin with ext_ (see Section 2.4 on page 60).
Requires directive: Support for any feature specified by a requirement clause on a requires directive is implementation defined (see Section 2.4 on page 60).
Internal control variables: the initial values of dyn-var, nthreads-var, run-sched-var, def-sched-var, bind-var, stacksize-var, wait-policy-var, thread-limit-var, max-active-levels-var, place-partition-var, affinity-format-var, default-device-var and def-allocator-var are implementation defined. The method for initializing a target device’s internal control variable is implementation defined (see Section 2.5.2 on page 66).
OpenMP context: the accepted isa-name values for the isa trait, the accepted arch-name values for the arch trait, and the accepted extension-name values for the extension trait are implementation defined (see Section 2.3.1 on page 51).
619
1 2 3
4 5 6 7
8
9 10 11 12 13
14
15
16
17
18
19
20 21
22 23
24 25 26 27 28
29 30 31 32 33
34 35 36 37
38 39
•
•
•
•
• • •
•
•
•
declare variant directive: whether, for some specific OpenMP context, the prototype of the variant should differ from that of the base function, and if so how it should differ, is implementation defined (see Section 2.3.5 on page 58).
Dynamic adjustment of threads: providing the ability to adjust the number of threads dynamically is implementation defined. Implementations are allowed to deliver fewer threads (but at least one) than indicated in Algorithm 2.1 even if dynamic adjustment is disabled (see Section 2.6.1 on page 78).
Thread affinity: For the close thread affinity policy, if T > P and P does not divide T evenly, the exact number of threads in a particular place is implementation defined. For the spread thread affinity, if T > P and P does not divide T evenly, the exact number of threads in a particular subpartition is implementation defined. The determination of whether the affinity request can be fulfilled is implementation defined. If not, the mapping of threads in the team to places is implementation defined (see Section 2.6.2 on page 80).
teams construct: the number of teams that are created is implementation defined but less than or equal to the value of the num_teams clause if specified. The maximum number of threads that participate in the contention group that each team initiates is implementation defined but less than or equal to the value of the thread_limit clause if specified. The assignment of the initial threads to places and the values of the place-partition-var and default-device-var ICVs for each initial thread are implementation defined (see Section 2.7 on page 82).
sections construct: the method of scheduling the structured blocks among threads in the team is implementation defined (see Section 2.8.1 on page 86).
single construct: the method of choosing a thread to execute the structured block is implementation defined (see Section 2.8.2 on page 89)
Worksharing-Loop directive: the integer type (or kind, for Fortran) used to compute the iteration count of a collapsed loop is implementation defined. The effect of the schedule(runtime) clause when the run-sched-var ICV is set to auto is implementation defined. The value of simd_width for the simd schedule modifier is implementation defined (see Section 2.9.2 on page 101).
simd construct: the integer type (or kind, for Fortran) used to compute the iteration count for the collapsed loop is implementation defined. The number of iterations that are executed concurrently at any given time is implementation defined. If the alignment parameter is not specified in the aligned clause, the default alignments for the SIMD instructions are implementation defined (see Section 2.9.3.1 on page 110).
declare simd directive: if the parameter of the simdlen clause is not a constant positive integer expression, the number of concurrent arguments for the function is implementation defined. If the alignment parameter of the aligned clause is not specified, the default alignments for SIMD instructions are implementation defined (see Section 2.9.3.3 on page 116).
distribute construct: the integer type (or kind, for Fortran) used to compute the iteration count for the collapsed loop is implementation defined. If no dist_schedule clause is
620
OpenMP API – Version 5.0 November 2018
1 2
3 • 4
5
6
7 • 8
9
10 • 11
12 • 13
14
15
16 17
18 • 19
20 • 21
22
23 • 24
25
26
27
28 • 29
30
31 • 32
33
34
specified then the schedule for the distribute construct is implementation defined (see Section 2.9.4.1 on page 120).
taskloop construct: The number of loop iterations assigned to a task created from a taskloop construct is implementation defined, unless the grainsize or num_tasks clause is specified. The integer type (or kind, for Fortran) used to compute the iteration count for the collapsed loop is implementation defined (see Section 2.10.2 on page 140).
C++
taskloop construct: For firstprivate variables of class type, the number of invocations of copy constructors to perform the initialization is implementation defined (see Section 2.10.2 on page 140).
C++
Memory spaces: The actual storage resource that each memory space defined in Table 2.8 on page 152 represents is implementation defined.
Memory allocators: The minimum partitioning size for partitioning of allocated memory over the storage resources is implementation defined (see Section 2.11.2 on page 152). The default value for the pool_size allocator trait is implementation defined (see Table 2.9 on page 153). The associated memory space for each of the predefined omp_cgroup_mem_alloc, omp_pteam_mem_alloc and omp_thread_mem_alloc allocators is implementation defined (see Table 2.10 on page 155).
is_device_ptr clause: Support for pointers created outside of the OpenMP device data management routines is implementation defined (see Section 2.12.5 on page 170).
target construct: the effect of invoking a virtual member function of an object on a device other than the device on which the object was constructed is implementation defined (see Section 2.12.5 on page 170).
atomic construct: a compliant implementation may enforce exclusive access between atomic regions that update different storage locations. The circumstances under which this occurs are implementation defined. If the storage location designated by x is not size-aligned (that is, if the byte alignment of x is not a multiple of the size of x), then the behavior of the atomic region is implementation defined (see Section 2.17.7 on page 234).
Fortran
Data-sharing attributes: The data-sharing attributes of dummy arguments without the VALUE attribute are implementation-defined if the associated actual argument is shared, except for the conditions specified (see Section 2.19.1.2 on page 273).
threadprivate directive: if the conditions for values of data in the threadprivate objects of threads (other than an initial thread) to persist between two consecutive active parallel regions do not all hold, the allocation status of an allocatable variable in the second region is implementation defined (see Section 2.19.2 on page 274).
APPENDIX A. OPENMP IMPLEMENTATION-DEFINED BEHAVIORS 621
1 2 3 4
5 6
7 8 9
10 11 12
13 14 15 16 17
18 19 20 21
22 23 24 25
26 27 28 29 30
31 32 33 34
35 36
•
• •
•
•
•
•
•
•
•
Runtime library definitions: it is implementation defined whether the include file omp_lib.h or the module omp_lib (or both) is provided. It is implementation defined whether any of the OpenMP runtime library routines that take an argument are extended with a generic interface so arguments of different KIND type can be accommodated (see Section 3.1 on page 332).
Fortran
omp_set_num_threads routine: if the argument is not a positive integer the behavior is implementation defined (see Section 3.2.1 on page 334).
omp_set_schedule routine: for implementation specific schedule kinds, the values and associated meanings of the second argument are implementation defined (see Section 3.2.12 on page 345).
omp_get_supported_active_levels routine: the number of active levels of parallelism supported by the implementation is implementation defined, but must be greater than 0 (see Section 3.2.15 on page 349).
omp_set_max_active_levels routine: when called from within any explicit parallel region the binding thread set (and binding region, if required) for the omp_set_max_active_levels region is implementation defined and the behavior is implementation defined. If the argument is not a non-negative integer then the behavior is implementation defined (see Section 3.2.16 on page 350).
omp_get_max_active_levels routine: when called from within any explicit parallel region the binding thread set (and binding region, if required) for the omp_get_max_active_levels region is implementation defined (see Section 3.2.17 on page 351).
omp_get_place_proc_ids routine: the meaning of the non-negative numerical identifiers returned by the omp_get_place_proc_ids routine is implementation defined. The order of the numerical identifiers returned in the array ids is implementation defined (see Section 3.2.26 on page 360).
omp_set_affinity_format routine: when called from within any explicit parallel region, the binding thread set (and binding region, if required) for the omp_set_affinity_format region is implementation defined and the behavior is implementation defined. If the argument does not conform to the specified format then the result is implementation defined (see Section 3.2.30 on page 364).
omp_get_affinity_format routine: when called from within any explicit parallel region the binding thread set (and binding region, if required) for the omp_get_affinity_format region is implementation defined (see Section 3.2.31 on page 366).
omp_display_affinity routine: if the argument does not conform to the specified format then the result is implementation defined (see Section 3.2.32 on page 367).
622
OpenMP API – Version 5.0 November 2018
1 • 2
3 • 4
5 • 6
7 • 8
9
10
11
12
13
14
15
16
17
18 • 19
20
21 • 22
23
24
25 • 26
27
28 • 29
30 • 31
32
33 • 34
35 • 36
37 • 38
omp_capture_affinity routine: if the format argument does not conform to the specified format then the result is implementation defined (see Section 3.2.33 on page 368).
omp_get_initial_device routine: the value of the device number of the host device is implementation defined (see Section 3.2.41 on page 376).
omp_target_memcpy_rect routine: the maximum number of dimensions supported is implementation defined, but must be at least three (see Section 3.6.5 on page 402).
ompt_callback_sync_region_wait, ompt_callback_mutex_released, ompt_callback_dependences, ompt_callback_task_dependence, ompt_callback_work, ompt_callback_master, ompt_callback_target_map, ompt_callback_sync_region, ompt_callback_lock_init, ompt_callback_lock_destroy, ompt_callback_mutex_acquire, ompt_callback_mutex_acquired, ompt_callback_nest_lock, ompt_callback_flush, ompt_callback_cancel and ompt_callback_dispatch tool callbacks: if a tool attempts to register a callback with the string name using the runtime entry point ompt_set_callback, it is implementation defined whether the registered callback may never or sometimes invoke this callback for the associated events (see Table 4.2 on page 428)
Device tracing: Whether a target device supports tracing or not is implementation defined; if a target device does not support tracing, a NULL may be supplied for the lookup function to a tool’s device initializer (see Section 4.2.5 on page 427).
ompt_set_trace_ompt and ompt_buffer_get_record_ompt runtime entry points: it is implementation defined whether a device-specific tracing interface will define this runtime entry point, indicating that it can collect traces in OMPT format. The kinds of trace records available for a device is implementation defined (see Section 4.2.5 on page 427).
ompt_callback_target_data_op_t callback type: it is implementation defined whether in some operations src_addr or dest_addr might point to an intermediate buffer (see Section 4.5.2.25 on page 488).
ompt_set_callback_t entry point type: the subset of the associated event in which the callback is invoked is implementation defined (see Section 4.6.1.3 on page 500).
ompt_get_place_proc_ids_t entry point type: the meaning of the numerical identifiers returned is implementation defined. The order of ids returned in the array is implementation defined (see Section 4.6.1.8 on page 505).
ompt_get_partition_place_nums_t entry point type: the order of the identifiers returned in the array place_nums is implementation defined (see Section 4.6.1.10 on page 507).
ompt_get_proc_id_t entry point type: the meaning of the numerical identifier returned is implementation defined (see Section 4.6.1.11 on page 508).
ompd_callback_print_string_fn_t callback function: the value of catergory is implementation defined (see Section 5.4.5 on page 556).
APPENDIX A. OPENMP IMPLEMENTATION-DEFINED BEHAVIORS 623
1 2
3 4
5 6
7 8
9 10 11
12 13
14 15 16 17
18
19
20
21
22
23
24
25
26
27
28
29
30 31 32
33 34
35 36 37
• • • • •
• •
•
ompd_parallel_handle_compare operation: the means by which parallel region handles are ordered is implementation defined (see Section 5.5.6.5 on page 575).
ompd_task_handle_compare operation: the means by which task handles are ordered is implementation defined (see Section 5.5.7.6 on page 580).
OMPT thread states: The set of OMPT thread states supported is implementation defined (see Section 4.4.4.26 on page 452).
OMP_SCHEDULE environment variable: if the value does not conform to the specified format then the result is implementation defined (see Section 6.1 on page 601).
OMP_NUM_THREADS environment variable: if any value of the list specified leads to a number of threads that is greater than the implementation can support, or if any value is not a positive integer, then the result is implementation defined (see Section 6.2 on page 602).
OMP_DYNAMIC environment variable: if the value is neither true nor false the behavior is implementation defined (see Section 6.3 on page 603).
OMP_PROC_BIND environment variable: if the value is not true, false, or a comma separated list of master, close, or spread, the behavior is implementation defined. The behavior is also implementation defined if an initial thread cannot be bound to the first place in the OpenMP place list (see Section 6.4 on page 604).
OMP_PLACES environment variable: the meaning of the numbers specified in the environment variable and how the numbering is done are implementation defined. The precise definitions of the abstract names are implementation defined. An implementation may add implementation-defined abstract names as appropriate for the target platform. When creating a place list of n elements by appending the number n to an abstract name, the determination of which resources to include in the place list is implementation defined. When requesting more resources than available, the length of the place list is also implementation defined. The behavior of the program is implementation defined when the execution environment cannot map a numerical value (either explicitly defined or implicitly derived from an interval) within the OMP_PLACES list to a processor on the target platform, or if it maps to an unavailable processor. The behavior is also implementation defined when the OMP_PLACES environment variable is defined using an abstract name (see Section 6.5 on page 605).
OMP_STACKSIZE environment variable: if the value does not conform to the specified format or the implementation cannot provide a stack of the specified size then the behavior is implementation defined (see Section 6.6 on page 607).
OMP_WAIT_POLICY environment variable: the details of the ACTIVE and PASSIVE behaviors are implementation defined (see Section 6.7 on page 608).
OMP_MAX_ACTIVE_LEVELS environment variable: if the value is not a non-negative integer or is greater than the number of parallel levels an implementation can support then the behavior is implementation defined (see Section 6.8 on page 608).
624
OpenMP API – Version 5.0 November 2018
•
• •
1 • 2
3 • 4
5
6
7 • 8
9
10 • 11
12
13 • 14
15 • 16
17 • 18
OMP_NESTED environment variable: if the value is neither true nor false the behavior is implementation defined (see Section 6.9 on page 609).
Conflicting OMP_NESTED and OMP_MAX_ACTIVE_LEVELS environment variables: if both environment variables are set, the value of OMP_NESTED is false, and the value of OMP_MAX_ACTIVE_LEVELS is greater than 1, the behavior is implementation defined (see Section 6.9 on page 609).
OMP_THREAD_LIMIT environment variable: if the requested value is greater than the number of threads an implementation can support, or if the value is not a positive integer, the behavior of the program is implementation defined (see Section 6.10 on page 610).
OMP_DISPLAY_AFFINITY environment variable: for all values of the environment variables other than TRUE or FALSE, the display action is implementation defined (see Section 6.13 on page 612).
OMP_AFFINITY_FORMAT environment variable: if the value does not conform to the specified format then the result is implementation defined (see Section 6.14 on page 613).
OMP_TARGET_OFFLOAD environment variable: the support of disabled is implementation defined (see Section 6.17 on page 615).
OMP_DEBUG environment variable: if the value is neither disabled nor enabled the behavior is implementation defined (see Section 6.20 on page 617).
APPENDIX A. OPENMP IMPLEMENTATION-DEFINED BEHAVIORS 625
This page intentionally left blank
APPENDIX B
1 2
3
4 B.1 5
6 7
8 9
10
11 12 13
14 B.2
15 16 17
Features History
This appendix summarizes the major changes between OpenMP API versions since version 2.5.
Deprecated Features
The following features have been deprecated in Version 5.0.
• The nest-var ICV, the OMP_NESTED environment variable, and the omp_set_nested and
•
omp_get_nested routines were deprecated.
Lock hints were renamed to synchronization hints. The following lock hint type and constants
were deprecated:
– the C/C++ type omp_lock_hint_t and the Fortran kind omp_lock_hint_kind;
– the constants omp_lock_hint_none, omp_lock_hint_uncontended, omp_lock_hint_contended, omp_lock_hint_nonspeculative, and omp_lock_hint_speculative.
Version 4.5 to 5.0 Differences
• The memory model was extended to distinguish different types of flush operations according to specified flush properties (see Section 1.4.4 on page 25) and to define a happens before order based on synchronizing flush operations (see Section 1.4.5 on page 27).
627
1 2
3
4 5
6 7 8
9 10 11 12 13
14 15 16 17 18
19 20
21 22 23 24
25 26
27 28
29 30
31 32 33
34 35 36 37
•
• •
•
•
•
• •
• • • •
•
Various changes throughout the specification were made to provide initial support of C11, C++11, C++14, C++17 and Fortran 2008 (see Section 1.7 on page 31).
Fortran 2003 is now fully supported (see Section 1.7 on page 31).
The requires directive (see Section 2.4 on page 60) was added to support applications that
require implementation-specific features.
The target-offload-var internal control variable (see Section 2.5 on page 63) and the OMP_TARGET_OFFLOAD environment variable (see Section 6.17 on page 615) were added to support runtime control of the execution of device constructs.
Control over whether nested parallelism is enabled or disabled was integrated into the max-active-levels-var internal control variable (see Section 2.5.2 on page 66), the default value of which is now implementation defined, unless determined according to the values of the OMP_NUM_THREADS (see Section 6.2 on page 602) or OMP_PROC_BIND (see Section 6.4 on page 604) environment variables.
Support for array shaping (see Section 2.1.4 on page 43) and for array sections with non-unit strides in C and C++ (see Section 2.1.5 on page 44) was added to facilitate specification of discontiguous storage and the target update construct (see Section 2.12.6 on page 176) and the depend clause (see Section 2.17.11 on page 255) were extended to allow the use of shape-operators (see Section 2.1.4 on page 43).
Iterators (see Section 2.1.6 on page 47) were added to support expressions in a list that expand to multiple expressions.
The metadirective directive (see Section 2.3.4 on page 56) and declare variant directive (see Section 2.3.5 on page 58) were added to support selection of directive variants and declared function variants at a callsite, respectively, based on compile-time traits of the enclosing context.
The teams construct (see Section 2.7 on page 82) was extended to support execution on the host device without an enclosing target construct (see Section 2.12.5 on page 170).
The canonical loop form was defined for Fortran and, for all base languages, extended to permit non-rectangular loop nests (see Section 2.9.1 on page 95).
The relational-op in the canonical loop form for C/C++ was extended to include != (see Section 2.9.1 on page 95).
The default loop schedule modifier for worksharing-loop constructs without the static schedule and the ordered clause was changed to nonmonotonic (see Section 2.9.2 on page 101).
The collapse of associated loops that are imperfectly nested loops was defined for the worksharing-loop (see Section 2.9.2 on page 101), simd (see Section 2.9.3.1 on page 110), taskloop (see Section 2.10.2 on page 140) and distribute (see Section 2.9.4.2 on page 123) constructs.
628
OpenMP API – Version 5.0 November 2018
1 • 2
3
4 • 5
6
7 • 8
9
10 • 11
12
13
14
15 • 16
17 • 18
19
20 • 21
22
23
24 • 25
26 • 27
28 • 29
30
31
32 • 33
34
35 • 36
The simd construct (see Section 2.9.3.1 on page 110) was extended to accept the if, nontemporal and order(concurrent) clauses and to allow the use of atomic constructs within it.
The loop construct and the order(concurrent) clause were added to support compiler optimization and parallelization of loops for which iterations may execute in any order, including concurrently (see Section 2.9.5 on page 128).
The scan directive (see Section 2.9.6 on page 132) and the inscan modifier for the reduction clause (see Section 2.19.5.4 on page 300) were added to support inclusive and exclusive scan computations.
To support task reductions, the task (see Section 2.10.1 on page 135) and target (see Section 2.12.5 on page 170) constructs were extended to accept the in_reduction clause (see Section 2.19.5.6 on page 303), the taskgroup construct (see Section 2.17.6 on page 232) was extended to accept the task_reduction clause Section 2.19.5.5 on page 303), and the task modifier was added to the reduction clause (see Section 2.19.5.4 on page 300).
The affinity clause was added to the task construct (see Section 2.10.1 on page 135) to support hints that indicate data affinity of explicit tasks.
The detach clause for the task construct (see Section 2.10.1 on page 135) and the omp_fulfill_event runtime routine (see Section 3.5.1 on page 396) were added to support execution of detachable tasks.
To support taskloop reductions, the taskloop (see Section 2.10.2 on page 140) and taskloop simd (see Section 2.10.3 on page 146) constructs were extended to accept the reduction (see Section 2.19.5.4 on page 300) and in_reduction (see Section 2.19.5.6 on page 303) clauses.
The taskloop construct (see Section 2.10.2 on page 140) was added to the list of constructs that can be canceled by the cancel construct (see Section 2.18.1 on page 263)).
To support mutually exclusive inout sets, a mutexinoutset dependence-type was added to the depend clause (see Section 2.10.6 on page 149 and Section 2.17.11 on page 255).
Predefined memory spaces (see Section 2.11.1 on page 152), predefined memory allocators and allocator traits (see Section 2.11.2 on page 152) and directives, clauses (see Section 2.11 on page 152 and API routines (see Section 3.7 on page 406) to use them were added to support different kinds of memories.
The semantics of the use_device_ptr clause for pointer variables was clarified and the use_device_addr clause for using the device address of non-pointer variables inside the target data construct was added (see Section 2.12.2 on page 161).
To support reverse offload, the ancestor modifier was added to the device clause for target constructs (see Section 2.12.5 on page 170).
APPENDIX B. FEATURES HISTORY 629
1 2 3
4 5
6 7 8 9
10
11 12 13 14
15
16 17 18 19 20
21
22 23
24 25
26 27
28 29 30 31
32 33 34 35
36 37
•
• •
• •
• •
• •
• • •
•
•
To reduce programmer effort implicit declare target directives for some functions (C, C++, Fortran) and subroutines (Fortran) were added (see Section 2.12.5 on page 170 and Section 2.12.7 on page 180).
The target update construct (see Section 2.12.6 on page 176) was modified to allow array sections that specify discontiguous storage.
The to and from clauses on the target update construct (see Section 2.12.6 on page 176), the depend clause on task generating constructs (see Section 2.17.11 on page 255), and the map clause (see Section 2.19.7.1 on page 315) were extended to allow any lvalue expression as a list item for C/C++.
Support for nested declare target directives was added (see Section 2.12.7 on page 180).
New combined constructs master taskloop (see Section 2.13.7 on page 192),
parallel master (see Section 2.13.6 on page 191), parallel master taskloop (see Section 2.13.9 on page 195), master taskloop simd (see Section 2.13.8 on page 194), parallel master taskloop simd (see Section 2.13.10 on page 196) were added.
The depend clause was added to the taskwait construct (see Section 2.17.5 on page 230).
To support acquire and release semantics with weak memory ordering, the acq_rel, acquire, and release clauses were added to the atomic construct (see Section 2.17.7 on page 234) and flush construct (see Section 2.17.8 on page 242), and the memory ordering semantics of implicit flushes on various constructs and runtime routines were clarified (see Section 2.17.8.1 on page 246).
The atomic construct was extended with the hint clause (see Section 2.17.7 on page 234). The depend clause (see Section 2.17.11 on page 255) was extended to support iterators and to
support depend objects that can be created with the new depobj construct.
Lock hints were renamed to synchronization hints, and the old names were deprecated (see
Section 2.17.12 on page 260).
To support conditional assignment to lastprivate variables, the conditional modifier was
added to the lastprivate clause (see Section 2.19.4.5 on page 288).
The description of the map clause was modified to clarify the mapping order when multiple map-types are specified for a variable or structure members of a variable on the same construct. The close map-type-modifier was added as a hint for the runtime to allocate memory close to the target device (see Section 2.19.7.1 on page 315).
The capability to map C/C++ pointer variables and to assign the address of device memory that is mapped by an array section to them was added. Support for mapping of Fortran pointer and allocatable variables, including pointer and allocatable components of variables, was added (see Section 2.19.7.1 on page 315).
The defaultmap clause (see Section 2.19.7.2 on page 324) was extended to allow selecting the data-mapping or data-sharing attributes for any of the scalar, aggregate, pointer or allocatable
630
OpenMP API – Version 5.0 November 2018
1 2
3 4
5 6 7
8 9
10
11
12
13
14
15
16 17
18 19 20
21 22
23 24
25 26
27 B.3 28
29
30 31
• •
• •
• •
• • •
• •
classes on a per-region basis. Additionally it accepts the none parameter to support the requirement that all variables referenced in the construct must be explicitly mapped or privatized.
The declare mapper directive was added to support mapping of data types with direct and indirect members (see Section 2.19.7.3 on page 326).
The omp_set_nested (see Section 3.2.10 on page 343) and omp_get_nested (see Section 3.2.11 on page 344) routines and the OMP_NESTED environment variable (see Section 6.9 on page 609) were deprecated.
The omp_get_supported_active_levels routine was added to query the number of active levels of parallelism supported by the implementation (see Section 3.2.15 on page 349).
Runtime routines omp_set_affinity_format (see Section 3.2.30 on page 364), omp_get_affinity_format (see Section 3.2.31 on page 366), omp_set_affinity (see Section 3.2.32 on page 367), and omp_capture_affinity (see Section 3.2.33 on page 368) and environment variables OMP_DISPLAY_AFFINITY (see Section 6.13 on page 612) and OMP_AFFINITY_FORMAT (see Section 6.14 on page 613) were added to provide OpenMP runtime thread affinity information.
The omp_get_device_num runtime routine (see Section 3.2.37 on page 372) was added to support determination of the device on which a thread is executing.
The omp_pause_resource and omp_pause_resource_all runtime routines were added to allow the runtime to relinquish resources used by OpenMP (see Section 3.2.43 on page 378 and Section 3.2.44 on page 380).
Support for a first-party tool interface (see Section 4 on page 419) was added. Support for a third-party tool interface (see Section 5 on page 533) was added.
Support for controlling offloading behavior with the OMP_TARGET_OFFLOAD environment variable was added (see Section 6.17 on page 615).
Stubs for Runtime Library Routines(previously Appendix A) were moved to a separate document. Interface Declarations (previously Appendix B) were moved to a separate document.
Version 4.0 to 4.5 Differences
• •
Support for several features of Fortran 2003 was added (see Section 1.7 on page 31 for features that are still not supported).
A parameter was added to the ordered clause of the worksharing-loop construct (see Section 2.9.2 on page 101) and clauses were added to the ordered construct (see
APPENDIX B. FEATURES HISTORY 631
1 2
3 4
5 6
7 8 9
10 11 12
13 14
15 16 17
18 19
20 21
22 23 24
25 26 27
28 29 30 31
32 33
34
35 36
• • •
• •
• • •
•
•
•
• •
Section 2.17.9 on page 250) to support doacross loop nests and use of the simd construct on loops with loop-carried backward dependences.
The linear clause was added to the worksharing-loop construct (see Section 2.9.2 on page 101).
The simdlen clause was added to the simd construct (see Section 2.9.3.1 on page 110) to support specification of the exact number of iterations desired per SIMD chunk.
The priority clause was added to the task construct (see Section 2.10.1 on page 135) to support hints that specify the relative execution priority of explicit tasks. The omp_get_max_task_priority routine was added to return the maximum supported priority value (see Section 3.2.42 on page 377) and the OMP_MAX_TASK_PRIORITY environment variable was added to control the maximum priority value allowed (see
Section 6.16 on page 615).
Taskloop constructs (see Section 2.10.2 on page 140 and Section 2.10.3 on page 146) were added
to support nestable parallel loops that create OpenMP tasks.
To support interaction with native device implementations, the use_device_ptr clause was added to the target data construct (see Section 2.12.2 on page 161) and the is_device_ptr clause was added to the target construct (see Section 2.12.5 on page 170).
The nowait and depend clauses were added to the target construct (see Section 2.12.5 on page 170) to improve support for asynchronous execution of target regions.
The private, firstprivate and defaultmap clauses were added to the target construct (see Section 2.12.5 on page 170).
The declare target directive was extended to allow mapping of global variables to be deferred to specific device executions and to allow an extended-list to be specified in C/C++ (see Section 2.12.7 on page 180).
To support unstructured data mapping for devices, the target enter data (see Section 2.12.3 on page 164) and target exit data (see Section 2.12.4 on page 166) constructs were added and the map clause (see Section 2.19.7.1 on page 315) was updated.
To support a more complete set of device construct shortcuts, the target parallel (see Section 2.13.16 on page 203), target parallel worksharing-loop (see Section 2.13.17 on
page 205), target parallel worksharing-loop SIMD (see Section 2.13.18 on page 206), and target simd (see Section 2.13.20 on page 209), combined constructs were added.
The if clause was extended to take a directive-name-modifier that allows it to apply to combined constructs (see Section 2.15 on page 220).
The hint clause was addded to the critical construct (see Section 2.17.1 on page 223). The source and sink dependence types were added to the depend clause (see
Section 2.17.11 on page 255) to support doacross loop nests.
632
OpenMP API – Version 5.0 November 2018
1 2
3 4
5 6
7 8
9 10
11 12
13 14 15
16 17
18
19 B.4 20
21 22
23 24 25
26
27 28
29 30 31
• The implicit data-sharing attribute for scalar variables in target regions was changed to firstprivate (see Section 2.19.1.1 on page 270).
• Use of some C++ reference types was allowed in some data sharing attribute clauses (see Section 2.19.4 on page 282).
• Semantics for reductions on C/C++ array sections were added and restrictions on the use of arrays and pointers in reductions were removed (see Section 2.19.5.4 on page 300).
• The ref, val, and uval modifiers were added to the linear clause (see Section 2.19.4.6 on page 290).
• Support was added to the map clauses to handle structure elements (see Section 2.19.7.1 on page 315).
• Query functions for OpenMP thread affinity were added (see Section 3.2.24 on page 358 to Section 3.2.29 on page 363).
• The lock API was extended with lock routines that support storing a hint with a lock to select a desired lock implementation for a lock’s intended usage by the application code (see
Section 3.3.2 on page 385).
• Device memory routines were added to allow explicit allocation, deallocation, memory transfers and memory associations (see Section 3.6 on page 397).
• C/C++ Grammar (previously Appendix B) was moved to a separate document.
Version 3.1 to 4.0 Differences
•
• •
• •
•
Various changes throughout the specification were made to provide initial support of Fortran 2003 (see Section 1.7 on page 31).
C/C++ array syntax was extended to support array sections (see Section 2.1.5 on page 44).
The proc_bind clause (see Section 2.6.2 on page 80), the OMP_PLACES environment variable (see Section 6.5 on page 605), and the omp_get_proc_bind runtime routine (see Section 3.2.23 on page 357) were added to support thread affinity policies.
SIMD directives were added to support SIMD parallelism (see Section 2.9.3 on page 110). Implementation defined task scheduling points for untied tasks were removed (see Section 2.10.6
on page 149).
Device directives (see Section 2.12 on page 160), the OMP_DEFAULT_DEVICE environment variable (see Section 6.15 on page 615), and the omp_set_default_device, omp_get_default_device, omp_get_num_devices, omp_get_num_teams,
APPENDIX B. FEATURES HISTORY 633
1 2
3 4
5 6 7
8
9 10 11 12
13 14 15
16 17
18
19 B.5 20
21 22
23 24 25
26 27
28 29
30 31 32
• •
• •
•
• •
omp_get_team_num, and omp_is_initial_device routines were added to support execution on devices.
The taskgroup construct (see Section 2.17.6 on page 232) was added to support more flexible deep task synchronization.
The atomic construct (see Section 2.17.7 on page 234) was extended to support atomic swap with the capture clause, to allow new atomic update and capture forms, and to support sequentially consistent atomic operations with a new seq_cst clause.
The depend clause (see Section 2.17.11 on page 255) was added to support task dependences.
The cancel construct (see Section 2.18.1 on page 263), the cancellation point construct (see Section 2.18.2 on page 267), the omp_get_cancellation runtime routine (see Section 3.2.9 on page 342) and the OMP_CANCELLATION environment variable (see Section 6.11 on page 610) were added to support the concept of cancellation.
The reduction clause (see Section 2.19.5.4 on page 300) was extended and the
declare reduction construct (see Section 2.19.5.7 on page 304) was added to support user defined reductions.
The OMP_DISPLAY_ENV environment variable (see Section 6.12 on page 611) was added to display the value of ICVs associated with the OpenMP environment variables.
Examples (previously Appendix A) were moved to a separate document.
634
OpenMP API – Version 5.0 November 2018
Version 3.0 to 3.1 Differences
•
•
• • •
The bind-var ICV has been added, which controls whether or not threads are bound to processors (see Section 2.5.1 on page 64). The value of this ICV can be set with the OMP_PROC_BIND environment variable (see Section 6.4 on page 604).
The nthreads-var ICV has been modified to be a list of the number of threads to use at each nested parallel region level and the algorithm for determining the number of threads used in a parallel region has been modified to handle a list (see Section 2.6.1 on page 78).
The final and mergeable clauses (see Section 2.10.1 on page 135) were added to the task construct to support optimization of task data environments.
The taskyield construct (see Section 2.10.4 on page 147) was added to allow user-defined task scheduling points.
The atomic construct (see Section 2.17.7 on page 234) was extended to include read, write, and capture forms, and an update clause was added to apply the already existing form of the atomic construct.
1 2
3 4
5 6
7 8 9
10 11
12
13 14
15 B.6 16
17 18
19 20
21 22
23
24
25
26
27
28
29 30 31 32 33
• Data environment restrictions were changed to allow intent(in) and const-qualified types for the firstprivate clause (see Section 2.19.4.4 on page 286).
• Data environment restrictions were changed to allow Fortran pointers in firstprivate (see Section 2.19.4.4 on page 286) and lastprivate (see Section 2.19.4.5 on page 288).
• New reduction operators min and max were added for C and C++ (see Section 2.19.5 on page 293).
• The nesting restrictions in Section 2.20 on page 328 were clarified to disallow closely-nested OpenMP regions within an atomic region. This allows an atomic region to be consistently defined with other OpenMP regions so that they include all code in the atomic construct.
• The omp_in_final runtime library routine (see Section 3.2.22 on page 356) was added to support specialization of final task regions.
• Descriptions of examples (previously Appendix A) were expanded and clarified.
• Replaced incorrect use of omp_integer_kind in Fortran interfaces with
selected_int_kind(8).
Version 2.5 to 3.0 Differences
•
• • •
•
The definition of active parallel region has been changed: in Version 3.0 a parallel region is active if it is executed by a team consisting of more than one thread (see Section 1.2.2 on page 2).
The concept of tasks has been added to the OpenMP execution model (see Section 1.2.5 on page 10 and Section 1.3 on page 20).
The OpenMP memory model now covers atomicity of memory accesses (see Section 1.4.1 on page 23). The description of the behavior of volatile in terms of flush was removed.
In Version 2.5, there was a single copy of the nest-var, dyn-var, nthreads-var and run-sched-var internal control variables (ICVs) for the whole program. In Version 3.0, there is one copy of these ICVs per task (see Section 2.5 on page 63). As a result, the omp_set_num_threads, omp_set_nested and omp_set_dynamic runtime library routines now have specified effects when called from inside a parallel region (see Section 3.2.1 on page 334,
Section 3.2.7 on page 340 and Section 3.2.10 on page 343).
The thread-limit-var ICV has been added, which controls the maximum number of threads participating in the OpenMP program. The value of this ICV can be set with the OMP_THREAD_LIMIT environment variable and retrieved with the omp_get_thread_limit runtime library routine (see Section 2.5.1 on page 64, Section 3.2.14 on page 348 and Section 6.10 on page 610).
APPENDIX B. FEATURES HISTORY 635
1 2 3 4 5 6
7 8 9
10 11 12
13 14
15 16
17 18 19
20 21
22 23 24
25 26
27 28
29 30
31 32
33 34 35 36
•
•
•
• • •
• •
• • • • •
The max-active-levels-var ICV has been added, which controls the number of nested active parallel regions. The value of this ICV can be set with the OMP_MAX_ACTIVE_LEVELS environment variable and the omp_set_max_active_levels runtime library routine, and it can be retrieved with the omp_get_max_active_levels runtime library routine (see Section 2.5.1 on page 64, Section 3.2.16 on page 350, Section 3.2.17 on page 351 and
Section 6.8 on page 608).
The stacksize-var ICV has been added, which controls the stack size for threads that the OpenMP implementation creates. The value of this ICV can be set with the OMP_STACKSIZE environment variable (see Section 2.5.1 on page 64 and Section 6.6 on page 607).
The wait-policy-var ICV has been added, which controls the desired behavior of waiting threads. The value of this ICV can be set with the OMP_WAIT_POLICY environment variable (see Section 2.5.1 on page 64 and Section 6.7 on page 608).
The rules for determining the number of threads used in a parallel region have been modified (see Section 2.6.1 on page 78).
In Version 3.0, the assignment of iterations to threads in a loop construct with a static schedule kind is deterministic (see Section 2.9.2 on page 101).
In Version 3.0, a loop construct may be associated with more than one perfectly nested loop. The number of associated loops is controlled by the collapse clause (see Section 2.9.2 on
page 101).
Random access iterators, and variables of unsigned integer type, may now be used as loop iterators in loops associated with a loop construct (see Section 2.9.2 on page 101).
The schedule kind auto has been added, which gives the implementation the freedom to choose any possible mapping of iterations in a loop construct to threads in the team (see Section 2.9.2 on page 101).
The task construct (see Section 2.10 on page 135) has been added, which provides a mechanism for creating tasks explicitly.
The taskwait construct (see Section 2.17.5 on page 230) has been added, which causes a task to wait for all its child tasks to complete.
Fortran assumed-size arrays now have predetermined data-sharing attributes (see Section 2.19.1.1 on page 270).
In Version 3.0, static class members variables may appear in a threadprivate directive (see Section 2.19.2 on page 274).
Version 3.0 makes clear where, and with which arguments, constructors and destructors of private and threadprivate class type variables are called (see Section 2.19.2 on page 274, Section 2.19.4.3 on page 285, Section 2.19.4.4 on page 286, Section 2.19.6.1 on page 310 and Section 2.19.6.2 on page 312).
636
OpenMP API – Version 5.0 November 2018
1 • 2
3
4
5
6 • 7
8 •
9 10 11
12 • 13
14
15 • 16
17
18 • 19
20
21 • 22
23
24 • 25
26
27 •
In Version 3.0, Fortran allocatable arrays may appear in private, firstprivate, lastprivate, reduction, copyin and copyprivate clauses (see Section 2.19.2 on page 274, Section 2.19.4.3 on page 285, Section 2.19.4.4 on page 286, Section 2.19.4.5 on page 288, Section 2.19.5.4 on page 300, Section 2.19.6.1 on page 310 and Section 2.19.6.2 on page 312).
In Fortran, firstprivate is now permitted as an argument to the default clause (see Section 2.19.4.1 on page 282).
For list items in the private clause, implementations are no longer permitted to use the storage of the original list item to hold the new list item on the master thread. If no attempt is made to reference the original list item inside the parallel region, its value is well defined on exit from the parallel region (see Section 2.19.4.3 on page 285).
The runtime library routines omp_set_schedule and omp_get_schedule have been added; these routines respectively set and retrieve the value of the run-sched-var ICV (see Section 3.2.12 on page 345 and Section 3.2.13 on page 347).
The omp_get_level runtime library routine has been added, which returns the number of nested parallel regions enclosing the task that contains the call (see Section 3.2.18 on page 352).
The omp_get_ancestor_thread_num runtime library routine has been added, which returns, for a given nested level of the current thread, the thread number of the ancestor (see Section 3.2.19 on page 353).
The omp_get_team_size runtime library routine has been added, which returns, for a given nested level of the current thread, the size of the thread team to which the ancestor belongs (see Section 3.2.20 on page 354).
The omp_get_active_level runtime library routine has been added, which returns the number of nested active parallel regions enclosing the task that contains the call (see Section 3.2.21 on page 355).
In Version 3.0, locks are owned by tasks, not by threads (see Section 3.3 on page 381).
APPENDIX B. FEATURES HISTORY 637
This page intentionally left blank
Index
Symbols
_OPENMP macro, 49, 611–613
A
acquire flush, 27
affinity, 80
allocate, 156, 158
array sections, 44
array shaping, 43
atomic, 234
atomic construct, 621 attribute clauses, 282 attributes, data-mapping, 314 attributes, data-sharing, 269 auto, 105
B
barrier, 226 barrier, implicit, 228
C
cancel, 263
cancellation constructs, 263
cancel, 263
cancellation point, 267 cancellation point, 267 canonical loop form, 95 capture, atomic,234
clauses
allocate, 158
attribute data-sharing, 282
collapse, 101, 102 copyin, 310 copyprivate, 312 data copying, 309 data-sharing, 282 default, 282 defaultmap, 324 depend, 255 firstprivate, 286 hint, 260
if Clause, 220 in_reduction, 303 lastprivate, 288 linear, 290
map, 315
private, 285 reduction, 300 schedule, 103 shared, 283 task_reduction, 303
combined constructs, 185
master taskloop, 192
master taskloop simd, 194 parallel loop, 186
parallel master, 191
parallel master taskloop, 195 parallel master taskloop simd,
196
parallel sections, 188 parallel workshare, 189
639
parallel worksharing-loop construct, 185
parallel worksharing-loop SIMD construct, 190
target parallel, 203 target parallel loop, 208 target parallel worksharing-loop
construct, 205
target parallel worksharing-loop SIMD
construct, 206
target simd, 209
target teams, 210
target teams distribute, 211 target teams distribute parallel
worksharing-loop construct, 215 target teams distribute parallel
worksharing-loop SIMD
construct, 216
target teams distribute simd,
213
target teams loop construct, 214 teams distribute, 197 teams distribute parallel
worksharing-loop construct, 200 teams distribute parallel
worksharing-loop SIMD
construct, 201
teams distribute simd, 198 teams loop, 202
compilation sentinels, 50 compliance, 31
conditional compilation, 49 constructs
atomic, 234
barrier, 226
cancel, 263
cancellation constructs, 263 cancellation point, 267 combined constructs, 185 critical, 223
declare mapper, 326 declare target, 180 depobj, 254
device constructs, 160 distribute, 120
distribute parallel do, 125 distribute parallel do simd,
126
distribute parallel for, 125 distribute parallel for simd,
126
distribute parallel worksharing-loop construct, 125
distribute parallel worksharing-loop SIMD construct, 126
distribute simd, 123 do Fortran, 101
flush, 242
for, C/C++, 101
loop, 128
master, 221
master taskloop, 192
master taskloop simd, 194 ordered, 250
parallel, 74
parallel do Fortran, 185 parallel for C/C++, 185 parallel loop, 186
parallel master, 191
parallel master taskloop, 195 parallel master taskloop simd,
196
parallel sections, 188 parallel workshare, 189 parallel worksharing-loop
construct, 185
parallel worksharing-loop SIMD
construct, 190 sections, 86
simd, 110
single, 89
target, 170
target data, 161 target enter data, 164 target exit data, 166 target parallel, 203
640 OpenMP API – Version 5.0 November 2018
target parallel do, 205 target parallel do simd, 206 target parallel for, 205 target parallel for simd, 206 target parallel loop, 208 target parallel worksharing-loop
construct, 205
target parallel worksharing-loop SIMD
construct, 206
target simd, 209
target teams, 210
target teams distribute, 211 target teams distribute parallel
worksharing-loop construct, 215 target teams distribute parallel
worksharing-loop SIMD
construct, 216
target teams distribute simd,
213
target teams loop, 214 target update, 176 task, 135
taskgroup, 232
tasking constructs, 135 taskloop, 140 taskloop simd, 146 taskwait, 230 taskyield, 147
teams, 82
teams distribute, 197 teams distribute parallel
worksharing-loop construct, 200 teams distribute parallel
worksharing-loop SIMD
construct, 201
teams distribute simd, 198 teams loop, 202
workshare, 92
worksharing, 86
worksharing-loop construct, 101 worksharing-loop SIMD construct, 114
controlling OpenMP thread affinity, 80 copyin, 310
copyprivate, 312 critical, 223
D
data copying clauses, 309
data environment, 269
data terminology, 12
data-mapping rules and clauses, 314 data-sharing attribute clauses, 282 data-sharing attribute rules, 269 declare mapper, 326
declare reduction, 304 declare simd, 116 declare target, 180 declare variant,58 default, 282 defaultmap, 324 depend, 255
depend object, 254 depobj, 254 deprecated features, 627 device constructs
declare mapper, 326 declare target, 180
device constructs, 160 distribute, 120
distribute parallel worksharing-loop
construct, 125
distribute parallel worksharing-loop
SIMD construct, 126 distribute simd, 123 target, 170
target update, 176 teams, 82
device data environments, 24, 164, 166 device directives, 160
device memory routines, 397
directive format, 38
directives, 37 allocate, 156
declare mapper, 326 declare reduction, 304 declare simd, 116 declare target, 180
Index 641
declare variant,58
memory management directives, 152 metadirective, 56
requires, 60
scan Directive, 132 threadprivate, 274
variant directives, 51
distribute, 120
distribute parallel worksharing-loop
construct, 125
distribute parallel worksharing-loop SIMD
construct, 126 distribute simd, 123
do, Fortran, 101
do simd, 114
dynamic, 105
dynamic thread adjustment, 620
E
environment variables, 601 OMP_AFFINITY_FORMAT, 613 OMP_ALLOCATOR, 618 OMP_CANCELLATION, 610 OMP_DEBUG, 617 OMP_DEFAULT_DEVICE, 615 OMP_DISPLAY_AFFINITY, 612 OMP_DISPLAY_ENV, 611 OMP_DYNAMIC, 603 OMP_MAX_ACTIVE_LEVELS, 608 OMP_MAX_TASK_PRIORITY, 615 OMP_NESTED, 609 OMP_NUM_THREADS, 602 OMP_PLACES, 605 OMP_PROC_BIND, 604 OMP_SCHEDULE, 601 OMP_STACKSIZE, 607 OMP_TARGET_OFFLOAD, 615 OMP_THREAD_LIMIT, 610 OMP_TOOL, 616 OMP_TOOL_LIBRARIES, 617 OMP_WAIT_POLICY, 608
event, 396
event callback registration, 425 event callback signatures, 459
eventroutines,396
execution environment routines, 334 execution model, 20
F
features history, 627
firstprivate, 286
fixed source form conditional compilation
sentinels, 50
fixed source form directives, 41
flush, 242
flush operation, 25
flush synchronization, 27
flush-set, 25
for, C/C++, 101
for simd, 114
frames, 454
free source form conditional compilation
sentinel, 50
free source form directives, 41
G
glossary, 2 guided, 105
H
happens before, 27 header files, 332 history of features, 627
I
ICVs (internal control variables), 63 if Clause, 220
implementation, 619 implementation terminology, 16 implicit barrier, 228
implicit flushes, 246 in_reduction, 303
include files, 332
internal control variables, 619 internal control variables (ICVs), 63 introduction, 1
iterators, 47
642 OpenMP API – Version 5.0 November 2018
L
lastprivate, 288 linear, 290
list item privatization, 279 lock routines, 381
loop, 128
loop terminology, 8
M
map, 315
master, 221
master taskloop, 192 master taskloop simd, 194 memory allocators, 152
memory management, 152 memory management directives
memory management directives, 152 memory management routines, 406 memory model, 23
memory spaces, 152
metadirective, 56
modifying and retrieving ICV values, 68 modifying ICVs, 66
N
nesting of regions, 328 normative references, 31
O
OMP_AFFINITY_FORMAT, 613 omp_alloc, 413 OMP_ALLOCATOR, 618 OMP_CANCELLATION, 610 omp_capture_affinity, 368 OMP_DEBUG, 617 OMP_DEFAULT_DEVICE, 615 omp_destroy_allocator, 410 omp_destroy_lock, 387 omp_destroy_nest_lock, 387 OMP_DISPLAY_AFFINITY, 612 omp_display_affinity, 367 OMP_DISPLAY_ENV, 611 OMP_DYNAMIC, 603
omp_free, 414
omp_fulfill_event, 396 omp_get_active_level, 355 omp_get_affinity_format, 366 omp_get_ancestor_thread_num, 353 omp_get_cancellation, 342 omp_get_default_allocator, 412 omp_get_default_device, 370 omp_get_device_num, 372 omp_get_dynamic, 341 omp_get_initial_device, 376 omp_get_level, 352 omp_get_max_active_levels, 351 omp_get_max_task_priority, 377 omp_get_max_threads, 336 omp_get_nested, 344 omp_get_num_devices, 371 omp_get_num_places, 358 omp_get_num_procs, 338 omp_get_num_teams, 373 omp_get_num_threads, 335 omp_get_partition_num_places,
362
omp_get_partition_place_nums, 363
omp_get_place_num, 362 omp_get_place_num_procs, 359 omp_get_place_proc_ids, 360 omp_get_proc_bind, 357 omp_get_schedule, 347 omp_get_supported_active
_levels, 349 omp_get_team_num, 374
omp_get_team_size, 354 omp_get_thread_limit, 348 omp_get_thread_num, 337 omp_get_wtick, 395 omp_get_wtime, 394 omp_in_final, 356 omp_in_parallel, 339 omp_init_allocator, 409 omp_init_lock, 384, 385 omp_init_nest_lock, 384, 385 omp_is_initial_device, 375
Index 643
OMP_MAX_ACTIVE_LEVELS, 608 OMP_MAX_TASK_PRIORITY, 615 OMP_NESTED, 609
OMP_NUM_THREADS, 602 omp_pause_resource, 378 omp_pause_resource_all, 380 OMP_PLACES, 605
OMP_PROC_BIND, 604
OMP_SCHEDULE, 601 omp_set_affinity_format, 364 omp_set_default_allocator, 411 omp_set_default_device, 369 omp_set_dynamic, 340 omp_set_lock, 388 omp_set_max_active_levels, 350 omp_set_nest_lock, 388 omp_set_nested, 343 omp_set_num_threads, 334 omp_set_schedule, 345 OMP_STACKSIZE, 607 omp_target_alloc, 397 omp_target_associate_ptr, 403 omp_target_disassociate_ptr, 405 omp_target_free, 399 omp_target_is_present, 400 omp_target_memcpy, 400 omp_target_memcpy_rect, 402 OMP_TARGET_OFFLOAD, 615 omp_test_lock, 392 omp_test_nest_lock, 392 OMP_THREAD_LIMIT, 610
OMP_TOOL, 616 OMP_TOOL_LIBRARIES, 617 omp_unset_lock, 390 omp_unset_nest_lock, 390 OMP_WAIT_POLICY, 608 ompd_bp_device_begin, 598 ompd_bp_device_end, 599 ompd_bp_parallel_begin, 594 ompd_bp_parallel_end, 595 ompd_bp_task_begin, 595 ompd_bp_task_end, 596 ompd_bp_thread_begin, 597
ompd_bp_thread_end, 597 ompd_callback_device_host
_fn_t, 554 ompd_callback_get_thread
_context_for_thread_id
_fn_t, 547 ompd_callback_memory_alloc
_fn_t, 546 ompd_callback_memory_free
_fn_t, 546 ompd_callback_memory_read
_fn_t, 551 ompd_callback_memory_write
_fn_t, 553 ompd_callback_print_string
_fn_t, 556 ompd_callback_sizeof_fn_t, 549 ompd_callback_symbol_addr
_fn_t, 550 ompd_callbacks_t, 556
ompd_dll_locations_valid, 536 ompd_dll_locations, 535 ompt_callback_buffer
_complete_t, 487 ompt_callback_buffer _request_t, 486
ompt_callback_cancel_t, 481 ompt_callback_control
_tool_t, 495 ompt_callback_dependences_t, 468 ompt_callback_dispatch_t, 465 ompt_callback_device
_finalize_t, 484 ompt_callback_device
_initialize_t, 482 ompt_callback_flush_t, 480
ompt_callback_implicit _task_t, 471
ompt_callback_master_t, 473 ompt_callback_mutex
_acquire_t, 476 ompt_callback_mutex_t, 477
ompt_callback_nest_lock_t, 479
644 OpenMP API – Version 5.0 November 2018
ompt_callback_parallel _begin_t, 461
ompt_callback_parallel _end_t, 463
ompt_callback_sync_region_t, 474 ompt_callback_device_load_t, 484 ompt_callback_device
_unload_t, 486 ompt_callback_target_data
_op_t, 488 ompt_callback_target_map_t, 492 ompt_callback_target
_submit_t, 494 ompt_callback_target_t, 490
ompt_callback_task_create_t, 467 ompt_callback_task
_dependence_t, 470 ompt_callback_task
_schedule_t, 470 ompt_callback_thread
_begin_t, 459 ompt_callback_thread_end_t, 460 ompt_callback_work_t, 464 OpenMP compliance, 31
ordered, 250
P
parallel, 74
parallel loop, 186
parallel master construct, 191 parallel master taskloop, 195 parallel master taskloop simd, 196 parallel sections, 188
parallel workshare, 189
parallel worksharing-loop construct, 185 parallel worksharing-loop SIMD
construct, 190 private, 285
R
read, atomic,234 reduction, 300 reduction clauses, 293 release flush, 27
requires, 60
runtime, 105
runtime library definitions, 332 runtime library routines, 331
S
scan Directive, 132 scheduling, 149 sections, 86 shared, 283 simd, 110
SIMD Directives, 110
Simple Lock Routines, 382
single, 89
stand-alone directives, 42
static, 104
strong flush, 25
synchronization constructs, 223 synchronization constructs and clauses, 223 synchronization hints, 260
synchronization terminology, 9
T
target, 170
target data, 161
target memory routines, 397
target parallel, 203
target parallel loop, 208
target parallel worksharing-loop construct
construct, 205
target parallel worksharing-loop SIMD
construct, 206 target simd, 209
target teams, 210
target teams distribute, 211 target teams distribute parallel
worksharing-loop construct, 215 target teams distribute parallel
worksharing-loop SIMD
construct, 216
target teams distribute simd, 213 target teams loop, 214
target update, 176
task, 135
Index 645
task scheduling, 149 task_reduction,303 taskgroup, 232 tasking constructs, 135 tasking terminology, 10 taskloop, 140 taskloop simd, 146 taskwait, 230 taskyield, 147 teams, 82
teams distribute, 197
teams distribute parallel worksharing-loop
construct, 200
teams distribute parallel worksharing-loop
SIMD construct, 201 teams distribute simd, 198
teams loop, 202 thread affinity, 80 threadprivate, 274 timer, 394
timing routines, 394
tool control, 415
tool initialization, 423
tool interfaces definitions, 419, 534 tools header files, 419, 534
tracing device activity, 427 U
update, atomic,234
V
variables, environment, 601 variant directives, 51
W
wait identifier, 456 wall clock timer, 394 workshare, 92 worksharing
constructs, 86 parallel, 185 scheduling, 109
worksharing constructs, 86 worksharing-loop construct, 101
worksharing-loop SIMD construct, 114 write, atomic,234
646 OpenMP API – Version 5.0 November 2018