留学生代考 NUM 1024*1024

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
Computer Graphics
OpenMP Tasks

Copyright By PowCoder代写 加微信 powcoder

tasks.pptx
mjb – March 16, 2022
Remember OpenMP Sections?
Sections are independent blocks of code, able to be assigned to separate threads if they are available.
#pragma omp parallel sections {
#pragma omp section {
#pragma omp section {
There is an implied barrier at the end
Computer Graphics
mjb – March 16, 2022
OpenMP sections are static, that is, they are good if you know, when you are writing the program, how many of them you will need.

It would be nice to have something more Dynamic
Computer Graphics
Imagine a capability where you can write something to do down on a Post-It® note, accumulate the Post-It notes, then have all of the threads together execute that set of tasks.
You would also like to not have to know, ahead of time, how many of these Post-It notes you will write. That is, you want the total number to be dynamic.
Well, congratulations, you have just invented OpenMP Tasks!
mjb – March 16, 2022
OpenMP Tasks
• An OpenMP task is a single line of code or a structured block which is immediately “written down” in a list of tasks.
• The new task can be executed immediately, or it can be deferred.
• If the if clause is used and the argument evaluates to 0, then the task is executed immediately, superseding whatever else that thread is doing.
• There has to be an existing parallel thread team for this to work. Otherwise one thread ends up doing all tasks and you don’t get any contribution to parallelism.
• One of the best uses of this is to process elements of a linked list or a tree.
You can create a task barrier with:
#pragma omp taskwait
Tasks are very much like OpenMP Sections, but Sections are static, that is, the number of sections is set when you write the code, whereas Tasks can be created anytime, and in any number, under control of your program’s logic.
Computer Graphics
mjb – March 16, 2022

OpenMP Task Example: Something (Supposedly) Simple
Writes fprintf( stderr, “A\n” ); on a sticky note and adds it to the list of tasks
Without this, thread #0 has to do everything
omp_set_num_threads( 2 ); #pragma omp parallel default(none) {
#pragma omp task
fprintf( stderr, “A\n” );
#pragma omp task
fprintf( stderr, “B\n” );
Writes fprintf( stderr, “B\n” ); on a sticky note and adds it to the list of tasks
Computer Graphics
mjb – March 16, 2022
#pragma omp task
Adds the next line of code (or block of code) to the list of tasks
If You Run This a Number of Times, You Get This: (Uh-oh, what Happened?)
Run# 12345
1. Why do we not get the same output every time?
2. Why do we get 4 things printed when we only have print statements in 2 tasks?
Not so simple, huh?
The first answer is easy. Unless you make some special arrangements, the order of execution of the different tasks is undefined.
The second answer is that we actually asked the two threads to each put two tasks on the sticky notes, for a total of four. How can we get only one thread to do this?
Computer Graphics
mjb – March 16, 2022

The “single” Pragma
omp_set_num_threads( 2 ); #pragma omp parallel default(none) {
#pragma omp single
#pragma omp task
fprintf( stderr, “A\n” );
#pragma omp task
fprintf( stderr, “B\n” );
When using Tasks, you only want one thread to write the things to do down on the sticky note, but you want all of the threads to be able to execute the sticky notes.
Computer Graphics
mjb – March 16, 2022
But, if you run this, the order of printing will still be non-deterministic. If you care about order, do this:
omp_set_num_threads( 2 ); #pragma omp parallel
#pragma omp single default(none)
#pragma omp task
fprintf( stderr, “A\n” );
#pragma omp taskwait #pragma omp task
fprintf( stderr, “B\n” );
#pragma omp taskwait
Causes all tasks to wait until they are completed
Causes all tasks to wait until they are completed
Computer Graphics
mjb – March 16, 2022

A Better OpenMP Task Example: Processing each Element of a Linked List
Without this, thread #0 has to do everything
#pragma omp parallel default(none)
Without this, each thread does a full traversal – bad idea!
#pragma omp single default(none)
element *p = listHead; while( p != NULL )
#pragma omp task firstprivate(p)
Process( p ); p = p->next;
#pragma omp taskwait
Write “Process( p )” on a sticky note and add it to the list
Copies the current value of p into the task and immediately makes it private (i.e., not shared)
Put this here if you want to wait for all tasks to finish being executed before proceeding
Computer Graphics
mjb – March 16, 2022
10 execution of the different tasks is undefined. Here come some special arrangements.
One more thing – Task Dependencies
Remember from before: unless you make some special arrangements, the order of
omp_set_num_threads( 3 ); #pragma omp parallel
#pragma omp single default(none)
float a, b, c;
#pragma omp task depend( OUT: a )
#pragma omp task depend( IN: a, OUT: b )
b = a + 16.;
#pragma omp task depend( IN: b )
c = b + 12.;
#pragma omp taskwait
mjb – March 16, 2022
This maintains the proper dependencies, but, because it involves all of the
tasks, it essentially serializes the parallelism out of them.
Be careful not to go overboard with dependencies!

Given a tree:
Tree Traversal Algorithms
Computer Graphics
mjb – March 16, 2022
• We would like to traverse it as quickly as possible.
• We are assuming that we do not need to traverse it in order.
• We just need to visit all nodes.
Tree Traversal Algorithms
• This is common in graph algorithms, such as searching.
• If the tree is binary and is balanced, then the maximum depth of the tree is log2(# of Nodes)
• Strategy at a node:
This order could be re- arranged, depending on what you are trying to do
follow one descendent node follow the other descendent node process the node you’re at
Computer Graphics
mjb – March 16, 2022

Tree Traversal Algorithms
Without this, thread #0 has to do everything – bad idea!
#pragma omp parallel #pragma omp single
Traverse( root );
#pragma omp taskwait
Without this, each thread does a full traversal – bad idea!
Computer Graphics
mjb – March 16, 2022
Put this here if you want to wait for all nodes to be traversed before proceeding
Parallelizing a Binary Tree Traversal with Tasks
Computer Graphics
mjb – March 16, 2022
Traverse( Node *n ) {
if( n->left != NULL ) {
#pragma omp task private(n) untied
Traverse( n->left );
if( n->right != NULL ) {
#pragma omp taskwait
Process( n ); }
#pragma omp task private(n) untied
Traverse( n->right );
Put this here if you want to wait for both branches to be taken before processing the parent

Benchmarking a Binary Task-driven Tree Traversal
#define NUM 1024*1024
Process( Node *n ) {
for( int i = 0; i < NUM; i++ ) { n->value = pow( n->value, 1.01 );
Computer Graphics
mjb – March 16, 2022
Parallelizing a Binary Tree Traversal with Tasks
Traverse( A );
Computer Graphics
mjb – March 16, 2022
1 2 4 5 8 9 11 12

Parallelizing a Binary Tree Traversal with Tasks: Tied (g++ 10.2)
1 3 2 3 4 1 5 1 8 3 9 2 11 2 12 2
Traverse( A );
Computer Graphics
mjb – March 16, 2022
Parallelizing a Binary Tree Traversal with Tasks: Untied (g++ 10.2)
1 3 2 3 4 0 5 0 8 0 9 3 11 1 12 1
Traverse( A );
Computer Graphics
mjb – March 16, 2022

How Evenly Tasks Get Assigned to Threads g++ vs. icpc
6 Levels – g++ 10.2: 6 Levels – icpc 15.0.0:
12 Levels – g++ 10.2: 12 Levels – icpc 15.0.0:
Computer Graphics
Number of Tasks
Number of Tasks
Number of Tasks
Number of Tasks
mjb – March 16, 2022
How Evenly Tasks Get Assigned to Threads g++ 4.9 vs. g++ 10.2
6 Levels – g++ 4.9: 6 Levels – g++ 10.2:
12 Levels – g++ 4.9: 12 Levels – g++ 10.2:
Computer Graphics
Number of Tasks
Number of Tasks
Number of Tasks
Number of Tasks
mjb – March 16, 2022

21 6 Levels – g++ 10.2 — Tied: 6 Levels – g++ 10.2 — Untied:
How Evenly Tasks Get Assigned to Threads Tied vs. Untied
Number of Tasks
Number of Tasks
12 Levels – g++ 10.2 — Tied: 12 Levels – g++ 10.2 — Untied:
Number of Tasks
Number of Tasks
Computer Graphics
mjb – March 16, 2022
Performance vs. Number of Threads
Number of Tree Levels
Computer Graphics
mjb – March 16, 2022
Nodes Processed per Second

Performance vs. Number of Levels
Number of Threads
Computer Graphics
mjb – March 16, 2022
Performance vs. Number of Levels
Computer Graphics
8-thread Speed-up ≈ 6.7 Fp ≈ 97%
mjb – March 1
Nodes Processed per Second Nodes Processed per Second

Parallelizing a Tree Traversal with Tasks: Summary
• Tasks get spread among the current “thread team”
• Tasks can execute immediately or can be deferred. They are executed at “some time”.
• Tasks can be moved between threads, that is, if one thread has a backlog of tasks to do, an idle thread can come steal some workload.
• Tasks are more dynamic than sections. The task paradigm would still work if there was a variable number of children at each node.
Computer Graphics
mjb – March 16, 2022

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com