Intro to Parallel Computing
Topic 6 – Barriers and Mutexes (Critical Sections, Atomics and Locks)
COSC 407: Intro to Parallel Computing
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
Copyright By PowCoder代写 加微信 powcoder
Previously (LiveStream):
– Student lead questions
– Basics, HelloWorld
– Distributing the work
– Example: Summing it all up (as array)
– Synchronization (barriers, nowait)
– Mutual Exclusion (critical, atomic, locks)
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
Race Condition
§ Unpredictable results when two (or more) threads attempt to read or write the same data simultaneously (previous unprotected example)
global_sum += my_sum;
§ Consider two threads each increases the value of a shared integer variable by one
§ Data Race happens when two or more threads work on a shared data item, where at least one thread is trying to update (write to) this item.
§ Mutual Exclusion: only one thread at a time can have access to a shared resource (there can only be one!)
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
§ Barriers are used for threads synchronization
§ All threads must reach the barrier before any of them can proceed
– Wemayusebarriersbetweenpartsofthecodethatread and write to the same memory locations
§ Important
– Eachbarrierneedstobeencounteredbyallthethreadsina
team or none of them
– Thesequenceofwork-sharingregionsandbarrierregions
encountered must be the same for every thread in the team (more on this soon….)
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
§ Two types:
– Implicitbarriers:automaticallyaddedforyou • eg end of critical section (as we saw previously(
– Explicitbarriers:programmersaddthemexplicitlyusing #pragma omp barrier
Idle waiting
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
Barriers: Example
§ Assume the following two for-loops are executed in parallel.
Without a barrier, a race condition might occur.
– T1mightreadanoldvalueofa[i]beforeithasbeen updated by T0.
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
for(i=0; i<100; i++) a[i] = 10;
for(j=0; j<100; j++) b[i] = a[i] + 10;
Barriers: Example, cont’d
§ If you add a barrier between these two parts (i.e., the two for blocks), then you force T1 to wait until the first loop finishes before it resume executing its own for loop
for(i=0; i<100; i++) a[i] = 10;
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
for(j=0; j<100; j++) b[i] = a[i] + 10;
Illegal Barriers!
§ The barrier is illegal as it is not encountered by all the threads in the team
– Duetoconditionalstatement,somethreadsmayexecute different paths (Thread Divergence)
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
§ The use of barriers is expensive (as threads in a team will be idle for a while)
§ To reduce synchronization, you can add the nowait clause.
§ The implicit barrier at the end pragma construct can be
cancelled with a nowait
– i.e.IfnowaitisusedinanOpenMPconstruct,threadswill not synchronize (wait) at the end of these constructs.
§ Example:
#pragma omp single nowait
Threads don’t have to synchronize here
§ You can use nowait if there is no data dependency between two parts of your code
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
Mutual Exclusion in OpenMP
§ Sometimes, only one thread at a line can access a critical section of code or update a variable
– SameconceptaswithPOSIXthreads
§ OpenMP provides several mechanisms for insuring
mutual exclusion in critical sections:
– critical directive
• Unnamed and named critical directives
– atomic directive – simple locks
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
critical Directive
§ Race condition may be controlled by defining a critical region # pragma omp critical
*global_result_p += my_result;
§ Critical regions force threads to work one at a time
– Allthreadsrunthecode,butonlyonethreadatatimecanrun
the code in the critical region
– Whenathreadencountersacriticalconstruct,itwaitsuntilno other thread is executing a critical region with the same name
§ All structured blocks modified by an unnamed critical directive form a single critical section
https://www.openmp.org/spec-html/5.0/openmpsu89.html
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
critical Directive cont’d
§ On the right, assume we have a concurrent program that has three code sections: non-critical, critical, non-critical.
§ The illustration below shows how 4 threads could be running this program
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
A thread has to wait if it reaches a critical section that is being executed by another thread
Non-critical region
Critical sections have
no sync barriers at their entry or exit points within a thread
Threads wait for each other at a barrier.
Non-critical region
Critical region
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
Multiple critical
§ All critical (unnamed) section belong to the same section. Assume we have this code: non-critical, critical, non-critical, critical, non- critical
§ The illustration below shows a sample run of this program using 3 threads
Notice how the third thread caused the first to wait until the third one is done.
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
Named critical Sections
§ Sometimes you have two critical section and each one must be executed by only one thread at any time but it is ok for both to run in parallel
– Example:updatingtwosharedvariablesaandb
• It is ok that both are updated at the same time, but cannot
be accessed by more than one thread at any time
§ OpenMP provides the option of adding a name to a critical directive:
# pragma omp critical(name)
§ When we do this, two blocks protected with critical directives with different names can be executed simultaneously
§ Example:
# pragma omp critical(name1)
# pragma omp critical(name2)
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks)
COSC 407: Intro to Parallel Computing
Named critical Sections, cont’d
§ In this illustration, assume we have a concurrent program that has four code sections with two named critical sections :
non-critical, critical A, critical B, non-critical.
§ Consider 4 threads could be running this program A
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
Threads can be running the two critical sections A and B concurrently
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
Named critical Sections, cont’d
§ Consider the following code sections:
non-critical, critical A, non-critical, critical B, non-critical
§ The illustration below shows one of the scenarios of running this program
– Assumethetopthreadtooklongerthanothersto finish the light brown section.....
In this case, the second thread entered B first, causing the first to wait until the first one is done (there can only be one section running the named section ‘B’ at any one time....
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
statement;
atomic Directive
§ Allows threads to safely update a shared variable and avoids
data race conditions
§ Similar to critical directive (protects shared data) and can be faster on many processors (if implemented correctly -> but can cause problems if done incorrectly)
– Acriticalsectionthatonlydoesaload-modify-storecanbe protected much more efficiently using this special instruction
– Onlythememoryupdateisatomic Example:
#pragma omp atomic
*global_result_p += my_result;
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
atomic Directive
§ It can only be applied to load-modify-store operation in one of
following forms:
a) A single statement in the form:
x++, ++x, x–, –x
x
x = x
x =
where
b) A structured block in the form:
{ value = x; //any of the expressions in (b) above; } { //any of the expressions in (b) above; value = x; } e.g. {value = x; x += 1;}
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
§ All elements are added sequentially, and there is performance penalty for using atomic, because the system coordinates all threads
§ Slower than a serial code! – Eachthreadmustwait!
§ Each thread adds up its local sum.
§ The atomic is only applied for adding up local sums to obtain the total sum
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
A lock consists of a data structure and functions that allow the programmer to explicitly enforce mutual exclusion in a critical section.
INITIALIZES a lock-data-structure (by one thread, e.g. master)
#pragma omp parallel
//..not critical section
//..critical section
//..not critical section
DESTROY the lock-data-structure (by e.g. the master)
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
A lock consists of a data structure and functions that allow the programmer to explicitly enforce mutual exclusion in a critical section
static omp_lock_t mylock; omp_init_lock(&mylock);
#pragma omp parallel
//not critical code
omp_set_lock(&mylock); //critical region
omp_unset_lock(&mylock); //not critical region
omp_destroy_lock(&mylock);
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
lock data structure is shared among the threads
One of the threads (e.g., master) initializes it
Only one thread can own (set) mylock, and only this thread can unset it. While a thread owns mylock, no other thread can enter this critical section. If another thread attempts to enter the critical section, it will block when it calls the lock function.
One of the threads destroys the lock.
When to Use Which?
§ First, think of atomic (usually highest performance)
– Useatomicforsingleload-modify-storestatement(be careful!)
§ Second, think of critical
– Nolargedifferencebetweentheperformancebetween critical directive and locks.
§ Finally, if neither atomic nor critical is possible then use locks
– e.g.,ifyouwanttoleavealockedregionwithjumps(e.g. break or return).
• You cannot leave a region protected by ‘critical’ with a jump
• Don’t forget to unset the lock!!!!
– e.g.,ifyouwanttoprotectasingledatastructure,notthe
complete block of code.
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
Some Caveats
Different types of mutual exclusion don’t belong to the same critical region
• Don’t mix the different types for a single critical section There is no guarantee of fairness in mutual exclusion constructs
It can be dangerous to “nest” mutual exclusion constructs – Use named critical sections if you have to nest mutual
exclusion constructs, but even then there is no guarantee to avoid deadlocks
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
Conclusion/Up Next
§ What we covered today (review key concepts):
– Synchronization (barriers, nowait)
– Mutual Exclusion (critical, atomic, locks)
– Variablescope(shared,private,firstprivate)
– Reduction
– WorkSharing
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
§ Please review
– OpenMPResources(Seeweekthreemodule) – AdditionalresourcesonCanvas
– Runthesamplecodeandtrythechallenge
• You need to be able to run and understand how to approach a problem
• Be familiar with the OpenMP directives introduced so far
Topic 6 : Barriers and Mutexes (Critical Sections, Atomics and Locks) COSC 407: Intro to Parallel Computing
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com