Overview Critical regions Parallel linked list Summary and next lecture
XJCO3221 Parallel Computation
University of Leeds
Copyright By PowCoder代写 加微信 powcoder
Lecture 6: Critical regions and atomics
XJCO3221 Parallel Computation
Critical regions Parallel linked list Summary and next lecture
Previous lecture
This lecture Singly linked lists
Previous lecture
In the last lecture we looked at data races and loop parallelism:
If different threads access the same memory location, with at least one having write access, there is a potential race condition.
Leads to non-deterministic behaviour. Can arise in loops as data dependencies.
Often possible to remove these dependencies, sometimes at the expense of increased parallel overheads.
XJCO3221 Parallel Computation
Critical regions Parallel linked list Summary and next lecture
Previous lecture
This lecture
Singly linked lists
This lecture
In this lecture we will look at an important concept in parallel programming: synchronisation.
Represents some form of coordination between processing units (threads etc.).
Will arise in many contexts throughout this module. Briefly mentioned in Lecture 4 in the context of fork-join.
Now we will focus on using synchronisation to avoid data races.
Define critical regions which can only be accessed by one thread at a time.
Atomics: Critical regions specialised to single operations. We will say more about atomics in Lecture 18.
XJCO3221 Parallel Computation
Critical regions Parallel linked list Summary and next lecture
Previous lecture This lecture Singly linked lists
Singly linked lists
Serial code on Minerva: linkedList.c
Linked lists are a form of container in which each item is linked together in a chain:
First item is the head, with a global pointer item t *head. Last item is the tail, with next=NULL.
item_t *next; [data]
item_t *next; [data]
Note this is a singly linked list – a doubly linked list has arrows ¡®both ways,¡¯ i.e. a field item t *prev;.
item_t *next; [data]
item_t item_t
XJCO3221 Parallel Computation
Critical regions Parallel linked list Summary and next lecture
Previous lecture This lecture Singly linked lists
List storage
For today¡¯s example, the data for each item is just a single integer. To link into a list, convenient to use a struct in C:
1 2 3 4 5 6
typedef struct item {
struct item *next; // Next in the list
item_t *head = NULL; // First item
New items are added using addToList:
for( i=0; i
newItem ->next = NULL;
// Find tail and add newItem to it.
if( head==NULL ) // i.e. list is empty. {
head = newItem; }
else // Find end of list and add to it. {
item_t *tail = head;
while( tail->next != NULL ) tail = tail->next; tail->next = newItem;
XJCO3221 Parallel Computation
Critical regions
Parallel linked list Summary and next lecture
Thread safety
Critical regions (lock synchronisation) Performance with critical regions: Serialisation
Linked lists in parallel
Suppose we want to add multiple items in parallel for speed. The obvious thing is to create multiple threads, and have each
thread add items simultaneously.
In OpenMP, this could be achieved as follows:
#pragma omp parallel for for( i=0; i
If also removing (or ¡®popping¡¯) items, similar considerations would apply, although things would be more complicated.
XJCO3221 Parallel Computation
Critical regions
Parallel linked list Summary and next lecture
Thread safety
Critical regions (lock synchronisation)
Performance with critical regions: Serialisation
Critical regions
It is not easy to remove the data dependencies in this case. Cannot make head local/private (unique to the entire list). List traversal is a while-loop (trip count not known at start).
An alternative strategy is to ensure only one thread can access critical regions of code at a time.
Implemented in OpenMP as #pragma omp critical
Called lock synchronisation, for reasons that will become clear next lecture.
XJCO3221 Parallel Computation
Critical regions
Parallel linked list Summary and next lecture
Thread safety
Critical regions (lock synchronisation)
Performance with critical regions: Serialisation
#pragma omp critical
Critical region
Only one thread is allowed to be in a critical region at a time. Until it leaves, no other threads are allowed to enter.
A critical region is defined in OpenMP as follows:
#pragma omp critical {
… // Critical region }
The critical region is defined by the scope, i.e. from ¡®{¡¯ to ¡®}.¡¯ and
XJCO3221 Parallel Computation
Critical regions
Parallel linked list Summary and next lecture
Thread safety
Critical regions (lock synchronisation)
Performance with critical regions: Serialisation
Example for 4 threads
… // Non-critical
… // code executed
… // in parallel
#pragma omp critical {
… // Critical region …
Thread 1 reaches the critical region first. No other thread can enter until it leaves. Exactly one thread may then enter.
XJCO3221 Parallel Computation
Critical regions
Parallel linked list Summary and next lecture
Thread safety
Critical regions (lock synchronisation) Performance with critical regions: Serialisation
Performance
There can be a significant performance penalty for critical regions:
1 Need some mechanism to synchronise the threads on
entering and leaving the region [cf. Lecture 7].
2 Threads ¡®blocked¡¯ at the start will be idle. This leads to poor
load balancing [cf. Lecture 13].
3 The scheduler may suspend idle threads in favour of another (not necessarily yours!). Suspension and restart incur penalties.
XJCO3221 Parallel Computation
Critical regions
Parallel linked list Summary and next lecture
Thread safety
Critical regions (lock synchronisation) Performance with critical regions:
Since only one thread can be in a critical region at any time, the critical code is executed in serial.
This is known as serialisation.
Amdahl¡¯s law and the Gustafson-Barsis law from Lecture 4:
Maximum speed-up S in terms of the serial fraction f .
By serialising regions of code we are increasing the value of f .
The maximum speed-up S is reduced, especially for Amdahl¡¯s law (i.e. strong scaling), which predicts S ¡Ü 1/f .
It is therefore important to restrict the number and size of critical regions to ensure reasonable parallel performance.
and XJCO3221 Parallel Computation
Overview Critical regions Parallel linked list Summary and next lecture
Serialise calls to addToList() Serialise list traversal
First attempt: Serialise calls to addToList()
#pragma omp parallel for for( i=0; i
newItem ->next = NULL;
newItem is local/private to each thread.
Each thread will create its own item independently of other
Each value of loop counter i will be mapped one-to-one to a
value of data.
This is the behaviour we want! (so far. . . )
and XJCO3221 Parallel Computation
Overview Critical regions Parallel linked list Summary and next lecture
Serialise calls to addToList() Serialise list traversal
The data dependencies in the remainder of addToList() can be removed by placing this portion in a critical region:
1 2 3 4 5 6 7 8 9
10 11 12 13
#pragma omp critical {
if( head==NULL ) {
head = newItem; }
item_t *tail = head;
while( tail->next != NULL ) tail = tail->next; tail->next = newItem;
Performance slightly improved compared to the previous attempt.
XJCO3221 Parallel Computation
Overview Critical regions Parallel linked list Summary and next lecture
Serialise calls to addToList() Serialise list traversal
Making routines thread safe
Note the strategy followed for making addToList() thread safe:
1 Identify data dependencies.
2 Enclose in critical regions.
3 Reduce size and/or number of critical regions until required performance achieved.
4 If further scaling benefits required, may need to rethink the algorithm completely.
XJCO3221 Parallel Computation
Overview Critical regions Parallel linked list Summary and next lecture
Serialise calls to addToList() Serialise list traversal
Often the data dependency is only a single arithmetic operation. For instance, counting the number of items in an array of size n
that obey some condition:
1 2 3 4 5 6
int count = 0;
for( i=0; i