程序代写 XJCO3221 Parallel Computation

XJCO3221 Parallel Computation

University of Leeds
Lecture 5: Loop parallelism and data races

Copyright By PowCoder代写 加微信 powcoder

XJCO3221 Parallel Computation

Previous lectures
In Lecture 3 we saw how two problems could be parallelised:
1 Vector addition, where two one-dimensional arrays were added together element-by-element.
2 Mandelbrot set, where pixel colours were calculated to generate a 2D image.
However, neither of these problems have any data dependencies.
1 Each vector element was calculated independently of other
2 Each pixel colour was calculated independently of the others.
XJCO3221 Parallel Computation

Today¡¯s lecture
Today we will look how to parallelise problems that do have data dependencies.
Can lead to data races on shared memory systems. Behaviour then becomes non-deterministic.
May require algorithm re-design to remove dependencies.
We will then look at three examples of loop parallel problems and how their dependencies can be resolved.
XJCO3221 Parallel Computation

Example of a data race
Consider the following pseudecode1 for two concurrent threads, where each thread accesses the same variable x. x=0 at the start of the code segment.
What value does x take at the end?
1From ¡ì2.6 of McCool et al., Structured parallel programming (Morgan-Kaufman, 2012).
XJCO3221 Parallel Computation

Non-determinism
The result may differ each time the program is executed. An example of non-determinism [Lecture 3].
The scheduler runs threads depending on various factors, including other applications and OS tasks.
Cannot predict which thread is launched first.
The OS may suspend threads to make way for other tasks
(pre-emptive multitasking).
The instructions may become interleaved.
XJCO3221 Parallel Computation

Interleaved instructions (example)
Recall x=0 initially.
=x; += 1; =x; =a; += 2; =b;
//Thread0:anow0 //Thread0:anow1 //Thread1:bnow0 //Thread0:xnow1 //Thread1:bnow2 //Thread1:xnow2
In this example, x=2 at the end.
Possible to get x=1 or x=3 by different interleaving of
instructions (check left as exercise).
XJCO3221 Parallel Computation

Race conditions
This is known as a data race or a race condition.
Result of calculation depends on which thread reaches its
instructions first.
Non-deterministic, which is usually undesirable.
Only an issue for shared memory.
If each thread had its own x, there would be no race.
In this example, if each thread had its own x, x=1 for thread 0 and x=2 for thread 1 at the end, regardless of any interleaving.
XJCO3221 Parallel Computation

Read-only does not lead to a data race
For a race condition to arise, at least one thread must write to x. No race if all threads just read x.
There is no race for the following example, as both threads only read x:
For this example, x=0 (and a=1 and b=2) at the end.
XJCO3221 Parallel Computation

Sequential consistency
Have assumed each thread executes its instructions in order. i.e. have assumed sequential consistency.
Modern compilers often rearrange instructions to improve performance.
e.g. bring forward memory accesses, combine operations etc. The result is the same in serial.
However, multithreading can confuse compilers.
Can lead to unexpected results!
XJCO3221 Parallel Computation

The volatile keyword
You may read that the way to solve this is to declare variables as
volatile (in C/C++). However, this is only partially correct1. volatile is for special memory, such as that read/written
by external device (memory-mapped I/O).
The way compilers handle such variables is not guaranteed
to work for multithreading.
If this might be an issue, should use features specific to concurrent programming.
e.g. memory fences, std::atomic<> in C++11 etc. Will come to atomics next lecture and Lecture 18.
1S. Meyers, Effective modern C++ (O¡¯Reilly, 2015).
XJCO3221 Parallel Computation

Loop parallelism
Often we are required to parallelise loops. Known as loop parallelism.
If there are data dependencies, may have implicit data races within the loop.
There is no systematic way to parallelise loops.
How to remove a dependency depends on context.
Consider extra resources or loops to reach the correct solution.
For the remainder of this lecture, will give examples of loops with data dependencies, and how to overcome them.
XJCO3221 Parallel Computation

Example 1: Redundant variable
Consider the following serial code:
1 float temp , a[n], b[n], c[n];
2 … // Initialise arrays b and c
5 for( i=0; iCS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com