Parallelism and Threads
Further reading in Chapter 12.1-12.4 and Parallelism slides part 1 on Edstem that includes data parallel SIMD examples
Content based upon Dr.
Copyright By PowCoder代写 加微信 powcoder
COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING
This material has been reproduced and communicated to you by or on behalf of the University of Sydney pursuant to Part VB of the Copyright Act 1968 (the Act).
The material in this communication may be subject to copyright under the Act. Any further copying or communication of this material by you may be the subject of copyright protection under the Act.
Do not remove this notice.
• Forms of parallelism § Task-parallelism
§ Data-parallelism
• Examples
• POSIX Threads § Introduction
§ Thread creation, termination
§ Passing arguments to threads
§ Thread synchronization upon termination (pthread_join) § Thread scheduling
§ Basic Synchronisation with mutex
Example: preparing a salad
Tasks to do:
§ tear leaves § wash leaves § chop leaves
§ wash § slice
§ apply dressing § mix
execution time (seconds)
Single Chef Salad (sequential)
tear leaves T1 wash leaves T2
chop leaves T3
wash tomatoes T4
slice tomatoes T5
apply dressing T6 mix T7
§ Preparing a salad consists of 7 tasks (T1- T7).
§ A single chef prepares a salad in processing the 7 tasks sequentially (one after the other).
§ It takes a single chef 142 seconds to prepare a salad.
§ We can speed up the process by making the chef work faster.
execution time (seconds)
40 chop leaves T3
Fast Single Chef Salad (sequential)
tear leaves T1
If we can convince the Chef to finish every task in 18
instead of 20 seconds, we will reduce the process to 7×18=126 seconds.
The chef agrees for Tasks T1-T6.
However, he cannot speed up Task T7 without spoiling the entire kitchen.
He refuses to work any faster than that.
We can speed up the process by bringing in an additional Chef that works in parallel to Chef 1.
wash leaves T2
80 apply dressing
wash tomatoes T4 slice tomatoes T5
execution time (seconds)
Two Chefs in Parallel Salad
Chef 2 washes the tomatoes (Task T4) while Chef 1 tears the lettuce (Task T1).
Task T1 is processed in parallel to Task T4. Processing tasks in parallel is called task-parallelism.
Another form of task- parallelism occurs between tasks T2 and T5.
There exist dependencies between tasks:
§ Vegetables must be washed before they are chopped/sliced.
§ Mixing (T7) can only be done after the dressing has been applied (T6).
Task-dependency graph
execution time (seconds)
Two Chefs in Parallel Salad
Dependencies between tasks do not allow further task-parallelism with this example.
A task-dependency graph shows dependencies between tasks.
§ directed acyclic graph (DAG)
§ graph nodes represent tasks
§ directed edge
indicates that Tb can only be processed after Ta has completed.
§ See example graph to the left. 8
execution time (seconds)
Two Chefs in Parallel (task and data parallelism)
§ Dependencies between tasks do not allow further task-parallelism with this example.
§ However, we can have several Chefs work in parallel on the “data” of a task.
§ This form of parallelism is called data-parallelism.
§ Example1: Chef 1 and Chef 2 chop one half of the lettuce each (Tasks T3a and T3b).
§ Example2: On the next slide more data-parallelism is introduced. 9
T1b T2b T3b
T1 || T4 means: Task T1 executes in parallel to Task T4.
execution time (seconds)
The final outcome:
§ Task parallelism: T1 || T4, T2 || T5
§ Data parallelism: T2a || T2b, T5a || T5b, T3a||T3b||T3c||T3d
§ We can still increase data parallelism with the lettuce by bringing in more Chefs. What about limits?
Definitions
Task: a computation that consists of a sequence of instructions. The computation is a distinct part of a program or algorithm. (That is, programs and algorithms can be de-composed into tasks).
Examples: “washing lettuce”, “initialize data-structures”, “sort array”, …
Task-parallelism: parallelism achieved from executing different tasks at the same time (i.e., in parallel).
Data-parallelism: performing the same task to different data-items at the same time.
Example: 2 Chefs slicing 1 tomato each. (tomato = data, slicing ~ task).
Definitions (cont.)
Dependencies: an execution order between two tasks Ta and Tb. Ta must complete before Tb can execute. Notation: TaàTb. Dependencies limit the amount of parallelism in an application.
Example task dependency graphs:
dependencies, no parallelism possible:
no dependencies, maximum parallelism:
Definitions (cont. cont.) Dependencies impose a partial ordering on the tasks:
Two tasks Ta and Tb can execute in parallel iff
1) there is no path in the dependence graph from Ta to Tb 2) there is no path in the dependence graph from Tb to Ta
Dependency is transitive:
TaàTb and TbàTc implies TaàTc
Say: Ta has to complete befor Tb, and Tb has to complete before Tc, therefore Ta has to complete before Tc.
What about this?
Who will do the actual parallelization ?
• The compiler?
§ Would be nice. Programmers could continue writing high-level language programs
§ The compiler would find a task-decomposition for a given multicore processor.
§ Unfortunately this approach does not work (yet).
• Esp. heterogeneous multiprocessors are difficult to program
• The speed-up gained from automatic parallelization is limited.
§ Parallelism from automatic parallelization is called implicit parallelism.
• The programmer?
§ Yes! (contents of this course)
• Needs to understand the program to find a task-decomposition.
• Needs to understand the hardware to achieve a task-decomposition that fits the underlying hardware.
• Needs to take care of communication & coordination among tasks.
§ Parallelism done by the programmer (her/him)self is called explicit parallelism.
§ The research community is working on programming languages and tools that ease this task.
Outline • Forms of parallelismü
§ Task-parallelismü
§ Data-parallelismü • Examples
Task Parallelism Example
Example: Assume we have a large data-set in an array and the task is to compute the minimum, average and maximum value. This task can be decomposed into 3 tasks: computing minimum (T1), average (T2), and maximum value (T3).
#define maxN 1000000000
int m[maxN];
int min = m[0]; int max = m[0]; double avrg = m[0];
for(i=1; i < maxN; i++) { if(m[i] < min)
min = m[i];
avrg = avrg + m[i]; if(m[i] > max)
max = m[i]; }
avrg = avrg / maxN;
#define maxN 1000000000 int m[maxN];
int i; int min = m[0]; for(i=1; i < maxN; i++) {
if(m[i] < min) min = m[i];
double avrg = m[0];
for(j=1; j < maxN; j++) { avrg = avrg + m[j];
avrg = avrg / maxN;
int k; int max = m[0]; for(k=1; k < maxN; k++) {
if(m[k] > max) max = m[k];
Dependence graph:
Note: if T1 – T3 should execute in parallel, then each task needs its own loop index variable (i, j, k)!
Task Parallelism Example (cont.)
§ The problem is now decomposed into three tasks T1, T2, T3.
#define maxN 1000000000 int m[maxN];
int i; int min = m[0]; for(i=1; i < maxN; i++) {
if(m[i] < min) min = m[i];
double avrg = m[0]; for(j=1; j < maxN; j++) {
avrg = avrg + m[j]; }
avrg = avrg / maxN;
int k; int max = m[0]; for(k=1; k < maxN; k++) {
if(m[k] > max) max = m[k];
Need a way to tell the compiler that tasks T1, T2 and T3 shall be executed in parallel.
§ However: still sequential (T1, then T2 then T3).
• Forms of parallelismü
§ Task-parallelismü § Data-parallelismü
• Examples ü
• POSIX Threads
§ Introduction
§ Thread creation, termination
§ Passing arguments to threads
§ Thread synchronization upon termination (pthread_join) § Thread scheduling
§ Basic Synchronisation with mutex
Introduction
• We need a way to tell the compiler that a sequence of instructions (=Task) shall be executed in parallel.
§ Examples: Tasks T1 (min), T2 (max).
• We need to be able to specify communication and synchronization between tasks.
§ Example communication: task T1 needs to communicate the computed minimum value (min) back to task T3 for output.
§ Example synchronization: task T3 can only output the min and max results after they have been received from task T1 and T2.
parallel min, max computations on integer array:
#define maxN 1000000000 int m[maxN];
int i; int min = m[0]; for(i=1; i < maxN; i++) {
if(m[i] < min) min = m[i];
int k; int max = m[0]; for(k=1; k < maxN; k++) {
if(m[k] > max) max = m[k];
printf(“min %d max %d\n”, min, max);
Introduction
• Communication with processes require specific IPC
§ The task is so small, why not just share access to the memory of that process to begin with?
• A thread is a series of instructions that are executed in the memory context of a process
• There has always been one thread per process described in this course
To make a task, we can create a new thread
• A thread shares the virtual memory of a process with all other threads
§ Shares heap § Shares static § Shares code
• A thread has it’s own thread id, program counter, registers, stack
The POSIX threads API
• API = Application Programming Interface: a set of functions, procedures or classes provided by an operating system or library for use by application programs.
• Many companies provide vendor-specific APIs to program threads.
• IEEE provides the 1003.1c-1995 POSIX API standard
§ also referred to as Pthreads (for “POSIX threads).
§ POSIX: Portable Operating System Interface (+X for “UniX”)
• Most operating systems support Pthreads
• POSIX threads allow us to create tasks
§ The sequence of instructions of a task becomes a POSIX thread.
§ POSIX threads can communicate and synchronize with each other.
§ The operating system schedules threads (i.e., lets them execute). Works both on uniprocessors and multicores.
• Studying Pthreads will make it easy for you to work with other types of threads.
• Pthreads widely used today, e.g., with
Our first POSIX Thread
• To compile this program, you need to specify –lpthread
§ Tells GCC to link in the POSIX thread library.
• ps –eLf | grep
§ the main thread
• function main()
§ my_thread
• function threadF()
#include
void* threadF (void *arg) {
printf(“Hello from our first POSIX thread.\n”); sleep(60);
int main(void){
pthread_create(&my_thread, NULL, threadF, NULL);
printf(“Hello from the main thread.\n”);
pthread_join(my_thread, NULL);
return 0; }
~]$ gcc –o hello hello.c -lpthread
Our first POSIX Thread (cont.)
• The main thread of program hello calls pthread_create() to create the POSIX thread ‘my_thread’.
• threadF() is the pointer to the function that this thread will execute.
• After its creation, the new thread starts
executing its thread function (threadF). thread.
start of my_thread
hello main thread
pthread_create()
hello my_thread
• my_thread and the main thread execute in parallel.
• The main thread calls pthread_join() to wait for my_thread to terminate.
§ The call to pthread_join() returns after my_thread has terminated.
§ We say that the main thread is blocked (has to wait).
pthread_join()
wait for termination of my_thread
termination of my_thread
Execution Indeterminism
hello main thread
hello my_thread
Here we know
that my_thread
For threads that execute in parallel, no assumption about the statement execution order between threads is possible!
pthread_create()
pthread_join()
Here we know that my_thread has
terminated.
pthread_create()
int pthread_create(
pthread_t *tid,
const pthread_attr_t *atrr,
void * (*start_routine) (void *), void *arg
// Purpose: create a new thread.
// pointer to thread ID
// pointer to thread attributes
// pointer to function that the thread will execute // pointer to argument
run-time argument passing
Arguments:
1. ThethreadIDvariablethatwillbeusedforthecreatedthread. § (We need a way to refer to the thread, e.g., when we want to join it).
2. Thethread’sattributes(explainedonsubsequentslides). § NULL specifies default attributes.
3. Thefunctionthatthethreadwillexecuteonceitiscreated (start_routine).
§ void * (*start_routine) (void *) is a function pointer to a function that § has void * as its return value
§ accepts one void * argument
4. Theargumentthatwillbepassedtostart_routine()atrun-time. 28
pthread_create()
Once created, threads are peers
• may create other threads
• no implied hierarchy or dependency between threads
• No assumption on execution order of siblings possible.
§ Example: assume thread2 creates thread3 and then thread4.
§ You must not assume that thread3 will start execution before thread 4. § Except when applying Pthread scheduling mechanisms
main() thread
pthread_ create
pthread_ create
pthread_ create
pthread_ create
pthread_ create
pthread_ create
Passing Arguments to a Thread Function § pthread_create() allows us to pass one argument to a thread.
• Inside the loop several threads are created.
• The same thread function (tfunc) is used for each thread.
§ Common scenario with data parallelism.
• Thread argument passing is used to differentiate between threads.
• Example:
§ tell each thread on which part of a data array it shall work.
§ Tell each cook which tomato to
1 #include
2 #include
4 pthread_t threadIDs[NUMT]; /* thread IDs */
5 void * tfunc (void * p) {
// thread work function
int main(void) { inti;
for (i=0; i < NUMT; i++) { pthread_create(&threadIDs[i], NULL,
tfunc, ??? ); pthread_join(threadIDs[i], NULL);
14 for (i=0; i < NUMT; i++) {
17 return 0; 18 }
Passing Arguments to a Thread Function (cont.)
• In Line 4 we declare an array of integer variables. Each thread has its own “private” array element as argument.
• Each thread will receive a pointer to its “private” array element as argument.
§ In Line 14, &args[i] determines the address of the ith integer variable.
§ Taking the address of an integer variable gives us an integer pointer
àtype-cast to a void *
§ In Line 7, the thread function castsbackthevoid*toan int*.
• Dereference to get the actual argument value. 31
1 #include
2 #include
4 int args[NUMT];
5 pthread_t threadIDs[NUMT];
/* argument array */
/* thread IDs */ printf(“thread arg is %d\n”, * (int *) p );
6 void * tfunc (void * p) { 7
16 for (i=0; i < NUMT; i++) {
int main(void) { int i;
pthread_join(threadIDs[i], NULL);
for (i=0; i < NUMT; i++) {
args[i] = i; /* init arg for thread #i */
pthread_create(&threadIDs[i], NULL, tfunc, (void *) &args[i] );
19 return 0; 20 }
0 1 2 ... NUMT-1
What will not work...
• In Line 12, a pointer to the loop index variable is passed as argument to each thread.
• Instead of setting up a unique parameter for each thread, the loop index variable is the shared argument between all threads.
• Leads to disaster quickly...
§ The threads may not access the parameter fast enough to read the value “assigned” to them.
§ See, e.g., sleep (10)...
#include
/* thread IDs */
printf(“thread arg is %d\n”, * (int *) p );
void * tfunc (void * p) { sleep(10);
15 for (i=0; i < NUMT; i++) {
int main(void) { int i;
for (i=0; i < NUMT; i++) { pthread_create(&threadIDs[i], NULL,
tfunc, (void *) &i ); pthread_join(threadIDs[i], NULL);
18 return 0; 19 }
What will not work... (cont.) main thread
tfunc tfunc thread0 thread1
#include
/* thread IDs */
printf(“thread arg is %d\n”, * (int *) p );
void * tfunc (void * p) { sleep(10);
15 for (i=0; i < NUMT; i++) {
int main(void) {
for (i=0; i < NUMT; i++) {
pthread_create(&threadIDs[i], NULL, tfunc, (void *) &i );
pthread_join(threadIDs[i], NULL);
time thread0 dereferences the pointer to variable i, the value has already been changed by the main thread. Thread0 reads the wrong value (2 instead of 0). Thread1 also reads 2àtwo chefs start chopping tomato nr. 2. Ouch!
i=0 // intended for thread0
pthread_ create()
i++ // 1, intended for thread1
pthread_ create()
pthread_ create()
printf (..., * (int *) p )
printf (..., * (int *) p )
Our first Race-condition
§ The correctness of the previous program depends on how fast the created threads manage to read variable i (reading must happen before the main thread updates i, otherwise a wrong value is read).
§ (Assume there was no sleep(10) statement).
§ Execution speed among threads is determined by the operating system that
schedules the threads.
§ Depends on system load also.
Race Condition: “a situation in which multiple threads read and write a shared data item and the final result depends on the relative timing of their execution”.
From the previous example:
§ shared data item: variable i
§ final result: what value thread 0 reads as its parameter (0 or >0).
§ relative timing: when and how fast thread 0 and the main thread execute relative to each other.
Passing more than one argument to a thread
#include
To pass more than one argument to a thread…
Create a struct that can hold multiple arguments (Lines 4-7).
Create a variable of that struct type (lines 13-15).
Pass a pointer to this struct variable to pthread_create (Line 17).
In the thread function, cast the void * back to a pointer to the struct (Line 9).
struct multi_arg { int range;
int tomato;
void * tfunc (void * p) {
struct multi_arg * ptr = (struct multi_arg *) p; printf(“range: %d, tomato %d\n”,
ptr->range, ptr->tomato); 12 int main(void) {
13 struct multi_arg ma; 14 ma.range = 1;
15 ma.tomato = 2;
16 pthread_create(&threadID, NULL, &tfunc, 17 (void *) &ma );
18 19 20 }
pthread_join(threadID, NULL); return 0;
pthread_join()
int pthread_join( pthread_t tid, void ** status,
// Purpose: wait for a thread to terminate. // thread ID we are waiting for
// exit status
Note: with pthread_create() we are passing a pointer to a thread ID as argument. With pthread_join() it is the
Arguments:
thread ID itself!
1. ThethreadIDofthethreadwearewaitingfor.
2. Thecompletionstatusoftheexitingthreadwillbecopiedinto*status, unless status is NULL, in which case the completion status is ignored.
§ Once a thread is joined, the thread no longer exists. Its thread ID is no longer valid, and it cannot be joined again, e.g. with another thread!
pthread_join() (cont.)
§ Joining threads is a mechanism to accomplish synchronization between threads.
§ pthread_join blocks the calling thread until the specified thread terminates.
§ A thread can only join one pthread_join call. It is a logical error to attempt to join a thread several times, e.g., from different threads.
§ Further synchronization methods will be discussed later § mutexes, condition variables
Thread termination
There are three ways to
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com