Microsoft PowerPoint – Chapter 7 – Programming Shared Address Space Platforms
Introduction to
Parallel Computing
George Karypis
Programming Shared Address Space
Platforms
Outline
Shared Address-Space Programming
Models
Thread-based programming
POSIX API/Pthreads
Directive-based programming
OpenMP API
Shared Memory Programming
Communication is implicitly specified
Focus on constructs for expressing
concurrency and synchronization
Minimize data-sharing overheads
Commonly Used Models
Process model
All memory is local unless explicitly specified/allocated as shared.
Unix processes.
Light-weight process/thread model
All memory is global and can be accessed by all the threads.
Runtime stack is local but it can be shared.
POSIX thread API/Pthreads
Low-level & system-programming flavor to it.
Directive model
Concurrency is specified in terms of high-level compiler directives.
High-level constructs that leave some of the error-prone details to the
compiler.
OpenMP has emerged as a standard.
POSIX API/Pthreads
Has emerged as the de-facto standard
supported by most OS vendors.
Aids in the portability of threaded applications.
Provides a good set of functions that allow for
the creation, termination, and synchronization of
threads.
However, these functions are low-level and the API is
missing some high-level constructs for efficient data-
sharing
There are no collective communication operation like those
provided by MPI.
Pthreads Overview
Thread creation & termination
Synchronization primitives
Mutual exclusion locks
Conditional variables
Object attributes
Thread Creation & Termination
Computing the value of π
Synchronization Primitives
Access to shared variable need to be controlled to
remove race conditions and ensure serial semantics.
Mutual Exclusion Locks
Pthreads provide a special variable called a mutex lock that can be
used to guard critical sections of the program.
The idea is for a thread to acquire the lock before entering the critical
section and release on exit.
If the lock is already owned by another thread, the thread blocks until
the lock is released.
Lock represent serialization points, so too many locks can decrease
the performance.
Computing the minimum element of
an array.
Producer Consumer Queues
Conditional Variables
Waiting-queue like synchronization
principles.
Based on the outcome of a certain condition a
thread may attach itself to a waiting queue.
At a later point in time, another thread that
change the outcome of the condition, will
wake up one/all of the threads so that they
can see if they can proceed.
Conditional variables are always
associated with a mutex lock.
Conditional Variables API
Producer
Consumer
Example with
Conditional
Variables
Attribute Objects
Various attributes can be associated with threads, locks, and
conditional variables.
Thread attributes:
scheduling parameters
stack size
detached state
Mutex attributes:
normal
only a single thread is allowed to lock it.
if a threads tries to lock it twice a deadlock occurs.
recursive
a thread can lock the mutex multiple time.
each successive lock increments a counter and each successive release
decrements the counter.
a thread can lock a mutex only if its counter is zero.
errorcheck
like normal but an attempt to lock it again by the same thread leads to an error.
The book and the Posix thread API provide additional details.
OpenMP
A standard directive-based shared
memory programming API
C/C++/Fortran versions of the API exist
API consists of a set of compiler directive
along with a set of API functions.
Parallel Region
Parallel regions are specified by the parallel directive:
The clause list contains information about:
conditional parallelization
if (scalar expression)
degree of concurrency
num_threads (integer expression)
data handling
private (var list), firstprivate (var list), shared (var
list)
default(shared|private|none)
Reduction clause
Computing the value of π
Specifying concurrency
Concurrent tasks are specified using the
for and sections directives.
The for directive splits the iterations of a
loop across the different threads.
The sections directive assigns each thread
to explicitly identified tasks.
The for directive
An example
The loop index for the for directive is assumed to be private.
More one for directive
Loop scheduling schemes
schedule(static[, chunk-size])
splits the iterations into consecutive chucks of size chunk-size and
assigns them in round-robin fashion.
schedule(dynamic [, chunk-size])
splits the iterations into consecutive chunks of size chunk-size and
gives to each thread a chunk as soon as it finishes processing its
previous chunk.
schedule(guided [, chunk-size])
like dynamic but the chunk-size is reduced exponentially as each
chunk is dispatched to a thread.
schedule(runtime)
is determined by reading an environmental variable.
Restrictions on the for directive
For loops must not have break statements.
Loop control variables must be integers.
The initialization expression of the control
variable must be an integer.
The logical expression must be one of <
<=, >, >=.
The increment expression must have
integer increments and decrements.
The sections directive
Synchronization Directives
barrier directive
single/master directives
critical/atomic directives
ordered directive