More Concurrency Exercises and OpenMP
Shuaiwen Leon Song
Concurrency and Synchronization Exercise
2
How many lines of output will it produce?
Answer: 7
Q1(fork)
3
Q2 (sempahores)
Consider the following program using semaphores . Which of the following execution order of sem_wait and sem_post are possible (W represent sem_wait, P represent sem_post)?
(A) WPWWWP (B) WWPPWW (C) PPWWWW (D) WPWPWW (E) WWWWPP
4
Will the following programs create zombie processes?
Q3 (Zombie Processes)
5
Q4 (Scalability)
Consider a multithreaded webserver running on a machine with N CPU cores. The server has M worker threads. Every incoming request is put in a request queue, and served by one of the free worker threads. The server is fully saturated and has a certain throughput at saturation. Under which circumstances will increasing M might lead to an approximate linear increase in the saturation throughput of the server?
(A) Under any circumstances
(B) Under no circumstances
(C) when M < N and the workload to the server is CPU-bound (D) when M < N and the workload to the server is I/O-bound
6
Please choose all possible outputs of the following program.
Q5 (unintentional share or not)
A. 0123456789 B. 0234156789 C. 0987123456 D. 8905674321 E. 0114459876 F. 1124465789
7
In the program below, which variables are shared among threads? Which are not?
Q6 (variable scope)
8
Q6 (false sharing vs. true sharing)
Consider the program below, is there any potential performance issue with it? If there is, how to fix it? Note that the two functions are concurrently executed on two different processor cores, each of which has its own cache hierarchy.
What about
static struct foo f[3];
9
False Sharing vs True Sharing
True sharing: true sharing, would require programmatic synchronization constructs to ensure ordered data access.
False sharing: The frequent coordination required between processors when cache lines are marked ‘Invalid’ requires cache lines to be written to memory and subsequently loaded. False sharing increases this coordination and can significantly degrade application performance.
OpenMP programs are known to be prone to false sharing.
10
Q7 (semaphore mechanism)
If a process performs a sem_post(sem_t ∗sem) operation on a counting semaphore whose value is zero, and there are one or more other processes waiting on the semaphore, how many of the processes could be unblocked?
A. Exactly one process
B. All processes
C. Zero
D. Could be any value from one to N (the number of blocked process)
11
Q8 ( comparison across different synchronizations)
Which of the following descriptions about semaphores and spinlocks are correct.
A. Spinlocks waste CPU time (busy waiting) while semaphores almost do not.
B. A semaphore has a counter and could be acquired by multiple threads while a
spinlock could only be acquired by one thread.
C. When threads hold locks for a short period of time and the contention
between threads is not significant, Spinlocks are preferred due to no context
switch overheads.
D. When threads hold locks for a short period of time and the contention
between threads is not significant, Semaphores are preferred since spinlock’s busy-waiting will waste CPU cycles.
12
Please write out all the possible printouts of the following program A and B.
Q9 (fork() and waitpid(-1))
13
Q10 (thread detachment)
(I) pthread_detach(t); pthread_exit(NULL); (II) pthread_detach(t); return 0;
(III) pthread_detach(t); exit(0); (IV)pthread_join(t, NULL); return 0;
14
Q11 (unintentional sharing)
Please consider the following program. The program's original intention was to print out “hello from thread 0”, 'hello from thread 1' all the way to 'hello from thread 9' (note that the order of the printout should be random). However, there are several bugs in the code below. Please list all the bugs appeared in this code and explain why they cause problems. Please provide a fix to the code so it can run correctly.
15
› Door opening and locking
Q12 (Synchronization using mutex)
16
Q13 (writing a barrier)
17
OpenMP Basics
18
› What is OpenMP?
- Open specification for Multi-Processing
- “Standard” API for defining multi-threaded shared-memory programs - openmp.org – Talks, examples, forums, etc.
- computing.llnl.gov/tutorials/openMP/
- portal.xsede.org/online-training
- www.nersc.gov/assets/Uploads/XE62011OpenMP.pdf
› High-level API
- Preprocessor (compiler) directives ( ~ 80% ) - Library Calls ( ~ 19% )
- Environment Variables ( ~ 1% )
Introduction to OpenMP
19
A Programmer’s View of OpenMP › OpenMP is a portable, threaded, shared-memory programming
specification with “light” syntax
- Exact behavior depends on OpenMP implementation! - Requires compiler support (C, C++ or Fortran)
› OpenMP will:
- Allow a programmer to separate a program into serial regions and parallel
regions, rather than T concurrently-executing threads.
- Hide stack management
- Provide synchronization constructs
› OpenMP will not:
- Parallelize automatically
- Guarantee speedup
- Provide freedom from data races
20
int main() {
// Do this part in parallel
printf( "Hello, World!\n" );
return 0; }
Motivation – OpenMP
21
int main() {
omp_set_num_threads(16);
// Do this part in parallel
#pragma omp parallel
{
printf( "Hello, World!\n" ); }
return 0; }
Motivation – OpenMP
22
Programming Model
Because OpenMP is designed for shared memory parallel programming, it largely limited to single node parallelism. Typically, the number of processing elements (cores) on a node determine how much parallelism can be implemented.
23
› OpenMP easily parallelizes loops
- Requires: No data dependencies (reads/write or write/write pairs) between iterations!
› Preprocessor calculates loop bounds for each thread directly from serial source
#pragma omp parallel for
for( i=0; i < 25; i++ ) {
printf(“Foo”); }
?
Programming Model – Concurrent Loops
?
24
Motivation of using OpenMP
25
› No better materials then LLNL OpenMP tutorial.
› We are now going to go through some of the important
knowledge points using the tutorial:
› https://computing.llnl.gov/tutorials/openMP/
LLNL OpenMP Tutorial
26