[537] Concurrency Bugs
Concurrency Bugs
Questions answered in this lecture:
Why is concurrent programming difficult?
What types of concurrency bugs occur?
How to fix atomicity bugs (with locks)?
How to fix ordering bugs (with condition variables)?
How does deadlock occur?
How to prevent deadlock (with waitfree algorithms, grab all locks atomically, trylocks, and ordering across locks)?
CSE 2431
Systems 2
Based on slides by Andrea C. Arpaci-Dusseau
Remzi H. Arpaci-Dusseau
Concurrency in Medicine: Therac-25 (1980’s)
“The accidents occurred when the high-power electron beam was activated instead of the intended low power beam, and without the beam spreader plate rotated into place. Previous models had hardware interlocks in place to prevent this, but Therac-25 had removed them, depending instead on software interlocks for safety. The software interlock could fail due to a race condition.”
“…in three cases, the injured patients later died.”
Source: http://en.wikipedia.org/wiki/Therac-25
Lu etal. Study:
For four major projects, search for concurrency bugs among >500K bug reports. Analyze small sample to identify common types of concurrency bugs.
Source: http://pages.cs.wisc.edu/~shanlu/paper/asplos122-lu.pdf
Concurrency Study from 2008
OpenOffice
Apache
Atomicity MySQL Apache Mozilla OpenOffice 12 7 29 3 Order MySQL Apache Mozilla OpenOffice 1 6 15 2 Deadlock MySQL Apache Mozilla OpenOffice 9 4 16 2 Other MySQL Apache Mozilla OpenOffice 1 0 0 1
Bugs
Atomicity: MySQL
Thread 1:
if (thd->proc_info) {
…
fputs(thd->proc_info, …);
…
}
What’s wrong?
Thread 2:
thd->proc_info = NULL;
Test (thd->proc_info != NULL) and set (writing to thd->proc_info)
should be atomic
Fix Atomicity Bugs with Locks
Thread 1:
pthread_mutex_lock(&lock);
if (thd->proc_info) {
…
fputs(thd->proc_info, …);
…
}
pthread_mutex_unlock(&lock);
Thread 2:
pthread_mutex_lock(&lock);
thd->proc_info = NULL;
pthread_mutex_unlock(&lock);
Lu etal. Study:
For four major projects, search for concurrency bugs among >500K bug reports. Analyze small sample to identify common types of concurrency bugs.
Source: http://pages.cs.wisc.edu/~shanlu/paper/asplos122-lu.pdf
Concurrency Study from 2008
OpenOffice
Apache
Atomicity MySQL Apache Mozilla OpenOffice 12 7 29 3 Order MySQL Apache Mozilla OpenOffice 1 6 15 2 Deadlock MySQL Apache Mozilla OpenOffice 9 4 16 2 Other MySQL Apache Mozilla OpenOffice 1 0 0 1
Bugs
Ordering: Mozilla
Thread 1:
void init() {
…
mThread =
PR_CreateThread(mMain, …);
…
}
Thread 2:
void mMain(…) {
…
mState = mThread->State;
…
}
What’s wrong?
Thread 1 sets value of mThread needed by Thread2
How to ensure that reading mThread happens after mThread initialization?
Fix Ordering bugs with Condition variables
Thread 2:
void mMain(…) {
…
Mutex_lock(&mtLock);
while (mtInit == 0)
pthread_cond_wait(&mtCond, &mtLock);
Mutex_unlock(&mtLock);
mState = mThread->State;
…
}
Thread 1:
void init() {
…
mThread = PR_CreateThread(mMain, …);
pthread_mutex_lock(&mtLock);
mtInit = 1;
pthread_cond_signal(&mtCond);
pthread_mutex_unlock(&mtLock);
…
}
Lu etal. Study:
For four major projects, search for concurrency bugs among >500K bug reports. Analyze small sample to identify common types of concurrency bugs.
Source: http://pages.cs.wisc.edu/~shanlu/paper/asplos122-lu.pdf
Concurrency Study from 2008
OpenOffice
Apache
Atomicity MySQL Apache Mozilla OpenOffice 12 7 29 3 Order MySQL Apache Mozilla OpenOffice 1 6 15 2 Deadlock MySQL Apache Mozilla OpenOffice 9 4 16 2 Other MySQL Apache Mozilla OpenOffice 1 0 0 1
Bugs
Deadlock
Deadlock: No progress can be made because two or more threads are waiting for the other to take some action and thus neither ever does
“Cooler” name: the deadly embrace (Dijkstra)
STOP
STOP
STOP
STOP
STOP
STOP
STOP
STOP
A
STOP
STOP
STOP
STOP
A
B
STOP
STOP
STOP
STOP
A
B
STOP
STOP
STOP
STOP
A
B
who goes?
STOP
STOP
STOP
STOP
A
B
STOP
STOP
STOP
STOP
A
B
STOP
STOP
STOP
STOP
STOP
STOP
STOP
STOP
A
B
C
D
STOP
STOP
STOP
STOP
A
B
C
D
who goes?
STOP
STOP
STOP
STOP
A
B
C
D
who goes?
STOP
STOP
STOP
STOP
A
B
C
D
Deadlock!
Code Example
Thread 2:
lock(&B);
lock(&A);
Thread 1:
lock(&A);
lock(&B);
Can deadlock happen with these two threads?
Circular Dependency
Lock A
Lock B
Thread 1
Thread 2
holds
holds
wanted
by
wanted
by
Fix Deadlocked Code
Thread 2
lock(&A);
lock(&B);
Thread 1
lock(&A);
lock(&B);
Thread 2:
lock(&B);
lock(&A);
Thread 1:
lock(&A);
lock(&B);
How would you fix this code?
Non-circular Dependency (fine)
Lock A
Lock B
Thread 1
Thread 2
holds
wanted
by
wanted
by
What’s Wrong?
set_t *set_intersection (set_t *s1, set_t *s2) {
set_t *rv = Malloc(sizeof(*rv));
Mutex_lock(&s1->lock);
Mutex_lock(&s2->lock);
for(int i=0; i
if(set_contains(s2, s1->items[i])
set_add(rv, s1->items[i]);
Mutex_unlock(&s2->lock);
Mutex_unlock(&s1->lock);
}
Encapsulation
Modularity can make it harder to see deadlocks
Thread 1:
rv = set_intersection(setA, setB);
Thread 2:
rv = set_intersection(setB, setA);
Solution?
if (m1 > m2) {
// grab locks in high-to-low address order
pthread_mutex_lock(m1);
pthread_mutex_lock(m2);
} else {
pthread_mutex_lock(m2);
pthread_mutex_lock(m1);
}
Any other problems?
Code assumes m1 != m2 (not same lock)
Deadlock Theory
Deadlocks can only happen with these four conditions:
– mutual exclusion
– hold-and-wait
– no preemption
– circular wait
Eliminate deadlock by eliminating any one condition
STOP
STOP
STOP
STOP
A
B
C
D
Mutual Exclusion
Def:
Threads claim exclusive control of resources that they require (e.g., thread grabs a lock);
This means that if one thread has a resource, no other thread can have the same resource until the first thread releases it; therefore, if a thread has a resource that another thread requires, the second thread must wait for the resource.
Wait-Free Algorithms
Strategy: Eliminate locks!
Try to replace locks with atomic primitive:
int CompAndSwap(int *addr, int expected, int new)
Returns 0: fail, 1: success
void add (int *val, int amt) {
do {
int old = *val;
} while(!CompAndSwap(val, ??, old+amt);
}
void add (int *val, int amt) {
Mutex_lock(&m);
*val += amt;
Mutex_unlock(&m);
}
?? old
Deadlock Theory
Deadlocks can only happen with these four conditions:
– mutual exclusion
– hold-and-wait
– no preemption
– circular wait
Eliminate deadlock by eliminating any one condition
Hold-and-Wait
Def:
Threads hold resources allocated to them (e.g., locks they have already acquired) while waiting for additional resources (e.g., locks they wish to acquire).
Eliminate
Hold-and-Wait
Strategy: Acquire all locks atomically once
Can release locks over time, but cannot acquire again until all have been released
How to do this? Use a meta lock, like this:
lock(&meta);
lock(&L1);
lock(&L2);
… // Critical section code
unlock(…); //unlock locks besides meta lock
unlock(&meta);
Disadvantages?
Must know ahead of time which locks will be needed
Must be conservative (acquire any lock possibly needed)
Degenerates to just having one big lock
Deadlock Theory
Deadlocks can only happen with these four conditions:
– mutual exclusion
– hold-and-wait
– no preemption
– circular wait
Eliminate deadlock by eliminating any one condition
No preemption
Def:
Resources (e.g., locks) cannot be forcibly removed from threads that are holding them (that is, the OS cannot preempt, or forcibly take away, a resource that a thread holds).
Support (Pseudo-)Preemption
Strategy: if thread can’t get what it wants, release what it holds; provide a trylock() system call.
top:
lock(A);
if (trylock(B) == -1) {
unlock(A);
goto top;
}
…
Disadvantages?
Livelock:
no processes make progress, but the state of involved processes constantly changes
Deadlock Theory
Deadlocks can only happen with these four conditions:
– mutual exclusion
– hold-and-wait
– no preemption
– circular wait
Eliminate deadlock by eliminating any one condition
Circular Wait
Def:
There exists a circular chain of threads such that each thread holds a resource (e.g., lock) being requested by next thread in the chain.
Eliminating
Circular Wait
Strategy:
– decide which locks should be acquired before others (that is, decide order of lock acquisition)
– if A before B, never acquire A if B is already held!
– document this, and write code accordingly
Works well if system has distinct layers
Summary
-When in doubt about correctness, better to limit concurrency (i.e., add unnecessary lock)
-Concurrency is hard, encapsulation makes it harder!
-Have a strategy to avoid deadlock and stick to it
-Choosing a lock order is probably most practical