[537] Threads
Concurrency:
Locks
Questions answered in this lecture:
Review threads and mutual exclusion for critical sections
How can locks be used to protect shared data structures such as linked lists?
Can locks be implemented by disabling interrupts?
Can locks be implemented with loads and stores?
Can locks be implemented with atomic hardware instructions?
Are spinlocks a good idea?
UNIVERSITY of WISCONSIN-MADISON
Computer Sciences Department
CSE 2431
Systems 2
Based on slides by Andrea C. Arpaci-Dusseau
Remzi H. Arpaci-Dusseau
CPU 1
CPU 2
running
thread 1
running
thread 2
RAM
PageDir A
PageDir B
…
PTBR
PTBR
CODE
HEAP
Virt Mem
(PageDir B)
IP
IP
SP
SP
Review:
Which registers store the same/different values across threads?
CPU 1
CPU 2
running
thread 1
running
thread 2
RAM
PageDir A
PageDir B
…
PTBR
PTBR
CODE
HEAP
Virt Mem
(PageDir B)
IP
IP
SP
SP
STACK 1
STACK 2
Review: What is needed for CORRECTNESS?
Balance = balance + 1;
Instructions accessing shared memory must execute as uninterruptable group
Need instructions to be atomic
mov 0x123, %eax
add %0x1, %eax
mov %eax, 0x123
critical section
More general:
Need mutual exclusion for critical sections
if process/thread A is in critical section C, process/thread B can’t be also
(okay if other processes do unrelated work)
Implementing Synchronization
Build higher-level synchronization primitives in OS
Operations that ensure correct ordering of instructions across threads
Motivation: Build them once and get them right
Monitors
Semaphores
Condition Variables
Locks
Loads
Stores
Test&Set
Disable Interrupts
Lock Implementation Goals
Correctness
Mutual exclusion
Only one thread in critical section at a time
Progress (deadlock-free)
If several simultaneous requests, must allow one to proceed
Bounded (starvation-free)
Must eventually allow each waiting thread to enter
Fairness
Each thread waits for (roughly) same amount of time
Performance
CPU is not used unnecessarily (e.g., spinning)
Implementing Synchronization
To implement, need atomic operations
Atomic operation: No other instructions can be interleaved (this means: no other instructions can be executed after the atomic operation begins and before it ends)
Examples of atomic operations
Code between interrupts on uniprocessors
Disable timer interrupts, don’t do any I/O
Loads and stores of words
Load r1, B
Store r1, A
Special hw instructions
Test&Set
Compare&Swap
Implementing Locks: W/ Interrupts
Turn off interrupts for critical sections
Prevent dispatcher from running another thread
Code between interrupts executes atomically
Void acquire(lockT *l) {
disableInterrupts();
}
Void release(lockT *l) {
enableInterrupts();
}
Disadvantages??
Only works on uniprocessors
Process can keep control of CPU for arbitrary length
Cannot perform other necessary work
Implementing LOCKS: w/ Load+Store
Code uses a single shared lock variable
Boolean lock = false; // shared variable
Void acquire(Boolean *lock) {
while (*lock) /* wait */ ;
*lock = true;
}
Void release(Boolean *lock) {
*lock = false;
}
Why doesn’t this work? Example schedule that fails with 2 threads?
Race Condition with LOAD and STORE
*lock == 0 initially
Thread 1 Thread 2
while(*lock == 1)
while(*lock == 1)
*lock = 1
*lock = 1
Both threads grab lock!
Problem: Testing lock and setting lock are not atomic
Peterson’s Algorithm
Assume only two threads (tid = 0, 1) and use just loads and stores
int turn = 0; // shared
Boolean lock[2] = {false, false}; // also shared
Void acquire() {
lock[tid] = true;
turn = 1-tid;
while (lock[1-tid] && turn == 1-tid) /* wait */ ;
}
Void release() {
lock[tid] = false;
}
Different Cases:
All work
Lock[0] = true;
turn = 1;
while (lock[1] && turn ==1);
Only thread 0 wants lock
Lock[0] = true;
turn = 1;
while (lock[1] && turn ==1);
Thread 0 and thread 1 both want lock;
Lock[1] = true;
turn = 0;
while (lock[0] && turn == 0);
Different Cases:
All Work
Lock[0] = true;
turn = 1;
while (lock[1] && turn ==1);
Thread 0 and thread 1 both want lock
Lock[1] = true;
turn = 0;
while (lock[0] && turn == 0);
Different Cases:
All Work
Lock[0] = true;
turn = 1;
while (lock[1] && turn ==1);
while (lock[1] && turn ==1);
Thread 0 and thread 1 both want lock;
Lock[1] = true;
turn = 0;
while (lock[0] && turn == 0);
Peterson’s Algorithm:
Intuition
Mutual exclusion: Enter critical section if and only if
Other thread does not want to enter
Other thread wants to enter, but your turn
Progress: Both threads cannot wait forever at while() loop
Completes if other process does not want to enter
Other process (matching turn) will eventually finish
Bounded waiting (not shown in examples)
Each process waits at most one critical section
Problem: doesn’t work on modern hardware
(cache-consistency issues)
xchg: atomic exchange,
or test-and-set
// xchg(int *addr, int newval)
// return what was pointed to by addr
// at the same time, store newval into addr
int xchg(int *addr, int newval) {
int old = *addr;
*addr = newval;
return old;
}
LOCK Implementation with XCHG
typedef struct__lock_t {
int flag;
} lock_t;
void init(lock_t *lock) {
lock->flag = ??;
}
void acquire(lock_t *lock) {
????;
// spin-wait (do nothing)
}
void release(lock_t *lock) {
lock->flag = ??;
}
XCHG Implementation
typedef struct __lock_t {
int flag;
} lock_t;
void init(lock_t *lock) {
lock->flag = 0;
}
void acquire(lock_t *lock) {
while(xchg(&lock->flag, 1) == 1) ;
// spin-wait (do nothing)
}
void release(lock_t *lock) {
lock->flag = 0;
}
Other Atomic HW Instructions
int CompareAndSwap(int *addr, int expected, int new) {
int actual = *addr;
if (actual == expected)
*addr = new;
return actual;
}
void acquire(lock_t *lock) {
while(CompareAndSwap(&lock->flag, ?, ?)
== ?) ;
// spin-wait (do nothing)
}
Other Atomic HW Instructions
int CompareAndSwap(int *addr, int expected, int new) {
int actual = *addr;
if (actual == expected)
*addr = new;
return actual;
}
void acquire(lock_t *lock) {
while(CompareAndSwap(&lock->flag, 0, 1)
== 1) ;
// spin-wait (do nothing)
}
Lock Implementation Goals
Correctness
Mutual exclusion
Only one thread in critical section at a time
Progress (deadlock-free)
If several simultaneous requests, must allow one to proceed
Bounded (starvation-free)
Must eventually allow each waiting thread to enter
Fairness
Each thread waits for same amount of time
Performance
CPU is not used unnecessarily
spin
spin
spin
spin
Basic Spinlocks are Unfair
A
B
0
20
40
60
80
100
120
140
160
A
B
A
B
A
B
lock
lock
unlock
lock
unlock
lock
unlock
lock
unlock
Scheduler is independent of locks/unlocks
Spinlock Performance
Fast when…
– many CPUs
– locks held a short time
– advantage: avoid context switch
Slow when…
– one CPU
– locks held a long time
– disadvantage: spinning is wasteful
spin
spin
spin
spin
spin
CPU Scheduler is Ignorant
A
B
0
20
40
60
80
100
120
140
160
C
D
A
B
C
D
lock
unlock
lock
CPU scheduler may run B instead of A
even though B is waiting for A (B is waiting
for lock which A has)
Spinlock Performance
Waste…
Without yield: O(threads * time_slice)
With yield: O(threads * context_switch)
So even with yield, spinning is slow with high thread contention (that is, when many threads are contending for the lock)
Next improvement: Block and put thread on waiting queue instead of spinning