[537] Threads
Concurrency:
Threads
Questions answered in this lecture:
Why is concurrency useful?
What is a thread and how does it differ from processes?
What can go wrong if scheduling of critical sections is not atomic?
CSE 2431
Systems 2
Based on slides by Andrea C. Arpaci-Dusseau
Remzi H. Arpaci-Dusseau
Announcements
Reading:
Chapter 26
Review: Easy Piece 1
Virtualization
CPU
Memory
Context Switch – Dispatcher
Schedulers
Segmentation
Paging
TLBs
Multilevel
Swapping
Allocation
http://cacm.acm.org/magazines/2012/4/147359-cpu-db-recording-microprocessor-history/fulltext
Motivation for Concurrency
Motivation
CPU Trend (starting about 2005): Same speed, but multiple cores
Goal: Write applications that fully utilize many cores
Option 1: Build apps from many communicating processes
Example: Chrome (process per tab)
Communicate via pipe() or similar
Pros?
Don’t need new abstractions; good for security
Cons?
Cumbersome programming
High communication overheads
Expensive context switching (why expensive?)
CONCURRENCY:
Option 2
New abstraction: thread
Threads are like processes, except:
Multiple threads of same process share an address space
Divide large task across several cooperative threads
Communicate through shared address space
Common Programming Models
Multi-threaded programs tend to be structured as:
Producer/consumer
Multiple producer threads create data (or work) that is handled by one of the multiple consumer threads
Pipeline
Task is divided into series of subtasks, each of which is handled in series by a different thread
Defer work with background thread
One thread performs non-critical work in the background (when CPU idle)
CPU 1
CPU 2
running
thread 1
running
thread 2
RAM
What state do threads share?
CPU 1
CPU 2
running
thread 1
running
thread 2
RAM
PageDir A
PageDir B
…
What threads share page directories?
CPU 1
CPU 2
running
thread 1
running
thread 2
RAM
PageDir A
PageDir B
…
PTBR
PTBR
CPU 1
CPU 2
running
thread 1
running
thread 2
RAM
PageDir A
PageDir B
…
PTBR
PTBR
CPU 1
CPU 2
running
thread 1
running
thread 2
RAM
PageDir A
PageDir B
…
PTBR
PTBR
IP
IP
Do threads share Instruction Pointer?
CPU 1
CPU 2
running
thread 1
running
thread 2
RAM
PageDir A
PageDir B
…
PTBR
PTBR
CODE
HEAP
…
Virt Mem
(PageDir B)
IP
IP
CPU 1
CPU 2
running
thread 1
running
thread 2
RAM
PageDir A
PageDir B
…
PTBR
PTBR
CODE
HEAP
…
Virt Mem
(PageDir B)
IP
IP
Share code, but each thread may be executing
different code at the same time
Different Instruction Pointers
CPU 1
CPU 2
running
thread 1
running
thread 2
RAM
PageDir A
PageDir B
…
PTBR
PTBR
CODE
HEAP
…
Virt Mem
(PageDir B)
IP
IP
CPU 1
CPU 2
running
thread 1
running
thread 2
RAM
PageDir A
PageDir B
…
PTBR
PTBR
CODE
HEAP
…
Virt Mem
(PageDir B)
IP
IP
SP
SP
Do threads share stack pointer?
CPU 1
CPU 2
running
thread 1
running
thread 2
RAM
PageDir A
PageDir B
…
PTBR
PTBR
CODE
HEAP
Virt Mem
(PageDir B)
IP
IP
SP
SP
STACK 1
STACK 2
CPU 1
CPU 2
running
thread 1
running
thread 2
RAM
PageDir A
PageDir B
…
PTBR
PTBR
CODE
HEAP
Virt Mem
(PageDir B)
IP
IP
SP
SP
STACK 1
STACK 2
threads executing different functions (or even
different instances of the same function) need different stacks
summary
For two or more threads (which belong to the same process):
Share address space
Share PTBR (separate registers, but with same Page Table Base address)
Share code
Share heap
DO NOT share registers (IP or other registers) or stack
If two or more threads do not belong to the same process, they do not share any of the above (except that independent processes may share parts of their address spaces, as discussed previously)
THREAD VS. Process
Multiple threads within a single process share:
Process ID (PID)
Address space
Code (instructions)
Most data (heap)
Open file descriptors
Current working directory
User and group id
Each thread has its own
Thread ID (TID)
Set of registers, including Program counter and Stack pointer
Stack for local variables and return addresses
(in same address space)
THREAD API
Variety of thread systems exist
POSIX Pthreads
Common thread operations
Create
Exit
Join (instead of wait() for processes)
OS Support:
Approach 1
User-level threads: Many-to-one thread mapping
Implemented by user-level runtime libraries
Create, schedule, synchronize threads at user-level
OS is not aware of user-level threads
OS thinks each process contains only a single thread of control
Advantages
Does not require OS support; Portable
Can tune scheduling policy to meet application demands
Lower overhead thread operations since no system call
Disadvantages?
Cannot leverage multiprocessors; thread library can only schedule one thread at a time, so no concurrent execution
Entire process blocks when one thread blocks
OS Support:
Approach 2
Kernel-level threads: One-to-one thread mapping
OS provides each user-level thread with a kernel thread
Each kernel thread scheduled independently, so concurrency possible
Thread operations (creation, scheduling, synchronization) performed by OS
Advantages
Each kernel-level thread can run in parallel on a multiprocessor
When one thread blocks, other threads from process can be scheduled
Disadvantages
Higher overhead for thread operations; system calls required
OS must scale well with increasing number of threads
one schedule
Thread SchedulE #1
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
%eax: ?
%rip: 0x195
State:
0x9cd4: 100
%eax: ?
%rip = 0x195
process
control
blocks:
T1
%eax: ?
%rip: 0x195
balance = balance + 1; balance at 0x9cd4 (virtual)
Thread SchedulE #1
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
State:
0x9cd4: 100
%eax: 100
%rip = 0x19a
process
control
blocks:
T1
%eax: ?
%rip: 0x195
%eax: ?
%rip: 0x195
Thread SchedulE #1
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
State:
0x9cd4: 100
%eax: 101
%rip = 0x19d
process
control
blocks:
T1
%eax: ?
%rip: 0x195
%eax: ?
%rip: 0x195
Thread SchedulE #1
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
State:
0x9cd4: 101
%eax: 101
%rip = 0x1a2
process
control
blocks:
T1
%eax: ?
%rip: 0x195
%eax: ?
%rip: 0x195
Thread SchedulE #1
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
State:
0x9cd4: 101
%eax: 101
%rip = 0x1a2
process
control
blocks:
T1
%eax: ?
%rip: 0x195
%eax: ?
%rip: 0x195
Thread Context Switch
Thread SchedulE #1
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
State:
0x9cd4: 101
%eax: ?
%rip = 0x195
process
control
blocks:
T2
%eax: 101
%rip: 0x1a2
%eax: ?
%rip: 0x195
Thread SchedulE #1
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
State:
0x9cd4: 101
%eax: 101
%rip = 0x19a
process
control
blocks:
T2
%eax: 101
%rip: 0x1a2
%eax: ?
%rip: 0x195
Thread SchedulE #1
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
State:
0x9cd4: 101
%eax: 102
%rip = 0x19d
process
control
blocks:
T2
%eax: 101
%rip: 0x1a2
%eax: ?
%rip: 0x195
Thread SchedulE #1
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
State:
0x9cd4: 102
%eax: 102
%rip = 0x1a2
process
control
blocks:
T2
%eax: 101
%rip: 0x1a2
%eax: ?
%rip: 0x195
Thread SchedulE #1
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
State:
0x9cd4: 102
%eax: 102
%rip = 0x1a2
process
control
blocks:
T2
%eax: 101
%rip: 0x1a2
%eax: ?
%rip: 0x195
Final value 2 greater than original.
Desired Result!
Another schedule
Thread SchedulE #2
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
%eax: ?
%rip: 0x195
State:
0x9cd4: 100
%eax: ?
%rip = 0x195
process
control
blocks:
T1
%eax: ?
%rip: 0x195
Thread SchedulE #2
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
%eax: ?
%rip: 0x195
State:
0x9cd4: 100
%eax: 100
%rip = 0x19a
process
control
blocks:
T1
%eax: ?
%rip: 0x195
Thread SchedulE #2
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
%eax: ?
%rip: 0x195
State:
0x9cd4: 100
%eax: 101
%rip = 0x19d
process
control
blocks:
T1
%eax: ?
%rip: 0x195
Thread Context Switch
Thread SchedulE #2
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
%eax: 101
%rip: 0x19d
State:
0x9cd4: 100
%eax: ?
%rip = 0x195
process
control
blocks:
T2
%eax: ?
%rip: 0x195
Thread SchedulE #2
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
%eax: 101
%rip: 0x19d
State:
0x9cd4: 100
%eax: 100
%rip = 0x19a
process
control
blocks:
T2
%eax: ?
%rip: 0x195
Thread SchedulE #2
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
%eax: 101
%rip: 0x19d
State:
0x9cd4: 100
%eax: 101
%rip = 0x19d
process
control
blocks:
T2
%eax: ?
%rip: 0x195
Thread SchedulE #2
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4A
Thread 1
Thread 2
%eax: 101
%rip: 0x19d
State:
0x9cd4: 101
%eax: 101
%rip = 0x1a2
process
control
blocks:
T2
%eax: ?
%rip: 0x195
Thread SchedulE #2
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
%eax: 101
%rip: 0x19d
State:
0x9cd4: 101
%eax: 101
%rip = 0x1a2
process
control
blocks:
T2
%eax: ?
%rip: 0x195
Thread Context Switch
Thread SchedulE #2
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
%eax: 101
%rip: 0x19d
State:
0x9cd4: 101
%eax: 101
%rip = 0x19d
process
control
blocks:
T1
%eax: 101
%rip: 0x1a2
Thread Context Switch
Thread SchedulE #2
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
%eax: 101
%rip: 0x19d
State:
0x9cd4: 101
%eax: 101
%rip = 0x19d
process
control
blocks:
T1
%eax: 101
%rip: 0x1a2
Thread SchedulE #2
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
%eax: 101
%rip: 0x1a2
State:
0x9cd4: 101
%eax: 101
%rip = 0x1a2
process
control
blocks:
T1
%eax: 101
%rip: 0x1a2
Thread SchedulE #2
0x195 mov 0x9cd4, %eax
0x19a add $0x1, %eax
0x19d mov %eax, 0x9cd4
Thread 1
Thread 2
%eax: 101
%rip: 0x1a2
State:
0x9cd4: 101
%eax: 101
%rip = 0x1a2
process
control
blocks:
T1
%eax: 101
%rip: 0x1a2
WRONG Result! Final value of balance is 101
Timeline View
Thread 1 (add 1) Thread 2 (add 2)
mov 0x123, %eax
add %0x1, %eax
mov %eax, 0x123
mov 0x123, %eax
add %0x2, %eax
mov %eax, 0x123
How much is added to shared variable?
3: correct!
Timeline View
Thread 1 Thread 2
mov 0x123, %eax
add %0x1, %eax
mov 0x123, %eax
mov %eax, 0x123
add %0x2, %eax
mov %eax, 0x123
How much is added?
2: incorrect!
Timeline View
Thread 1 Thread 2
mov 0x123, %eax
mov 0x123, %eax
add %0x2, %eax
add %0x1, %eax
mov %eax, 0x123
mov %eax, 0x123
How much is added?
1: incorrect!
Timeline View
Thread 1 Thread 2
mov 0x123, %eax
add %0x2, %eax
mov %eax, 0x123
mov 0x123, %eax
add %0x1, %eax
mov %eax, 0x123
How much is added?
3: correct!
Timeline View
Thread 1 Thread 2
mov 0x123, %eax
add %0x2, %eax
mov 0x123, %eax
add %0x1, %eax
mov %eax, 0x123
mov %eax, 0x123
How much is added?
2: incorrect!
Non-Determinism
Concurrency leads to non-deterministic results
Not deterministic result: different results even with same inputs
race conditions: access to shared data is the key
Whether bug manifests depends on CPU schedule!
Passing tests means little
How to program: imagine scheduler is malicious
Assume scheduler will pick bad ordering at some point…
What do we want?
Want 3 instructions to execute as an uninterruptable group
That is, we want them to be atomic
mov 0x123, %eax
add %0x1, %eax
mov %eax, 0x123
critical section
More generally:
Need mutual exclusion for critical sections
if process A is in critical section C, process B can’t be
(okay if other processes do unrelated work, that is, work not
in critical sections)
Synchronization
Build higher-level synchronization primitives in OS
Operations that ensure correct ordering of instructions across threads
Motivation: Build them once and get them right
Monitors
Semaphores
Condition Variables
Locks
Loads
Stores
Test&Set
Disable Interrupts
Locks
Goal: Provide mutual exclusion (mutex)
Three common operations:
Allocate and Initialize
Pthread_mutex_t mylock = PTHREAD_MUTEX_INITIALIZER;
Acquire
Acquire exclusion access to lock;
Wait if lock is not available (some other process in critical section)
Spin or block (relinquish CPU) while waiting
Pthread_mutex_lock(&mylock);
Release
Release exclusive access to lock; let another process enter critical section
Pthread_mutex_unlock(&mylock);
Conclusions
Concurrency is needed to obtain high performance by utilizing multiple cores
Threads are multiple execution streams within a single process or address space (share PID and address space, own registers and stack)
Context switches within a critical section can lead to non-deterministic bugs (race conditions)
Use locks (or other mechanisms) to provide mutual exclusion