ECS 150 – Concurrency and threads
Prof. Joël Porquet-Lupine
UC Davis – 2020/2021
Copyright © 2017-2021 Joël Porquet-Lupine – CC BY-NC-SA 4.0 International License /
1 / 33
Concurrency
Definition
Concurrency is the composition of independently executing tasks
Tasks can start, run, complete in overlapping time periods
Opposite to sequential execution Process concurrency
Decompose complex problems into simple(r) ones
Make each simple one a process Resulting processes run concurrently
Example
IPC IPC
T1
T2
T1
T2
T1
T2
T1
T2
By default, gcc runs compilation tools sequentially, using intermediary files With option -pipe, gcc runs cpp | cc1 | as | ld
Tools run concurrently as independent but cooperating processes
2 / 33
/
Concurrency
Types of concurrency
Example of sequential execution
CPU and I/O bursts
CPU virtualization
CPU burst
I/O burst
T1
T2
Processes interleaved on same CPU
I/O concurrency
I/O bursts overlapped with CPU bursts
CPU CPU
CPU
Each task runs almost as fast as if it had its own computer
Total completion time reduced
CPU parallelism
Requires multiple CPUs Processes running simultaneously Speedup
CPU1 CPU2
3 / 33
/
Concurrency
Parallelism (short digression)
Real-life example
Parallelism is common in real life
A single sales employee sells $1M
Expectation that hiring another sales employee will generate $2M
Ideal speedup of 2
Seepdup
Speedup can be linear More often sublinear
Bottleneck in sharing resources, coordination overhead, etc.
Quite rarely superlinear
Caching effect, different algorithms
Speedup
8 7 6 5 4 3 2
1
Number of CPUs
12345678
4 / 33
/
Linear
Superlinear
Sublinear
Concurrency
Process concurrency limitations
Quite heavy for the OS to fork a process
Duplication of resources (address space, environment, execution flow) Slow context switch
E.g., some processor caches not shared between processes Difficulty to communicate between processes
Only IPCs, which all necessitate kernel intervention (syscalls)
Idea
Eliminate duplication of the address space and most of the environment Place concurrent computations within the same address space
Process 1 Process 2 Process 3
Process
Kernel
Kernel
5 / 33
/
Threads: introduction
Definition
One or more threads per process (i.e., per memory address space) Single execution sequence that represents a separately schedulable task Familiar programming model (sequential instruction of instructions)
Thread can be run or suspended at any time, independently from another Also known as lightweight process or task
Example
main()
threadA()
threadB()
threadA()
main()
int main(void) { statements;
…
thread_create(threadA);
thread_create(threadB);
…
}
void threadA(void) { statements;
… }
void threadB(void) { statements;
… }
Process
Kernel
Context switch
Context switch
Context switch
Context switch
6 / 33
/
Threads: introduction
Rationale
Problem structure
We think linearly
But the world is concurrent
Responsiveness
One thread to maintain quick response with user
Other thread(s) to execute longer tasks in the background, or block on I/O
Faster execution
Threads scheduled across different processors in a multi-processor system Achieve true parallelism
Sharing and communication
No need for heavy IPCs Use of shared memory
7 / 33
/
Threads: introduction
Example 1: Web server Monothreaded process
Multithreaded process
void webserver(void) { while (1) {
} }
uri = get_uri_from_req();
data = read_from_disk(uri);
resp = create_response(data);
send_response(resp);
int main(void) {
for (int i = 0; i < N; i++)
}
thread_create(webserver);
while (1) {
uri = get_uri_from_req(); data = read_from_disk(uri); resp = create_response(data); send_response(resp);
}
get_uri_from_req() read_from_disk()
disk latency
create_response() send_reponse()
network latency
get_uri_from_req() read_from_disk()
disk latency
create_response() send_reponse()
network latency
get_uri_from_req() read_from_disk()
get_uri_from_req() read_from_disk()
disk latency
create_response() send_reponse()
network latency
create_response() send_reponse()
network latency
8 / 33
/
Request #2
Request #1
Request #2
Request #1
Threads: introduction
Example 2: Array computation
Assuming a dual-processor system...
Monothreaded process
int a[n], b[n], c[n];
for (i = 0; i < n; i++) a[i] = b[i] * c[i];
CPU #1
a[0] = b[0]*c[0];
a[1] = b[1]*c[1];
...
a[n-1] = b[n-1]*c[n-1];
CPU #2
Multi-threaded process
void do_mult(int p, int m) { for (i = p; i < m; i++)
}
a[i] = b[i] * c[i];
int main(void) { thread_create(do_mult, 0, n/2); thread_create(do_mult, n/2, n); ...
}
CPU #1
a[0] = b[0]*c[0];
a[1] = b[1]*c[1];
...
a[n/2-1]
= b[n/2-1]*c[n/2-1];
CPU #2
a[n/2] = b[n/2]*c[n/2];
...
...
a[n-1] = b[n-1]*c[n-1];
Parallel computation
Not achievable by process forking
9 / 33
/
Threads: characteristics
Execution context
Threads have the exclusive use of the processor registers while executing Threads each have their own stacks
But no memory protection
When a thread is preempted, the registers are saved as part of its state
The next thread gets to use the processor registers
Process environment
All threads of a process share the same environment
Current working directory User ID, Group ID
File descriptors
Etc.
void thread(void) { printf("hello\n");
chdir("/");
}
int main(void) {
char dir[PATH_MAX];
...
dup2(STDOUT_FILENO, fd[1]); thread_create(thread);
...
printf("%s", getcwd(dir, PATH_MAX)); ...
}
10 / 33
/
Threads: characteristics
Address space
All process data can be accessed by any thread
Particularly global variables
Heap is also be shared (via pointers)
int global, *a;
void thread(int arg) {
int var = global + arg;
*a = var; }
int main(void) { global = 42;
a = malloc(sizeof(int)); thread_create(thread, 23); thread_create(thread, 42); ...
}
11 / 33
/
Threads: characteristics
Metadata structures
Process Control Block (PCB)
Process-specific information
PID, Owner, priority, current working directory, active thread, pointers to thread control blocks, etc.
Thread Control Block (TCB)
Thread-specific information
Stack pointer, PC, thread state, register values, pointer to PCB, etc.
[process address space]
PCB
Stack - th 1
Stack - th 2
Heap
Data
Instructions
PID
File descriptors Address space TCBs
TCB thread 1
SP
PC State Registers
TCB thread 2
SP
PC State Registers
12 / 33
/
Threads: characteristics
Differences between threads and processes
Thread
Has no code or data segment or heap of its own. Only has its own stack and set of registers.
Cannot live on its own: must live within a process. There can be more than one thread in a process - the original thread calls main() and retains the process's initial stack.
If it dies, its stack is reclaimed.
Depending on implementation, each thread can run on a different physical processor.
Communication between threads via implicit shared address space.
Inexpensive creation and context switch.
Process
Has code/data/heap and other segments of its own. Also has its own registers.
There must be at least one thread in a process. The thread that executes main() and uses the process's stack.
If it dies, its resources are reclaimed and all its internal threads die as well.
Each process can run on a different physical processor.
Communication between processes through kernel-managed IPCs
More expensive creation and context switch.
13 / 33
/
ECS 150 - Concurrency and threads
Prof. Joël Porquet-Lupine
UC Davis - 2020/2021
Copyright © 2017-2021 Joël Porquet-Lupine - CC BY-NC-SA 4.0 International License /
14 / 33
Recap
Concurrency
Composition of independently executing tasks
Opposite to sequential execution
Processes vs threads
Process concurrency has limitations
Slow context switch, difficult to communicate Concurrent threads within same process
Easier communication, parallel computation
Process 1 Process 2 Process 3
Parallelism
Specific type of concurrency Requires multiple CPUs
CPU1 CPU2
T1
T2
T1
T2
T1
T2
T1
T2
Process
Kernel
Kernel
15 / 33
/
Threads: models
API
Exact API varies depending on OS/library (e.g., POSIX pthreads)
/* Thread function prototype */
typedef void (*func_t)(void *arg);
/* Create new thread and return its TID */
thread_t thread_create(func_t func, void *arg);
/* Wait for thread @tid and retrieve exit value */
int thread_join(thread_t tid, int *ret); /* Yield to next available thread */
void thread_yield(void);
/* Exit and return exit value */
void thread_exit(int);
Implementation models
Kernel-level threads (one-to-one) User-level threads (many-to-one)
16 / 33
/
Threads: models
Kernel-level threads (one-to-one)
Kernel-level threads are threads which the OS knows about
Every process is composed of a least one kernel-level thread (main())
Kernel manages and schedules threads (along with processes) System calls to create, destroy, synchronize threads
E.g., clone() syscall on Linux
Switching between threads of same process requires a light context switch Values of CPU registers, PC and SP must be switched
Memory protection remains since threads share the same address space
Process
Kernel
17 / 33
/
Threads: models
Single-threaded processes Multi-threaded processes
18 / 33
/
Threads: models
Context switch procedure
Thread A stops running
That is, blocks (I/O), is interrupted (timer), or voluntarily yields (syscall) Mode switch to kernel mode
OS chooses new thread B to run OS switches from A to B
Saves thread A's state (save processor registers to A's TCB)
Restore new thread B's state (restore processor registers from B's TCB) OS returns to thread B
Mode switch to user mode New thread B is running
TA
TB
Kernel
User/kernel switch
context switches time
19 / 33
/
Threads: models
User-level threads (many-to-one)
User-level threads are threads which the OS does not know about
OS only knows and schedules processes, not threads within processes
Programmer uses a dedicated thread library to manage threads Functions to create, destroy, synchronize threads
User-level code can define scheduling policy
Switching between threads doesn't involve a (kernel-managed) context switch
Process
Kernel
20 / 33
/
Threads: models
Multi-threaded processes at user-level
21 / 33
/
Threads: models
Context switch procedure (sort of)
Thread A is running
Thread A is interrupted (by a signal), or voluntarily yields (function call) Library picks new thread B to run
Library saves thread A's state (to A's custom TCB)
Library restores thread B's state (from B's custom TCB)
New thread B is running
TA
Library
TB
context switches
Pitfall
Whole process is blocked if one thread blocks on I/O
time
22 / 33
/
Threads: models
Differences between kernel- vs user-level threads
Kernel-level thread Pros
Blocking system calls suspend the calling thread only (I/O)
Threads can run simultaneously on a multiprocessor system
Signals can usually delivered to specific threads
Used by existing systems, e.g. Linux
Cons
Can be heavy, not as flexible
User-level thread Pros
Really fast to create and switch between threads (no system calls or full context switches necessary)
May be an order of magnitude faster Customizable scheduler
Cons
All threads of process are blocked on system calls
Can use non-blocking versions, if they exist
Customizable scheduler
Why you can have millions of Goroutines but only thousands of Java Threads
23 / 33
/
Threads: models
One abstraction, many flavors
1. Single-threaded processes
Traditional application model
Mapping 1:1 with kernel-level threads
2. Multi-threaded processes with user- level threads
3. Multi-threaded processes with kernel-level threads
4. In-kernel threads (aka kernel threads)
24 / 33
/
Threads: models
One abstraction, many flavors
1. Single-threaded processes
2. Multi-threaded processes with user-level threads
Threads are managed in user- space
Mapping M:1 with kernel-level threads
Thread management through function calls
Scheduled by user-space library scheduler
TCBs in user-space library data structures
3. Multi-threaded processes with kernel-level threads
4. In-kernel threads (aka kernel threads)
25 / 33
/
Threads: models
One abstraction, many flavors
1. Single-threaded processes
2. Multi-threaded processes with user- level threads
3. Multi-threaded processes with kernel-level threads
Threads are managed by OS
Thread management through system calls
TCBs and PCBs in kernel
Scheduled by kernel scheduler
4. In-kernel threads (aka kernel threads)
26 / 33
/
Threads: models
One abstraction, many flavors
1. Single-threaded processes
2. Multi-threaded processes with user- level threads
3. Multi-threaded processes with kernel-level threads
4. In-kernel threads (aka kernel threads)
The kernel itself can be multi- threaded
E.g., idle thread, thread migration, OOM reaper, disk writeback, etc.
27 / 33
/
Threads: models
One more flavor: cooperative vs preemptive
Cooperative
Threads run until they yield control to another thread (aka fibers) The action of yielding is in the code itself
Better control of scheduling
Simpler reasoning about shared resources
Can lead to potential starvation on mono-core machines Need to reason further for multi-core machines
Preemptive
Indeterministic scheduling
Certain guarantee of fairness A Scheduling is not deterministic B
Threads are frequently interrupted and forced to yield AB
Resource sharing needs to be resistant to preemption
A B
28 / 33
/
Threads: models
POSIX threads
POSIX 1003.1c (aka pthreads) is an API for multithreaded programming standardized by IEEE as part of the POSIX standards
Multithreaded programs using pthreads are likely to run unchanged on a wide variety of UNIX-based systems
Interface
Only defines the interface
user-space implementation
or kernel space implementation
#include
void pthread_exit(void * status);
int pthread_join(pthread_t thread, void ** status); pthread_t pthread_self(void);
int pthread_equal(pthread_t thread_1, pthread_t thread_2); …
29 / 33
/
Threads: issues
Forking
void *thread_fcn(void *ptr)
{
char *msg = (char*)ptr; fork();
printf(“%s\n”, msg);
}
int main(void)
{
pthread_t t1;
char *msg1 = “Thread 1”; pthread_create(&t1, NULL, thread_fcn,
(void*)msg1); pthread_join(t1, NULL);
printf(“Done!\n”); return 0;
}
$ ./a.out Thread 1 Thread 1 Done!
fork() only clones the calling thread
Mixing multi-threading and forking is not recommended as it can lead to undesirable situations
30 / 33
/
Threads: issues
Sharing
Shared data structures and files
May lead to surprising results…
int a;
void *thread_fcn(void *ptr)
{
a++;
}
int main(void)
{
pthread_t t1, t2;
pthread_create(&t1, NULL,
thread_fcn, NULL);
pthread_create(&t2, NULL,
thread_fcn, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf(“a=%d\n”, a); return 0;
}
$ ./a.out a=2
$ ./a.out a=1
Will require use of synchronization mechanisms
Will see that in next topic!
31 / 33
/
Threads: issues
Signals
Signals can target specific threads (e.g., SIGSEGV)
Or target the entire process
Sent to first thread that doesn’t block them (e.g., SIGINT)
void sigsegv_hdl(int sig,
siginfo_t *siginfo,
void *context) { ucontext_t *c = (ucontext_t*)context;
… }
void *thread_fcn(void *ptr) {
*(NULL) = 42;
}
int main(void) {
struct sigaction act; pthread_t t1;
/* Install handler for segfaults */
act.sa_sigaction = &sigsegv_hdl;
act.sa_flags = SA_SIGINFO;
sigemptyset(&sa.sa_mask);
sigaction(SIGSEGV, &act, NULL);
pthread_create(&t1, NULL,
thread_fcn, NULL);
pthread_join(t1, NULL); return 0;
}
32 / 33
/
Threads: issues
Libraries
Global variables and non-reentrant library functions
Functions should be reentrant e.g., strtok_r()
No shared context between threads Global variables: Thread Local Storage
C11 extension
__thread int errno;
Provides one global variable copy per thread
void *thread_fcn(void *ptr)
{
char *msg = (char*)ptr; char *p = strtok(msg, ” “); while (p) {
if (write(1, p, strlen(p)) == -1) perror(“write”);
p = strtok(NULL, ” “);
}
}
int main(void)
{
pthread_t t1, t2;
char msg1[] = “Thread 1”; char msg2[] = “Thread 2”;
pthread_create(&t1, NULL, thread_fcn, (void*)msg1);
pthread_create(&t2, NULL, thread_fcn, (void*)msg2);
… }
33 / 33
/