CS代考 CMPSC 311 – Introduction to Systems Programming

CMPSC 311 – Introduction to Systems Programming
Introduction to Concurrency
Suman Saha
(Slides are mostly by Professors Patrick McDaniel and )

Copyright By PowCoder代写 加微信 powcoder

CMPSC 311 – Introduction to Systems Programming

Sequential Programming
• Processing a network connection as it arrives and fulfilling the exchange completely is sequential processing
• i.e., connections are processed in sequence of arrival
listen_fd = Listen(port); while(1) {
client_fd = accept(listen_fd); buf = read(client_fd); write(client_fd, buf); close(client_fd);
CMPSC 311 – Introduction to Systems Programming

Whither sequential?
• Benefits
• simple to implement
• very little persistent state to maintain • few complex error conditions
• Disadvantages
• Sometimes poor performance
• one slow client causes all others to block • poor utilization of network, CPU
Think about it this way: if the class took the final exam sequentially, it would take 25 days to complete
CMPSC 311 – Introduction to Systems Programming

An alternate design …
• Why not process multiple requests at the same time, interleaving processing while waiting for other actions (e.g., read requests) to complete?
• This is known as concurrent processing … • Process multiple requests concurrently
• Approaches to concurrent server design • Asynchronous servers (select() )
• Multiprocess servers (fork())
• Multithreaded servers (pthreads)
CMPSC 311 – Introduction to Systems Programming

Concurrency with processes
• The server process blocks on accept(), waiting for a new client to connect
• when a new connection arrives, the parent calls fork() to create another process
• the child process handles that new connection, and exit()’s when the connection terminates
• Children become “zombies” after death • wait() to “reap” children
CMPSC 311 – Introduction to Systems Programming

Graphically
client client client
server server
server server server
fork() child
CMPSC 311 – Introduction to Systems Programming

• The fork function creates a new process by duplicating the calling process. pid_t fork(void);
• The new child process is an exact duplicate of the calling parent process, except that it has its own process ID and pending signal queue
• The fork() function returns
• 0 (zero) for the child process OR …
• The child’s process ID in the parent code
Idea: think about duplicating the process state and running ….
CMPSC 311 – Introduction to Systems Programming

Process control
• fork (pid == child PID)
• wait for child to complete (maybe)
• begins at fork (pid == 0) • runs until done
• calls exit
CMPSC 311 – Introduction to Systems Programming

• The exit causes normal process termination with a return value void exit(int status);
• status is sent to the to the parent
• Note: exit vs. return from a C program • return is a language keyword
• returns (possibly a value) from a function
• exit is a function, eventually calls _exit a system call
• terminates process immediately, returns status to parent • exit and return are similar in main function
CMPSC 311 – Introduction to Systems Programming

• The wait function is used to wait for state changes in a child of the calling process (e.g., to terminate)
pid_t wait(int *status);
• returns the process ID of the child process
• status is return value set by the child process
CMPSC 311 – Introduction to Systems Programming

Putting it all together …
int main(void) {
printf(“starting parent process\n”); pid_t pid = fork();
if (pid < 0) { perror(“fork failed”); exit(EXIT_FAILURE); if (pid == 0) { printf(“child processing\n”); sleep(50); printf(“parent forked a child with pid = %d\n”, pid); int status; wait(&status); printf(“child exited with status = %d\n”, WEXITSTATUS(status)); return 0; } CMPSC 311 - Introduction to Systems Programming Concurrency with processes • Benefits • almost as simple as sequential • in fact, most of the code is identical! • parallel execution; good CPU, network utilization • often better security (isolation) • Disadvantages • processes are heavyweight • relatively slow to fork • context switching latency is high • communication between processes is complicated CMPSC 311 - Introduction to Systems Programming Concurrency with threads • A single process handles all of the connections • ... but ... a parent thread forks (dispatches) a new thread to handle each connection • the child thread: • handles the new connection • exits when the connection terminates Note: you can create as many threads as you want (up to a system limit) CMPSC 311 - Introduction to Systems Programming • A thread is defined as an independent stream of instructions that can be scheduled to run as such by the operating system. • To the software developer, the concept of a "procedure" that runs independently from its main program. • To go one step further, imagine a main program that contains a number of procedures. Now imagine all of these procedures being able to be scheduled to run simultaneously and/or independently by the operating system. That would describe a "multi-threaded" program. Idea: “forking” multiple threads of execution in one process! CMPSC 311 - Introduction to Systems Programming Multiple Processes vs Multiple Threads thread 1 thread 2 thread N registers stack code process 1 process 2 process N Multiple processes registers stack code registers registers stack stack registers stack registers stack code CMPSC 311 - Introduction to Systems Programming Threads (cont.) CMPSC 311 - Introduction to Systems Programming Threads (cont.) CMPSC 311 - Introduction to Systems Programming Threads (cont.) pthread_create( ) CMPSC 311 - Introduction to Systems Programming Threads (cont.) CMPSC 311 - Introduction to Systems Programming Threads (cont.) client client pthread_create( ) CMPSC 311 - Introduction to Systems Programming Threads (cont.) shared data structures CMPSC 311 - Introduction to Systems Programming Threads (cont.) UNIX Process ... and with threads CMPSC 311 - Introduction to Systems Programming Threads (cont.) • This independent flow of control is accomplished because a thread maintains its own: • Stack pointer • Registers • Scheduling properties (such as policy or priority) • Set of pending and blocked signals • Thread specific data. CMPSC 311 - Introduction to Systems Programming Thread Summary • Exists within a process and uses the process resources • Has its own independent flow of control as long as its parent process exists and the OS supports it • Duplicates only the essential resources it needs to be independently “schedulable“ • May share the process resources with other threads that act equally independently (and dependently) • Dies if the parent process dies - or something similar • Is "lightweight" because most of the overhead has already been accomplished through the creation of its process. CMPSC 311 - Introduction to Systems Programming • Because threads within the same process share resources: • Changes made by one thread to shared system resources (such as closing a file) will be seen by all other threads. • Two pointers having the same value point to the same data. • Reading and writing to the same memory locations is possible, and therefore requires explicit synchronization by the programmer. Warning: shared data between threads can cause conflicts, deadlocks, etc. CMPSC 311 - Introduction to Systems Programming Thread control • pthread_create() (create thread) • wait for thread to finish via pthread_join() (maybe) • begins at function pointer • runs until the return or pthread_exit() • Library support • #include • Compile with option –lpthread to link with the pthread library
CMPSC 311 – Introduction to Systems Programming

pthread_create()
• The pthread_create function starts a new thread in the calling process.
int pthread_create(pthread_t *thread,
const pthread_attr_t *attr,
void *(*start_routine) (void *), void *arg);
• thread is a pthread library structure holding thread info • attr is a set of attributes to apply to the thread
• start_routine is the thread function pointer
• arg is an opaque data pointer to pass to thread
CMPSC 311 – Introduction to Systems Programming

Thread with no arguments
void *func(void *arg) {
printf(“Hello from thread %lx\n”, pthread_self()); return NULL;
int main(void) {
pthread_t t1, t2, t3;
printf(“main thread %lx starting a new thread\n”, pthread_self()); pthread_create(&t1, NULL, func, NULL);
pthread_create(&t2, NULL, func, NULL);
pthread_create(&t3, NULL, func, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
pthread_join(t3, NULL);
• Always check return values (omitted for brevity)
• Thread becomes alive in pthread_create – may even run before it returns
CMPSC 311 – Introduction to Systems Programming

Thread with one argument
void *func(void *arg) {
char *s = (char *) arg; printf(“Hello from thread %s\n”, s); return NULL;
int main(void) {
pthread_t t1, t2, t3; pthread_create(&t1, NULL, func, “a”); pthread_create(&t2, NULL, func, “b”); pthread_create(&t3, NULL, func, “b”); pthread_join(t1, NULL); pthread_join(t2, NULL); pthread_join(t3, NULL);
• Run the above program in a loop to observe indeterminate scheduling
CMPSC 311 – Introduction to Systems Programming

Thread with multiple arguments
typedef struct { int num;
const char *str; } foo_t;
void *func(void *arg) {
foo_t *val = arg;
printf(“thread %lx was passed values %d, %s\n”, pthread_self(), val->num, val->str); return NULL;
int main(void) {
foo_t v = {5678, “bar”};
pthread_t t;
printf(“main thread %lx starting a new thread\n”, pthread_self()); pthread_create(&t, NULL, func, &v);
pthread_join(t, NULL);
• The above is effectively a procedure call – real programs are more complex
CMPSC 311 – Introduction to Systems Programming

pthread_join()
• The pthread_join function waits for the thread specified by thread to terminate.
int pthread_join(pthread_t thread, void **retval);
• thread is a pthread library structure holding thread info • retval is a double pointer return value
CMPSC 311 – Introduction to Systems Programming

Returning values from a thread
typedef struct { int num;
char *str;
void *func(void *arg) { foo_t *a = arg;
foo_t *b = malloc(sizeof(foo_t)); b->num = a->num * 2;
b->str = malloc(strlen(a->str) + 1); strcpy(b->str, a->str);
for (char *p = b->str; *p; ++p) *p = toupper(*p);
return b; }
int main(void) {
foo_t v = {5678, “bar”}, *p;
pthread_t t;
printf(“main thread %lx starting a new thread\n”, pthread_self()); pthread_create(&t, NULL, func, &v);
pthread_join(t, (void **) &p);
printf(“thread returned num = %d, str = %s\n”, p->num, p->str); return 0;
CMPSC 311 – Introduction to Systems Programming

Returning values from a thread
typedef struct { int num;
const char *str; } foo_t;
void *func(void *arg) { foo_t p;
return &p; }
int main(void) {
foo_t v = {5678, “bar”}, *p;
pthread_t t;
printf(“main thread %lx starting a new thread\n”, pthread_self()); pthread_create(&t, NULL, func, &v);
pthread_join(t, (void **) &p);
printf(“thread returned num = %d, str = %s\n”, p->num, p->str); return 0;
• The above will segfault! Do not return a pointer to a stack-allocated variable
CMPSC 311 – Introduction to Systems Programming

pthread_exit()
• The pthread_exit function terminates the calling thread and returns a value
void pthread_exit(void *retval); • retval is a pointer to a return value
CMPSC 311 – Introduction to Systems Programming

Threads accessing shared data
static int counter = 0;
void *func(void *arg) {
for (int i = 0; i < 5000; ++i) ++counter; return NULL; int main(void) { pthread_t t1, t2; printf(“counter = %d\n”, counter); pthread_create(&t1, NULL, func, NULL); pthread_create(&t2, NULL, func, NULL); pthread_join(t1, NULL); pthread_join(t2, NULL); printf(“both threads completed, counter = %d\n”, counter); return 0; • What will this program output? CMPSC 311 - Introduction to Systems Programming What is happening? A race condition! • Race condition happens when the outcome of a program depends on the interleaving of the execution of multiple threads accessing critical section • Critical section is a piece of code that accesses a shared variable and must not be concurrently executed by more than one thread mov 0x2e50(%rip),%eax # 4014 add $0x1,%eax
mov %eax,0x2e47(%rip) # 4014
• Each instruction executed atomically
• Multiple threads executing the above instructions can result in different interleavings (and outcomes) due to uncontrollable OS scheduling
CMPSC 311 – Introduction to Systems Programming

One Possible Interleaving
CMPSC 311 – Introduction to Systems Programming

Avoiding race conditions
• To avoid race conditions we need to ensure that only a single thread can execute a critical section at any given time
• For simple cases we can use atomics (#include ) • modifying a variable results in a single CPU instruction
• In general, however, a critical section may contain complex logic
• We need primitives for mutual exclusion – a guarantee that only one thread
is executing the critical section while others are prevented from doing so
• One way to achieve mutual exclusion is using locks: lock_t mutex
lock(&mutex) critical section unlock(&mutex)
CMPSC 311 – Introduction to Systems Programming

Threads accessing shared data
• Fixing race condition using atomics
#include
static atomic_int counter = 0;
void *func(void *arg) {
for (int i = 0; i < 5000; ++i) ++counter; return NULL; int main(void) { pthread_t t1, t2; printf(“counter = %d\n”, counter); pthread_create(&t1, NULL, func, NULL); pthread_create(&t2, NULL, func, NULL); pthread_join(t1, NULL); pthread_join(t2, NULL); printf(“both threads completed, counter = %d\n”, counter); return 0; CMPSC 311 - Introduction to Systems Programming Threads accessing shared data – Fixed! • Fixing race condition using mutexes static int counter = 0; pthread_mutex_lock_t lock = PTHREAD_MUTEX_INITIALIZER; void *func(void *arg) { for (int i = 0; i < 5000; ++i) { pthread_mutex_lock(&lock); ++counter; pthread_mutex_unlock(&lock); return NULL; int main(void) { pthread_t t1, t2; printf(“counter = %d\n”, counter); pthread_create(&t1, NULL, func, NULL); pthread_create(&t2, NULL, func, NULL); pthread_join(t1, NULL); pthread_join(t2, NULL); printf(“both threads completed, counter = %d\n”, counter); return 0; CMPSC 311 - Introduction to Systems Programming Threads tradeoffs • Benefits • still the case that much of the code is identical! • parallel execution; good CPU, network utilization • lower overhead than processes • shared-memory communication is possible • Disadvantages • synchronization is complicated • shared fate within a process; one rogue thread can hurt you • security (no isolation) • We scratched the surface – more advanced usage will be taught in 473 CMPSC 311 - Introduction to Systems Programming 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com