CS计算机代考程序代写 data structure database chain compiler concurrency cache algorithm Slide 1

Slide 1

Operating Systems

Hubertus Franke

.edu

CSCI-GA.2250-001

Concurrency

Deadlocks

For illustration
purpose only

Potential Deadlock

I need quad
A and B

I need quad
B and C

I need quad
C and D

I need quad
D and A

Actual Deadlock

HALT until B
is free

HALT until C
is free

HALT until D
is free

HALT until A
is free

Inter Process Communication (IPC)

• Processes often need to work
together or at the very least share
resources.

Issues:
• Send information
• Mitigate contentions over resources
• Synchronize dependencies

Inter-Process Communication

Race Conditions:
result depends on exact order of processes running

Two processes want to access shared memory at same time:
Read is typically not an issue but write or conditional execution IS

Process will first inquire
about free slot (read), then
write to the free slot and
finally increment the free
slot index.

Intra Process Communication

• Threads often need to work together and
access resources (e.g. memory) in the
common address space.

Same Issues:
• Send information
• Mitigate contentions over resources
• Synchronize dependencies

Example: Multi-threaded Webserver

• Dispatcher thread deposits work into
queue of requests to be processed

• Worker threads will pick work from the
queue

• Read-Modify-Write cycles are typically an issue

• For instance, you read a variable , make a decision and
modify the variable.

• Examples:

• These will have problems if the same code is executed by
two threads concurrently.

• Consider the race and preemption case with 2 threads

Expectation is that code has “consistent view” at data

Read-Modify-Write Cycles

If ( var == 0 ) {

/*do something */

var = 1;

}

ldw r3, @var

cmpwi r3, 0

bne L1

stwi @var, 1

/* some code */

L1:

If ( var == 0 ) {

var = 1;

/*do something */

}

read modify-write race or preemption

Critical Regions

Four conditions to prevent errors:

1. No two processes simultaneously in critical region

2. No assumptions made about speeds or numbers of CPUs

3. No process running outside its critical region may block
another process

4. No process must wait forever to enter its critical region

Mutual Exclusion: Only one process (thread)
at a time can access any shared variables,

memory, or resources.

Critical Regions

Mutual exclusion using critical regions

Mutual Exclusion with Busy Waiting

Simplest solution?

• How about disabling interrupt?

• User program has too much privilege.

• Won’t work in SMP (multi-core systems)

Lock variable?

• If the value of the lock variable is 0 then
process sets it to “1” and enters. Other
processes have to wait.

• What’s the problem here?
–> READ-MODIFY-WRITE cycle

• Using a simple Lock Variable

• Both Threads assume now they hold the lock

Read-Modify-Write Cycles
During Locks

L1: ldw r3, @var

cmpwi r3, 0

bnz L1

stwi @var, 1

/* lock acquired */

var=0

var=1

L1: ldw r3, @var

cmpwi r3, 0

bnz L1

stwi @var, 1

/* some code */

L1:

r3 still sees “0”

Thread-1 Thread-2

Context-switch

Mutual Exclusion with Busy Waiting

Process 0 Process 1

Proposed solution to critical region problem

But that’s just a toy and in reality not useful

Peterson’s solution for achieving mutual exclusion

Mutual Exclusion with Busy Waiting

Figure 2-25. Entering and leaving a critical region
using the TSL instruction.

The TSL (Test and Set Lock) Instruction

• With a “little help” from Hardware

Figure 2-26. Entering and leaving a critical region
using the XCHG instruction.

The XCHG Instruction
(other arch have cmp_and_swap)

Load/Store Conditional
• Modern processors resolve cmp_and_swap through the

cache coherency and instructions

• Reservation (ldwx) remembers ONE address on a CPU
and verifies on (stwx) whether still held otherwise
store will fail

• Reservation is lost on interrupts or if another CPU
steals cacheline holding (see discussion in
first lecture on cache coherency) or if another lwdx is
issued.

L1: ldwx r3, @lockvar // load r3 and set reservation register on CPU with &lockvar
add r3,r3,1 // r3 = r3+1
stwx r3, @lockvar // store conditionally r3 back to lockvar if reservation is still held
bcond L1 // if store conditionally failed try again

Solutions so far

• Both TSL (xchg/cmpswp) and
Peterson’s solutions are correct, but
they

– rely on busy-waiting during “contention”

– waste CPU cycles

– Priority Inversion problem

Priority Inversion Problem

• Higher priority process can be
prevented from entering a critical
section (CS) because the lock variable is
dependent on a lower priority process.

– Priority P1 < Priority P2 – P1 is in its CS but P1 is never scheduled. Since P2 is busy in busy-waiting using the CPU cycles Lock Contention • Lock Contention arises when a process/thread attempts to acquire a lock and the lock is not available. • This is a function of – frequency of attempts to acquire the lock – Lock hold time (time between acquisition and release) – Number of threads/processes acquiring a lock • Lock contention is a function of lock hold time and lock acquisition frequency. • If lock contention is low, TSL is an OK solution. • The linux kernel uses it extensively for many locking scenarios. • Alternative approaches will be discussed next. Concurrency • Reminder – Concurrency is having multiple contexts of execution not necessarily running at the exact same time. – Parallelism is having multiple contexts of execution running at the exact same time. Producer-Consumer problem Buffer with N slots • Producer: (1 .. j) – How do I know a slot is free ? – How do I know a slot became free ? • Consumer: (1 .. k) – How do I know nothing is available? – How do I know something became available ? P1 PJ : C1 CK :buffer How can mutual exclusion solutions be used in every day OSs? • Example: – UNIX: ls –ls | grep “yooh” | awk ‘{ print $1 }’ /usr/bin/ls -ls /usr/bin/grep “yooh” /usr/bin/awk ‘print $1’ Process-1 Process-2 Process-3 Pipe #1 Pipe #2 Kernel Kernel-Obj Kernel-Obj Terminal Output • Responsibility of a Pipe – Provide Buffer to store data from stdout of Producer and release it to stdin of Consumer – Block Producer when the buffer is full (because consumer has not consumed data) – Block Consumer if no data in buffer when the consumer wants to read(stdin) – Unblock Producer when buffer space becomes free – Unblock Consumer when buffer data becomes available Pipes • Pipes not just for stdin and stdout • Pipes can be created by applications • All kind of usages for pipes. • PipeBuffer typically has 16 write slots • 4KB guaranteed to be atomic • ➔ 64KB Not sure why the book put a while loop here, we just assume that these function produce/consume one elem Pre-emption!!! Count == 0 Called after pre-emption but before sleep in the consumer Fatal Race Condition 1. Let count = 1 2. Consumer begins loop, decrements count == 0 3. Consumer returns to loop beginning and executes: if(count == 0), then pre-emption happens. 4. Producer gets to run, executes count = count + 1; if(count == 1) and calls wakeup(consumer) 5.Pre-emption Consumer calls sleep(consumer) Requirements • Need a mechanism that allows synchronization between processes/threads on the base of shared resource. • Synchronization implies interaction with scheduling sub-system. • Led to the innovation of semaphores. Semaphore Data Structure class Semaphore { int value; // counter Queue waiting; // queue of threads

// waiting on this sema

void Init(int v); // initialization

void P(); // down(), wait()

void V(); // up(), signal ()

}

Initially developed as a “construct” to ease programming.

Dijkstra 1965

This version is based on multi-threaded capable OS or thread library
Older version used Process as the object , same principle

Semaphore implementations

void Semaphore::Init(int v)

{

value = v;

Queue.init(); // empty queue

}

Semaphore implementations

void Semaphore::P() // or wait() or down()

{

value = value – 1;
if (value < 0) { waiting.add(current_thread); current_thread.status = blocked; schedule(); // forces wait, thread blocked } } AKA “acquiring or grabbing the semaphore” think of it as obtaining access rights Semaphore implementations void Semaphore::V() // or signal() or up() { value = value + 1; if (value <= 0) { Thread *thd = waiting.getNextThread(); scheduler->add(thd); // make it scheduable

}

AKA “releasing the semaphore”

Semaphore Solution

How do P() and V() avoid the race condition?

P() and V() must be atomic.

e.g.

• First line of P() & V() can disable interrupts.

• Last line of P() & V() re-enables interrupts.

However: disabling interrupts only works on single CPU systems
Atomic lock variable an option on entry and exit,

but must release the lock as part of call to schedule() in P()
or must reacquire the lock as part of call to schedule() in V()

Semaphore Data Structure
(with atomicity)

class Semaphore

{

int lockvar; // to guarantee atomicity

int value; // counter

Queue waiting; // queue of threads

// waiting on this sema

void Init(int v); // initialization

void P(); // down(), wait()

void V(); // up(), signal ()

}

Semaphore implementations

void Semaphore::P() // or wait() or down()

{

lock(&lockvar);
value = value – 1;
if (value < 0) { waiting.add(current_thread); current_thread.status = blocked; unlock(&lockvar); schedule(); // forces wait, thread blocked lock(&lockvar); } } Semaphore implementations void Semaphore::V() // or signal() or up() { lock(&lockvar); value = value + 1; if (value <= 0) { Thread *thd = waiting.getNextThread(); scheduler->add(thd); // make it scheduable

}

unlock(&lockvar);

}

Two kinds of semaphores

• Mutex semaphores
(or binary semaphores or LOCK):
for mutual exclusion problems:
value initialized to 1

• Counting semaphores:
for synchronization problems.
Value initialized to any value 0..N
Value shows available tokens to enter
or number or processes waiting when
negative.

Semaphore Solution To the Producer

Consumer Problem

Producer(T item)

{

P(&empty);

P(&mutex); // Lock

buffer[widx] = item;

widx = (widx + 1 ) % N;

V(&mutex); // Unlock

V(&full);

}

Consumer(T &item)

{

P(&full);

P(&mutex); // Lock

item = buffer[ridx];

ridx = (ridx + 1 ) % N;

V(&mutex); // Unlock

V(&empty);

}

#define N

Semaphore empty = N;

Semaphore full = 0; 3 Semaphores required

Semaphore mutex = 1;

T buffer[N];

int widx = 0, ridx = 0;

LOCK

signaling

Semaphore Solution To the Producer

Consumer Problem

Producer(T item)

{

P(&empty);

P(&mutex_w); // Lock

buffer[widx] = item;

widx = (widx + 1 ) % N;

V(&mutex_w); // Unlock

V(&full);

}

Consumer(T &item)

{

P(&full);

P(&mutex_r); // Lock

item = buffer[ridx];

ridx = (ridx + 1 ) % N;

V(&mutex_r); // Unlock

V(&empty);

}

#define N

Semaphore empty = N;

Semaphore full = 0; 4 Semaphores for lower

Semaphore mutex_w = 1; lock contention

Semaphore mutex_r = 1;

T buffer[N];

int widx = 0, ridx = 0;

LOCK

signaling

Semaphore Solution To the Producer

Consumer Problem

Producer(T item)

{

P(&empty);

P(&mutex_w); // Lock

int wi = wdix;

widx = (widx + 1 ) % N;

V(&mutex_w); // Unlock

buffer[wi] = item;

V(&full);

}

Consumer(T &item)

{

P(&full);

P(&mutex_r); // Lock

int ri = rdix;

ridx = (ridx + 1 ) % N;

V(&mutex_r); // Unlock

item = buffer[ri];

V(&empty);

}

#define N

Semaphore empty = N;

Semaphore full = 0; 4 Semaphores for lower

Semaphore mutex_w = 1; lock contention

Semaphore mutex_r = 1;

T buffer[N];

int widx = 0, ridx = 0;

LOCK

signaling

Shorter lock hold times by only protecting the indices not the buffer themselves

Nasty condition on wi

This example doesn’t work; too aggressive !!!!!

Figure 2-29. Implementation of mutex lock and mutex unlock.

Mutex Implementation

Figure 2-30. Some of the Pthreads calls relating to mutexes.

Mutexes in Pthreads

Figure 2-31. Some of the Pthreads calls relating
to condition variables.

Mutexes in Pthreads

Using threads to solve
the producer-consumer

problem.

Mutexes in Pthreads

Problems with semaphore?

– It can be difficult to write semaphore code
(arguably)

– One has to be careful in code construction

– If a thread dies and it holds a semaphore,
the implicit token is lost

Deadlock-free code Code with potential deadlock

Some Advice for locks

• Always acquire multiple locks in the same
order

• Preferably release in reverse order as
acquired: not required but good hygiene

lots of discussion on this, there are scenarios where
that is not desired but highly optimized
implementation

• Example: SMP CPU scheduler
where load balancing is required.

Example

doit()

{

s1.P();

{

s2.P();

“do something awesome”

s2.V();

}

s1.V();

}

doit()

{

s1.P();

s2.P();

“do something awesome”

s1.V();

s2.V();

}

doit()

{

s1.P();

s2.P();

“do something awesome”

s2.V();

s1.V();

}

OK Better Making nesting clearer

Example

doit()

{

s1.P();

do {

s2.P();

do {

“do something awesome”

break;

“do something else awesome”

} while(0);

s2.V();

} while (0);

s1.V();

} Making nesting clearer

Example: SMP Scheduler

• Each cpu_i has ( runQ[i] and rqLock[i] )

• cpu0 ( i=0, j=1 ) ; cpu1 ( i=1, j=0 )

• Can lead to deadlocks → must use same order

CPU-0

runQ[0]

rqLock[0]

CPU-1

runQ[1]

rqLock[1]

Schedule(int i)

{

lock(rqlock[i]);

{ // load balance with j

lock(rqlock[j]);

; // pull some threads

unlock(rqlock[j]);

}

unlock(rqlock[i]);

}

Example SMP Scheduler

• So how do we get correct order?

• We force it !!
if necessary release owned lock first and reaquire.

• We just need some order ( could be “<“ or based on addresses, it doesn’t matter ) Schedule(int i) { lock(rqlock[i]); : { // load balance with j add_lock(i,j); ; // pull some threads unlock(rqlock[j]); } : unlock(rqlock[i]); } add_lock(int hlv, int alv) { if ( hlv > alv ) {

// first unlock hlv

unlock(rqlock[hlv]);

lock(rqlock[alv];

lock(rqlock[hlv];

} else {

lock(rqlock[alv]);

}

So which lock should I use

• Busy Lock vs. Semaphores ?

• If lock hold time is short and/or code is
uninterruptible, lock variables and busy
waiting are OK ( linux kernel uses it all
the time )

• Else use semaphores

Other Unix/Linux Mechanisms

• File based:

flock()

• System V semaphores
heavy weight as each call is a system call going into the kernel

semget(), semop() [ P and V ]

• Futexes

Lighter weight as uncontested cases are resolved done using cmpxchg
in userspace and if race condition is recognized it goes to kernel

futex()

• Message queues

mq_open(),mq_close(),mq_send(), mq_receive()

Monitors

⚫ Hoare and Brinch Hansen proposed a higher-level synchronization
primitive: monitor

⚫ Only ONE thread allowed inside the Monitor

⚫ Compiler achieves Mutual Exclusion

⚫ Monitor is a programming language construct like a
class or a for-loop.

⚫ We still need a way to synchronize on events!

⚫ Condition variables: wait & signal

⚫ Not counters: signaling with no one waiting → event lost

⚫ Waiting on signal releases the monitor and wakeup reacquires it.

Figure 2-33. A monitor.

Monitor Example

Monitors

• What to do when a thread A in a
monitor signals a thread B sleeping on a
condition variable?

– Ensure that signal is the last command in the
monitor?

– Block A while B runs?

– Let A finish then run B?

Figure 2-34. An outline of the producer-consumer problem with
monitors.

Monitors

Figure 2-36. The producer-consumer problem with N messages.

Producer-Consumer Problem
with Message Passing

. . .

Figure 2-36. The producer-consumer problem with N messages.

Producer-Consumer Problem
with Message Passing

Message Passing

▪ No shared memory on distributed

systems…

▪Instead we can send LAN messages.

send(destination, &message);

receive(destination, &message);

Message Passing Issues

• Guard against lost messages (acknowledgement)

• Authentication (guard against imposters)

• Addressing :

• To processes

• Via Mailbox (place to buffer messages)

• Send to a full mailbox means block

• Receive from an empty mailbox means block

Message Passing Issues

• What about buffer-less messages?

• Send and Receive wait (block) for each

Related Posts