Summary of Lecture 5
• Three ways to get from user code to kernel code:
– System calls, interrupts, exceptions
– All mediated through the Interrupt Descriptor Table
• Interrupt handler and other code shares various stuff:
– Register, EFLAGS (push on stack)
– Memory (use a separate stack)
– Shared state:
• Data structures (linked list example)
• Device state (VGA example)
• Memory (is_device_locked)
© 2021 Yih-Chun Hu ECE391
Summary of Lecture 5
• Solving these issues:
– CLI / STI – disable interrupts
– Define shared variables as volatile
– Prevent compiler and ISA reorderings
• Google: “memory barrier” and “serializing instruction”
© 2021 Yih-Chun Hu ECE391
ECE391
• Multiprocessors and locks
• Spin locks
Lecture Topics
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• We solved the critical section problem for uniprocessors
• What about multiprocessors?
– CLI … critical section …. STI
• What is a multiprocessor?
– usually a symmetric multiprocessor (SMP)
– symmetric aspect: all processors have equal latency to all
memory banks
– multicore processors are similar from our perspective
Multiprocessors and Locks
P P P P
bus
I/O
I/O bus
M M
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• Some non-uniform memory architecture (NUMA)
machines were built
Multiprocessors and Locks (cont.)
P P P P
bus
I/O
I/O bus
M M
com PPPP
bus
I/O
I/O bus
MM
com
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• Multithreaded code is not protected by IF
• Why haven’t we solved the atomicity problem on
multiprocessors?
– interrupts are masked if IF is cleared!
– answer: IF is not cleared on other processors!
– just tell other processors to clear IF, too?
• too slow
• requires an interrupt!
• We need to use shared memory to synchronize…
Multiprocessors and Locks (cont.)
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• Logically, we use a lock
– when we want to access a piece of shared data
we first look at the lock
– if it’s locked, we wait until it’s unlocked
– once it’s unlocked
• we lock it
• access the data
• then unlock it
Multiprocessors and Locks (cont.)
shared
data
(locked state is
often represented
as 1)
shared
data
(unlocked state is
often represented
as 0)
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• Locking must be atomic with respect to other
processors!
INCL (%EAX)
read EAX
add 1
put it back
Multiprocessors and Locks (cont.)
PP
M read
write
It’s MY BUS!
LOCK: Prefix – execute
following instruction with bus
locked
LOCK INCL (%EAX)
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• The simplest lock
– spin lock
– lock op
do {
try to change lock variable from 0 to 1
} while (attempt failed);
– other work to do? ignore it!
– spin in a tight loop on the lock (hence the name)
• Once successful, program/interrupt handler owns the lock
Spin Locks
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• Only the owner can unlock
– How?
• (change lock variable to 0)
– Need to be atomic?
• (no, only owner can change when locked)
– What about memory op reordering?
• (must be avoided!)
• Why keep asking the last question?
– reordering leads to subtle bugs
– unlikely to happen often
– unlikely to be repeatable
– very hard to find!
Spin Locks (cont.)
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• Static initialization
static spinlock_t a_lock = SPIN_LOCK_UNLOCKED;
• Dynamic initialization
spin_lock_init (&a_lock);
• When is dynamic initialization safe?
– lock must not be in use (race condition!)
– other synchronization method must prevent use
Linux Spin Lock API
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
void spin_lock (spinlock_t* lock);
void spin_unlock (spinlock_t* lock);
Linux Spin Lock API – Basic Functions
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
int spin_is_locked (spinlock_t* lock);
returns 1 if held, 0 if not, but beware of races!
int spin_trylock (spinlock_t* lock);
make one attempt; returns 1 on success, 0 on failure
void spin_unlock_wait (spinlock_t* lock);
wait until available (race condition again!)
Linux Spin Lock API – Testing Functions
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• Is spinlock enough to protect a critical section?
Linux Spin Lock API (cont.)
P1 P2
try to get lock
(forced to wait)
interrupt (P2)
code
lock
gets lock
crit. sect.
unlock
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• What about ?
• Still need CLI/STI
• Which is first, CLI or lock?
– CLI first
– interrupt may occur between them, leading to scenario
above
Linux Spin Lock API (cont.)
P1
try to get lock
(waits forever!)
interrupt (P1)
code
lock
called a
deadlock
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
static spinlock_t the_lock = SPIN_LOCK_UNLOCKED;
unsigned long flags;
spin_lock_irqsave (&the_lock, flags);
/* the critical section */
spin_unlock_irqrestore (&the_lock, flags);
asm volatile (”
PUSHFL
POPL %0
CLI
” : “=g” (flags) /* outputs */
: /* inputs */
: “memory” /* clobbers */
);
spin_lock (&the_lock);
Linux’ Lock/CLI Combo
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
spin_unlock (&the_lock);
asm volatile (”
PUSHL %0
POPFL
” : /* outputs */
: “g” (flags) /* inputs */
: “memory”, “cc” /* clobbers */
);
Linux’ Lock/CLI Combo (cont)
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• Notice that spin_lock_irqsave changes the flags argument (it’s a macro)
• The “memory” argument
– tells compiler that all memory is written by assembly block
– prevents compiler from moving memory ops across assembly
• The “cc” argument
– condition codes change
– can lead to subtle bugs if left out!
• spin_lock and spin_unlock calls
– become NOPs on uniprocessors
– in that case, calls just change IF
• Restore rationale: may have had IF=0 on entry; if so, STI is unsafe
Comments on code
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• Synch issues
• Conservative synchronization design
• Semaphores
• Reader/writer synchronization
• Selecting a synchronization mechanism
Lecture Topics
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• Synchronization issues
– five hungry philosophers
– five chopsticks
• Protocol
– take left chopstick (or wait)
– take right chopstick (or wait)
– eat
– release right chopstick
– release left chopstick
– digest
– repeat
problems? deadlock!
Another Philosophy Lesson
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• How about the following protocol?
– take left chopstick (or wait)
– if right chopstick is free, take it
– else release left chopstick and start over
– eat
– release right
– release left
– digest
– repeat
• Does this work?
Another Philosophy Lesson (cont.)
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• What if all philosophers act in lock-step (same speed)?
left left left left left
release release release release release
left left left left left
release release release release release
left left left left left
(ad infinitum)
• Called a livelock
Another Philosophy Lesson (cont.)
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• To solve the problem, need (partial) lock ordering
– e.g., call chopsticks #1 through #5
– protocol: take lower-numbered, then take higher-numbered
– two philosophers try to get #1 first
– can’t form a loop of waiting philosophers
– thus someone will be able to eat
Another Philosophy Lesson (cont.)
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• Getting synchronization correct can be hard
– it’s the focus of several research communities
• On uniprocessor
– mentally insert handler code
between every pair of adjacent
instructions
– ask whether anything bad can
happen
– if so, prevent with CLI/STI
Conservative Synchronization Design
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• On a multiprocessor
– consider all possible interleavings
of instructions
– amongst all pieces of (possibly
concurrent) code
– ask whether anything bad can
happen
– if so, use a lock to force
serialization
– good luck!
Conservative Synchronization Design (cont)
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• What does “bad” mean, anyway?
• A conservative but systematic definition
if any data written by one piece of code
are also read or written by another piece of code
these two pieces must be atomic with respect to each other
Conservative Synchronization Design (cont)
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• What variables are shared?
• step 0: ignore the parts that don’t touch shared data
• step 1: calculate read & write sets
• step 2: check for R/W, W/W relationships
– must be atomic!
• step 3: add lock(s) to guarantee atomic execution (pick order if > 1 locks)
• step 4: optimize if desired
Conservative Synchronization Design (cont)
not shared
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• Read/write sets
– move reads position, velocity
writes position
– dist reads position
– speed reads velocity
– momentum reads mass, velocity
– stop writes velocity
– change_mass writes mass
• Edges in graph imply need for atomicity
Conservative Synchronization Design –
Example Code Analysis
momentum
dist
speed
move
stop change_mass
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• Each lock = circle in graph
• All edges must be contained in some circle
• One lock suffices, but prevents parallelism (performance)
• Could use three (as shown above);
then MUST pick a lock order!
Conservative Synchronization Design –
Example Code Analysis (cont)
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• Recall our philosophical friends
• Five philosophers
• Three pairs of chopsticks
(one “lock” per pair)
• Problem: how do you get a pair?
• Option 1: walk around the table until you find a pair free
– lots of walking
– other people may cut in front of you
Role of Semaphores
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• Option 2: pick a target pair and wait for it to be free
– other pairs may be on the table
– but you’re still waiting hungrily for your chosen pair
• Instead, use a semaphore!
– an atomic counter
– proberen (P for short, meaning test/decrement)
– verhogen (V for short, meaning increment)
– Dutch courtesy of E. Dijkstra
Role of Semaphores
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• When are semaphores useful?
• Fixed number of resources to be allocated dynamically
– physical devices
– virtual devices
– entries in a static array
– logical channels in a network
• Linux semaphores have a critical difference from
Linux spin locks
– can block (sleep) and let other programs run
(spin locks do not block)
– thus must not be used by interrupt handlers
– used for system calls, particularly with long critical sections
Semaphores
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
void sema_init (struct semaphore* sem, int val);
Initialize dynamically to a specified value
void init_MUTEX (struct semaphore* sem);
Initialize dynamically to value one (mutual exclusion)
void down (struct semaphore* sem);
Wait on the semaphore (P)
void up (struct semaphore* sem);
Signal the semaphore (V)
Linux Semaphore API
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• If critical section needs both semaphores and spin locks
– Get all semaphores first!
– Linux expects not to be preempted while holding spin locks
– Semaphore code voluntarily relinquishes processor
Linux Semaphore API (cont.)
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• Philosophers like to read newspaper
– each philosopher reads frequently, taking short breaks between
– multiple philosophers may read at same time
• different sections
• or just over another’s shoulder
• Paper carrier delivers new paper
– once per day (infrequently)
– must change all sections at once
• Reader/writer synchronization supports this style
– allows many (in theory infinite) readers simultaneously
– at most one writer (and never at same time as readers)
Reader/Writer Problem:
The Philosophers and Their Newspaper
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• What if newspaper is always being read? starvation!
• Linux provides two types of reader/writer synchronization
– reader/writer spin locks (rwlock_t)
• extension of spin locks
• use for short critical sections
• ok to use in interrupt handlers
• admit writer starvation (you must ensure that this not happen)
– reader/writer semaphores (struct rw_semaphore)
• extension of semaphores
• use only in system calls
• do not admit writer starvation
(new readers wait if writer is waiting)
Reader/Writer Problem:
The Philosophers and Their Newspaper (cont)
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• General questions for synchronization of shared resources
– What are the operations?
– What are their frequencies?
– example:
• walk list and find min. age?
• update min. age every time list changes?
– What are the shared data?
• Goals
– short critical sections
– low contention for data
Selecting a Synchronization Mechanism
© Steven Lumetta, Zbigniew Kalbarczyk ECE391
• Safe selection (when in doubt…)
Selecting a Synchronization Mechanism (cont)
mutual exclusion reader/writer
data shared by
interrupt handlers
spin_lock_irqsave
spin_unlock_irqrestore
read/write__lock_irqsave
read/write_unlock_irqrestore
data shared only
by system calls
down
up
down_read/write
up_read/write
© Steven Lumetta, Zbigniew Kalbarczyk ECE391