Mutual Exclusion and Quorums
CS 171
MUTUAL EXCLUSION & QUORUMS
1
CS 171
Distributed Mutual Exclusion
• Given a set of processes and a single
resource, develop a protocol to ensure
exclusive access to the resource by a single
process at a time.
• This is a fundamental operation in operating
systems, and is generalized to locking in
databases.
2
CS 171
Centralized algorithm
• One process elected as coordinator
P
Crequest(R)
grant(R)
1. Request resource
2. Wait for response
3. Receive grant
4. access resource
5. Release resource
release(R)
3
CS 171
Centralized Solution
• But we can have multiple concurrent requests!
• Coordinator maintains queue of pending requests.
• Protocol:
– Process send request to coordinator.
– If no other request, coordinator sends back reply.
• Otherwise, put request in queue
– On receipt of reply, process accesses resource.
– Once done, process sends release to coordinator.
– On receipt of release, coordinator checks queue for any
pending requests.
4
CS 171
Centralized Solution
P0
C
request(R)
reply(R)
release(R) P1
P2
request(R)
Queue
P1
request(R)
P2
reply(R)
5
thanks paul krzyzanowski rutgers
CS 171
Centralized Approach
• Does it satisfy requests in order the requests
are made?
6
CS 171
Lamport’s Distributed Solution
• Instead of a central coordinator, all processes
collectively
• Use similar approach, but engage ALL
processes.
• Each request is time stamped with:
– Lamport Logical Time + Identifier (process ID)
– Ie, Timestamps are totally-ordered Lamport pairs.
7
CS 171
Lamport’s Distributed Solution
– Process sends time stamped request to all processes
and puts request in local queue.
– On receipt of request
– process puts request in queue and sends back reply.
– Process accesses resource
• On receipt of all replies
• Own request at head of queue
– Once done, process sends release to all processes.
– On receipt of release, process removes request
8
CS 171
Distributed Solution
• Does this work (Lamport original solution)?
• Need to order queues so they are identical:
– Use logical Lamport time + proc id to break ties.
– FIFO channels
• Requests are executed in causal order.
– Why?
9
CS 171
Example
10
P0 P1
P2 P3
Assume that process P0 and process P2 need to access the
shared resource at the same time.
Queue
Queue
request P0
request
P2
request
P0
request
P2
Queue
request
P0
Queue
request
P2
P2
P0
CS 171
Example
11
P0 P1
P2 P3
Using process id to break ties, process P0’s request will
end up on top of the queue, over process P2’s.
reply
reply
reply
reply
reply
Queue
P0 P2
Queue
P0 P2
Queue
P0 P2
Queue
P0 P2
reply
P0
P2
CS 171
Example
12
P0 P1
P2 P3
Process P0 is done and broadcasts a release message.
release
release
release
Queue
P0 P2
Queue
P0 P2
Queue
P0 P2
Queue
P2
P2
CS 171
Example
13
P0 P1
P2 P3
When process P2 is done with the shared resource, it
also broadcasts the release message.
release
release
release
Queue
P2
Queue
P2
Queue
Queue
P2
CS 171
Complexity of Lamport’s Solution
• This algorithm creates 3(N − 1) messages per
request:
• (N − 1) total number of requests
• (N − 1) total number of replies
• (N − 1) total number of releases
• Can we reduce the number of messages while
maintaining the same guarantees?
14
CS 171
Quorums
• What if there are failures?
• Mostly network failures
• No failure of the resource holder
• Do we need to communicate with ALL
processes?
• A quorum is the minimum number of votes that
a process has to obtain in order to be allowed
access to the shared resource.
15
CS 171
Quorums
• Any two requests should have a common
process to act as an arbitrator.
• Vi is called a quorum.
• Let process pi (pj) request permission from Vi (Vj), then
Vi ⋂ Vj ≠ Ø
16
CS 171
Quorums
• Given N processes: |Vi| > N/2, ie,
• In general, majority, ie ⌈(N/2)⌉.
[Gifford 79]
17
Basic MUTEX Protocol
• Requesting CS
– process requests CS by sending request message to processes in
its quorum
– If a process receives a request it sends back reply unless it
granted permission to other process; in which case the request
is queued—similar to granting a lock
• Entering CS
– process may enter CS when it receives replys from all processes
in its quorum
• Releasing CS
– after exiting CS process sends release to every process in its
quorum—release lock
– when a process gets release it sends reply to another request in
its queue
CS 171
Example
19
P0 P1
P2 P3
Assume that process P0 and process P2 need to
access the shared resource at the same time.
P4
Quorums:
V0: {P0, P1, P4}
V2: {P2, P3, P4}
Queue
request
P0
Queue
request
P2
Queue
P2
Queue
P0
Queue
request
P0
request
P2
CS 171
Example
20
P0 P1
P2 P3
Process P4 is the arbitrator since it’s common in both
V0 and V2 quorums.
reply
P4
reply
replyQuorums:
V0: {P0, P1, P4}
V2: {P2, P3, P4}
Queue
P0
Queue
P2
Queue
P2
Queue
P0
Queue
P0 P2
CS 171
Example
21
P0 P1
P2 P3
Process P0 sends the release message to its quorum.
release
release
P4
Quorums:
V0: {P0, P1, P4}
V2: {P2, P3, P4}
Queue
P0
Queue
P2
Queue
P2
Queue
Queue
P0 P2
reply
CS 171
Example
22
P0 P1
P2 P3
Process P2 finishes and broadcasts to its quorum
the release message.
release
release
P4
Quorums:
V0: {P0, P1, P4}
V2: {P2, P3, P4}
Queue
Queue
P2
Queue
Queue
Queue
P2
CS 171
Deadlock
• Unfortunately, the quorums approach can
lead to Deadlocks.
• Deadlocks occur even with a single resource.
• Not the standard database or operating
system multi-object deadlock.
23
CS 171
Deadlock Example
24
P0 P1
P2 P3
Assume that process P0 and process P2 need to
access the shared resource at the same time.
Quorums:
V0: {P0, P1, P3}
V2: {P2, P1, P3}
Queue
request
P0
Queue
request
P2
Queue
P2
Queue
P0
request
P2
request
P0
CS 171
Deadlock Example
25
P0 P1
P2 P3
P1 and P3 don’t agree on which process should
get the resource.
reply
reply
Queue
P2
Queue
P0
Quorums:
V0: {P0, P1, P3}
V2: {P2, P1, P3}
Queue
P0
Queue
P2
P2
P0
CS 171
Site Failure Example
26
P0 P3
P1 P2
How many sites can we afford to go down?
P4
We can achieve quorum!
CS 171
Site Failure Example
27
P0 P3
P1 P2
How many sites can we afford to go down?
P4
We can not achieve quorum!
No majority possible.
CS 171
Network failure: Partitioning
28
P0 P3
P1 P2
If sites are unreachable, quorum might still be
possible.
P4
Here we can achieve quorum!
This partition won’t be able to get the resource.
CS 171
Network failure: Partitioning
29
P0 P3
P1 P2
If sites are unreachable, quorum might still be
possible.
P4
No partition can now get the resource.
CS 171
Studying the Sizes of Quorums
• Can we have quorums of sizes LESS than
majority?
• Yes:
– Using logical structures
30
CS 171
Quorum Sizes on Logical Grid Structure
31Quorum size is O(√N)