Week 8 – Coordination and Agreement
COORDINATION AND
AGREEMENT
Dr Bailin Deng
Motivation
• Processes within a distributed system need to coordinate
actions or agree on values
Motivation
• Processes within a distributed system need to coordinate
actions or agree on values
• Example: different computers controlling a spaceship
• Agreement on current mission status: has it been aborted?
• Coordinate actions regarding shared resources (sensors etc.)
Motivation
• Processes within a distributed system need to coordinate
actions or agree on values
• Example: different computers controlling a spaceship
• Agreement on current mission status: has it been aborted?
• Coordinate actions regarding shared resources (sensors etc.)
• Avoid fixed master-slave relationship: simple to handle, but with
single point of failure
Overview
• Distributed mutual exclusion
• Election
Overview
• Distributed mutual exclusion
• Election
Distributed Mutual Exclusion
• Prevent concurrent access to shared resource/data
Distributed Mutual Exclusion
• Requirements:
• ME1 (safety): at most one process may execute in the critical
section (CS) at a time
Distributed Mutual Exclusion
• Requirements:
• ME1 (safety): at most one process may execute in the critical
section (CS) at a time
• ME2 (liveness): requests to enter and exit the critical section
eventually succeed (no deadlock or starvation)
Distributed Mutual Exclusion
• Requirements:
• ME1 (safety): at most one process may execute in the critical
section (CS) at a time
• ME2 (liveness): requests to enter and exit the critical section
eventually succeed (no deadlock or starvation)
• ME3 (Happened-before ordering): if one request to enter the CS
happened-before another, then entry to the CS is granted in that
order.
Happened-before Ordering
• Events from a single process can be ordered using local
clock
Happened-before Ordering
• Events from a single process can be ordered using local
clock
• We cannot order an arbitrary pair of events in a distributed
system, because we cannot synchronize all local clocks
perfectly
• Only partial ordering of some events based on physical causality
Happened-before Ordering
• Physical causality
SECTION 14.4 LOGICAL TIME AND LOGICAL CLOCKS 607
14.4 Logical time and logical clocks
From the point of view of any single process, events are ordered uniquely by times
shown on the local clock. However, as Lamport [1978] pointed out, since we cannot
synchronize clocks perfectly across a distributed system, we cannot in general use
physical time to find out the order of any arbitrary pair of events occurring within it.
In general, we can use a scheme that is similar to physical causality but that applies
in distributed systems to order some of the events that occur at different processes. This
ordering is based on two simple and intuitively obvious points:
• If two events occurred at the same process pi i 1 2 N= , then they
occurred in the order in which pi observes them – this is the order i that we
defined above.
• Whenever a message is sent between processes, the event of sending the message
occurred before the event of receiving the message.
Lamport called the partial ordering obtained by generalizing these two relationships the
happened-before relation. It is also sometimes known as the relation of causal ordering
or potential causal ordering.
We can define the happened-before relation, denoted by , as follows:
HB1: If process pi : e i e’, then e e .
HB2: For any message m, send(m) receive(m)
– where send(m) is the event of sending the message, and receive(m)
is the event of receiving it.
HB3: If e, e and e are events such that e e and e e , then e e .
Thus, if e and e are events, and if e e , then we can find a series of events
e1 e2 en occurring at one or more processes such that e e1= and e en= , and
for i 1 2 N 1–= either HB1 or HB2 applies between ei and ei 1+
·
. That is, either
they occur in succession at the same process, or there is a message m such that ei =
send(m) and ei 1+ = receive(m). The sequence of events e1 e2 en need not be
unique.
The relation is illustrated for the case of three processes, p1 , p2 and p3 , in
Figure 14.5
Figure 14.5 Events occurring at three processes
p1
p2
p3
a b
c d
e f
m1
m2
Physical
time
. It can be seen that a b, since the events occur in this order at process p1
(a i b), and similarly c d. Furthermore, b c, since these events are the sending and
Happened-before Ordering
SECTION 14.4 LOGICAL TIME AND LOGICAL CLOCKS 607
14.4 Logical time and logical clocks
From the point of view of any single process, events are ordered uniquely by times
shown on the local clock. However, as Lamport [1978] pointed out, since we cannot
synchronize clocks perfectly across a distributed system, we cannot in general use
physical time to find out the order of any arbitrary pair of events occurring within it.
In general, we can use a scheme that is similar to physical causality but that applies
in distributed systems to order some of the events that occur at different processes. This
ordering is based on two simple and intuitively obvious points:
• If two events occurred at the same process pi i 1 2 N= , then they
occurred in the order in which pi observes them – this is the order i that we
defined above.
• Whenever a message is sent between processes, the event of sending the message
occurred before the event of receiving the message.
Lamport called the partial ordering obtained by generalizing these two relationships the
happened-before relation. It is also sometimes known as the relation of causal ordering
or potential causal ordering.
We can define the happened-before relation, denoted by , as follows:
HB1: If process pi : e i e’, then e e .
HB2: For any message m, send(m) receive(m)
– where send(m) is the event of sending the message, and receive(m)
is the event of receiving it.
HB3: If e, e and e are events such that e e and e e , then e e .
Thus, if e and e are events, and if e e , then we can find a series of events
e1 e2 en occurring at one or more processes such that e e1= and e en= , and
for i 1 2 N 1–= either HB1 or HB2 applies between ei and ei 1+
·
. That is, either
they occur in succession at the same process, or there is a message m such that ei =
send(m) and ei 1+ = receive(m). The sequence of events e1 e2 en need not be
unique.
The relation is illustrated for the case of three processes, p1 , p2 and p3 , in
Figure 14.5
Figure 14.5 Events occurring at three processes
p1
p2
p3
a b
c d
e f
m1
m2
Physical
time
. It can be seen that a b, since the events occur in this order at process p1
(a i b), and similarly c d. Furthermore, b c, since these events are the sending and
Distributed Mutual Exclusion
• Central server algorithm
• Ring-based algorithm
• Multicast and logical clocks
Central Server Algorithm
• Processes send messages to server to request ”token”
• A process granted token can enter critical section, and releases token
after exiting critical section
SECTION 15.2 DISTRIBUTED MUTUAL EXCLUSION 635
ME3: ( ordering) If one request to enter the CS happened-before another,
then entry to the CS is granted in that order.
If a solution grants entry to the critical section in happened-before order, and if all
requests are related by happened-before, then it is not possible for a process to enter the
critical section more than once while another waits to enter. This ordering also allows
processes to coordinate their accesses to the critical section. A multi-threaded process
may continue with other processing while a thread waits to be granted entry to a critical
section. During this time, it might send a message to another process, which
consequently also tries to enter the critical section. ME3 specifies that the first process
be granted access before the second.
We evaluate the performance of algorithms for mutual exclusion according to the
following criteria:
• the bandwidth consumed, which is proportional to the number of messages sent in
each entry and exit operation;
• the client delay incurred by a process at each entry and exit operation;
• the algorithm’s effect upon the throughput of the system. This is the rate at which
the collection of processes as a whole can access the critical section, given that
some communication is necessary between successive processes. We measure the
effect using the synchronization delay between one process exiting the critical
section and the next process entering it; the throughput is greater when the
synchronization delay is shorter.
We do not take the implementation of resource accesses into account in our descriptions.
We do, however, assume that the client processes are well behaved and spend a finite
time accessing resources within their critical sections.
The central server algorithm •
Figure 15.2 Server managing a mutual exclusion token for a set of processes
Server
1. Request
token
Queue of
requests
2. Release
token
3. Grant
token
p 4
p
3p 2
p
1
2
4
The simplest way to achieve mutual exclusion is to
employ a server that grants permission to enter the critical section. Figure 15.2 shows
the use of this server. To enter a critical section, a process sends a request message to
Central Server Algorithm
• Server queues incoming request when token is held by another
process, and grants token in FIFO order when token is releasedSECTION 15.2 DISTRIBUTED MUTUAL EXCLUSION 635
ME3: ( ordering) If one request to enter the CS happened-before another,
then entry to the CS is granted in that order.
If a solution grants entry to the critical section in happened-before order, and if all
requests are related by happened-before, then it is not possible for a process to enter the
critical section more than once while another waits to enter. This ordering also allows
processes to coordinate their accesses to the critical section. A multi-threaded process
may continue with other processing while a thread waits to be granted entry to a critical
section. During this time, it might send a message to another process, which
consequently also tries to enter the critical section. ME3 specifies that the first process
be granted access before the second.
We evaluate the performance of algorithms for mutual exclusion according to the
following criteria:
• the bandwidth consumed, which is proportional to the number of messages sent in
each entry and exit operation;
• the client delay incurred by a process at each entry and exit operation;
• the algorithm’s effect upon the throughput of the system. This is the rate at which
the collection of processes as a whole can access the critical section, given that
some communication is necessary between successive processes. We measure the
effect using the synchronization delay between one process exiting the critical
section and the next process entering it; the throughput is greater when the
synchronization delay is shorter.
We do not take the implementation of resource accesses into account in our descriptions.
We do, however, assume that the client processes are well behaved and spend a finite
time accessing resources within their critical sections.
The central server algorithm •
Figure 15.2 Server managing a mutual exclusion token for a set of processes
Server
1. Request
token
Queue of
requests
2. Release
token
3. Grant
token
p 4
p
3p 2
p
1
2
4
The simplest way to achieve mutual exclusion is to
employ a server that grants permission to enter the critical section. Figure 15.2 shows
the use of this server. To enter a critical section, a process sends a request message to
Central Server Algorithm
• Assume there is no failure, does central server algorithm
meet the requirements?
• ME1 (safety)
• ME2 (liveness)
• ME3 (happened-before)
SECTION 15.2 DISTRIBUTED MUTUAL EXCLUSION 635
ME3: ( ordering) If one request to enter the CS happened-before another,
then entry to the CS is granted in that order.
If a solution grants entry to the critical section in happened-before order, and if all
requests are related by happened-before, then it is not possible for a process to enter the
critical section more than once while another waits to enter. This ordering also allows
processes to coordinate their accesses to the critical section. A multi-threaded process
may continue with other processing while a thread waits to be granted entry to a critical
section. During this time, it might send a message to another process, which
consequently also tries to enter the critical section. ME3 specifies that the first process
be granted access before the second.
We evaluate the performance of algorithms for mutual exclusion according to the
following criteria:
• the bandwidth consumed, which is proportional to the number of messages sent in
each entry and exit operation;
• the client delay incurred by a process at each entry and exit operation;
• the algorithm’s effect upon the throughput of the system. This is the rate at which
the collection of processes as a whole can access the critical section, given that
some communication is necessary between successive processes. We measure the
effect using the synchronization delay between one process exiting the critical
section and the next process entering it; the throughput is greater when the
synchronization delay is shorter.
We do not take the implementation of resource accesses into account in our descriptions.
We do, however, assume that the client processes are well behaved and spend a finite
time accessing resources within their critical sections.
The central server algorithm •
Figure 15.2 Server managing a mutual exclusion token for a set of processes
Server
1. Request
token
Queue of
requests
2. Release
token
3. Grant
token
p 4
p
3p 2
p
1
2
4
The simplest way to achieve mutual exclusion is to
employ a server that grants permission to enter the critical section. Figure 15.2 shows
the use of this server. To enter a critical section, a process sends a request message to
Central Server Algorithm
• Assume there is no failure, does central server algorithm
meet the requirements?
• ME1 (safety) ✓
• ME2 (liveness) ✓
• ME3 (happened-before) ✘
SECTION 15.2 DISTRIBUTED MUTUAL EXCLUSION 635
ME3: ( ordering) If one request to enter the CS happened-before another,
then entry to the CS is granted in that order.
If a solution grants entry to the critical section in happened-before order, and if all
requests are related by happened-before, then it is not possible for a process to enter the
critical section more than once while another waits to enter. This ordering also allows
processes to coordinate their accesses to the critical section. A multi-threaded process
may continue with other processing while a thread waits to be granted entry to a critical
section. During this time, it might send a message to another process, which
consequently also tries to enter the critical section. ME3 specifies that the first process
be granted access before the second.
We evaluate the performance of algorithms for mutual exclusion according to the
following criteria:
• the bandwidth consumed, which is proportional to the number of messages sent in
each entry and exit operation;
• the client delay incurred by a process at each entry and exit operation;
• the algorithm’s effect upon the throughput of the system. This is the rate at which
the collection of processes as a whole can access the critical section, given that
some communication is necessary between successive processes. We measure the
effect using the synchronization delay between one process exiting the critical
section and the next process entering it; the throughput is greater when the
synchronization delay is shorter.
We do not take the implementation of resource accesses into account in our descriptions.
We do, however, assume that the client processes are well behaved and spend a finite
time accessing resources within their critical sections.
The central server algorithm •
Figure 15.2 Server managing a mutual exclusion token for a set of processes
Server
1. Request
token
Queue of
requests
2. Release
token
3. Grant
token
p 4
p
3p 2
p
1
2
4
The simplest way to achieve mutual exclusion is to
employ a server that grants permission to enter the critical section. Figure 15.2 shows
the use of this server. To enter a critical section, a process sends a request message to
Ring-based Algorithm
• No central server
• Each process has a
communication channel
to the next one in a ring
• A token is passed along
the ring
636 CHAPTER 15 COORDINATION AND AGREEMENT
the server and awaits a reply from it. Conceptually, the reply constitutes a token
signifying permission to enter the critical section. If no other process has the token at the
time of the request, then the server replies immediately, granting the token. If the token
is currently held by another process, then the server does not reply, but queues the
request. When a process exits the critical section, it sends a message to the server, giving
it back the token.
If the queue of waiting processes is not empty, then the server chooses the oldest
entry in the queue, removes it and replies to the corresponding process. The chosen
process then holds the token. In the figure, we show a situation in which p2 ’s request
has been appended to the queue, which already contained p4 ’s request. p3 exits the
critical section, and the server removes p4 ’s entry and grants permission to enter to p4
by replying to it. Process p1 does not currently require entry to the critical section.
Given our assumption that no failures occur, it is easy to see that the safety and
liveness conditions are met by this algorithm. The reader should verify, however, that
the algorithm does not satisfy property ME3.
We now evaluate the performance of this algorithm. Entering the critical section
– even when no process currently occupies it – takes two messages (a request followed
by a grant) and delays the requesting process by the time required for this round-trip.
Exiting the critical section takes one release message. Assuming asynchronous message
passing, this does not delay the exiting process.
The server may become a performance bottleneck for the system as a whole. The
synchronization delay is the time taken for a round-trip: a release message to the server,
followed by a grant message to the next process to enter the critical section.
A ring-based algorithm •
p
n
p
2
p
3
p
4
Token
Figure 15.3 A ring of processes transferring a mutual exclusion token
p
1
One of the simplest ways to arrange mutual exclusion between
the N processes without requiring an additional process is to arrange them in a logical
ring. This requires only that each process pi has a communication channel to the next
process in the ring, p i 1+ mod N . The idea is that exclusion is conferred by obtaining a
token in the form of a message passed from process to process in a single direction –
Ring-based Algorithm
• When a process receives the token
• If it needs to enter CS: keeps the token,
enters CS, pass the token
• Otherwise: pass the token
636 CHAPTER 15 COORDINATION AND AGREEMENT
the server and awaits a reply from it. Conceptually, the reply constitutes a token
signifying permission to enter the critical section. If no other process has the token at the
time of the request, then the server replies immediately, granting the token. If the token
is currently held by another process, then the server does not reply, but queues the
request. When a process exits the critical section, it sends a message to the server, giving
it back the token.
If the queue of waiting processes is not empty, then the server chooses the oldest
entry in the queue, removes it and replies to the corresponding process. The chosen
process then holds the token. In the figure, we show a situation in which p2 ’s request
has been appended to the queue, which already contained p4 ’s request. p3 exits the
critical section, and the server removes p4 ’s entry and grants permission to enter to p4
by replying to it. Process p1 does not currently require entry to the critical section.
Given our assumption that no failures occur, it is easy to see that the safety and
liveness conditions are met by this algorithm. The reader should verify, however, that
the algorithm does not satisfy property ME3.
We now evaluate the performance of this algorithm. Entering the critical section
– even when no process currently occupies it – takes two messages (a request followed
by a grant) and delays the requesting process by the time required for this round-trip.
Exiting the critical section takes one release message. Assuming asynchronous message
passing, this does not delay the exiting process.
The server may become a performance bottleneck for the system as a whole. The
synchronization delay is the time taken for a round-trip: a release message to the server,
followed by a grant message to the next process to enter the critical section.
A ring-based algorithm •
p
n
p
2
p
3
p
4
Token
Figure 15.3 A ring of processes transferring a mutual exclusion token
p
1
One of the simplest ways to arrange mutual exclusion between
the N processes without requiring an additional process is to arrange them in a logical
ring. This requires only that each process pi has a communication channel to the next
process in the ring, p i 1+ mod N . The idea is that exclusion is conferred by obtaining a
token in the form of a message passed from process to process in a single direction –
Ring-based Algorithm
• Does it meet the requirements?
• ME1 (safety)
• ME2 (liveness)
• ME3 (happened-before)
636 CHAPTER 15 COORDINATION AND AGREEMENT
the server and awaits a reply from it. Conceptually, the reply constitutes a token
signifying permission to enter the critical section. If no other process has the token at the
time of the request, then the server replies immediately, granting the token. If the token
is currently held by another process, then the server does not reply, but queues the
request. When a process exits the critical section, it sends a message to the server, giving
it back the token.
If the queue of waiting processes is not empty, then the server chooses the oldest
entry in the queue, removes it and replies to the corresponding process. The chosen
process then holds the token. In the figure, we show a situation in which p2 ’s request
has been appended to the queue, which already contained p4 ’s request. p3 exits the
critical section, and the server removes p4 ’s entry and grants permission to enter to p4
by replying to it. Process p1 does not currently require entry to the critical section.
Given our assumption that no failures occur, it is easy to see that the safety and
liveness conditions are met by this algorithm. The reader should verify, however, that
the algorithm does not satisfy property ME3.
We now evaluate the performance of this algorithm. Entering the critical section
– even when no process currently occupies it – takes two messages (a request followed
by a grant) and delays the requesting process by the time required for this round-trip.
Exiting the critical section takes one release message. Assuming asynchronous message
passing, this does not delay the exiting process.
The server may become a performance bottleneck for the system as a whole. The
synchronization delay is the time taken for a round-trip: a release message to the server,
followed by a grant message to the next process to enter the critical section.
A ring-based algorithm •
p
n
p
2
p
3
p
4
Token
Figure 15.3 A ring of processes transferring a mutual exclusion token
p
1
One of the simplest ways to arrange mutual exclusion between
the N processes without requiring an additional process is to arrange them in a logical
ring. This requires only that each process pi has a communication channel to the next
process in the ring, p i 1+ mod N . The idea is that exclusion is conferred by obtaining a
token in the form of a message passed from process to process in a single direction –
Ring-based Algorithm
• Does it meet the requirements?
• ME1 (safety) ✓
• ME2 (liveness) ✓
• ME3 (happened-before) ✘
636 CHAPTER 15 COORDINATION AND AGREEMENT
the server and awaits a reply from it. Conceptually, the reply constitutes a token
signifying permission to enter the critical section. If no other process has the token at the
time of the request, then the server replies immediately, granting the token. If the token
is currently held by another process, then the server does not reply, but queues the
request. When a process exits the critical section, it sends a message to the server, giving
it back the token.
If the queue of waiting processes is not empty, then the server chooses the oldest
entry in the queue, removes it and replies to the corresponding process. The chosen
process then holds the token. In the figure, we show a situation in which p2 ’s request
has been appended to the queue, which already contained p4 ’s request. p3 exits the
critical section, and the server removes p4 ’s entry and grants permission to enter to p4
by replying to it. Process p1 does not currently require entry to the critical section.
Given our assumption that no failures occur, it is easy to see that the safety and
liveness conditions are met by this algorithm. The reader should verify, however, that
the algorithm does not satisfy property ME3.
We now evaluate the performance of this algorithm. Entering the critical section
– even when no process currently occupies it – takes two messages (a request followed
by a grant) and delays the requesting process by the time required for this round-trip.
Exiting the critical section takes one release message. Assuming asynchronous message
passing, this does not delay the exiting process.
The server may become a performance bottleneck for the system as a whole. The
synchronization delay is the time taken for a round-trip: a release message to the server,
followed by a grant message to the next process to enter the critical section.
A ring-based algorithm •
p
n
p
2
p
3
p
4
Token
Figure 15.3 A ring of processes transferring a mutual exclusion token
p
1
One of the simplest ways to arrange mutual exclusion between
the N processes without requiring an additional process is to arrange them in a logical
ring. This requires only that each process pi has a communication channel to the next
process in the ring, p i 1+ mod N . The idea is that exclusion is conferred by obtaining a
token in the form of a message passed from process to process in a single direction –
Multicast and Logical Clocks
• No central server
• N processes, with a communication channel between
each pair
• Each process has
• A unique identifier
• A logical clock producing timestamps in happened-before order
Multicast and Logical Clocks
• No central server
• N processes, with a communication channel between
each pair
• Each process has
• A unique identifier
• A logical clock producing timestamps in happened-before order
Logical Clocks
608 CHAPTER 14 TIME AND GLOBAL STATES
reception of message m1 , and similarly d f. Combining these relations, we may also
say that, for example, a f.
It can also be seen from Figure 14.5 that not all events are related by the relation
. For example, a e/ and e a/ , since they occur at different processes, and there is
no chain of messages intervening between them. We say that events such as a and e that
are not ordered by are concurrent and write this a e .
The relation captures a flow of data intervening between two events. Note,
however, that in principle data can flow in ways other than by message passing. For
example, if Smith enters a command to his process to send a message, then telephones
Jones, who commands her process to issue another message, the issuing of the first
message clearly happened-before that of the second. Unfortunately, since no network
messages were sent between the issuing processes, we cannot model this type of
relationship in our system.
Another point to note is that if the happened-before relation holds between two
events, then the first might or might not actually have caused the second. For example,
if a server receives a request message and subsequently sends a reply, then clearly the
reply transmission is caused by the request transmission. However, the relation
captures only potential causality, and two events can be related by even though there
is no real connection between them. A process might, for example, receive a message
and subsequently issue another message, but one that it issues every five minutes
anyway and that bears no specific relation to the first message. No actual causality has
been involved, but the relation would order these events.
Logical clocks • Lamport [1978] invented a simple mechanism by which the happened-
before ordering can be captured numerically, called a logical clock. A Lamport logical
clock is a monotonically increasing software counter, whose value need bear no
particular relationship to any physical clock. Each process pi keeps its own logical
clock, Li , which it uses to apply so-called Lamport timestamps to events. We denote the
timestamp of event e at pi by Li e , and by L e we denote the timestamp of event e
at whatever process it occurred at.
To capture the happened-before relation , processes update their logical clocks
and transmit the values of their logical clocks in messages as follows:
LC1: Li is incremented before each event is issued at process pi :
Li := Li 1.+
LC2: (a) When a process pi sends a message m, it piggybacks on m the value
t Li= .
(b) On receiving (m, t), a process pj computes Lj := max Lj t and then
applies LC1 before timestamping the event receive(m).
Although we increment clocks by 1, we could have chosen any positive value. It can
easily be shown, by induction on the length of any sequence of events relating two
events e and e , that e e L e L e .
Note that the converse is not true. If L e L e , then we cannot infer that
e e . In Figure 14.6 we illustrate the use of logical clocks for the example given in
Figure 14.5. Each of the processes p1 , p2 and p3 has its logical clock initialized to 0.
The clock values given are those immediately after the event to which they are adjacent.
Note that, for example, L b L e but b e .
Multicast and Logical Clocks
• Three possible states for a process
• Released – not in or requiring entry to the critical section
• Wanted – requiring entry to the critical section
• Held – acquired entry to the critical section and has not yet finished
that access
Multicast and Logical Clocks
• When a process wants to enter a CS:
• It sets its state to wanted
• It sends a message (Timestamp, ID) to all other processes
• Once all other processes replies, it can enter the CS
Multicast and Logical Clocks
• When a process receives a message from another process
• If the receiver process state is held, the message is queued
• If the receiver process state is wanted and the message timestamp is
after the local timestamp, the message is queued (if the timestamps
are the same, the process ID is used to order messages)
• Otherwise: reply immediately
Multicast and Logical Clocks
• When a process receives a message from another process
• If the receiver process state is held, the message is queued
• If the receiver process state is wanted and the message timestamp is
after the local timestamp, the message is queued (if the timestamps
are the same, the process ID is used to order messages)
• Otherwise: reply immediately
Ensures consistent ordering
of (timestamp, ID)
Multicast and Logical Clocks
• When a process exits CS
• Sets its state to released
• Replies to all queued requests
Multicast and Logical Clocks
• This algorithm provides safety, liveness, and ordering.
• See: Coulouris et al., 2005. Distributed Systems: Concepts and Design. 5th Edition.
Pearson Education. (Chapter 15.2)
P2
P3
P1
34
34Reply Reply
41
Reply
41
Released
Released
Released
Wanted
Held
Wanted
Held
Fault Tolerance
• Assuming asynchronous system (no time bound on
message transmission, CS execution, etc.)
• What happens when messages are lost?
• What happens when a process crashes?
Overview
• Distributed mutual exclusion
• Election
Election
• Choosing a unique process to play a role
• For example, the server in central server algorithm
Election
• Choosing a unique process to play a role
• For example, the server in central server algorithm
• Any process can start an election
• All processes need to agree on the selection
• Only one process is chosen – the one with the largest
identifying value (could be process number, uptime, etc.)
Ring-based Algorithm
• To select the process with
the largest ID as coordinator
• Processes arranged in a
logical ring
• A process starts an election
by creating a message
containing its ID, and
sending it to the next
process in the ring
Figure 15.7 A ring-based election in progress
Note: The election was started by process 17. The highest process identifier encountered
so far is 24. Participant processes are shown in a darker tint.
24
15
9
4
3
28
17
24
1
SECTION 15.3 ELECTIONS 643
If, however, the received identifier is that of the receiver itself, then this process’s
identifier must be the greatest, and it becomes the coordinator. The coordinator marks
itself as a non-participant once more and sends an elected message to its neighbour,
announcing its election and enclosing its identity.
When a process pi receives an elected message, it marks itself as a non-
participant, sets its variable electedi to the identifier in the message and, unless it is the
new coordinator, forwards the message to its neighbour.
It is easy to see that condition E1 is met. All identifiers are compared, since a
process must receive its own identifier back before sending an elected message. For any
two processes, the one with the larger identifier will not pass on the other’s identifier. It
is therefore impossible that both should receive their own identifier back.
Condition E2 follows immediately from the guaranteed traversals of the ring
(there are no failures). Note how the non-participant and participant states are used so
that duplicate messages arising when two processes start an election at the same time are
extinguished as soon as possible, and always before the ‘winning’ election result has
been announced.
If only a single process starts an election, then the worst-performing case is when
its anti-clockwise neighbour has the highest identifier. A total of N 1– messages are
then required to reach this neighbour, which will not announce its election until its
identifier has completed another circuit, taking a further N messages. The elected
message is then sent N times, making 3N 1– messages in all. The turnaround time is
also 3N 1– , since these messages are sent sequentially.
Ring-based Algorithm
• When a process receives a message
• If ID in message is smaller than its own ID,
replace the ID in message with its own ID,
forward the message to the next process
• If ID in message is greater, forward it to the
next process
• If ID in message is the same as its own ID,
then its ID is the largest in the ring; pass an
elected message along the ring to notify other
processes
Figure 15.7 A ring-based election in progress
Note: The election was started by process 17. The highest process identifier encountered
so far is 24. Participant processes are shown in a darker tint.
24
15
9
4
3
28
17
24
1
SECTION 15.3 ELECTIONS 643
If, however, the received identifier is that of the receiver itself, then this process’s
identifier must be the greatest, and it becomes the coordinator. The coordinator marks
itself as a non-participant once more and sends an elected message to its neighbour,
announcing its election and enclosing its identity.
When a process pi receives an elected message, it marks itself as a non-
participant, sets its variable electedi to the identifier in the message and, unless it is the
new coordinator, forwards the message to its neighbour.
It is easy to see that condition E1 is met. All identifiers are compared, since a
process must receive its own identifier back before sending an elected message. For any
two processes, the one with the larger identifier will not pass on the other’s identifier. It
is therefore impossible that both should receive their own identifier back.
Condition E2 follows immediately from the guaranteed traversals of the ring
(there are no failures). Note how the non-participant and participant states are used so
that duplicate messages arising when two processes start an election at the same time are
extinguished as soon as possible, and always before the ‘winning’ election result has
been announced.
If only a single process starts an election, then the worst-performing case is when
its anti-clockwise neighbour has the highest identifier. A total of N 1– messages are
then required to reach this neighbour, which will not announce its election until its
identifier has completed another circuit, taking a further N messages. The elected
message is then sent N times, making 3N 1– messages in all. The turnaround time is
also 3N 1– , since these messages are sent sequentially.
Ring-based Algorithm
17
24
1
28
15
9
4
3
17
24
28 24
28
28
28 28
28
28
28
References
• Coulouris, G., Dollimore, J., Kindberg, T. and Blair G.,
2005. Distributed Systems: Concepts and Design. 5th
Edition. Pearson Education. (Chapter 15)