程序代写代做代考 distributed system algorithm Distributed Systems Foundations

Distributed Systems Foundations

GLOBAL STATES AND
CHECKPOINTS

CS 271 1

Motivation

CS 271 2

Detecting global properties
 Want to discover if a property holds in a distributed system

 Three examples:

 Distributed garbage collection: if there are no longer any
reference to objects, the memory taken up by the objects
should be reclaimed.

 Distributed deadlock detection: when each of a collection
of processes waits for another process to send it a
message, and where there is a cycle in the graph of this
“wait-for” relationship.

 Distributed termination detection: detect if a distributed
algorithm has terminated. Need to test if each process
has halted and no more messages in the network.

CS 271 3

CS 271 4

Distributed Checkpoints and Rollback
Recovery

• Fault tolerance is achieved by periodically using
stable storage to save the processes’ states during
the failure-free execution.

• Upon a failure, a failed process rolls back from one of
its saved states, thereby reducing the amount of lost
computation.

• Each of the saved states is called a checkpoint

CS 271 5

Checkpoint based Recovery

• Uncoordinated checkpointing: Each
process takes its checkpoints
independently

• Coordinated checkpointing: Processes
coordinate their checkpoints in order to
save a system-wide consistent state.

CS 271 6

Domino effect: uncoordinated
example

P0

P1

P2

m0

m1

m2 m3

m4

m5
m7

m6

Recovery

Line

Domino Effect: Cascaded rollback which causes the

system to roll back too far in the computation (even to the

beginning), in spite of all the checkpoints

Coordinated Non-blocking

• Processes could coordinate, but …

• Do we really need to block …?

!
!

K. Mani Chandy Leslie Lamport

CS 271 7

Global State
Chandy and Lamport—TOCS 1985

• Global state of a distributed system
– Local state of each process
– Messages sent but not received

• Many applications need the state of the system
– Failure recovery, distributed deadlock detection
– Detect stable properties.

• Problem: how can you figure out the state of a
distributed system?
– Each process is independent
– Network does not have any processing power.

• Distributed snapshot: a consistent global state

CS 271 8

Distributed System Model

• Assume each process communicates with
another process using unidirectional FIFO
point-to-point channels (e.g, TCP connections)

CS 271 9

p q

r

c1

c2

c3c4

CS 271 10

A Simple Example
A Variant of producer-consumer example

• Producer code:

while (1)

{

produce m;

send m;

wait for ack;

}

• Consumer code:

while (1)

{

recv m;

consume m;

send ack;

}

CS 271 11

Example: Initial State

m

Producer Consumer

CS 271 12

Example

m

Producer Consumer

CS 271 13

Example

m

Producer Consumer

CS 271 14

Example

a

Producer Consumer

CS 271 15

Example

a

Producer Consumer

CS 271 16

Example

a

Producer Consumer

CS 271 17

A naïve snapshot algorithm

• Processes record their state at any arbitrary
point

• A designated process collects these states

+ So simple!!

– Correct??

CS 271 18

Example
Producer Consumer problem

Producer records its state

m

Producer Consumer

CS 271 19

Example

m

Producer Consumer

CS 271 20

Example

Consumer records its state

m

Producer Consumer

CS 271 21

Example
The recorded state

m m

Producer Consumer

!!!!!!!!!!!!!!!!!!

CS 271 22

Where did we err?

• What did we do wrong?

producer

consumer

m

CS 271 23

Error!!

• The sender has no record of the sending

• The receiver has the record of the receipt

• Result

– Global state has record of the receive event but no
send event violating the happened before
concept!!

CS 271 24

The Notion of Consistency

• A global state is consistent if it could have
been observed by an external observer

• If a b then it is never the case that b is
observed by an external observer and not a

• All feasible states are consistent

CS 271 25

An Example

p

q

p q

Sp
0 Sp

1 Sp
2 Sp

3

Sq
0 Sq

1 Sq
2 Sq

3

m1

m2

m3

CS 271 26

A Consistent State?

p

q

p q

Sp
0 Sp

1 Sp
2 Sp

3

Sq
0 Sq

1 Sq
2 Sq

3

m1

m2

m3

Sp
1 Sq

1

CS 271 27

Yes

p

q

p q

Sp
0 Sp

1 Sp
2 Sp

3

Sq
0 Sq

1 Sq
2 Sq

3

m1

m2

m3

Sp
1 Sq

1

CS 271 28

A Consistent State?

p

q

p q

Sp
0 Sp

1 Sp
2 Sp

3

Sq
0 Sq

1 Sq
2 Sq

3

m1

m2

m3

Sp
2 Sq

3

m3

CS 271 29

Yes

p

q

p q

Sp
0 Sp

1 Sp
2 Sp

3

Sq
0 Sq

1 Sq
2 Sq

3

m1

m2 m3

Sp
2 Sq

3

m3

CS 271 30

What about……

p

q

p q

Sp
0 Sp

1 Sp
2 Sp

3

Sq
0 Sq

1 Sq
2 Sq

3

m1

m2

m3

Sp
1 Sq

3

Consistent State

a) A consistent cut

b) An inconsistent cut

CS 271 31

Distributed Snapshot Algorithm

• Any process can initiate the algorithm
– Save local state

– Send MARKERs on every outgoing channel

• On receiving a first marker on a channel c:
– Process saves local state and state of c is empty

– Send MARKERs on all outgoing channels, and save
messages on all other incoming channels (not c).

• On receiving subsequent marker on a channel:
– stop saving messages for that channel

– Saved messages are the state of the channel

CS 271 32

Distributed Snapshot

• A process finishes when

– It receives a marker on each incoming channel and
processes them all

– State: local state plus state of all channels

– Send state to initiator

• Any process can initiate snapshot

– Multiple snapshots may be in progress

• Each is distinguished by tagging the marker with the
initiator ID (and sequence number)

CS 271 33

Example

p q

r

c1

c2

c3c4

initiator

p

q

r

marker

checkpoint

x x

x
x x

CS 271 34

CS 271 35

Execution Example

p

q
Sq

0 Sq
1 Sq

2 Sq
3

Sp
0 Sp

1 Sp
2 Sp

3

m1 m2 m3

p q

c1

c2

initiator

CS 271 36

Execution Example

p

q
Sq

0 Sq
1 Sq

2 Sq
3

Sp
0 Sp

1 Sp
2 Sp

3

m1 m2 m3

q records state as Sq
1 , sends marker to p

p q

c1

c2

initiator

CS 271 37

Execution Example

p

q
Sq

0 Sq
1 Sq

2 Sq
3

Sp
0 Sp

1 Sp
2 Sp

3

m1 m2 m3

p records state as Sp
2, channel state as empty

p q

c1

c2

initiator

CS 271 38

Execution
Example

p

q
Sq

0 Sq
1 Sq

2 Sq
3

Sp
0 Sp

1 Sp
2 Sp

3

m1 m2 m3

q records channel state as m3

p q

c1

c2

initiator

CS 271 39

Execution
Example

p

q
Sq

0 Sq
1 Sq

2 Sq
3

Sp
0 Sp

1 Sp
2 Sp

3

m1 m2 m3

Recorded Global State = ((Sp
2, Sq

1), (0,m3) )

p q

c1

c2

initiator

Two processes trading in “widgets”

p1 p2
c2

c1

account widgets

$1000 (none)

account widgets

$50 2000

• Process p1 sends orders for widgets over c2 to p2,

enclosing payment at the rate of $10 per widget.

• In exchange, process p2 sends widgets along

channel c1 to p1.
CS 271 40

Execution Example

CS 271 41

p
1

p
2

(empty)<$1000, 0> <$50, 2000>

(empty)

c
2

c1e0: p1 sends (10, $100).

p
1

p
2

(Order 10, $100)<$900, 0> <$50, 2000>c2

1. p1 records its state and

sends a marker over c2.

(empty)c1
e1: p2 sends 5 free widgets to p1

p
1

p
2

(Order 10, $100)<$900, 0> <$50, 1995>

(five widgets)

c
2

c
1

Send 5 freebie widgets!

2. p2 receives marker & records

its state & channel c2 as empty.

Then sends marker over c1

p
1

p
2

(Order 10, $100)<$900, 5> <$50, 1995>c2

c
1

e2: p1 receives five widgets

3. p1 receives the marker, &

records state of c1 as 5 widget
p

1
p

2
(Order 10, $100)<$900, 5> <$50, 1995>c2

c
1

The Retrieved State

What is this State? Does it correspond to

any of the actual global states the system

went through?

p1 p2

(empty)

<$1000, 0> <$50, 1995>

<5 widgets>

C2

c1

CS 271 42

The Execution of the Algorithm

e0: p1 sends (10, $100).

e1: p2 sends 5 widges to p1

e2: p1 receives five widgets

p1 p2

(empty)

<$1000, 0> <$50, 1995>

<5 widgets>

C2

c1

CS 271 43

The Execution of the Algorithm

e0: p1 sends (10, $100).

e1: p2 sends 5 widges to p1

e2: p1 receives five widgets

e1: p2 sends 5 widges to p1

e0: p1 sends (10, $100)

e2: p1 receives five widgets

p1 p2

(empty)

<$1000, 0> <$50, 1995>

<5 widgets>

C2

c1

CS 271 44

The Execution of the Algorithm

e0: p1 sends (10, $100).

e1: p2 sends 5 widges to p1

e2: p1 receives five widgets

e1: p2 sends 5 widges to p1

e0: p1 sends (10, $100)

e2: p1 receives five widgets

p1 p2

(empty)

<$1000, 0> <$50, 1995>

<5 widgets>

C2

c1

TAKE SNAPSHOT

CS 271 45

Model

• Finite set of processes. Finite set of directed
channels. Modeled as a Directed Graph.

• Channels are FIFO, error free.

• A process is a finite set of states: an initial
state and a set of events.

• State of channel is msgs sent but not received.

CS 271 46

Correctness Proof

• Global State: set of process and channel states.

• Let seq = e1, e2, ……., en be a dist comp.

• Let Si be the state before event ei.

• Let Sinit be the state in which the algorithm
started and Sfinal the state when it terminated.

• Ssnap be the recorded state.

• Show that Ssnap is reachable from Sinit and Sfinal
is reachable from Ssnap

CS 271 47

Correctness Proof

• There is a sequence seq’ such that:

1. seq’ is a permutation of seq

2. Sinit = Ssnap or Sinit occurs earlier than Ssnap
3. Sfinal = Ssnap or Ssnap occurs earlier than Sfinal
• e is a pre-recording event iff e is in p and p

records state after e in seq.

• E is post-recording event iff e is in p and p
records state before e in seq.

CS 271 48

Correctness Proof

1. All events ei where i < init are pre-recording. 2. All events ei where i > final are post-recording.

3. There can be some post-recording events before
some pre-recording events.

• Possible on same process? NO

• What about on different processes?

• A pair of events a, b can be scheduled in any

order if there is no causal order between

them, so (a; b) is equivalent to (b; a)

CS 271 49

Checkpoint Proof: Different
Processes Case

• Violates FIFO property

P

Q

m2

m3
M

m1

Post-Rec

Pre-Rec

Reachability between states in the
snapshot algorithm

Sinit Sfinal

Ssnap

actual execution e0,e1,…

recording recording
begins ends

pre-snap: e’0,e ‘1,…e

R-1 post-snap: e


R,e ‘R+1,…

CS 271 51

Why does it work?

Let an observer observe the following actions:

pre[i] pre[k] post[k] pre[j] post[i] pre[l] post[j] post[l] …

pre[i] pre[k] pre[j] post[k] post[i] pre[l] post[j] post[l] …

pre[i] pre[k] pre[j] post[k] pre[l] post[i] post[j] post[l] …

pre[i] pre[k] pre[j] pre[l] post[k] post[i] post[j] post[l] …

Ssnap

52

Ssnap
Easy conceptualization of the snapshot state

All pre All post

Returns a correct global state

• Obtain seq by reordering events of seq
between first snap and last snap, putting all pre-
recording events before all post-recording
events, preserving causality.

• Returned state is exactly the global state of seq
between the pre-recording and post-recording
events.

CS 271 53

Correctness

• Termination:
–Communication graph is strongly connected

–All snap eventually, because of either snap
input or marker message.

– If there is a communication path from pi to
pk, then pk will record its state a finite period
of time after pi

–Markers eventually sent and received on all
channels.

CS 271 54

What about Channel States?

• Recorded state of channel C from p to q:

– Sequence of msgs received by q before Marker received

Minus

– Sequence of msgs received by q before state recording

CS 271 55

What about Channel States?

• What we want:

– Sequence of msgs sent by p before state recording

Minus

– Sequence of msgs received by q before state recording

CS 271 56

What about Channel States?

• Recorded state of channel C from p to q:

– Sequence of msgs received by q before Marker received

Minus

– Sequence of msgs received by q before state recording

• What we want:

– Sequence of msgs sent by p before state recording

Minus

– Sequence of msgs received by q before state recording

CS 271 57

What about Channel States?

• Recorded state of channel C from p to q:

– Sequence of msgs received by q before Marker received

Minus

– Sequence of msgs received by q before state recording

• What we want:

– Sequence of msgs sent by p before state recording

Minus

– Sequence of msgs received by q before state recording

CS 271 58

What about Channel States?

• Recorded state of channel C from p to q:

– Sequence of msgs received by q before Marker received

Minus

– Sequence of msgs received by q before state recording

• What we want:

– Sequence of msgs sent by p before state recording

Minus

– Sequence of msgs received by q before state recording

• They are equal!
CS 271 59

Detecting Stable Properties
• A predicate y (S) is a stable property if once y is

true for state S it remains true (unless you
interfere with system) for all subsequent states.

• Run protocol and record state S*.

– If y(Sinit) is true then y(S*) is true

– If y(S*) is true then y (Sfinal) is true

• y(S*) true implies property holds

• y(S*) false does NOT implies property does not
hold.

CS 271 60