程序代写代做代考 distributed system chain algorithm data structure Distributed Systems Foundations

Distributed Systems Foundations

REPLICATED DICTIONARY
AND LOG

CS 171 1

Replicated dictionary problem
• Efficient solutions to the replicated log and

dictionary problems. Wuu and Bernstein
PODC 84.

• Replication is a fundamental method for fault-
tolerance.

• Replication is also used for performance.

• Dictionary is a common data structure which
is often fully replicated on all sites.

• (Blockchains are also fully replicated)

CS 171 2

Replication motivation

• What if data is stored in ONE place?

CS 171 3

DATA

Replication motivation

• What if data is stored in ONE place?

– Failures cause data to be inaccessible (single point
of failure)

CS 171 4

DATA
CRASH!!

Replication motivation

• What if data is stored in ONE place?

– Failures cause data to be inaccessible (single point
of failure)

– Users far away experience large latency to access
data

CS 171 5

DATA

Replication

• Replication requires maintaining data in
multiple locations

– Identical copies?

CS 171 6

Replica A Replica B

Basic assumptions

• Very general: Sites may crash, links may fail,
partitioning.

• Asynchronous System

• Each site maintains a local clock (a counter).

• Use Lamport’s event execution model and
happens-before relation.

CS 171 7

Motivating example for causality

Micro-blogging application

Causality in micro-blogging

• A micro-blogging website has users and a
micro-blog

• One global stream of users blog

• In what order should the stream display
blogs?

• In a non-replicated deployment, order by time
of arrival.

CS 171 9

Causality between blogs
• A user A blogs blog a1 as a consequence of an

earlier blog b2 (or blogs) by another user
(users) B.

CS 171 10

Blog stream

User B: in which room is

CS171?

User A: I cannot believe

students are still asking

about class locations!

Causality between blogs

• This causality is trivially captured in non-
replicated systems. User A made a comment
after seeing what caused it (blog by User B) in
the stream, which means it already exists.

CS 171 11

SERVER

Causality between blogs

• For three replicas, A, B, and C. If blogs are just
replicated asynchronously it will not
guarantee causality.

• Naïve-replication:

– When a blog is posted send to all replicas.

– When a replica receives a blog, append to log.

CS 171 12

Causality between blogs
• Blog a is caused by blog b2,

• at replica C blog a is ordered before blog b2;

• problem!

CS 171 13

Replica A

Replica B

Replica C

Blog b2

Blog a

Blog b1

Log stores

• An append operation inserts a record at the
head of the log

• Changing the content of a record: delete
record and then insert

• Reading log => all log content.

insert(a) Insert(b) Insert(c) Delete(b) Insert(d)

Head of log

CS 171 14

The Log- Example

Site 1

Log

The Log- Example

• I(x)

Site 1

I(x)

Log

The Log- Example

• I(y)

Site 1

I(x) I(y)

Log

The Log- Example

• D(y)

Site 1

I(x) I(y) D(y)

Log

The Log- Example

• D(x)

Site 1

I(x) I(y) D(y) D(x)

Log

The log problem

• Each site maintains a copy of the log.

• The log contains local events, i.e.,
– insert

– delete

• The goal of the algorithm is to keep all copies
of the log up to date.

• Li is the copy of the log at site i.

• L(e) is the contents of log Lnode(e) immediately
after event e is executed.

CS 171 20

An example of a log

• Logs are transmitted between replicas

• What to include in the transmitted log?

• Lb = {insert(x,20)}

• La = {insert(y,50),insert(z,100)}
CS 171 21

Replica A

Replica B

insert(z,100)

Lb

insert(y,50)

insert(x,20)

La

The log problem

• Log Problem: find an algorithm that maintains
the log such that given an execution ,

 events e,f: if f e then f is in L(e)

• General approach:

– For each local event, insert a record in the local log.

– Exchange logs to update other sites

• Main question: when to exchange logs? With
application communication to capture the
happens before relation.

CS 171 22

Attempt at Solution 1

• A solution:

– Site i sends to site j all records in the log that were inserted
since i last sent a message to j.

– WHY INCORRECT?
– Log message lost – log messages reordered

23

Replica A

Replica B

T1=insert(x,20)

insert(x,20) insert(w,60)

T2=insert(w,60)

CS 171

Attempt at Solution 2

• Another solution:

– each site i includes its log Li with each message.

– On receiving a message, a site j incorporates all new
event records.

–BAD:

• Entire log sent with each message

• Entire log kept at each node.

CS 171 24

Replica A

Replica B

T1=Lb

insert(x,20) insert(w,60)

T2=Lb

Efficient solution for log problem

• Observation 1: Once i “knows” that j
“knows” of an event e (which may have
occurred on site k), then i does not need to
include event e in messages to j.

• Observation 2: Once i “knows” that all sites
“know” about an event e, then i does not
need to keep a record of e in its local log.

CS 171 25

2 Dimensional Time-Table
• TTi[n,n]:

• if TTi[j,k] = t,
then site i knows that site j has learned of all

events that occurred at site k up to time t.

tj

k

CS 171 26

The 2 dimensional timetable

• site j might actually know about more
events, but site i may not be aware of it.

• TTi[i,i] is the value of clock at site i.

• TTi[i,k] is the value of clock at site k of the
most recent event at site k that site i is
aware of.

CS 171 27

Two dimensional timetable

• Let hasrec(TTi, e, k) be true iff

TTi[k,node(e)] >= time(e)

• The algorithm must guarantee that if
hasrec(TTi, e, k) is true, then site k has
learned of event e.

• Note: site i need not send a record of
event e to site k if hasrec(TTi, e, k) is true.

CS 171 28

The Replicated Log- Example

Site 1

Log

0 0 0

0 0 0

0 0 0

TimeTable

Site 1
Log

0 0 0

0 0 0

0 0 0

TimeTable

Site 2
Log

0 0 0

0 0 0

0 0 0

TimeTable

Site 3
Log

0 0 0

0 0 0

0 0 0

TimeTable

Site 1
I(X)Log

1 0 0

0 0 0

0 0 0

TimeTable

Site 2
Log

0 0 0

0 0 0

0 0 0

TimeTable

Site 3
Log

0 0 0

0 0 0

0 0 0

TimeTable

I(X)

Site 1
I(X) I(Y)Log

2 0 0

0 0 0

0 0 0

TimeTable

Site 2
Log

0 0 0

0 0 0

0 0 0

TimeTable

Site 3
Log

0 0 0

0 0 0

0 0 0

TimeTable

I(X) I(Y)

Site 1
I(X) I(Y)Log

2 0 0

0 0 0

0 0 0

TimeTable

Site 2
I(Z)Log

0 0 0

0 1 0

0 0 0

TimeTable

Site 3
Log

0 0 0

0 0 0

0 0 0

TimeTable

I(X) I(Y)

I(Z)

Site 1
I(X) I(Y) D(Y)Log

3 0 0

0 0 0

0 0 0

TimeTable

Site 2
I(Z)Log

0 0 0

0 1 0

0 0 0

TimeTable

Site 3
Log

0 0 0

0 0 0

0 0 0

TimeTable

I(X) I(Y)

I(Z)

D(Y)

Site 1
I(X) I(Y) D(Y)Log

3 0 0

0 0 0

0 0 0

TimeTable

Site 2
I(Z)Log

0 0 0

0 1 0

0 0 0

TimeTable

Site 3
Log

0 0 0

0 0 0

0 0 0

TimeTable

I(X) I(Y)

I(Z)

D(Y)S(2)

Site 1 knows that site 2

knows till clock 0 about

its local events

Site one send all its log

local events with clock > 0

Site 1 sends [(I(X),1),

(I(Y),2), (D(Y),3)] [] [] and

its timetable

How to update the timetable?

3 0 0

0 0 0

0 0 0
+

0 0 0

0 1 0

0 0 0
=

0 0 0

0 1 0

0 0 0

•Max of all elements

3

•Max of times in local i’th row and remote k’th row.

3

Log maintenance

• Initialize all entries in TT to 0.

• For each local operation, insert a copy in the local log.

• With each send operation from site i to site k: piggyback
TT+ the following subset of the local log Li: all records e
such that hasrec(TTi, e, k) is not true.

• On receipt of a message from site k by site i:

– incorporate all new events into local log

– update TT:

• Max of all elements

• Max of times in local i’th row and remote k’th row.

CS 171 37

Dictionary problem

• Assume we want to maintain a replicated
dictionary with insert, delete and lookup.

• On receipt of a message with partial log & TT:

– Update local copy of the dictionary

– Update local copy of TT as before

– Garbage collect local log from any records that
correspond to events e such that

⌐ site j such that hasrec(TTi, e, j) is not true

CS 171 38

Replicated Dictionary- Example

Site 1

Log Dictionary

Replicated Dictionary- Example

• I(x)

Site 1

I(x)

x …

Log

Dictionary

Replicated Dictionary- Example

• I(y)

Site 1

I(x) I(y)

X …

y …

Log

Dictionary

Replicated Dictionary- Example

• D(y)

Site 1

I(x) I(y) D(y)

X …

y …

Log

Dictionary

Replicated Dictionary- Example

• D(x)

Site 1

I(x) I(y) D(y) D(x)

X …

Log

Dictionary

Site 1
Log

Dictionary

0 0 0

0 0 0

0 0 0

TimeTable

Site 2
Log

Dictionary

0 0 0

0 0 0

0 0 0

TimeTable

Site 3
Log

Dictionary

0 0 0

0 0 0

0 0 0

TimeTable

Site 1
I(X) X …Log

Dictionary

1 0 0

0 0 0

0 0 0

TimeTable

Site 2
Log

Dictionary

0 0 0

0 0 0

0 0 0

TimeTable

Site 3
Log

Dictionary

0 0 0

0 0 0

0 0 0

TimeTable

I(X)

Site 1
I(X) I(Y) X …

Y …

Log

Dictionary

2 0 0

0 0 0

0 0 0

TimeTable

Site 2
Log

Dictionary

0 0 0

0 0 0

0 0 0

TimeTable

Site 3
Log

Dictionary

0 0 0

0 0 0

0 0 0

TimeTable

I(X) I(Y)

Site 1
I(X) I(Y) X …

Y …

Log

Dictionary

2 0 0

0 0 0

0 0 0

TimeTable

Site 2
I(Z) Z …Log

Dictionary

0 0 0

0 1 0

0 0 0

TimeTable

Site 3
Log

Dictionary

0 0 0

0 0 0

0 0 0

TimeTable

I(X) I(Y)

I(Z)

Site 1
I(X) I(Y) D(Y) X …

Y …

Log

Dictionary

3 0 0

0 0 0

0 0 0

TimeTable

Site 2
I(Z) Z …Log

Dictionary

0 0 0

0 1 0

0 0 0

TimeTable

Site 3
Log

Dictionary

0 0 0

0 0 0

0 0 0

TimeTable

I(X) I(Y)

I(Z)

D(Y)