CRDT – CONFLICT‐FREE REPLICATED DATA TYPES
Distributed Systems (Hans‐Arno Jacobsen) 1
Pixabay.com
CRDTs Units
• Eventual consistency, informally
• State‐based objects
• Eventual consistency, more formally • Conflict‐free replicated data types
Distributed Systems (Hans‐Arno Jacobsen) 2
EVENTUAL CONSISTENCY, INFORMALLY
Distributed Systems (Hans‐Arno Jacobsen)
3
Pixabay.com
Eventual Consistency
• Eventualconsistencyisdesirableforlarge‐scale distributed systems where high availability is important
• Tendstobecheaptoimplement(e.g.,viagossip)but may serve stale data
• Constitutesachallengeforenvironmentswhere stronger consistency is important
Distributed Systems (Hans‐Arno Jacobsen) 4
Handling Concurrent Writes
• Premise for eventual consistency were scenarios with few (no) concurrent writes to the same key (cf. client‐centric consistency)
• However, we do need a mechanism to handle concurrent writes should they so happen
• If there were a way to handle concurrent writes, we could support eventual consistency more broadly
• Would “only” need to guarantee that after processing all writes for a key, all replicas converge, no matter what order the writes are processed (e.g., assuming gossip)
Distributed Systems (Hans‐Arno Jacobsen) 5
Max register L1: 0 W(4) (4) W(2) (4)
Examples
Growth‐only counter (G‐counter)
L1: 0 W(+5) (5) W(+2) (7) W(+1) 8
L2: 0 W(+2) (2) W(+5) (7) W(+1) 8 Writes propagate to L2, L1, respectively
✔
Different locations (replicas)
merge(5) 5 L2: 0 W(5) (5) W(3) (5) merge(4) ✔
State propagate to L2, L1 via periodic merging
Distributed Systems (Hans‐Arno Jacobsen)
6
5
Self‐study Questions
• Think of a few basic data structures, like lists, sets, counters, binary trees, heaps, maps, etc., and visualize for yourself what happens if replicated instances of these structures are updated via gossip.
• Does their state converge, no matter the update sequence?
• What happens if update operations are lost or duplicated?
• What mechanisms we know other than gossip could be used to keep these replicated structures updated without violating their convergence.
• What are pros and cons of these mechanisms?
Distributed Systems (Hans‐Arno Jacobsen) 7
Distributed Systems (Hans‐Arno Jacobsen) 8
CRDT – FROM STATE‐BASED OBJECTS TO REPLICATED STATE‐BASED OBJECTS
Distributed Systems (Hans‐Arno Jacobsen)
9
Pixabay.com
State‐based objects Mostly plain old objects
• Offerupdateandqueryrequeststoclients
• Maintaininternalstate
• Processclientrequests
• Perform merge requests amongst each other • Periodicallymerge(supportinfrastructure)
Distributed Systems (Hans‐Arno Jacobsen) 10
State‐based Object
• What we commonly know as object • Comprised of
– Internal state
– One or more query methods – One or more update methods – A merge method
Distributed Systems (Hans‐Arno Jacobsen) 11
class Avg(object): def __init__(self):
def update(self, x): self.sum += x self.cnt += 1
self.sum = 0
self.cnt = 0
def query(self): if self.cnt != 0:
def merge(self, avg): self.sum += avg.sum self.cnt += avg.cnt
return
else: return 0
self.sum /
self.cnt
Class Average Running Example
Distributed Systems (Hans‐Arno Jacobsen) 12
Average
State‐based object representing a running average
• Internalstate
– self.sum and self.cnt
• Query returns average
• Update updates average with a new value x
• Merge merges one Avg instance into another one
Distributed Systems (Hans‐Arno Jacobsen) 13
Replicated State‐based Object
• State‐based object replicated across multiple nodes
• E.g., replicate Avg across two nodes
• Both nodes have a copy of state‐based object
• Clients send query and update to a single node
• Nodes periodically send their copy of state‐based object to other nodes for merging
Distributed Systems (Hans‐Arno Jacobsen) 14
Node a
a0 Timeline Update
Unique
Causal history based on operation identifiers
operation identifier
Each state represents a snapshot of object in time that results from updates applied
state
query
history
State
a1
a0 sum:0, cnt:0
0
a1 sum:1, cnt:1
1 0
a2 sum:4, cnt:2 Distributed Systems (Hans‐Arno Jacobsen)
2 0,1
State
15
Operation identifier is unique across replicas
Each state represents a snapshot of object in time that results from updates applied
state
query
history
Timeline
a0 sum:0, cnt:0
0
a1 sum:1, cnt:1
1 0
a2 sum:4, cnt:2
2 0,1
Distributed Systems (Hans‐Arno Jacobsen)
16
States and Causal Histories
If y = x.update(…) where the update has identifier i, then the causal history of y is the causal history of x union { i }.
a0 sum:0, a1 sum:1, b0 sum:0, b1 sum:2, b2 sum:6,
cnt:0 cnt:1 cnt:0 cnt:1 cnt:2
0 1 0 2 3
{} {0} {} {1} {1,2}
Distributed Systems (Hans‐Arno Jacobsen)
17
state
query ()
history
a0 sum:0,
cnt:0 cnt:1 cnt:0 cnt:1 cnt:2
0{} 2 {0} 0{} 4 {1}
a1 sum:2,
b0 sum:0,
b1 sum:4,
update 2 update 4
state
query () history
Merge
a2 sum:6,
Distributed Systems (Hans‐Arno Jacobsen)
3 {0,1}
18
Nodes Periodically Propagate Their State
Distributed Systems (Hans‐Arno Jacobsen) 19
Self‐study Questions
• Think of a few basic data structures, like lists, sets, counters, binary trees, heaps, maps, etc., and visualize for yourself what happens if replicated instances of these structures are updated via gossip.
• For the above data structures, specify merge operations that merge the state of two instances of a given structure.
• Assume merge happens periodically, does your replicated structures’ state converge?
Distributed Systems (Hans‐Arno Jacobsen) 20
Distributed Systems (Hans‐Arno Jacobsen) 21
CRDT –
EVENTUAL CONSISTENCY, MORE FORMALLY
Distributed Systems (Hans‐Arno Jacobsen)
22
Pixabay.com
Eventual Consistency (EC)
• A replicated state‐based object is
– eventually consistent if whenever two replicas of the state‐based object have the same causal history, they eventually (not necessarily immediately) converge to the same internal state
Distributed Systems (Hans‐Arno Jacobsen) 23
Strong Eventual Consistency (SEC)
• A replicated state‐based object is
– strongly eventually consistent if whenever two replicas of the state‐based object have the same causal history, they (immediately) have the same internal state
• Strong eventual consistency implies eventual consistency
Distributed Systems (Hans‐Arno Jacobsen) 24
– NoMergeAverage – BMergeAverage – MaxAverage
EC or SEC
That is the question?
• Variants of our Average object, defined next – Average
• Note that some of these objects do not represent realistic functionality (i.e., needed functionality)
• These objects are meant to illustrate convergence concepts only
Distributed Systems (Hans‐Arno Jacobsen) 25
Average
a, b attain the same causal history but do not converge to the same internal state – they do not converge at all!
state
query history
Neither eventually
b0 sum:0, cnt:0 b1 sum:4, cnt:1
0
consistent, nor
4
1
strongly eventually
b2 sum:10, cnt:3 b3 sum:26, cnt:8
3.3 3.25
0,1
consistent
0,1
Distributed Systems (Hans‐Arno Jacobsen)
a0 sum:0, cnt:0
a1 sum:2, cnt:1
a2 sum:6, cnt:2
a3 sum:16, cnt:5 3.2 0,1
0 2 0 3 0,1
26
NoMergeAverage
• Object’s merge does nothing
• All else is the same as for Average
Distributed Systems (Hans‐Arno Jacobsen) 28
a, b have same causal history, both converge to a stable but different internal state.
state
query
history
Neither eventually consistent, nor strongly eventually consistent.
a0 sum:0, a1 sum:2, a2 sum:2, a3 sum:2, b0 sum:0, b1 sum:4, b2 sum:4, b3 sum:4,
cnt:0 cnt:1 cnt:1 cnt:1 cnt:0 cnt:1 cnt:1 cnt:1
0 2 2 2 0 4 4 4
0 0,1 0,1 1 0,1 0,12 9
Distributed Systems (Hans‐Arno Jacobsen)
BMergeAverage
• Object’s merge
– At b – overwrite state with state at a – At a – do nothing
• All else is the same as for Average
Distributed Systems (Hans‐Arno Jacobsen) 30
a, b attain same causal history, both eventually converge to the same internal state – eventual consistent.
state
query history
a1, b1 have same causal history but different internal state – not strongly eventually consistent
0 0 0 0 0 4 0 0 0
a0 sum:0, cnt:0 a1 sum:0, cnt:0 a2 sum:0, cnt:0 b0 sum:0, cnt:0 b1 sum:4, cnt:1 b2 sum:0, cnt:0
0
Distributed Systems (Hans‐Arno Jacobsen)
31
MaxAverage
• Object’s merge
– Pair‐wise max of sum and cnt
• All else is the same as for Average
Distributed Systems (Hans‐Arno Jacobsen) 32
At a, b for all states with the same causal history, they have the same internal state – strongly eventually consistent.
state
query
history
Great!!! But, what
0,1
does it actually
compute? Here,
1
update(2) overwritten
0,1
by update(4)! Distributed Systems (Hans‐Arno Jacobsen)
0,1
a0 sum:0,
a1 sum:2,
a2 sum:4,
a3 sum:4,
b0 sum:0,
b1 sum:4,
b2 sum:4,
b3 sum:4,
cnt:0 cnt:1 cnt:1 cnt:1 cnt:0 cnt:1 cnt:1 cnt:1
0 2 4 4 0 4 4 4
0
0,1
33
Lessons Learned I
• Same causal history, different internal state
• Same causal history, converge to stable but different internal state
• Same causal history, eventually same internal state – EC
• Same causal history, always same internal state – SEC
Average NoMergeAverage BMergeAverage MaxAverage
no no no yes no no yes yes no yes yes yes
C? EC? SEC?
Designing a strongly eventually consistent state‐based object with intuitive semantics is challenging!
Distributed Systems (Hans‐Arno Jacobsen) 34
Lessons Learned II
• Replicatedstate‐basedobject
• No convergence
• Convergence
• Eventualconsistencyinthismodel
• Strongeventualconsistencyinthismodel
Distributed Systems (Hans‐Arno Jacobsen) 35
Self‐study Questions
• Can you design Average such that it becomes EC or SEC as well as offers correct averaging semantics?
• Think of other data structures and design update, query, and merge operations with reasonable semantics.
• Always draw timelines and state diagrams for your designs and proof EC or SEC, if possible.
• Think of data structures that support multiple update operations and one or more query operations.
Distributed Systems (Hans‐Arno Jacobsen) 36
Distributed Systems (Hans‐Arno Jacobsen) 37
CRDT –
CONFLICT‐FREE REPLICATED DATA TYPES, 2011
Distributed Systems (Hans‐Arno Jacobsen) 38
Pixabay.comv
• • •
A CRDT is a conflict‐free replicated state‐based object
•
CRDTs are no panacea but a great solution when they apply!
Conflict‐Free Replicated Data Types
A CRDT handles concurrent writes
Intuition ‐ restrictions:
– Do not allow writes with arbitrary values, limit to write
operations which are guaranteed not to conflict
– CRDTs are data structures with special write operations; they guarantee strong eventual consistency and are monotonic (no rollbacks)
Distributed Systems (Hans‐Arno Jacobsen) 39
•
Conflict‐Free Replicated Data Types CRDTs can be commutative, op‐based (CmRDT):
•
CRDTs can be convergent, state‐based (CvRDT):
– Example: A max register, which stores the maximum
•
Therefore, the value of a CRDT depends on multiple write operations or states, not just the latest one`
– Example: A growth‐only counter, which can only process increment operations
– Propagate operations among replicas (duplicate‐free, no‐loss messaging)
value written
– Propagate and merge states (idempotent)
Distributed Systems (Hans‐Arno Jacobsen) 40
• Supports – Query
CmCRDTs and CvCRDTs are equivalent. One can be transformed into the other one and vice versa.
– Update – Merge
State‐based CRDTs
• A CRDT is a replicated state‐based object
Distributed Systems (Hans‐Arno Jacobsen) 41
CRDT Properties
A CRDT is a replicated state‐based object that satisfies
• Mergeisassociative(e.g.,(A+(B+C))=((A+B)+C)) – For any three state‐based objects x, y, and z,
merge(merge(x, y), z) is equal to merge(x, merge(y, z)) • Mergeiscommutative(e.g.,A+B=B+A)
– For any two state‐based objects, x and y, merge(x, y) is equal to merge(y, x)
• Merge is idempotent
– For any state‐based object x, merge(x, x) is equal to x
• Every update is increasing
– Let x be a state‐based object and let y = update(x, …) be
the result of applying an update to x
– Then, update is increasing if merge(x, y) is equal to y
Distributed Systems (Hans‐Arno Jacobsen) 42
max of a, b
self.x = 0 def query(self):
Max Register is a CRDT The state‐based object IntMax is a CRDT
• IntMax wraps an integer • Merge(a, b)is the
class IntMax(object): def __init__(self):
• Update(x)adds x to the wrapped integer
return self.x
def update(self, x):
• Prove that IntMax is associative, commutative, idempotent, increasing
self.x += x def merge(self,
Distributed Systems (Hans‐Arno Jacobsen)
43
assert x >= 0
other):
self.x =
max(self.x,
other.x)
Establish Four Properties of CRDT
• Associativity merge(merge(a, b), c)
= max(max(a.x, b.x), c.x)
= max(a.x, max(b.x, c.x))
= merge(a, merge(b, c))
• Impotence merge(a, a)
= max(a.x, a.x) = a.x
= a
• Commutativity merge(a, b)
= max(a.x, b.x) = max(b.x, a.x) = merge(b, a)
• Update is increasing merge(a, update(a, x)) = max(a.x, a.x + x)
= a.x + x
= update(a, x)
Distributed Systems (Hans‐Arno Jacobsen) 44
G‐Counter CRDT Replicated growth‐only counter
• Internal state of a G‐Counter replicated on n nodes is an n‐length array of non‐negative integers
• query returns sum of every element in n‐length array • add(x)when invoked on the i‐th node, increments
the i‐th entry of the n‐length array by x
– E.g., Node 0 increments 0th entry, Node 1 increments 1st
entry of array, and so on
• merge performs a pairwise maximum of the two arrays
Distributed Systems (Hans‐Arno Jacobsen) 45
PN‐Counter CRDT
Replicated counter supporting addition & subtraction
• Internal state of a PN‐Counter
– pair of two G‐Counters named p and n.
• p represents total value added to PN‐Counter
• n represents total value subtracted from PN‐Counter.
• query method returns difference p.query() – n.query()
• add(x)- first of two updates: invokes p.add(x)
• sub(x)- second of two updates: invokes n.add(x)
• merge performs a pairwise merge of p and n
Distributed Systems (Hans‐Arno Jacobsen) 47
G‐Set CRDT Replicated growth‐only set
A G‐Set CRDT represents a replicated set which can be added to but not removed from
• Internal state of a G‐Set is just a set
• query returns the set
• add(x)adds x to the set
• merge performs a set union
Distributed Systems (Hans‐Arno Jacobsen) 49
2P‐Set CRDT
Replicated set supporting addition and subtraction
• Internalstateofa2P‐Setisa
– pair of two G‐Sets named a and r
• a represents set of values added to the 2P‐Set
• r represents set of values removed from the 2P‐Set
• query method returns the set difference a.query() – r.query()
• add(x) is the first of two updates – invokes a.add(x).
• sub(x)is the second of two updates – invokesr.add(x)
• merge performs a pairwise merge of a and r Distributed Systems (Hans‐Arno Jacobsen) 51
Summary on CRDTs
• Formalized and introduced in 2011/2014 • CmCRDTs and CvCRDTs are equivalent!
• Really neat solution if applicable
• Challenge is to design new CRDTs
Distributed Systems (Hans‐Arno Jacobsen) 53
Self‐study Questions
• For all CRDTs introduced, establish its four properties.
• Create sample execution sequences for each CRDT and
complete a timeline and a state table.
• Find use cases where the introduced CRDTs apply and show how they are used.
• Think of new CRDTs and repeat the above.
Distributed Systems (Hans‐Arno Jacobsen) 1
Distributed Systems (Hans‐Arno Jacobsen) 55