distributed systems
2020
CONSISTENCY MODELS
1
CONSISTENCY MODELS
Motivation
Requirements for a distributed system Security
Scalability
Availability Performance
Common solution: replication
Tradeoff: high availability vs data consistency
DISTRIBUTED SYSTEMS 2 CONSISTENCY MODELS
CONSISTENCY MODELS
Data Consistency
ADD 10%! (REQUEST A)
ADD 100$! (REQUEST B)
REQUEST A, THEN REQUEST B REQUEST B, THEN REQUEST A DATABASE REPLICAS
DISTRIBUTED SYSTEMS 3 CONSISTENCY MODELS
CONSISTENCY MODELS
History …
1970s: distribution transparency
Better to fail whole system than break transparency
An update to the data would ensure all observers saw that update
1990s: internet → focus on availability
Eric Brewer’s CAP theorem: data consistency, system availability,
tolerance to network partition (partial failure)
Problem: large-scale systems need to be failure tolerant!
Must choose from consistency or availability…
Solution: relax consistency guarantees
DISTRIBUTED SYSTEMS 4 CONSISTENCY MODELS
CONSISTENCY MODELS
ACID Atomicity
All or nothing
No partially-completed transactions
are ever observable
eg. money transfer. No withdraw but not deposit, etc.
Consistency
DBMS people take this to mean Sequential Consistency
Data constraints enforced within a transaction will hold universally, by composition of transactions.
Isolation
No transaction in progress can observe another transaction in- progress.
System provides an ordering on whole transactions, not individual reads/writes.
Durability
Transactions do not spontaneously undo, changes only made by future transactions
Results persistent through restarts etc
DISTRIBUTED SYSTEMS 5 CONSISTENCY MODELS • … more about this Real Soon Now
CONSISTENCY MODELS
CAP vs. ACID CAP
Consistency
Availability
Partition-tolerance
Address cluster-wide progress and consistency
ACID
Atomicity
Consistency Isolation
Durability
Properties of transactions
Are applicable even with respect to a single node
DISTRIBUTED SYSTEMS 6 CONSISTENCY MODELS
CONSISTENCY MODELS
CAP Theorem – Tradeoffs!!
It is impossible simultaneously to achieve always-on experience (availability) and to ensure that users read the latest written version of a distributed database (consistency) in the presence of partial failure (partition-tolerance)
Maintaining a single-system image in a distributed system has a cost
If two processes (or groups of processes) cannot communicate then updates cannot be synchronously propagated to all processes without blocking.
Under partitions a system cannot safely complete updates and hence is unavailable to some or all of its users.
A system that chooses availability over consistency enjoys benefits of low latency: if a server can safely respond to a user’s request when it is partitioned from all other servers, then it can also respond to a user’s request without contacting other servers even when it is able to do so.
DISTRIBUTED SYSTEMS 7 CONSISTENCY MODELS
CONSISTENCY MODELS
Design & Implementation
Choices …
If emphasis is on consistency
the system may not be available to take, for example, a write
If the write fails because of system unavailability => you will have to deal with what to do with the data to be written.
If the emphasis is on availability
It may always accept the write, but under certain conditions a read will not reflect the result of a recently completed write
You will have to decide whether the client requires access to the absolute latest update all the time
Both options require you to be aware of what the system is offering
DISTRIBUTED SYSTEMS 8 CONSISTENCY MODELS
CONSISTENCY MODELS
System Examples
Source: https://howtodoinjava.com/hadoop/brewers-cap-theorem-in-simple-words/
DISTRIBUTED SYSTEMS 9 CONSISTENCY MODELS
CONSISTENCY MODELS
This Lecture Focuses on
CONSISTENCY
DISTRIBUTED SYSTEMS 10 CONSISTENCY MODELS
CONSISTENCY MODELS
Consistency Models
Intuitive
1. Strict Consistency
2. Sequential Consistency
3. Causal Consistency
4. Processor Consistency 5. Release Consistency
n. Eventual Consistency
Expensive
Headache
Scalable, Efficient
DISTRIBUTED SYSTEMS 11 CONSISTENCY MODELS
CONSISTENCY MODELS
Client perspective …
A storage system/service:
A black box, but under the covers it is something of large scale and highly distributed, and that it is built to guarantee durability and availability.
Process A (client):
Writes to and reads from the storage system. Processes B and C (clients):
Independent of process A and write to and read from the storage system
Need to communicate to share information
DISTRIBUTED SYSTEMS 12 CONSISTENCY MODELS
CONSISTENCY MODELS
1. Strict Consistency
Any read on a data item x returns a value corresponding to the result of the most recent write on x.
DISTRIBUTED SYSTEMS 13 CONSISTENCY MODELS
CONSISTENCY MODELS
1. Strict Consistency
Any read on a data item returns a value corresponding to the result of the most recent write on the item (regardless of where the write occurred)
Any execution is the same as if write/read operations were performed in the order of wall- clock time at which they were issued
Consider the implications for the system model required to support this
E.g. After A’s write of “bar” to value, any subsequent read to value will return the value “bar”
DISTRIBUTED SYSTEMS 14 CONSISTENCY MODELS
CONSISTENCY MODELS
1. Strict Consistency
Advantages?
very intuitive Disadvantages?
difficult to implement
absolute global time of “Wall Clock” in a distributed system.
DISTRIBUTED SYSTEMS 15 CONSISTENCY MODELS
CONSISTENCY MODELS
2. Sequential Consistency
The result of any execution is the same as if the read and write operations by all processes were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program [Lamport, 1979]
E.g.
• Process P1 first performs W(x)a to x.
• Later (in absolute time), process P2 performs
a write operation, by setting the value of x to b.
• Both P3 and P4 first read value b, and later
value a.
• Write operation of process P2 appears to have
taken place before that of P1 (to P3 and P4)
Execution is sequentially consistent but not strictly consistent!
DISTRIBUTED SYSTEMS 16 CONSISTENCY MODELS
CONSISTENCY MODELS
2. Sequential Consistency
Any execution is the same as if write/read operations were performed in a logical order
Rule 1: Each machine’s own ops appear in order dictated by their program
Rule 2: All machines see results according to a single chosen total (sequential) ordering
Reads may be stale according to wall-clock time, but not logical time
Similar to Strict Consistency but we do not adhere to wallclock timing.
Analogy: FIFO asynchronous communication between clients and a single service
DISTRIBUTED SYSTEMS 17 CONSISTENCY MODELS
CONSISTENCY MODELS
2. Sequential Consistency
An execution of multiple threads is sequentially consistent if the method calls can be correctly arranged retaining the mutual order of the method calls in each thread/process.
The calls in different threads/process can be re-arranged whatever you wish, regardless of when they start or finish.
Happens all the time when caching is involved.
DISTRIBUTED SYSTEMS 18 CONSISTENCY MODELS
CONSISTENCY MODELS
2. Sequential Consistency
X:=A means write to object x the value ‘A’
P1
X := A
X := B
B==X
means read
from x and get/got
value ‘B’ P3
P4
P2
A==X B==X
B==X B==X
DISTRIBUTED SYSTEMS 19 CONSISTENCY MODELS
CONSISTENCY MODELS
2. Sequential Consistency
X:=A means write to object x the value ‘A’
P1
X := A
X := B
B==X
means read
from x and get/got
value ‘B’ P3
P4
P2
A==X B==X
B==X B==X
DISTRIBUTED SYSTEMS 20 CONSISTENCY MODELS
CONSISTENCY MODELS
2. Sequential Consistency
P1
P2
P3
P4
X := A X := A A == X X := B B == X
X := B
A == X
B == X
B == X
B == X
DISTRIBUTED SYSTEMS 21 CONSISTENCY MODELS
CONSISTENCY MODELS
2. Sequential Consistency
P1
P2
P3
P4
X := A X := A A == X X := B
X := B
B == X
A == X
B == X
B == X
B == X
DISTRIBUTED SYSTEMS 22 CONSISTENCY MODELS
CONSISTENCY MODELS
2. Sequential Consistency
P1 X := A
X := B
P2
P3
P4
B == X A == X
A == X B == X
DISTRIBUTED SYSTEMS 23 CONSISTENCY MODELS
CONSISTENCY MODELS
2. Sequential Consistency
P1 X := A
X := B
P2
P3
P4
B == X
A == X
Reads not consistent with any sequential write ordering
A == X B == X
DISTRIBUTED SYSTEMS 24 CONSISTENCY MODELS
CONSISTENCY MODELS
2. Sequential Consistency
P1 X := A
X := C
X := B
Reads not consistent with any sequential write ordering
P4
P2
P3
C == X
A == X
A == X B == X
DISTRIBUTED SYSTEMS 25 CONSISTENCY MODELS
CONSISTENCY MODELS
2. Sequential Consistency
Advantages?
still relatively intuitive
no notion of real time; system has some leeway in how it orders operations compatible with realistic (asynchronous) underlying system models
as used in several languages that define concurrent behaviour, e.g. C++-11
Disadvantages?
difficult/costly to implement distributed service [why?] : similar to atomic broadcast
once a write completes, other machines reads must see new data [why?] => writes and reads will still be expensive
what if you disconnect from the network but still want to edit your shared document?
DISTRIBUTED SYSTEMS 26 CONSISTENCY MODELS
CONSISTENCY MODELS
3. Causal Consistency
No enforcement of global ordering of all operations
Writes that are potentially causally related must be seen by all processes in the same order.
Concurrent writes may be seen in a different order on different machines.
DISTRIBUTED SYSTEMS 27 CONSISTENCY MODELS
CONSISTENCY MODELS
3. Causal Consistency
Any execution is the same as if all causally-related read/write ops were executed in an order that reflects their causality
Any concurrent operations might be seen in different orders by different clients
Reads are “fresh” only with respect to the writes they are causally dependent on
Only causally-related writes are ordered by all replicas in the same way
Concurrent writes may be committed in different orders by different replicas, and hence read in different orders by different clients
DISTRIBUTED SYSTEMS 28 CONSISTENCY MODELS
CONSISTENCY MODELS
3. Causal Consistency
P1
P2
P3
P4
X := A
X := B
B == X A == X
A == X B == X
DISTRIBUTED SYSTEMS 29 CONSISTENCY MODELS
CONSISTENCY MODELS
3. Causal Consistency
P1
P1 P2
P2
P3
P4
X := A
X := B
B == X
A == X
Differing orders OK due to lack of causal relationship
A == X B == X
DISTRIBUTED SYSTEMS 30 CONSISTENCY MODELS
CONSISTENCY MODELS
3. Causal Consistency
P1
P2
P3
P4
X := A
A == X
X := B
B == X A == X
A == X B == X
DISTRIBUTED SYSTEMS 31 CONSISTENCY MODELS
CONSISTENCY MODELS
3. Causal Consistency
P1
P2
P3
P4
X := A
Causal order A == X
X := B
B == X
A == X
Not permitted due to causal relationship
A == X B == X
DISTRIBUTED SYSTEMS 32 CONSISTENCY MODELS
CONSISTENCY MODELS
Another Example
This sequence is allowed with a causally-consistent DS, but not with sequentially or strictly consistent DS.
Can be implemented with vector clocks.
DISTRIBUTED SYSTEMS 33 CONSISTENCY MODELS
CONSISTENCY MODELS
3. Causal Consistency Consider this stream of posts:
Oh no! My cat just jumped out the window.
[a few minutes later] Whew, the catnip plant broke her fall. [reply from a friend] I love when that happens to cats!
It looks a little weird. if what shows up on someone else’s screen is:
Oh no! My cat just jumped out the window.
[reply from a friend] I love when that happens to cats!
There are even better examples, widely used, when talking about access control:
[Removes boss from friends list]
[Posts]: “My boss is the worst, I need a new job!”
If these two actions occur in the wrong order, then the post will not have been hidden from my boss as intended…
DISTRIBUTED SYSTEMS 34 CONSISTENCY MODELS
CONSISTENCY MODELS
3. Causal Consistency Advantages?
better performance because greater concurrency is possible than Sequential Consistency (why?)
can do some things, e.g. determine an event order, not possible under any weaker model
Disadvantages?
difficult to write a correct program under this model
implementation is complex: create graph of causal relationships between events
there are some design patterns (“writes after related reads”) that help to establish causal relationships, but they’re cumbersome/unintuitive/human-error-prone (“footguns”)
DISTRIBUTED SYSTEMS 35 CONSISTENCY MODELS
CONSISTENCY MODELS
4. Processor Consistency
All writes (to x) from one process must be seen in the same order as they were issued by all other processes.
No consistent ordering on writes issued by different processors
Weaker than Causal Consistency
DISTRIBUTED SYSTEMS 36 CONSISTENCY MODELS
CONSISTENCY MODELS
4. Processor Consistency
• Advantages? • Really cheap
• Disadvantages
• Not very useful except to solve some very specific problems
• Extremely difficult to write correct code under this model
• Many things impossible under this model
DISTRIBUTED SYSTEMS 37 CONSISTENCY MODELS
CONSISTENCY MODELS
4b. Cache Consistency
When multiple processors access their local copies of the same data item, ensure the values they read are consistent (same).
Similar to Processor Consistency, but weaker An ordering is enforced on writes, but not reads.
DISTRIBUTED SYSTEMS 38 CONSISTENCY MODELS
CONSISTENCY MODELS
5. Release Consistency
Accesses to shared state are explicitly controlled/ synchronised in the way that all pending acquires (lock) must be done before a release (unlock) is done:
1. Perform “Acquire” action on a synchronisation variable
2. Read/write shared state
3. Perform “Release” action on the same synchronisation variable
Looks a little like mutexes
DISTRIBUTED SYSTEMS 39 CONSISTENCY MODELS
CONSISTENCY MODELS
5. Release Consistency
Provides equivalent execution to Sequentially Consistent, iff:
The acquire/release protocol is strictly followed by clients for all shared accesses
All reads and writes by a processor are propagated before the Release is performed
Acquire/release operations are Processor Consistent
Acquire/Release are abstract events, not reads/writes
Used to synchronise the system, as they are observed in Processor-Consistent order
Operate similarly to barriers, provide happens-before relationships between reads/writes
DISTRIBUTED SYSTEMS 40 CONSISTENCY MODELS
CONSISTENCY MODELS
5. Release Consistency Advantages?
Cheap – synchronisation is explicit and occurs (mostly) only where desired
As strong as Sequential Consistency, therefore relatively “easy” to write correct code
Familiar programming model: critical sections / mutexes. Looks a bit like transactions.
Disadvantages?
Huge footgun: the programmer MUST do the acquire/release operations correctly or the execution may not be consistent in the way expected. Extremely difficult to test for this class of Heisenbug.
DISTRIBUTED SYSTEMS 41 CONSISTENCY MODELS
CONSISTENCY MODELS
5a. Eager Release
Propagate updates to all other processors on completion of updates at Release
Shared data is sent on update, during Release Ensure consistency each time of update
Widely used for hardware-coherent multiprocessors, Large communication volume
DISTRIBUTED SYSTEMS 42 CONSISTENCY MODELS
CONSISTENCY MODELS
5b. Lazy Release
Postpone updates propagation till another processor has successfully performed an Acquire
Shared data is only sent when required, during Acquire Reduced communication volume
Possibly increased latency Widely used in DSM
DISTRIBUTED SYSTEMS 43 CONSISTENCY MODELS
CONSISTENCY MODELS
Eager Release vs Lazy Release
DISTRIBUTED SYSTEMS 44 CONSISTENCY MODELS
CONSISTENCY MODELS
m. Client consistency
Consider a distributed database to which you have access through your notebook. Assume your notebook acts as a front end to the database.
At location A you access the database doing reads and updates.
At location B you continue your work, but unless you access the same
server as the one at location A, you may detect inconsistencies:
Your updates at A may not have yet been propagated to B
You may be reading newer entries than the ones available at A
Your updates at B may eventually conflict with those at A
The only thing you really want is that the entries you updated and/or read at A, are in B the way you left them in A. In that case, the database will appear to be consistent to you.
DISTRIBUTED SYSTEMS 45 CONSISTENCY MODELS
CONSISTENCY MODELS
m. Client Consistency A client-centric model
Maintain an internally-consistent views of the storage for individual clients
Even if different clients see different orderings
Client consistency guarantees monotonic reads
monotonic writes
read your writes
Write following reads
DISTRIBUTED SYSTEMS 46 CONSISTENCY MODELS
CONSISTENCY MODELS
n. Eventual Consistency
DISTRIBUTED SYSTEMS 47 CONSISTENCY MODELS
CONSISTENCY MODELS
n. Eventual Consistency
Permits the existence of “stale” reads
The storage system guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value.
Implemented in
DNS (Domain Name System). Updates to a name are distributed according to a configured pattern and in combination with time- controlled caches; eventually, all clients will see the update.
Amazon Dynamo – key/value store that is behind Amazon services
DISTRIBUTED SYSTEMS 48 CONSISTENCY MODELS
CONSISTENCY MODELS
n. Eventual Consistency
The inconsistency window is the maximum amount of time that can elapse between the time that the update is made, and the time that the update is guaranteed to be visible to all clients.
If no failures occur, the inconsistency window is determined by
communication delays, the load on the system, and the number of replicas involved in the replication scheme.
If reading from asynchronous replica, inconsistency window = length of log shipment (replaying the update log for each server on the other servers)
DISTRIBUTED SYSTEMS 49 CONSISTENCY MODELS
CONSISTENCY MODELS
Use case: Social Media
A data infrastructure at a social network where users post new status updates that are sent to their followers’ timelines, represented by separate lists—one per user
The database of timelines is stored across multiple physical servers [why?]
In the event of a partition between two servers, however, you cannot deliver each update to all timelines
Should you tell the user that s/he cannot post an update, or should you let users post but wait until the partition heals before providing a response?
Both of these strategies choose consistency over availability, at the cost of user experience.
DISTRIBUTED SYSTEMS 50 CONSISTENCY MODELS
CONSISTENCY MODELS
Use case: Social Media
Or, you propagate the update to the reachable set of followers’ timelines, return to the user, and delay delivering the update to the other followers until the partition heals
No guarantee that all users see the same set of updates at every point in time (and admit the possibility of timeline reordering as partitions heal),
But you gain high availability and (arguably) a better user experience
Because updates are eventually delivered, all users eventually see the same timeline with all of the updates that users posted
DISTRIBUTED SYSTEMS 51 CONSISTENCY MODELS
CONSISTENCY MODELS
n. Eventual Consistency
Replicas must exchange information about which writes they have seen
Can use an asynchronous broadcast, when a replica receives a write to a data item:
it immediately responds to the user
in the background, sends the write to all other replicas, which in
turn update their locally stored data items
In the event of concurrent writes to a given data item, replicas deterministically choose a “winning” value
DISTRIBUTED SYSTEMS 52 CONSISTENCY MODELS
CONSISTENCY MODELS
n. Eventual Consistency
Advantages?
more efficient: less communication, less synchronisation
stale values are available; hopefully better than nothing? can make progress while partitioned; “AP but not C”
Disadvantages?
Stale reads must be acceptable to the application
Even weaker than Causal Consistency, suitable only for some special cases
DISTRIBUTED SYSTEMS 53 CONSISTENCY MODELS
CONSISTENCY MODELS
There’s more
• There are other types of consistency models
• Entry (related to release Consistency)
• FIFO or Pipelined Random-Access Memory (PRAM)
• General
• Quiescent
• But we wont get into them in this course
DISTRIBUTED SYSTEMS 54 CONSISTENCY MODELS
CONSISTENCY MODELS
Summary
Consistency models in two perspectives
Source: https://web2.qatar.cmu.edu/~msakr/15440-f12/lectures.html
writes
DISTRIBUTED SYSTEMS 55 CONSISTENCY MODELS
CONSISTENCY MODELS
Summary
Whether inconsistencies are acceptable depends on the client application
Be aware what consistency is provided by the storage vs what your application needs
Is the application even possible?
Are the inconsistencies important to the end use case?
Choose your tradeoff deliberately
DISTRIBUTED SYSTEMS 56 CONSISTENCY MODELS