Advanced Cloud
Time and Ordering
“Time” in Distributed Systems
Copyright By PowCoder代写 加微信 powcoder
l Replicas:
l Who is having the “latest” copy?
l Transaction:
l Which transaction appeared first?
n Your account $80 n 2 transactions
l mom withdraws $60
l girlfriend withdraws $70
Distributed Time
l We need time for ordering only l Not exactly need the time value
l Challenges:
l Delay in network
l Server delay (receive vs deliver)
n Ex: client sends out msg x to server A and then msg y to server B
l A receives x first; and then B receives y
l But A is busy to deliver x to the application on top
l E.g., OS scheduling
l and thus B delivers y to its application first
What is a “correct” order?
l Total order: all servers process messages in same order l The same order among servers might not respect the real-time l A.k.a. sequential consistency
l Linearizability: total order + respect the real-time
l Which one is harder to implement (achieve)? Total but not linearizable
Linearizable, so also total
Ack. Brown’s CS138
What is a “correct” order?
l Total order: all servers process messages in same order
l Linearizability: total order +
l The order doesn’t conflict with the real
time order
l FIFO order: different servers can get different orders as long as the order from the same client is preserved
l 2 clients: yellow and greenààà
l Which order is weaker?
l Which one is easier to implement/achieve?
Ack. Brown’s CS138
Total (!= linearizable)
FIFO: order of each client is preserved
What is a “correct” order?
l Total ordering is required for banking
l All branches see the same input order and reach the same next state
l FIFO ordering has limited applications
l It can’t tell the relationship between processes
l Causal ordering tells the causal relationship between processes
l Those without causal relationships can in different orders on different processes, e.g.,
n Time 1: Mother tweets A: “My God, I lost my son David”. n Time 2: Mother tweets B: “Oh I found him”.
n Time 3: Father “reply to” on tweet B: “Thanks God”.
l FIFO: people on Twitter see:
n Mother: “My God, I lost my son David”. n Father: “Thanks God”.
n Mother: “Oh I found him”.
Applications of ordering
l Broadcast
l Replicated State Machine l Logging (for recovery)
l Transaction
The key point is: whether the application agrees that the (potentially different) order(s) the shards/replicas see is still deemed as “correct”
The gap between what you want and what you can achieve
l Linearizable in Distributed systems?
l FLP – asynchronous setting (worst case)
l CAP – practical case
l Then how come Google Spanner can achieve linearizable transaction???
n Real-time order: it builds its own timing system that gives timestamp intervals n CAP: it gives its own “Internet” such that P becomes rare
l Linearizable in Parallel systems?
l OK because of synchronous setting
l In fact, Linearizable is the only acceptable correctness
Implementing total-order broadcast
l Need a global sequencer (or elect a leader first)
l Before sending messages to all servers, must ask a sequencer for a timestamp
l Not yet deal with reliability yet
1. What if server 1 dies?
n Then, the problem adds the requirement of being an “atomic total-order broadcast” problem
2. What if the leader dies?
n Then, it becomes leader election
l 1+2 = Replication State Machine!
Implementing FIFO-order broadcast
l Each client maintains its local counter
l Server maintains a queue for each client to order its messages
Classic use cases of clock
l Distributed mutex (next topic)
l For queuing up requests from distributed nodes l A happen-before ordering is enough
l System recovery
l Logging – ensure the order of happened events
l Need to replay the log to restore the system to the same state l Need to respect causal ordering
Happen-Before Ordering
l Want to capture the following: l If same process
l Itiscleara≺g
l If different processes P and Q
n Event a: P sends Q a message
n Event b: Q receives the message na≺b
Happen-Before Ordering
l Transitive Property:
n If(a≺b)and(b≺c)èa≺c
l So, we have l e≺d
l Goal: Say, after the system finishes, if I ask P, Q, and R (not ask you),
l “was b happened before d?”
l What kind of log info the processes shall the system keep in order to answer that question?
How to log a happen-before ordering among all events?
l Lamport’s Logical Clock
l Clockisa.k.a.“timestamp”
l Eachprocesshasitsownlocalcounter
l 1)Anylocalthingstakenplace,LC++
l 2)Whensendingoutamessage,piggybackitsLCinthemessage
l 3)Whenreceivingamessage,setlocalLC=1+max(LC,message’sLC)
n i.e., make sure the recipient’s time = 1 + time-when-sender-sends-its-message
l To make sure that the message is “arrived” *after* the sender’s sent it
l Bylookingattheclock,
n We see that logical clock yields a partial order among events
n E.g., a, h, f are all ‘1’
n If somehow you need to ensure P0 P1 P2 shares the exact total-order
l Just don’t break the ties randomly, but break the tie of a, h, f, say, by adding the process id to the timestamp and sort by pid
Limitations of Logical Clock
l So, if x ≺ y ==> LC(x) < LC(y) l But: LC(x) < LC(y) !=> x ≺ y
l Logical clock can answer: if LC(x) < LC(y) è
l (i)xmayhappen-beforey,(ii)orconcurrentwithy,but(iii)xsurelydidnothappenaftery
l For many applications we don’t need to differentiate (i) and (ii), (iii) is enough. E.g., (d,4), (g,4), and (e,5)
n For mutex, both dàgàe and gàdàe are fine (as long as no starvation)
l However, for some applications, we do want to know more precisely which one, d or g,
l How to design a clock that can tell?
How to log a casual ordering among events?
l Vector Clock
l If a ≺ b <==> VC(a) < VC(b)
l VC is not a value, but a vector of length N, if there are N processes l Process i: VC(Ti1, ..., Tii, Tij , ..., Tin)
l VC(a) < VC(b) [based on dominance (prevail)]
l VC implementation:
l 1) Any local things taken place at process i: VC[i]++
l 2) When process i sends out a message, piggyback its VC in the message l 3) When process j receives a message from process i,
n set j’s VC from (Tj1, ..., Tji, Tjj , ..., Tjn) to
(max(Ti1, Tj1), ..., max(Tij, Tjj++), ..., max(Tin, Tjn))
Vector Clock
l a ≺ b <==> VC(a) < VC(b)
l Can tell whether it is concurrent or happen-before l Consider events (0,0,1) and (2,1,0)
l Means they are concurrent ||
l If the events are more clearly stated as “read” and “write” l Then, write-(x) ≺ read-(x) ≺ write-(x)
l Can precisely capture causal relationship
l Exercise: work out the vector clock for the below and tell us which one, d or g, cause e?
a(1,0,0) b(2,0,1) c(3,1,1) d(4,1,1) e(5,1,1) f(0,1,0) g(3,2,1)
Summary of ordering
l Linearizability: total order also follows real-time order l Parallelsystems:useoflocksinconcurrentdatastructures
l Distributedsystems?GoogleSpanner
l Total order: all server processes receive the same sequential order of events l Requireaglobalsequencerorelectaleaderfirst
l FIFO ordering: all server process events honoring each client’s local order l Wayeasiertoimplement:eachclient’slocalclockismonotonic
l Onlyusefulifclientstouchdifferentthings,elseitdoesn’tworkbecause
n it can’t capture the order between clients on the same item l Causal ordering
l Achievablebyvectorclock
l Butvectorclock’smessagesizeincreaseswith#ofprocesses
l Happen-before ordering l Exapplication:mutex
Summary of clock
l Designed to help establish the desired level of ordering in a distributed system
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com