CIS 455/555: Internet and Web Systems
Paxos December 2, 2020
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
1
Plan for today
n Transactions
n ACID properties
n Concurrency control: 2PL
n Distributed transactions: 2PC
n Consensus n Paxos
NEXT
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
2
Illustration of 2PC
Coordinator log begin
send “prepare”
send “yes”
log commit
send “commit”
send “ack” log end
Subordinate 1
send “prepare”
log prepare send “yes”
send “commit” log commit
send “ack”
Subordinate 2
log prepare
log commit
© 2020 A. Haeberlen, Z. Ives, V. Liu
3
What if a node reboots in the middle?
n Suppose we find a commit or abort log for transaction T, but not an end record?
n Needtoredo/undoTdependingonthelog
n IfthisnodeisthecoordinatorforT,keepsendingcommit/abort
msgs to subordinates until acks have been received
n Suppose we find a prepare log record for transaction T, but not commit/abort?
n ThisnodeisasubordinateforT
n RepeatedlycontactthecoordinatortofindstatusofT,thenwrite
commit/abort log record; redo/undo T; and write end log record n Suppose we don’t find even a prepare record for T?
n UnilaterallyabortandundoT
n This node may be coordinator! If so, subordinates may send
messages and need to also be undone © 2020 A. Haeberlen, Z. Ives, V. Liu
Plan for today
n Transactions n Consensus
n Paxos
© 2020 A. Haeberlen, Z. Ives, V. Liu
University of Pennsylvania
5
NEXT
Consensus
n Consider a general replicated service
n Client sends its request to each of the machines
n The machines coordinate and each return a result n Many applications need something like this
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
6
A formal definition
n Each node may propose a value
n Nodes agree on which they want to use
n Formally, a solution must satisfy the following:
n Termination: Every correct node eventually decides
n Integrity: Every correct node decides at most one value
n Validity: If a node decides v, then some node must have proposed v
n Agreement: If some correct node decides v, then every other correct node also decides v
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
7
Two Generals Problem
Washington Grant
n Two generals on opposite sides of an invading army n If they both attack simultaneously, they can win
n If they attack at different times, they all die
n They haven’t had time to coordinate at all
n Can only send messengers through the enemy camp
n Messengers can be captured and killed © 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
8
FLP: Consensus is “impossible”!
In an asynchronous system where nodes can fail: n Even if the communication channel is reliable
n Even if no nodes actually fail
n What now?
n Change the problem statement: Randomized algorithms,
approximate agreement, k-set agreement, …
n Change the assumptions: Assume bounds on message delays, or that we have an unreliable oracle (failure detector) that tells us when a node crashed
n Paxos: Guarantees safety, but not liveness
M. Fischer, N. Lynch, M. Paterson, “Impossibility of distributed consensus with one faulty process”,
n
Journal of the ACM, April 1985, 32(2):374-382. ACM Knuth Prize 2007! © 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
9
Plan for today
n Transactions
n Consensus
n Paxos
© 2020 A. Haeberlen, Z. Ives, V. Liu
University of Pennsylvania
10
NEXT
An unusual paper…
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
11
Παξοί (Island of Paxos)
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
12
The Paxos algorithm
n Scenario:
…
37. Add 7 to X 38. Read Y 39. (nop)
40. Z:=X+Y …
n There is a replicated append-only log (“ledgers” in the paper)
n The instances of this log are kept consistent by the protocol
n We can use this log to record the sequence of operations → All nodes will process them in the same order
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
13
The Paxos algorithm
n How Paxos approximately works:
n Three different roles: Proposers, acceptors, learners
n Proposers can propose a new entry to append n Eachproposalhasauniqueversionnumber
n Acceptors accept proposed values
n Ifmultipleproposalsaremadeconcurrently,somenodesmayeven
accept more than one proposal
n But,ifanodehasacceptedaproposalwithversionnumbern,itwillnot accept any further proposals with version numbers smaller than n
n A proposal accepted by a majority of the nodes is decided as the final value, and is learned by the learners
n Note: Each proposal is for a specific entry
n Multiple instances of the protocol can be active concurrently n Example: Propose XYZ as entry 12 and ABC as entry 13
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
14
Model
n Network:
n May lose messages (messenger leaves forever)
n May duplicate messages
n Asynchronous (messages can be delayed arbitrarily) n But: No message corruption
n Nodes:
n Can fail by crashing (legislator leaves the Chamber)
n Lessthan(N-1)/2failures,whereN=#acceptors
n No central clock (hourglass timers)
n But: Have some persistent memory (ledgers)
n But: Strictly follow the protocol – no lying, data corruption…
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
15
Phase 1a: Prepare
n Suppose a node A wants to propose a value X for entry n:
n Node A chooses a new version number v
n Node A sends PREPARE(n, v) to all other nodes
n Intuition: PREPARE(n, v) means
n May I make a proposal for entry n with version number v?
n If so, can you suggest a value I should use?
n Note: Fairness is not a goal; even though A is
suggesting X, it is happy with other values too n Why? (Remember what the values are!)
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
16
Phase 1b: Promise
n If a node B receives PREPARE(n, v) from A: n If B has already acknowledged a PREPARE(n, v’) with v’>v,
then it does nothing
n If B has previously accepted any proposals for entry n, it responds with PROMISE(n, v, v’, X’), where v’ is the highest version number of any proposal it has accepted for entry n, and X’ is the corresponding value
n Otherwise, B responds with PROMISE(n, v, -, -)
n Intuition: A PROMISE means
n I promise to never accept any proposal with version # < v
n You should choose value X' (if v' and X' are given), or: any value is fine with me (if v' and X' are not given)
© 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
17
Phase 2: Accept
n If A receives PROMISEs from a “quorum” of the other nodes, it issues ACCEPT(n, v, X'')
n X'' is the value from the PROMISE with the highest version number, or the original X if none of the PROMISEs had a value
n If B receives ACCEPT(n, v, X'')
n If B has already responded to a PREPARE(n, v') with v'>v,
then B does nothing
n Otherwise B accepts the proposal and sends ACCEPT(n, v, X”) to all the learners
n If a learner L receives ACCEPT(n, v, X”) from a “quorum” of the acceptors, it decides X”
n L then sends DECIDE(n, X”) to all the other learners
n If another learner receives DECIDE(n, X”), it decides X” © 2020 A. Haeberlen, Z. Ives, V. Liu University of Pennsylvania
18