程序代写 Operating Systems

Operating Systems
4160 – Distributed and Parallel Computing

Goal: Replicated State Machine

• Replicated (high availability) Data Store
– Serve as the high availability foundation of many systems on top
• Single-leader, all reads have to go through the leaderà performance not greatàcan’t read any replica straight
– If read a replica straight, may result in stale read
– Many kv-stores often twist it to improve performance
» E.g. Zookeeper allows reads any replica (may result in stale read)
– Being used in applications that HA + consistency >> performance
• E.g., Directory service (maintaining the VM-to-IP list) – Not throughput demanding, pretty static
– Fail-stop (not Byzantine), lost/delayed message

Goal: Replicated State Machine
• RSM vs Broadcasting
• RSM: agree on the final state
• Broadcasting: agree on the ordering of the input
• Under multi-core, same input != same output
– If play the input *serially*àagreed on the final state
– Ifplaytheinputinparallelàdifferentfinalstates(threadracefromOSscheduling)
• RSM is a (popular) instance of Consensus
• Raft is (multi-)Paxos with a clearer objective: RSM
• Multi-Paxos:
• no need to elect new leaders for every new command to apply to the RSM
• Just re-use the leader if the leader is not dead (skip Phase 1)
• https://www.alibabacloud.com/blog/paxos-raft-epaxos-how-has-distributed-consensus- technology-evolved_597127

– Every replica
1. Gets the same input sequence // Replicated Input Log
2. Applies an input command using single-thread
3. Gets the same output state
• Step 1 is the keyèconsensus
– Buffer the log entry (input command) until all replicas agree on that
» Erred log entry could be discarded/corrected by subsequent operations that piggyback the synchronization info
– This step is ‘atomic broadcast’

• Break the problem into: – Leader-election
– Log-replication

Server states (roles)
• At any given time, each server is either:
• Handle all client interactions
• Operate until it fails
– Follower:
• completely passive as a replica only
• If client X directly contacts it, tells X who the leader L is
– Candidate:
• discover leader dies, starts election
• Normal case: 1 leader, n-1 followers

Time divided into “terms”
• Each term has 2 parts:
– Begins with an election
– Normal operation (with the elected leader)
• Some terms have no leader (e.g., split vote)
• Each replica maintains its own currentTerm value
– Useful for knowing itself is out-of-date or not
– E.g., server 1 is in term 1945, but it receives an RPC from server 9 saying that its term 2022
Split vote, no majority

Follower f
• Everybody starts up as a follower
• Follower f expects to receive RPCs from leader or candidates
– Leader would:
• f.appendEntries(command x, …)
• f.appendEntries(null, …) //leader’s heartbeat in case no cmd recently • Return
– Candidate would: //could be more than 1 candidate • f.requestVote(…)
• Return
– If no RPC from leader after electionTimeout • Assume leader is dead
• f becomes a candidate and triggers a new election

When f triggers an election
1. 2. 3. 4.
my.currentTerm++
my.state = candidate
voteformyself
For all other servers s: s.requestVote(my id);
If I get the majority votes
a) my.state = leader
b) For all other servers: s.appendEntries(my id, null) //send heartbeat
Else If I get appendEntries from somebody L else my.state = follower
my.myleader = L Else split vote
restart from #1.
= announcement of new leader

Log Replication
• Log Entry=
• Log stored on persistent storage
• A log entry is committed if it known be to on **a majority of servers**
– commitIdx++
– once committed, eventually executed by every replica

• Clientssendcommandtotheleader
• Leaderappendsthecommandtoitsownlog
• Leaderbroadcaststhecommandtoitsfollowerf – f.appendEntries(command)
• Onceacommandiscommitted
– L.apply(command)
– Reply client: done!
Very efficient because the leader won’t wait for all followers, but a majority of them. No straggler problem.
– In subsequent f.appendEntries(cmd, commitIdx)
• followers refer commitIdx to apply the committed commands
• Leaderre-postscommittedentriestofollowersiftheyare slow, lost message, join, reborn, etc.

Raft is designed to uphold the following at all times
• Same log entry and same term?
èMust be the same command
èAll preceding entries must be the same as well
• Once an entry is committed, all preceding entries are also committed

But how to uphold that in the protocol? • via AppendEntries piggyback info
• When L sends appendEntries( command x, term&index of the preceding entry)
– E.g., appendEntries(command 3jmp, term=2&index=4)
• Follower f must check if its own term&index
– Reply fail if they don’t match

But how to uphold that in the protocol?
• If appendEntries fail, leader will retry by sending some earlier entries to overwrite the inconsistent uncommitted ones until the two (leader & follower) can match up again
• In the example, it nextIndex–
attempts to appendEntries(command 2mov, 1&3)

Reconciling the replicated logs – leader change case • Exampleinconsistency:
– Old leader dies with some but not all entries replicated (i.e., uncommitted yet)
• Newleaderisalwaysthesingle source of truth
• Itwillmakeallfollowersto converge to its log progressively
• Intheexample,allentries except those committed ones are dispensable
• Say,S2becomesthe new leader, eventually everyone will get the same log as S2

Eligibility of becoming a candidate
• IfS2dies,canS4/S5bealeader?
– No! It does not have all the committed entries!
• So,ifacandidateCwithaless complete log contacts me for a vote, I will say no to C (because I am better than C) and I promote myself be the candidate instead:
– My latest term > yours? Reject
– Our terms are the same but my log is longer than yours? Reject

Leader election
• Who can be the leader of term 3?
• S5: can’t
– S1-S4 are longer and have new term than it;
• they won’t vote for S5
• Only the ones with idx=4 and term=2 can win the election

Leader election bug
• If S1 dies, could S5 become a new leader for term 5?
– Yes, because once S1 is die, the yellow entry back to uncommitted!
– And S5 will get the votes and overwrite “committed” 2 on S1 to S3!
• So,thedefinitionof‘commit’shallbefine-tuned

So, the true commit rule
• Shall go beyond the majority
• So, an entry is committed if:
1) It is in the majority, and
2) At least one entry in the leader term must also in the majority of the server
Now, once yellow entry is committed
even S1 is dead
S5 can’t be the leader for term 5
Yellow entry is safe and won’t shock the client

Reconciling the log inconsistency – in detail • Now, leader for term 8 faces this situation:

Repairing the log inconsistency – in detail
• Leader keeps nextIndex for each follower
èthe next entry I should send to f
– If I appendEntry to f and get rejected • f.nextIndex–

The split-brain problem
• An old leader might not be dead, but just be temporarily network partitioned and come back
• Now we have 2 “leaders”, where the old one also thinks it is the leader and send appendEntries…
• Term update rule (for everyone):
– if (I send any RPC but is rejected because your term is newer than mine or I receive any RPC whose sender’s term is newer than mine)
• I update my term and become a follower

Client Protocol • What if
Commit your cmd x
Reply client ok
Lost reply msg
Resend cmd x
• Client makes a uniqueID (say, IP address + Seq ID) to uniquely identify each cmd
• Server can then ignore duplicated command
• If client doesn’t crashàexactly-one semantic

Configuration changes
• E.g., change the replication factor from 3 to 5
– Everybody switches to the new config at different speed
• Conflicting majority problem:
– Servers 1&2 are not ready, their majority is 2 out of 3 – Servers 3-5 are ready, their majority is 3 out of 5
– During transition, may result in two leaders
• (e.g., S2 elected with vote S1&S2; S4 elected from votes S3,S4,S5)

Solution: 2-phase
• Add special config change “commands”
– Specialcommand:
• Apply (take action) immediately, w/o waiting for committed
• Phase 1:
– Onviewchange,leaderappliesandbroadcastscommandCold+new
• Phase 2:
– Once the Cold+new is committed, leader broadcasts command Cnew – When Cnew is committed, transition finishes

In joint-consensus mode
• Leader L must be the first that applies cmd Cold+new – Then appendEntries(Cold+new ) to all nodes in Cold U Cnew
• Nodes that have applied Cold+new are in joint-consensus mode
– Any node in Cold or Cnew can be the leader
– Agreement (for elections and entry commitment) requires separate majorities from both Cold and Cnew

2-leader problem is solved
• (say) leader S1 dies during the transition period
– Any new leader must get enough votes from Cold [S1-S3] and Cnew [S2-S5]
Majority is Majority is based on Cold based on Cnew

Difficulties of implementing Raft • Assume5nodes:L,A,B,C,D
• Leader has added “put(key1, value-apple)” to its log
• AppendEntries to the followers
• Nodes A and B
• Replied
• Nodes C and D
• Noreply……
• Case2: •…

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts