程序代写 VLDB 2021

Distributed Transactions

Distributed Databases
Distributed Single-Node Database

Copyright By PowCoder代写 加微信 powcoder

(Availability, or by nature)
(Throughput)

Transactions and ACID in (single-node) DB
l Withdraw$100fromAccountA l Deposit $100 to Account B
l Atomicity: All or Nothing
l Consistency: C in single-node DB != C in distributed world
l End Result = preserve the invariants (e.g., constraint: account >=0)
l Isolation: Multiple Concurrent Transactions T1 and T2
l End Result = Serializable = T1 T2 or T2 T1 but not something else l Concurrency Control (e.g., locking)
l Durability: Result of a committed transaction persist no matter what
l e.g., earthquake, machine failure
l Logging and Recovery (not our focus)

Distributed Transactions and ACID

Explore Cross-database MSDTC for distributed transactions in SQL Server Always On Availability Groups

Concurrency control (OS vs DB) on single node
l Concurrent update to a single-value
n Locking (Semaphore; Pthread lock) n Lock-free (Use of CAS)
l A transaction is a bigger unit
n T involves R/W on multiple values with ACID

ACID on single-node DB
l Pessimistic (~ to OS Lock-based) l 2 Phase Locking (2PL)
n As long as a txn wants to access (read/write) an item, it must acquire a lock of that item first n A txn that touches multiple items
l once unlock //then step into phase 2
l you can’t acquire any lock again
n 2PL guarantees you a serializable schedule = the order of the final lock acquired n Good: little coordination between transactions
l Each transaction locally respects 2PL protocol and access the lock table n Bad: deadlock (will abort one), lock overhead
l Fine-grained locks: reader-writer
l Optimistic (~ to OS Lock-free) l Instead of waiting for the lock
Ref: geeks4geeks
n let them interleave whatever, but abort on observing conflicts and restart

ACID on single-node DB
l Optimistic
l Timestamp-ordering based
n Cross check between timestamps of txns on data items during read and write n The equivalent serializable order = timestamp order of the transaction
l Serializable order is: T < U l Whenever you (T) read X l Last-time-X-written-by-others (e.g., U) >
Your start timestamp (i.e., T start) l ”Read too late”èAbort and Restart T

ACID on single-node DB
l Optimistic
l Timestamp + Multi-versioning (MVTO)
n If keeping versions of X
l Then T can read the previous version of X

ACID on single-node DB
l Optimistic
l Validation based (a.k.a. OCC*)
n Space overhead spent on active transactions only l Read Phase 1:
n The writes are written into thread-local in-memory write set (not to DB yet) l When a transaction T is ending
n Validation Phase 2:
l Validate its read-set and write-set of T overlap with other active transactions’ r/w-sets?
n If no overlap / no serializability conflict / no problem after reordering l Apply the write-set //write Phase 3
l Abort T and Restart T
*very confusing naming convention; But we follow Complete book convention here

Pessimistic vs Optimistic • Locking
• Setting a lock on/off = writing shared memory 0/1 = overhead
• If low-contention
• (e.g., T1 writes X but T2 reads Y)
• Locking overhead is for nothing
• àhurt throughput
• Optimistic
• If low-contention
• àlittle overheadàbetter throughput
• If high-contention
• àAbort&Restart&Abort&Restart..à waste of work

Notion of Correctness under Concurrency
l Linearizability (Shared Memory, OS):
l Multiple threads/processes invoke operations on a single object (e.g., linked list)
n But it must behave like operations happen one-by-one (serial) following real-time order
l Serializability (Shared Memory, Single-Node DB): l Multiple threads (transactions) on multiple items
n But it must behave like any serial order
l So:T1T2T3orT2T1T3isalsofine
l Linearizability (Replicated State Machine, Distributed):
l Multiple replicas on a “single” logical object (e.g., a log, a kv store)
n But behaves like a single-threaded machine
http://www.bailis.org/blog/linearizability-versus-serializability/

ACID on distributed DB
l Problem: any site may fail or disconnect while a commit for transaction T is in progress.
l Atomicity says
n T does not “partly commit”, i.e., commit at some site but abort at the others.
n Individual sites cannot unilaterally choose to abort T without the agreement of the other
n IfTholdslocksatasiteS,
l then S cannot release them until it knows if T is committed or aborted.
n If T has pending updates to data at a site S,
l then S cannot expose the data until T commits/aborts.

ACID on distributed DB l 2PC
l Needs a designated coordinator (so if that is dead, hanged); vs. Raft: any one can be a leader
End Transaction // Commit //Trigger 2PC now

Explore Cross-database MSDTC for distributed transactions in SQL Server Always On Availability Groups

l TM = tran manager (coordinator)
l RM = remote manager (Participant)
Might block forever if the coordinator (a.k.a. transaction manager) fails or disconnects
https://www2.cs.duke.edu/courses/fall07/cps212/consensus.pdf

9 cases of fail-stop during 2PC
(1) before the coordinator sends prepare requests
Lu et al. VLDB 2021
(2) after some participant nodes receive prepared requests
(3) after all participant nodes receive prepared requests
(4) before the coordinator receives all votes from participant nodes
(5) after the coordinator writes the commit record
(6) before some participant nodes receive commit requests
(7) before the coordinator receives any acknowledgement
(8) after the coordinator receives some acknowledgements, and
(9) after the coordinator receives all acknowledgements.

Availability angle: 2PC is not fault-tolerant
l All processes agree on {Commit, Abort} l To counter FLP
l Weneedsafetymorethanlivenessinthis$case l 2-Phase Commit
l Mightblockforeverifboththecoordinatorandashardfailduringthecommitphase l 3-Phase Commit
l Syncsetting
l Can Paxos solve atomic commit? Yes
l PaxosCommit[GrayandLamport;2xTuringAward;2004]
n So, it’s also fault-tolerance and decentralized
n Less efficient than 2PC (which is fair because it is more fault-tolerance) and more efficient than
n In Distributed Transaction, processes do have stronger voting right (to say no / abort)
l But remember in Paxos? A process is more “whatever choice” and just want to get the consensus done asap

Scalability angle: 2PC is a bottleneck of distributed DB
l Locks must be held until 2PC steps into phase 2
l Significantly hurt concurrency è hurt throughput/latency/scalability l Motivated NoSQL movement

Consistent vs Availability vs A vS
Scalability
NoSQL (<2007) NewSQL (>2008)
Cassandra,DynamoDB, BigTable, …
Weaker C for A S
Þ No multiple items txn Þ No need to consider
serializability; so no
Þ No txn -> No cross-
shard; so no 2PC Þ NoSQL was not their
desire outcome (but they sold/thought it is), it is just a limitation that they didn’t how to solve
H-store (2008)
C&S over A
Demonstrated that
– txn + SQL +
sharding + serializable can scale
– By *avoiding* cross- shard txn; so no
Spanner (2013)
C&A & ok S
Said have SQL demand
Their workloads can’t avoid cross-shard txnè need 2PC
Txn + SQL + sharding + 2PC + serializable + C over A + linearizable(?) RSM
After all these years, isn’t that a distributed SQL database +
replication? !
TAPIR, Carousel, CALM, SLOG, … (2013+)
C&A & better S
Make 2PC+Paxos integrate better for higher efficiency; see 2PC+RSM papers
Or do away coordination; see SLOG and CALM, etc.
NoSQL is a “giant step backward” (2008)

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com