程序代写代做代考 database SQL hbase distributed system concurrency PowerPoint Presentation

PowerPoint Presentation

Distributed Transactions with
Two-Phase Commit
R&G – Chapter 20

Distributed vs. Parallel?
Earlier we discussed Parallel DBMSs
Shared-memory
Shared-disk
Shared-nothing
Distributed is basically shared-nothing parallel
Perhaps with a slower network

What’s Special About
Distributed Computing?
Parallel computation
No shared memory/disk
Unreliable Networks
Delay, reordering, loss of packets
Unsynchronized clocks
Impossible to have perfect synchrony
Partial failure: can’t know what’s up, what’s down
“A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable”.
— Leslie Lamport, Turing 2013

Distributed Database Systems
DBMS an influential special case of distributed computing
The trickiest part of distributed computing is state, i.e. Data
Transactions provide an influential model for concurrency/parallelism
DBMSs worried about fault handling early on
Special-case because not all programs are written transactionally
And if not, database techniques may not apply
Many of today’s most complex distributed systems are databases
Cloud SQL databases like Spanner, Aurora, Azure SQL
NoSQL databases like DynamoDB, Cassandra, MongoDB, Couchbase…
We’ll focus on transactional concurrency control and recovery
You already know many lessons of distributed query processing

Distributed Locking

Distributed Concurrency Control
Consider a shared-nothing distributed DBMS
For today, assume partitioning but no replication of data
Each transaction arrives at some node:
The “coordinator” for the transaction

T1

6

Where is the Lock Table
Typical design: Locks partitioned with the data
Independent: each node manages “its own” lock table
Works for objects that fit on one node (pages, tuples)
For coarser-grained locks, assign a “home” node
Object being locked (table, DB) exists across nodes
“Reserves”
“Sailors”
“Boats”

7

Where is the Lock Table, Pt 2
Typical design: Locks partitioned with the data
Independent: each node manages “its own” lock table
Works for objects that fit on one node (pages, tuples)
For coarser-grained locks, assign a “home” node
Object being locked (table, DB) exists across nodes
These locks can be partitioned across nodes

“Sailors”
“Boats”
“Reserves”

8

Where is the Lock Table, Pt 3
Typical design: Locks partitioned with the data
Independent: each node manages “its own” lock table
Works for objects that fit on one node (pages, tuples)
For coarser-grained locks, assign a “home” node
Object being locked (table, DB) exists across nodes
These locks can be partitioned across nodes
Or centralized at a master node

“Sailors”
“Boats”
“Reserves”

9

Ignore global locks for a moment…
Every node does its own locking
Clean and efficient
“Global” issues remain:
Deadlock
Commit/Abort

10

Distributed Deadlock Detection

What Could Go Wrong? #1
Deadlock detection

12

What Could Go Wrong? #1 Part 2
Deadlock detection
Easy fix: periodically union at designated master

Distributed Commit: 2PC

What Could Go Wrong? #2
Failures/Delays: Nodes
Commit? Abort?
When the node comes back, how does it recover in a world that
moved forward?
×

15

What Could Go Wrong? #2, Part 2
Failures/Delays: Nodes
Failures/Delays: Messages
Non-deterministic reordering per channel, interleaving across channels
“Lost” (very delayed) messages

16

What Could Go Wrong? #2, Part 3
Failures/Delays: Nodes
Failures/Delays: Messages
Non-deterministic reordering per channel, interleaving across channels
“Lost” (very delayed) messages
How do all nodes agree on Commit vs. Abort?

17

Basic Idea: Distributed Voting
Vote for Commitment
How many votes does a commit need to win?
Any single node could observe a problem (e.g. deadlock, constraint violation)
Hence must be unanimous.
T1

C(T1)

Distributed voting? How?
How do we implement distributed voting?!
In the face of message/node failure/delay?
T1

C(T1)

2-Phase Commit
A.k.a. 2PC. (Not to be confused with 2PL!)
Like a wedding ceremony!
Phase 1: “do you take this man/woman…”
Coordinator tells participants to “prepare”
Participants respond with yes/no votes
Unanimity required for yes!
Phase 2: “I now pronounce you…”
Coordinator disseminates result of the vote
Need to do some logging for failure handling….

2-Phase Commit, Part 1
Phase 1:
Coordinator tells participants to “prepare”
Participants respond with yes/no votes
Unanimity required for commit!
Phase 2:
Coordinator disseminates result of the vote
Participants respond with Ack

Prepare(T1)
Prepare(T1)
Prepare(T1)
Prepare(T1)

Prepare(T1)

2-Phase Commit, Part 2
Phase 1:
Coordinator tells participants to “prepare”
Participants respond with yes/no votes
Unanimity required for commit!
Phase 2:
Coordinator disseminates result of the vote
Participants respond with Ack

Prepare(T1)
Prepare(T1)
Prepare(T1)
Prepare(T1)
Prepare(T1)

2-Phase Commit, Part 3
Phase 1:
Coordinator tells participants to “prepare”
Participants respond with yes/no votes
Unanimity required for commit!
Phase 2:
Coordinator disseminates result of the vote
Participants respond with Ack

Prepare(T1)
Yes T1b
Yes T1d
Yes T1a
Yes T1c

2-Phase Commit, Part 4
Phase 1:
Coordinator tells participants to “prepare”
Participants respond with yes/no votes
Unanimity required for commit!
Phase 2:
Coordinator disseminates result of the vote
Participants respond with Ack

Yes T1b
Yes T1d
Yes T1a
Yes T1c

2-Phase Commit, Part 5
Phase 1:
Coordinator tells participants to “prepare”
Participants respond with yes/no votes
Unanimity required for commit!
Phase 2:
Coordinator disseminates result of the vote
Participants respond with Ack

Commit(T1)
Commit(T1)
Commit(T1)
Commit(T1)
Commit(T1)

2-Phase Commit, Part 6
Phase 1:
Coordinator tells participants to “prepare”
Participants respond with yes/no votes
Unanimity required for commit!
Phase 2:
Coordinator disseminates result of the vote
Participants respond with Ack

Commit(T1)
Commit(T1)
Commit(T1)
Commit(T1)
Commit(T1)

2-Phase Commit, Part 7
Phase 1:
Coordinator tells participants to “prepare”
Participants respond with yes/no votes
Unanimity required for commit!
Phase 2:
Coordinator disseminates result of the vote
Participants respond with Ack

Commit(T1)
Ack(T1d)
Ack(T1c)
Ack(T1b)
Ack(T1a)

2-Phase Commit, Part 8
Phase 1:
Coordinator tells participants to “prepare”
Participants respond with yes/no votes
Unanimity required for commit!
Phase 2:
Coordinator disseminates result of the vote
Participants respond with Ack

Ack(T1d)
Ack(T1c)
Ack(T1b)
Ack(T1a)

One More Time, With Logging
Phase 1
Coordinator tells participants to “prepare”
Participants generate prepare/abort record
Participants flush prepare/abort record
Participants respond with yes/no votes
Coordinator generates commit record
Coordinator flushes commit record

Prepare(T1)

Log tail

Log tail

29

One More Time, With Logging, Part 2
Phase 1
Coordinator tells participants to “prepare”
Participants generate prepare/abort record
Participants flush prepare/abort record
Participants respond with yes/no votes
Coordinator generates commit record
Coordinator flushes commit record

Prepare(T1)

Log tail

Log tail

30

One More Time, With Logging, Part 3
Phase 1
Coordinator tells participants to “prepare”
Participants generate prepare/abort record
Participants flush prepare/abort record
Participants respond with yes/no votes
Coordinator generates commit record
Coordinator flushes commit record

Log tail

Prepare(T1)

Log tail
prepare(T1)

31

One More Time, With Logging, Part 4
Phase 1
Coordinator tells participants to “prepare”
Participants generate prepare/abort record
Participants flush prepare/abort record
Participants respond with yes/no votes
Coordinator generates commit record
Coordinator flushes commit record

Log tail

yes(T1)

Log tail

prepare(T1)

32

One More Time, With Logging, Part 5
Phase 1
Coordinator tells participants to “prepare”
Participants generate prepare/abort record
Participants flush prepare/abort record
Participants respond with yes/no votes
Coordinator generates commit record
Coordinator flushes commit record

Log tail

yes(T1)

Log tail

prepare(T1)

33

One More Time, With Logging, Part 6
Phase 1
Coordinator tells participants to “prepare”
Participants generate prepare/abort record
Participants flush prepare/abort record
Participants respond with yes/no votes
Coordinator generates commit record
Coordinator flushes commit record

Log tail

yes(T1)

Log tail

prepare(T1)

34

One More Time, With Logging, Part 7
Phase 1
Coordinator tells participants to “prepare”
Participants generate prepare/abort record
Participants flush prepare/abort record
Participants respond with yes/no votes
Coordinator generates commit record
Coordinator flushes commit record

Log tail

Log tail

prepare(T1)
commit(T1)

35

One More Time, With Logging, Part 8
Phase 1
Coordinator tells participants to “prepare”
Participants generate prepare/abort record
Participants flush prepare/abort record
Participants respond with yes/no votes
Coordinator generates commit record
Coordinator flushes commit record

prepare(T1)

Log tail

Log tail
commit(T1)

36

One More Time, With Logging, Part 9
Phase 1
Coordinator tells participants to “prepare”
Participants generate prepare/abort record
Participants flush prepare/abort record
Participants respond with yes/no votes
Coordinator generates commit record
Coordinator flushes commit record

prepare(T1)
commit(T1)

Log tail

Log tail

37

One More Time, With Logging, Part 10
Phase 2:
Coordinator broadcasts result of vote
Participants make commit/abort record
Participants flush commit/abort record
Participants respond with Ack
Coordinator generates end record
Coordinator flushes end record

prepare(T1)
commit(T1)

Commit(T1)

Log tail

Log tail

38

One More Time, With Logging, Part 11
Phase 2:
Coordinator broadcasts result of vote
Participants make commit/abort record
Participants flush commit/abort record
Participants respond with Ack
Coordinator generates end record
Coordinator flushes end record

prepare(T1)
commit(T1)

Commit(T1)

Log tail

Log tail

39

One More Time, With Logging, Part 12
Phase 2:
Coordinator broadcasts result of vote
Participants make commit/abort record
Participants flush commit/abort record
Participants respond with Ack
Coordinator generates end record
Coordinator flushes end record

og tail

Log tail

prepare(T1)
commit(T1)

Commit(T1)

40

One More Time, With Logging, Part 13
Phase 2:
Coordinator broadcasts result of vote
Participants generate commit/abort record
Participants flush commit/abort record
Participants respond with Ack
Coordinator generates end record
Coordinator flushes end record

prepare(T1)
commit(T1)

Commit(T1)

Log tail

Log tail

41

One More Time, With Logging, Part 14
Phase 2:
Coordinator broadcasts result of vote
Participants generate commit/abort record
Participants flush commit/abort record
Participants respond with Ack
Coordinator generates end record
Coordinator flushes end record

prepare(T1)
commit(T1)

Commit(T1)
Ack(T1a)

Log tail

Log tail

42

One More Time, With Logging, Part 15
Phase 2:
Coordinator broadcasts result of vote
Participants generate commit/abort record
Participants flush commit/abort record
Participants respond with Ack
Coordinator generates end record
Coordinator flushes end record

prepare(T1)
commit(T1)

Commit(T1)
Ack(T1a)

Log tail

Log tail

43

One More Time, With Logging, Part 16
Phase 2:
Coordinator broadcasts result of vote
Participants generate commit/abort record
Participants flush commit/abort record
Participants respond with Ack
Coordinator generates end record
Coordinator flushes end record

Log tail

Log tail

prepare(T1)
commit(T1)

Commit(T1)
end(T1)

44

One More Time, With Logging, Part 17
Phase 2:
Coordinator broadcasts result of vote
Participants generate commit/abort record
Participants flush commit/abort record
Participants respond with Ack
Coordinator generates end record
Coordinator flushes end record

prepare(T1)
commit(T1)

Commit(T1)
end(T1)

Log tail

Log tail

45

Time
2PC In a Nutshell
Coordinator
Log
Participant
Log
Prepare
Vote Yes/No
Commit/Abort
Ack on commit
prepare* or abort*
(with coord ID)
commit* or abort*
(commit includes all participant IDs)
commit* or abort*
end
(on commit)
NOTE
asterisk*: wait for log flush
before sending next msg

Recovery and 2PC

Failure Handling
Assume everybody recovers eventually
Big assumption!
Depends on WAL (and short downtimes)
Coordinator notices a Participant is down?
If participant hasn’t voted yet, coordinate aborts transaction
If waiting for a commit Ack, hand to “recovery process”
Participant notices Coordinator is down?
If it hasn’t yet logged prepare, then abort unilaterally
If it has logged prepare, hand to “recovery process”
Note
Thinking a node is “down” may be incorrect!

Integration with ARIES Recovery
On recovery
Assume there’s a “Recovery Process” at each node
It will be given tasks to do by the Analysis phase of ARIES
These tasks can run in the background (asynchronously)
Note: multiple roles on a single node
Coordinator for some xacts, Participant for others

Integration with ARIES: Analysis
Recall transaction table states
Running, Committing, Aborting
On seeing Prepare log record (participant)
Change state to committing
Tell recovery process to ask coordinator recovery process for status
When coordinator responds, recovery process handles commit/abort as usual
(Note: During REDO, Strict 2PL locks will be acquired)

Integration with ARIES: Analysis, cont
On seeing Commit/Abort log record (coordinator)
Change state to committing/aborting respectively
Tell recovery process to send commit/abort msgs to participants
Once all participants ack commit, recovery process writes End and forgets
If at end of analysis there’s no 2PC log records for xact X
Simply set to Aborting locally, and let ToUndo handle it.
Same for participant and coordinator
A.k.a. “Presumed Abort”
There is an optimization called “Presumed Commit”

How Does Recovery Process Work?
Coordinator recovery process gets inquiry from a “prepared” participant
If transaction table at coordinator says aborting/committing
send appropriate response and continue protocol on both sides
If transaction table at coordinator says nothing: send ABORT
Only happens if coordinator had also crashed before writing commit/abort
Inquirer does the abort on its end

Time
2PC In a Nutshell
Coordinator
Log
Participant
Log
Prepare
Vote Yes/No
Commit/Abort
Ack on commit
prepare* or abort*
(with coord ID)
commit* or abort*
(commit includes all participant IDs)
commit* or abort*
end
(on commit)
NOTE
asterisk*: wait for log flush
before sending next msg
Crash!
Crash!

Recovery: Think it through
What happens when coordinator recovers?
With “commit” and “end”?
With just “commit”?
With “abort”?
What happens when participant recovers:
With no prepare/commit/abort?
With “prepare” and “commit”?
With just “prepare?
With “abort”?
Commit iff coordinator logged a commit

Recovery: Think it through, cont
What happens when coordinator recovers?
With “commit” and “end”? Nothing
With just “commit”? Rerun Phase 2!
With “abort”? Nothing (Presumed Abort)
What happens when participant recovers:
With no prepare/commit/abort? Nothing (Presumed Abort)
With “prepare” & “commit”? Send Ack to coordinator.
With just “prepare”? Send inquiry to Coordinator
With “abort”? Nothing (Presumed Abort)
Commit iff coordinator logged a commit

2PC + 2PL
Ensure point-to-point messages are densely ordered
1,2,3,4,5…
Dense per (sender/receiver/XID)
Receiver can detect anything missing or out-of-order
Receiver buffers message k+1 until [1..k] received
Commit:
When a participant processes Commit request, it has all the locks it needs
Flush log records and drop locks atomically
Abort:
Its safe to abort autonomously, locally: no cascade.
Log appropriately to 2PC (presumed abort in our case)
Perform local Undo, drop locks atomically

Availability Concerns
What happens while a node is down?
Other nodes may be in limbo, holding locks
So certain data is unavailable
This may be bad…
Dead Participants? Respawned by coordinator
Recover from log
And if the old participant comes back from the dead, just ignore it and tell it to recycle itself
Dead Coordinator?
This is a problem!
3-Phase Commit was an early attempt to solve it
Paxos Commit provides a more comprehensive solution
Gray+Lamport paper! Out of scope for this class.

Summing Up
Distributed Databases
A central aspect of Distributed Systems
Partitioning provides Scale-Up
Can also partition lock tables and logs
But need to do some global coordination:
Deadlock detection: easy
Commit: trickier
Two-phase commit is a classic distributed consensus protocol
Logging/recovery aspects unique:
many distributed protocols gloss over
But 2PC is unavailable on any single failure
This is bad news for scale-up,
because odds of failure go up with #machines
Paxos Commit (Gray+Lamport) addresses that problem

T1

T2

T3

T1

T2

T3

T1

T2

T3

T1

T2

T3

T1

T2

T3

C(T1)

/docProps/thumbnail.jpeg