程序代写代做代考 database go algorithm concurrency graph Concurrency Control

Concurrency Control
Ed Knorr, Slides from Fall 2020, CPSC 404
Based on:
Chapter 17 of Ramakrishnan & Gehrke (textbook), but building on the content of Chapter 16,
with some notes from George Tsiknis.
1

Learning Goals
 Explain why a serializable schedule is often a desirable schedule.
 Given a set of transactions, determine whether a given schedule is
serial, serializable, conflict serializable, and/or view serializable.
 Differentiate among these terms about concurrency and scheduling transactions: serial, serializable, conflict serializable, conflict equivalent, view serializable, and view equivalent. Identify which of these terms implies other terms in this list.
 Explain the purposes and principles of lock management.
 Construct a waits-for dependency graph to detect deadlock in a
schedule.
 Apply the wait-die and wound-wait protocols to a schedule to prevent deadlock.
2

Learning Goals (cont.)
 Argue for or against: Deadlock prevention is preferable to deadlock detection (and rolling back a contending transaction) in a heavily-used database system.
 Explain how phantom records could occur in a system that follows strict two-phase locking.
 Explain how index and predicate locking can help avoid phantom records.
 Show how a tree-locking algorithm can achieve good concurrency.
3

Schedules (and Some Review)
 Scheduler: determines the timing of read/write requests among transactions
– Either permits the reads and writes immediately, or delays them
 Schedule = sequence of time-ordered events involving actions like read/write, lock/unlock
 Serial schedule = all the actions of one transaction precede all the actions of another transaction
– e.g., (T1 → T3 → T2 → T4) or just (T1, T3, T2, T4)
4

Schedules (cont.)
 Every serial schedule guarantees consistency!
– We’re interested in schedules that produce the same
result as if the transactions were executed serially.
 A schedule is serializable if its effect on the DB state is the same as that of some (any) serial schedule (i.e., it preserves DB consistency).
– Serializable does not mean it is a serial schedule.
5

Serializability and Conflicts
 A conflict occurs when two transactions operate on the same DB object (e.g., row, page, table), and at least one of those operations is a write operation.
 We want to make sure that the ordering of conflicting operations is correct, in order to maintain consistency of the results.
 Most schedulers use a slightly stronger test for serializability than necessary, called conflict serializability (see the next slide).
 2PL ensures serializability.
6

Conflict Serializable Schedules
 Two schedules are conflict equivalent if: – They involve the same actions of the same
transactions.
– Every pair of conflicting actions is ordered the same
way.
 Schedule S is conflict serializable if S is conflict equivalent to some (any) serial schedule.
 A conflict-serializable schedule can be converted into a serial schedule by swaps of non-conflicting adjacent actions …
7

Converting a CS Schedule to Serial
T1: R(D) R(A) W(A) R(B) W(B) W(D) T2: R(A) W(A)
R(B) W(B)
Using a series of non-conflicting swaps of actions by the scheduler, prove that the above conflict serializable schedule can be turned into a serial schedule:
T1: R(D) R(A) W(A) T2:
R(B) W(B) W(D) R(A) W(A)
R(B) W(B) R(B) W(B) R(B) W(B)
T1: R(D) R(A) W(A) T2:
R(B) W(B) W(D) R(A) W(A)
T1: R(D) R(A) W(A) T2:
R(B) W(B) W(D) R(A) W(A)
T1: R(D) R(A) W(A) T2:
R(B) W(B)
R(A) W(A)
W(D)
R(B) W(B)
T1: R(D) R(A) W(A) T2:
R(B) W(B) W(D) R(A)
R(B) W(B) R(B) W(B)
T1: R(D) R(A) W(A) T2:
R(B)
W(B) W(D)
W(A) R(A) W(A)
8

A schedule that is conflict serializable:
 A schedule that is serializable, but not conflict serializable:
S1
T1 T2
S2
T1 T2 T3
R(A) W(A)
R(D) W(D)
R(B) W(B)
R(B) W(B) R(C) W(C)
W(D)
W(D)
Examples
 Every conflict serializable schedule is serializable. The reverse is not true.
9

Precedence Graph for Concurrent Transactions
 A precedence graph is sometimes called a dependency graph or a serializability graph:
– There is one node per transaction.
– We create an edge from Ti to Tj if an action of Ti precedes and conflicts with Tj
 e.g., Tj reads/writes an object last written by Ti  e.g., Tj writes an object last read by Ti
 Theorem: A schedule is conflict serializable if and only if its dependency graph is acyclic.
10

T1
A B
R(A), W(A), R(B), W(B)
T2 Precedence graph
Example
 Here is a schedule that is not conflict serializable:
T1: R(A), W(A), T2:
R(B), W(B)
 The cycle in the graph reveals the problem: the output of T1 depends on T2, and vice-versa.
11

R(C) W(C) …

– The schedule is not conflict serializable.
Schedule
Precedence Graph
T1 T2 T3
R(B) W(B)
T3
Another Example
R(A) T1 W(A)
T2
W(A) R(A)
W(A)
W(C)
 The cycle in the graph reveals the problem. The output of T1 depends on T3, and T3’s depends on T1’s output
12

T1: T2: T3:
R(A)
W(A)
T1: R(A),W(A)
– – –
If Ti reads initial value of A in S1, then Ti must read the same value of A in S2.
View Serializability
 Schedules S1 and S2 are view equivalent if:
If Ti reads value of A written by Tj in S1, then Ti also reads value of A written by Tj in S2.
If Ti writes final value of A in S1, then Ti also writes final value of A in S2.
W(A)
T2: W(A) W(A) T3:
W(A)
13

View Serializability (cont.)
 A schedule is view serializable if it is view equivalent to a serial schedule.
 If a schedule is conflict serializable, it is also view serializable.
– The reverse is not always true.
 It is very hard to test for view serializability; therefore, such a test is not used in practice.
 There is no locking algorithm that can enforce view serializability.
14

Strict Two-Phase Locking Protocol (Strict 2PL)
 Strict 2PL features:
– Each transaction must obtain a shared (S) or exclusive
T1 T2
(X) lock on the object before reading it, and an X lock
R(A) W(A)
on the object before writing it.
– All locks held by a transaction are released when the
R(A) W(A) Commit
transaction completes (commits or aborts).
 Strict 2PL allows only conflict serializable and recoverable schedules—and avoids cascading aborts.
R(B) W(B) Commit
– The precedence graph is acyclic.
– If T1 gets a lock for a (conflicting) item A before T2, T1 precedes T2 in the graph
– Transactions are ordered by the order they commit or abort.
 Example: The schedule on the right is conflict serializable, but is not allowed by strict 2PL.
– T1 gets an X lock on object A.
– T2 cannot get the X lock on A before T1 ends.
15

Two-Phase Locking (2PL)
 Strict 2PL is too strict. We can relax last the condition.  Non-strict 2PL:
– Each transaction must obtain an S (or an X) lock on the object before reading it, and an X lock on the object before writing it.
– A transaction can release locks at any time but it cannot request additional locks once it releases any lock.
 A transaction has two phases:
1. Growing: acquires locks
2. Shrinking: releases locks
 2PL produces conflict serializable schedules.
– The precedence graph is acyclic.
– Transactions are ordered by the order they enter their shrinking phase.
16

Two-Phase Locking (2PL) (cont.)
Unlike non-strict 2PL, strict 2PL schedules: – Are recoverable
e.g., T2 commits only after all transactions whose changes T2 reads (e.g., updates from T1 and T3) commit.
– Avoid cascading aborts
A cascading abort occurs if T2 reads a value written by T1, and then T1 aborts (rather than commits).
17

Lock Management
 Lock and unlock requests are handled by the lock manager, which typically maintains two tables:
1. Lock Table: a hash table with one entry per data object
2. Transaction Table: a list of locks held by each transaction
 What’s in a lock table entry?
– Number of transactions currently holding a lock
– Type of lock held (shared or exclusive)
– Pointer to a queue of lock requests
 Locking and unlocking have to be atomic operations.
 Lock upgrade: a transaction that holds a shared lock can be upgraded to hold an exclusive lock.
– We can expand 2PL to include upgrades or downgrades during the growing phase.
18

 2PL can produce deadlocks.
R(B) W(B) R(A) …
 In the schedule on the right:
– T1 and T2 are deadlocked.
– T1hasanXlockonAandwaitsforanXlockonB.
– T2hasanXlockonBandwaitsforanXlockonA.
– Neither can release locks before they finish.
R(B) …
 Two ways of dealing with deadlocks:
– Deadlock detection and recovery
– Deadlock prevention
Deadlocks
 When a transaction T waits for a lock, we say that T is blocked.
T1 T2
 A deadlock (a permanent block) occurs when 2 or more transactions are blocked, waiting for each other to release the locks.
R(A) W(A)
19

Deadlock Detection
 We need to create a waits-for graph:
– Nodes are transactions.
– There is an edge from Ti to Tj if Ti is waiting for Tj to release a lock.
 The lock manager:
– Adds edges when it queues a lock request
– Removes edges when it grants a lock request
 Periodically, we check for cycles in the waits-for graph.
 A deadlock is present iff the waits-for graph has a cycle.
 To resolve the deadlock, a transaction in the cycle is chosen and aborted. Choices could be based on:
– Number of locks held
– Work done or remaining work
– Times aborted, so far
– Number of log records written (e.g., DB2)
 Alternative deadlock detection and handling method using timeouts: abort a transaction if it is waiting too long for locks.
20

W(B) S(B)
Deadlock Detection Example
This example shows the change in the waits-for graph when T3 requests X(A):
T1 T2 T3 T4
S(A) R(A)
T1 T2 T4 T3
X(B)
R(C) X(C)
S(C)
X(B) X(A)
21

Deadlock Prevention
 Assign priorities based on timestamps.
– If Ti is older than Tj, Ti has a higher priority than Tj.
 Assume Ti wants a lock for an object that Tj holds. Two policies are possible:
1. Wait-Die: If Ti has higher priority, Ti waits for Tj; otherwise, Ti aborts.
2. Wound-Wait: If Ti has higher priority, Tj aborts; otherwise, Ti waits.
 Both policies avoid deadlock:
– In wait-die, lower priority transactions never wait; they must restart.
– In wound-wait, higher priority transactions never wait; they cause an existing transaction to restart.
 If a transaction re-starts, we use its original timestamp upon re-starting.
 Conservative 2PL: A transaction must acquire all the locks it needs
(or may need) before it starts:
– Prevents deadlock, but reduces performance; so, it’s not used in practice
22

Performance and Locking
 Throughput = # of transactions processed per unit time
T h r o u g h p u t
thrashing
 As more transactions are blocked, throughput decreases.
 There are 3 ways to increase throughput:
– Make locks apply to smaller objects.
– Reduce the time a transaction holds a lock.
# of transactions
– Reduce the # of items that are most frequently accessed (hot spots).
23

Dynamic Databases
 If we relax the assumption that the DB is a fixed collection of objects, even Strict 2PL will not assure serializability. For example, suppose:
– T1 locks all pages containing sailor records with rating = 1, and finds the oldest sailor among them (say, age = 71).
– Next, T2 inserts a new sailor with rating = 1 and age = 96.
– T2 then deletes the oldest sailor with rating = 2 (maybe having
age = 80), and commits.
– T1 now locks all pages containing sailor records with rating = 2,
and finds the oldest sailor among them with, say, age = 63.
 Thus, new phantom records are introduced, and there is no consistent DB state where T1 is “correct”:
– If the order is T1-T2, then T1 should result in: 71 and 80. – If the order is T2-T1, then T1 should result in: 96 and 63.
24

clause condition)
The Problem
 T1 implicitly assumes that it has locked the set of all sailor records with rating = 1.
– The assumption only holds if no sailor records are added while T1 is executing.
– We need some mechanism to enforce this assumption. We can use:
 index locking (i.e., lock the index page for rating = 1), or  predicate locking (i.e., allow locking based on a WHERE-
 This example shows that conflict serializability
guarantees serializability only if the set of objects is fixed!
– Note that we can take locks at the table level instead, but then we have low concurrency.
25

Index Locking
Index
 If there is a dense index (which we’ve been assuming in this course) on the rating field, T1 should lock the index page containing the data entries with rating = 1.
– If there are no records with rating = 1, T1 must lock the index page where such a data entry would be, if it existed!
 If there is no suitable index, T1 must lock all pages (i.e., lock the file/table to prevent new pages from being added, to ensure that no new records with rating = 1 are added).
r=1
Data
26

Predicate Locking
 Here we grant a lock on all records that satisfy some logical predicate, e.g., age > 2*salary
 Index locking is a special case of predicate locking for which an index supports efficient implementation of the predicate lock.
– What is the predicate in the sailor example?
 In general, predicate locking has a lot of locking overhead.
27

Locking in B+ Trees
 How can we efficiently lock a particular leaf node?
 One solution: Ignore the tree structure, and just lock pages while traversing the tree, following 2PL rules.
– But, this has terrible performance.
– The root node (and other high-level nodes in the index) can become bottlenecks because every tree access begins at the root!
28

Two Useful Observations
 Higher levels of the tree only direct the searches for leaf pages.
 For insertions, a node on the path from the root to the modified leaf must be locked (in X mode), but only if a split can propagate up to it from the modified leaf. (A similar point holds with respect to deletions.)
 We can exploit these observations to design efficient locking protocols that guarantee serializability even though they violate 2PL.
29

A Simple Tree-Locking Algorithm
 Search: Start at root and go down; repeatedly, S lock child then unlock parent
 Insert/Delete: Start at root and go down, obtaining X locks as needed; once child is locked, check if it is safe:
– If child is safe, release all locks on ancestors
 Safe node = A node such that changes will not propagate up beyond this node
– For insertions, this means the node is not full.
– For deletions, this means the node is not half-empty.
 Side note: Some improvements on this simple scheme exist (e.g., IX locks), but the above is a good intro.
30

Example
20
ROOT
A
Do:
1) Search 38* 2) Delete 38* 3) Insert 45* 4) Insert 25*
35
B
23 F3844C
GHIDE
20* 22* 23* 24* 35* 36* 38* 41* 44*
31

Transaction Support in SQL-92
 Each transaction has an access mode (read only, or R/W), error diagnostics size, isolation level (see table below), etc.
– You can specify these with an SQL statement or as a parameter when binding a plan.
Isolation Level
Dirty Repeatable Read? Read?
Phantom Problem
Read Uncommitted Read Committed Repeatable Reads Serializable
Maybe Maybe No Maybe No Yes No Y es
Maybe Maybe Maybe No
32

 Conflicts are bound to occur.
 A serializable schedule is good.
Summary
 Concurrency is highly desirable in a DBMS.
 Conflicts between transactions can be detected in the precedence (dependency) graph.
 There are two lock-based concurrency control schemes: Strict 2PL and 2PL.
 DB2 and SQL Server use Strict 2PL or variants.
 The lock manager keeps track of the locks issued.
 Deadlocks can either be prevented or detected.
33

– Wound-wait
Summary (cont.)
 Two deadlock prevention protocols are: – Wait-die
 Naïve locking strategies may have the phantom problem.
 Index locking is common, and affects performance significantly .
– Needed when accessing records via index
– Needed for locking logical sets of records (index locking/predicate locking)
 For tree-structured indexes:
– A straightforward use of 2PL is very inefficient.
– Several variations in tree-locking algorithms illustrate the potential for improvement.
34