Concurrency Control
Ed Knorr, Slides from Fall 2020, CPSC 404
Based on:
Chapter 17 of Ramakrishnan & Gehrke (textbook), but building on the content of Chapter 16,
with some notes from George Tsiknis.
1
Learning Goals
Explain why a serializable schedule is often a desirable schedule.
Given a set of transactions, determine whether a given schedule is
serial, serializable, conflict serializable, and/or view serializable.
Differentiate among these terms about concurrency and scheduling transactions: serial, serializable, conflict serializable, conflict equivalent, view serializable, and view equivalent. Identify which of these terms implies other terms in this list.
Explain the purposes and principles of lock management.
Construct a waits-for dependency graph to detect deadlock in a
schedule.
Apply the wait-die and wound-wait protocols to a schedule to prevent deadlock.
2
Learning Goals (cont.)
Argue for or against: Deadlock prevention is preferable to deadlock detection (and rolling back a contending transaction) in a heavily-used database system.
Explain how phantom records could occur in a system that follows strict two-phase locking.
Explain how index and predicate locking can help avoid phantom records.
Show how a tree-locking algorithm can achieve good concurrency.
3
Schedules (and Some Review)
Scheduler: determines the timing of read/write requests among transactions
– Either permits the reads and writes immediately, or delays them
Schedule = sequence of time-ordered events involving actions like read/write, lock/unlock
Serial schedule = all the actions of one transaction precede all the actions of another transaction
– e.g., (T1 → T3 → T2 → T4) or just (T1, T3, T2, T4)
4
Schedules (cont.)
Every serial schedule guarantees consistency!
– We’re interested in schedules that produce the same
result as if the transactions were executed serially.
A schedule is serializable if its effect on the DB state is the same as that of some (any) serial schedule (i.e., it preserves DB consistency).
– Serializable does not mean it is a serial schedule.
5
Serializability and Conflicts
A conflict occurs when two transactions operate on the same DB object (e.g., row, page, table), and at least one of those operations is a write operation.
We want to make sure that the ordering of conflicting operations is correct, in order to maintain consistency of the results.
Most schedulers use a slightly stronger test for serializability than necessary, called conflict serializability (see the next slide).
2PL ensures serializability.
6
Conflict Serializable Schedules
Two schedules are conflict equivalent if: – They involve the same actions of the same
transactions.
– Every pair of conflicting actions is ordered the same
way.
Schedule S is conflict serializable if S is conflict equivalent to some (any) serial schedule.
A conflict-serializable schedule can be converted into a serial schedule by swaps of non-conflicting adjacent actions …
7
Converting a CS Schedule to Serial
T1: R(D) R(A) W(A) R(B) W(B) W(D) T2: R(A) W(A)
R(B) W(B)
Using a series of non-conflicting swaps of actions by the scheduler, prove that the above conflict serializable schedule can be turned into a serial schedule:
T1: R(D) R(A) W(A) T2:
R(B) W(B) W(D) R(A) W(A)
R(B) W(B) R(B) W(B) R(B) W(B)
T1: R(D) R(A) W(A) T2:
R(B) W(B) W(D) R(A) W(A)
T1: R(D) R(A) W(A) T2:
R(B) W(B) W(D) R(A) W(A)
T1: R(D) R(A) W(A) T2:
R(B) W(B)
R(A) W(A)
W(D)
R(B) W(B)
T1: R(D) R(A) W(A) T2:
R(B) W(B) W(D) R(A)
R(B) W(B) R(B) W(B)
T1: R(D) R(A) W(A) T2:
R(B)
W(B) W(D)
W(A) R(A) W(A)
8
A schedule that is conflict serializable:
A schedule that is serializable, but not conflict serializable:
S1
T1 T2
S2
T1 T2 T3
R(A) W(A)
R(D) W(D)
R(B) W(B)
R(B) W(B) R(C) W(C)
W(D)
W(D)
Examples
Every conflict serializable schedule is serializable. The reverse is not true.
9
Precedence Graph for Concurrent Transactions
A precedence graph is sometimes called a dependency graph or a serializability graph:
– There is one node per transaction.
– We create an edge from Ti to Tj if an action of Ti precedes and conflicts with Tj
e.g., Tj reads/writes an object last written by Ti e.g., Tj writes an object last read by Ti
Theorem: A schedule is conflict serializable if and only if its dependency graph is acyclic.
10
T1
A B
R(A), W(A), R(B), W(B)
T2 Precedence graph
Example
Here is a schedule that is not conflict serializable:
T1: R(A), W(A), T2:
R(B), W(B)
The cycle in the graph reveals the problem: the output of T1 depends on T2, and vice-versa.
11
R(C) W(C) …
…
– The schedule is not conflict serializable.
Schedule
Precedence Graph
T1 T2 T3
R(B) W(B)
T3
Another Example
R(A) T1 W(A)
T2
W(A) R(A)
W(A)
W(C)
The cycle in the graph reveals the problem. The output of T1 depends on T3, and T3’s depends on T1’s output
12
T1: T2: T3:
R(A)
W(A)
T1: R(A),W(A)
– – –
If Ti reads initial value of A in S1, then Ti must read the same value of A in S2.
View Serializability
Schedules S1 and S2 are view equivalent if:
If Ti reads value of A written by Tj in S1, then Ti also reads value of A written by Tj in S2.
If Ti writes final value of A in S1, then Ti also writes final value of A in S2.
W(A)
T2: W(A) W(A) T3:
W(A)
13
View Serializability (cont.)
A schedule is view serializable if it is view equivalent to a serial schedule.
If a schedule is conflict serializable, it is also view serializable.
– The reverse is not always true.
It is very hard to test for view serializability; therefore, such a test is not used in practice.
There is no locking algorithm that can enforce view serializability.
14
Strict Two-Phase Locking Protocol (Strict 2PL)
Strict 2PL features:
– Each transaction must obtain a shared (S) or exclusive
T1 T2
(X) lock on the object before reading it, and an X lock
R(A) W(A)
on the object before writing it.
– All locks held by a transaction are released when the
R(A) W(A) Commit
transaction completes (commits or aborts).
Strict 2PL allows only conflict serializable and recoverable schedules—and avoids cascading aborts.
R(B) W(B) Commit
– The precedence graph is acyclic.
– If T1 gets a lock for a (conflicting) item A before T2, T1 precedes T2 in the graph
– Transactions are ordered by the order they commit or abort.
Example: The schedule on the right is conflict serializable, but is not allowed by strict 2PL.
– T1 gets an X lock on object A.
– T2 cannot get the X lock on A before T1 ends.
15
Two-Phase Locking (2PL)
Strict 2PL is too strict. We can relax last the condition. Non-strict 2PL:
– Each transaction must obtain an S (or an X) lock on the object before reading it, and an X lock on the object before writing it.
– A transaction can release locks at any time but it cannot request additional locks once it releases any lock.
A transaction has two phases:
1. Growing: acquires locks
2. Shrinking: releases locks
2PL produces conflict serializable schedules.
– The precedence graph is acyclic.
– Transactions are ordered by the order they enter their shrinking phase.
16
Two-Phase Locking (2PL) (cont.)
Unlike non-strict 2PL, strict 2PL schedules: – Are recoverable
e.g., T2 commits only after all transactions whose changes T2 reads (e.g., updates from T1 and T3) commit.
– Avoid cascading aborts
A cascading abort occurs if T2 reads a value written by T1, and then T1 aborts (rather than commits).
17
Lock Management
Lock and unlock requests are handled by the lock manager, which typically maintains two tables:
1. Lock Table: a hash table with one entry per data object
2. Transaction Table: a list of locks held by each transaction
What’s in a lock table entry?
– Number of transactions currently holding a lock
– Type of lock held (shared or exclusive)
– Pointer to a queue of lock requests
Locking and unlocking have to be atomic operations.
Lock upgrade: a transaction that holds a shared lock can be upgraded to hold an exclusive lock.
– We can expand 2PL to include upgrades or downgrades during the growing phase.
18
2PL can produce deadlocks.
R(B) W(B) R(A) …
In the schedule on the right:
– T1 and T2 are deadlocked.
– T1hasanXlockonAandwaitsforanXlockonB.
– T2hasanXlockonBandwaitsforanXlockonA.
– Neither can release locks before they finish.
R(B) …
Two ways of dealing with deadlocks:
– Deadlock detection and recovery
– Deadlock prevention
Deadlocks
When a transaction T waits for a lock, we say that T is blocked.
T1 T2
A deadlock (a permanent block) occurs when 2 or more transactions are blocked, waiting for each other to release the locks.
R(A) W(A)
19
Deadlock Detection
We need to create a waits-for graph:
– Nodes are transactions.
– There is an edge from Ti to Tj if Ti is waiting for Tj to release a lock.
The lock manager:
– Adds edges when it queues a lock request
– Removes edges when it grants a lock request
Periodically, we check for cycles in the waits-for graph.
A deadlock is present iff the waits-for graph has a cycle.
To resolve the deadlock, a transaction in the cycle is chosen and aborted. Choices could be based on:
– Number of locks held
– Work done or remaining work
– Times aborted, so far
– Number of log records written (e.g., DB2)
Alternative deadlock detection and handling method using timeouts: abort a transaction if it is waiting too long for locks.
20
W(B) S(B)
Deadlock Detection Example
This example shows the change in the waits-for graph when T3 requests X(A):
T1 T2 T3 T4
S(A) R(A)
T1 T2 T4 T3
X(B)
R(C) X(C)
S(C)
X(B) X(A)
21
Deadlock Prevention
Assign priorities based on timestamps.
– If Ti is older than Tj, Ti has a higher priority than Tj.
Assume Ti wants a lock for an object that Tj holds. Two policies are possible:
1. Wait-Die: If Ti has higher priority, Ti waits for Tj; otherwise, Ti aborts.
2. Wound-Wait: If Ti has higher priority, Tj aborts; otherwise, Ti waits.
Both policies avoid deadlock:
– In wait-die, lower priority transactions never wait; they must restart.
– In wound-wait, higher priority transactions never wait; they cause an existing transaction to restart.
If a transaction re-starts, we use its original timestamp upon re-starting.
Conservative 2PL: A transaction must acquire all the locks it needs
(or may need) before it starts:
– Prevents deadlock, but reduces performance; so, it’s not used in practice
22
Performance and Locking
Throughput = # of transactions processed per unit time
T h r o u g h p u t
thrashing
As more transactions are blocked, throughput decreases.
There are 3 ways to increase throughput:
– Make locks apply to smaller objects.
– Reduce the time a transaction holds a lock.
# of transactions
– Reduce the # of items that are most frequently accessed (hot spots).
23
Dynamic Databases
If we relax the assumption that the DB is a fixed collection of objects, even Strict 2PL will not assure serializability. For example, suppose:
– T1 locks all pages containing sailor records with rating = 1, and finds the oldest sailor among them (say, age = 71).
– Next, T2 inserts a new sailor with rating = 1 and age = 96.
– T2 then deletes the oldest sailor with rating = 2 (maybe having
age = 80), and commits.
– T1 now locks all pages containing sailor records with rating = 2,
and finds the oldest sailor among them with, say, age = 63.
Thus, new phantom records are introduced, and there is no consistent DB state where T1 is “correct”:
– If the order is T1-T2, then T1 should result in: 71 and 80. – If the order is T2-T1, then T1 should result in: 96 and 63.
24
clause condition)
The Problem
T1 implicitly assumes that it has locked the set of all sailor records with rating = 1.
– The assumption only holds if no sailor records are added while T1 is executing.
– We need some mechanism to enforce this assumption. We can use:
index locking (i.e., lock the index page for rating = 1), or predicate locking (i.e., allow locking based on a WHERE-
This example shows that conflict serializability
guarantees serializability only if the set of objects is fixed!
– Note that we can take locks at the table level instead, but then we have low concurrency.
25
Index Locking
Index
If there is a dense index (which we’ve been assuming in this course) on the rating field, T1 should lock the index page containing the data entries with rating = 1.
– If there are no records with rating = 1, T1 must lock the index page where such a data entry would be, if it existed!
If there is no suitable index, T1 must lock all pages (i.e., lock the file/table to prevent new pages from being added, to ensure that no new records with rating = 1 are added).
r=1
Data
26
Predicate Locking
Here we grant a lock on all records that satisfy some logical predicate, e.g., age > 2*salary
Index locking is a special case of predicate locking for which an index supports efficient implementation of the predicate lock.
– What is the predicate in the sailor example?
In general, predicate locking has a lot of locking overhead.
27
Locking in B+ Trees
How can we efficiently lock a particular leaf node?
One solution: Ignore the tree structure, and just lock pages while traversing the tree, following 2PL rules.
– But, this has terrible performance.
– The root node (and other high-level nodes in the index) can become bottlenecks because every tree access begins at the root!
28
Two Useful Observations
Higher levels of the tree only direct the searches for leaf pages.
For insertions, a node on the path from the root to the modified leaf must be locked (in X mode), but only if a split can propagate up to it from the modified leaf. (A similar point holds with respect to deletions.)
We can exploit these observations to design efficient locking protocols that guarantee serializability even though they violate 2PL.
29
A Simple Tree-Locking Algorithm
Search: Start at root and go down; repeatedly, S lock child then unlock parent
Insert/Delete: Start at root and go down, obtaining X locks as needed; once child is locked, check if it is safe:
– If child is safe, release all locks on ancestors
Safe node = A node such that changes will not propagate up beyond this node
– For insertions, this means the node is not full.
– For deletions, this means the node is not half-empty.
Side note: Some improvements on this simple scheme exist (e.g., IX locks), but the above is a good intro.
30
Example
20
ROOT
A
Do:
1) Search 38* 2) Delete 38* 3) Insert 45* 4) Insert 25*
35
B
23 F3844C
GHIDE
20* 22* 23* 24* 35* 36* 38* 41* 44*
31
Transaction Support in SQL-92
Each transaction has an access mode (read only, or R/W), error diagnostics size, isolation level (see table below), etc.
– You can specify these with an SQL statement or as a parameter when binding a plan.
Isolation Level
Dirty Repeatable Read? Read?
Phantom Problem
Read Uncommitted Read Committed Repeatable Reads Serializable
Maybe Maybe No Maybe No Yes No Y es
Maybe Maybe Maybe No
32
Conflicts are bound to occur.
A serializable schedule is good.
Summary
Concurrency is highly desirable in a DBMS.
Conflicts between transactions can be detected in the precedence (dependency) graph.
There are two lock-based concurrency control schemes: Strict 2PL and 2PL.
DB2 and SQL Server use Strict 2PL or variants.
The lock manager keeps track of the locks issued.
Deadlocks can either be prevented or detected.
33
– Wound-wait
Summary (cont.)
Two deadlock prevention protocols are: – Wait-die
Naïve locking strategies may have the phantom problem.
Index locking is common, and affects performance significantly .
– Needed when accessing records via index
– Needed for locking logical sets of records (index locking/predicate locking)
For tree-structured indexes:
– A straightforward use of 2PL is very inefficient.
– Several variations in tree-locking algorithms illustrate the potential for improvement.
34