T ransactions, Recovery
and Concurrency (I)
3/30/20 1
Air-line Reservation
• 10 available seats vs 15 travel agents.
• How do you design a robust and fair reservation
system?
– Insufficient resources
– Fair policy to every body
– Robustness
3/30/20 2
Failures
Number of factors might cause failures in user requirements
processing.
1. System failure:
§ Disk failure – e.g. head crash, media fault.
§ System crash – unexpected failure requiring a reboot.
2. Program error – e.g. a divide by zero.
3. Exception conditions – e.g. no seats for your reservation.
4. Concurrency control – e.g. deadlock, expired locks.
3/30/20 3
To handle failures correctly and efficiently
Each database user must express his requirements as a
set of program units.
Each program unit is a transaction that either
– accesses the contents of the database, or
– changes the state of the database, from one
consistent state to another.
Example transaction: buy a ticket from Sydney to
N.Y. by JAL.
A transaction must be treated as an atomic unit.
3/30/20 4
§ Sydney à Tokyo à LA
à N.Y
§ It does not make sense
only partial trip has tickets
Transaction Processing
Three kinds of operations may be used in a transaction:
– Read.
– Write.
– Computation.
3/30/20 5
Buffer Management in a DBMS
3/30/20 6
Read
1. Compute the data block that contains the item to be read
2. Either
– find a buffer containing the block, or
– read from disk into a buffer
3. Copy the value from the buffer.
3/30/20 7
Write
1. Compute the disk block containing the item to be written,
2. Either
– find a buffer containing the block, or
– read from disk into a buffer,
3. Copy the new value into the buffer,
4. At some point (maybe later), write the buffer back to disk.
3/30/20 8
Processing States of a Transaction
• The typical processing states are illustrated in the figure below (E/N Fig
17.4):
• Partially committed point: At this point, check and enforce the correctness of the
concurrent execution.
• Committed state: Once a transaction enters the committed state, it has concluded its
execution successfully.
3/30/20 9
Desirable Properties of Transaction
Processing ACID
• Atomicity: A transaction is either performed in its entirety or not
performed at all.
• Consistency preservation: A correct execution of the transaction
must take the database from one consistent state to another.
• Isolation: A transaction should not make its updates visible to other
transactions until it is committed.
• Durability or permanency: Once a transaction changes the database
and the changes are committed, these changes must never be lost
because of subsequent failure.
3/30/20 10
Problems without Enforcing ACID
§ For a banking system,
§ If durability is not enforced, then a customer may
lose a deposit.
§ If consistency preservation is not enforced, then
the bank runs a high risk of bankrupt. E.g., run-
over upper-limit.
§ Below are the problems if atomicity and isolation are
not enforced in a concurrent execution of
transactions.
3/30/20 11
Lost Update Problem (Isolation is not enforced)
• Suppose we have these two transactions, T1 and T2:
3/30/20 12
T1 :
read(X)
X ← X + N
wr i te (X )
read(Y)
Y ← Y N
w r i t e ( Y )
T2 :
read(X)
X ← X + M
w r i t e ( X )
• Let us see what may happen if T1 and T2 are executed concurrently
in an uncontrolled way:
Suppose initially that X = 100; Y = 50;N = 5 and M = 8.
3/30/20 13
• At the end of T1 and T2, X should be 113, Y should be 45.
• The update X ← X + N has been lost.
3/30/20 14
Incorrect Summary Problem (Isolation Issue)
§ Here the sum calculated by T3 will be wrong by N.
3/30/20 15
The Temporary Update Problem
Several possibilities for what might happen next:
3/30/20 16
Recover from the disk
3/30/20 17
Database T1 T2
X = 105, Y = 50 X = 113
X = 113Case 1: DBMS undoes T1
Database T1 T2
X = 105, Y = 50 X = 113
X = 113Case 2: DBMS does nothing to T1
Database T1 T2
X = 105, Y = 50 X = 113
X= 105, Y = 50 X = 113
Case 3: DBMS undoes T1
X=113, Y=50
Write (X)
X= 113
X=100, Y=50
X=105, Y=50
X=113, Y=50
Write (X)
X= 113
X=100, Y=50
Write (X), X= 113
X= 100
§ Case 1&2, only
half of T1 has
been executed.
§ Case 3, T1 & T2
have been lost.
Recover from Failures
Ensure ACID
Log-based Recovery
– Undo logging
– Redo logging
– Undo/Redo logging
3/30/20 18
System Log
• System Log
– The system needs to record the states information
to recover failures correctly.
– The information is maintained in a log (also called
journal or audit trail).
– The system log is kept in hard disk but maintains
its current contents in main memory.
3/30/20 19
System Log
• Start transaction marker [start transaction, T]: Records that transaction T
has started execution.
• [read item, T, X]: Records that transaction T has read the value of database
item X.
• [write item, T, X, old value, new value]: Records that T has changed the
value of database item X from old value to new value.
• Commit transaction marker [commit, T]: Records that transaction T has
completed successfully, and arms that its effect can be committed (recorded
permanently) to the database.
• [abort, T]: Records that transaction T has been aborted.
3/30/20 20
System Log (Cont’d)
• In fact some other entries (rollback, undo, redo) are also required for
a recovery method.
• These entries allow the recovery manager to rollback an
unsuccessful transaction (undo any partial updates).
3/30/20 21
Recovery
• Let us see how the log might be used to recover from a system
crash.
• The diagram below shows transactions between the last system
backup and a crash.
3/30/20 22
Recovery (Cont’d)
3/30/20 23
• The database on disk will be in a state somewhere
between that at t0 and the state at tx.
• The same is also true for log entries.
Recovery (Cont’d)
• We will assume that the write-ahead log strategy is used. This
means that
– old data values must be force-written to the log (i.e. the buffer must be copied
to disk) before any change can be made to the database, and
– the transaction is regarded as committed when the new data values and the
commit marker have been force-written to the log.
• Thus the log is force-written at least at t1, t2 and t3 in the above.
3/30/20 24
• Suppose the log was last written to disk at t4.
• By examining the log:
1. We know that T0, T1 and T2 have committed and their effects should be
reflected in the database after recovery.
2. But we do not know whether the effects of T0, T1 and T2 were reflected at the
time of the crash.
3. We also know that T3 has started, may have modified some data, but is not
committed. Thus T3 should be undone.
3/30/20 25
• The database can be recovered by rolling back T3 using the old data
values from the log, and redoing the changes made by T0 … T2 using
the new data values (for these committed transactions) from the log.
• Notice that instead of rolling back, the database could have been
restored from the backup. This might be necessary in the event of a
disk crash for example (for this reason, the log should be stored on
an independent disk pack).
3/30/20 26
Checkpoints
• Notice also that using this system, the longer the time between
crashes, the longer recovery may take.
• To avoid this problem, the system may take checkpoints at regular
intervals.
• To do this:
– a start of checkpoint marker is written to the log, then
– the database updates in buffers are force-written, then
– an end of checkpoint marker is written to the log.
3/30/20 27
In our example, suppose a checkpoint is taken at time tc. Then on
recovery we only need redo T2.
3/30/20 28
Schedules of Transactions
§ To fully utilise resources, desirable to interleave the operations of
transactions in an appropriate way.
§ For example, if one transaction is waiting for I/O to complete,
another transaction can use the CPU.
§ A schedule S of the transactions T1,…, Tn
– is a sequential ordering of the operations of T1,…, Tn, and
– preserves the ordering of operations in each transaction Ti.
3/30/20 29
3/30/20 30
cannot swap read (X) and write (X)
3/30/20 31
Incorrect correct
§ As we have seen, if operations are interleaved arbitrarily, incorrect results
may occur.
§ However, it is reasonable to assume that schedules (a) and (b) in the figure
will give correct results (as long as the transactions are independent).
§ (a) and (b) are called serial schedules, and we will assume that any serial
schedule is correct.
§ Notice that schedule (d) always produces the same result as schedules (a)
and (b), so it should also give correct results.
§ A schedule is serializable if it always produces the same result as some
serial schedule. (see E/N 17.5.1 for a formal definition).
§ Notice that schedule (c) is not serializable.
3/30/20 32
Scheduling Transactions
• Schedule and Complete Schedule?
• Serial schedule: Schedule that does not interleave the actions of different
transactions.
• Equivalent schedules: For any database state, the effect (on the set of
objects in the database) of executing the first schedule is identical to the
effect of executing the second schedule.
• Serializable schedule: A schedule over a set S of transactions is equivalent
to some serial execution of the set of committed transactions in S.
(Note: If each transaction preserves consistency, every serializable schedule
preserves consistency. )
3/30/20 33
3/30/20 34
Serial Schedule
3/30/20 35
Non-serializable Serializable
Scheduling Transactions (Cont.)
§ Recoverable schedule (RS): Transactions commit only after (and if) all
transactions whose changes they read commit.
EX1: T1.R(X), T1.W(X), T2. R(X), T2.W(X), COMMIT.T2.
EX1 is not recoverable.
EX2: T1.R(X), T1.W(X), T2. R(X), T2.W(X), COMMIT.T1. Recoverable!
§ Avoid cascading aborts (ACA): Transactions read only the changes of
committed transactions.
EX3: T1.R(X), T1.W(X), T2. R(X), T2.W(X)… EX3 is not ACA.
EX4: T1.R(X), T1.W(X), COMMIT.T1,T2. R(X), T2.W(X)… ACA!
§ Strict schedules (SS): A value written by a transaction is not read or
overwritten by other transactions until T either aborts or commits.
EX5: T1.R(X), T1.W(X), T2.W(X)… EX5 is RS and ACA but not SS.
EX6: T1.R(X), T1.W(X), COMMIT.T1,T2.W(X)… EX6 is SS.
Note: SS is ACA and ACA is RS but not vice versa.
3/30/20 36
• When there are only two transactions, there are only two serial
schedules – for n transactions there will be n!.
• Fortunately there is an efficient algorithm to check whether a
schedule is serializable without checking all these possibilities.
3/30/20 37
Check Serializability
Conflict Serializable Schedules
• Two schedules are conflict equivalent if:
– Involve the same actions of the same transactions
– Every pair of conflicting actions is ordered the same way
• Schedule S is conflict serializable if S is conflict equivalent to some serial
schedule
3/30/20 38
View Serializability
• Schedules S1 and S2 are view equivalent if:
– If Ti reads initial value of A in S1, then Ti also reads initial value of A in S2
– If Ti reads value of A written by Tj in S1, then Ti also reads value of A written
by Tj in S2
– If Ti writes final value of A in S1, then Ti also writes final value of A in S2
• A schedule is view serializable if view equivalent to a serial
schedule.
3/30/20 39
Properties of Serizability
§ View Serializability does not have monotonic
property; that is, a schedule is view serializable but its
sub-schedule may not necessarily view serializable.
Example:
3/30/20 40
T1: R (A) W(A)
T2: W (A)
T3: W(A)
T1: R (A) W(A)
T2: W (A)
Yes! No!
§ If no blind writes, conflict serializability is equivalent
to view serializability .
Check Conflict Serializability
• Algorithm
Step 1: Construct a schedule (or precedence) graph – a directed
graph.
Step 2: Check if the graph is cyclic:
• Cyclic: non-serializable.
• Acyclic: serializable.
3/30/20 41
• A directed graph G = (V, A) consists of
– a vertex set V, and
– an arc set A such that each arc connects two vertices.
• G is cyclic if G contains a directed cycle.
3/30/20 42
Cyclic Graph
Construct a Schedule Graph GS = (V, A) for a
schedule S
1. A vertex in V represents a transaction.
2. For two vertices Ti and Tj, an arc Ti→Tj is added to A if
– there are two conflicting operations O1 ∈Ti and O2 ∈Tj,
– in S, O1 is before O2.
Two operations O1 and O2 are conflicting if
– they are in different transactions but on the same data item,
– one of them must be a write.
3/30/20 43
3/30/20 44
3/30/20 45
3/30/20 46
3/30/20 47
• Unfortunately, testing for serializability on the fly is not
practical.
• Instead, a number of protocols have been developed which
ensure that if every transaction obeys the rules, then every
schedule will be serializable, and thus correct.
§ SS is serializable?
Ø irrelevant!
Example:
T1.R(X), T2.R (X), T1.W(X), COMMIT.T1,
T2.W(X), COMMIT.T2
3/30/20 48