3/30/20
1
Transactions, Recovery and Concurrency (I)
Air-line Reservation
• 10 available seats vs 15 travel agents.
• How do you design a robust and fair reservation system?
– Insufficient resources
– Fair policy to every body – Robustness
3/30/20
2
Failures
Number of factors might cause failures in user requirements processing.
3/30/20
3
1.
2. 3. 4.
§ §
System failure:
Disk failure – e.g. head crash, media fault.
System crash – unexpected failure requiring a reboot.
Program error – e.g. a divide by zero.
Exception conditions – e.g. no seats for your reservation. Concurrency control – e.g. deadlock, expired locks.
To handle failures correctly and efficiently
Each database user must express his requirements as a set of program units.
Each program unit is a transaction that either
§ SydneyàTokyoàLA à N .Y
§ It does not make sense only partial trip has tickets
– –
accesses the contents of the database, or
changes the state of the database, from one consistent state to another.
Example transaction: buy a ticket from Sydney to N.Y. by JAL.
A transaction must be treated as an atomic unit.
3/30/20 4
Transaction Processing Three kinds of operations may be used in a transaction:
– Read.
– Write.
– Computation.
3/30/20
5
Buffer Management in a DBMS
3/30/20 6
Read
1. Compute the data block that contains the item to be read
2. Either
– find a buffer containing the block, or – read from disk into a buffer
3. Copy the value from the buffer.
3/30/20
7
Write
1. Compute the disk block containing the item to be written,
2. Either
– find a buffer containing the block, or – read from disk into a buffer,
3. 4.
Copy the new value into the buffer,
At some point (maybe later), write the buffer back to disk.
3/30/20
8
Processing States of a Transaction
• The typical processing states are illustrated in the figure below (E/N Fig 17.4):
• Partially committed point: At this point, check and enforce the correctness of the concurrent execution.
• Committed state: Once a transaction enters the committed state, it has concluded its execution successfully.
3/30/20 9
Desirable Properties of Transaction Processing ACID
• Atomicity: A transaction is either performed in its entirety or not performed at all.
• Consistency preservation: A correct execution of the transaction must take the database from one consistent state to another.
• Isolation: A transaction should not make its updates visible to other transactions until it is committed.
• Durability or permanency: Once a transaction changes the database and the changes are committed, these changes must never be lost because of subsequent failure.
3/30/20 10
Problems without Enforcing ACID § For a banking system,
§ If durability is not enforced, then a customer may lose a deposit.
§ If consistency preservation is not enforced, then the bank runs a high risk of bankrupt. E.g., run- over upper-limit.
§ Below are the problems if atomicity and isolation are not enforced in a concurrent execution of transactions.
3/30/20 11
•
Let us see what may happen if T1 and T2 are executed concurrently in an uncontrolled way:
Lost Update Problem (Isolation is not enforced)
•
Suppose we have these two transactions, T1 and T2: T1:
read(X) X←X+N
write(X) read(Y) Y←YN
write(Y)
T2: read(X)
X←X+M write(X)
3/30/20 12
Suppose initially that X = 100; Y = 50;N = 5 and M = 8.
3/30/20 13
• At the end of T1 and T2, X should be 113, Y should be 45.
• The update X ← X + N
has been lost.
3/30/20
14
Incorrect Summary Problem (Isolation Issue)
§ Here the sum calculated by T3 will be wrong by N.
3/30/20 15
The Temporary Update Problem
Several possibilities for what might happen next:
3/30/20 16
Recover from the disk
Database
X=105, Y=50
X=100, Y=50 Case 1: DBMS undoes T1
X=113, Y=50
Database
X=105, Y=50
X=105, Y=50
Case 2: DBMS does nothing to T1
X=113, Y=50
X = 113
X = 113
Write (X) X= 113
§ §
Case 1&2, only half of T1 has been executed.
Case 3, T1 & T2 have been lost.
Database
X=105, Y=50
X= 105, Y = 50
T1
T1
Write (X) X= 113
T2
T2
X = 113
X = 113
3/30/20
17
X=100, Y=50
Case 3: DBMS undoes T1
T1
T2
X = 113
X = 113
Write (X), X= 113 X= 100
Recover from Failures Ensure ACID
Log-based Recovery – Undo logging
– Redo logging
– Undo/Redo logging
3/30/20 18
System Log • System Log
– The system needs to record the states information to recover failures correctly.
– The information is maintained in a log (also called journal or audit trail).
– The system log is kept in hard disk but maintains its current contents in main memory.
3/30/20
19
System Log
• Start transaction marker [start transaction, T]: Records that transaction T has started execution.
• [read item, T, X]: Records that transaction T has read the value of database item X.
• [write item, T, X, old value, new value]: Records that T has changed the value of database item X from old value to new value.
• Commit transaction marker [commit, T]: Records that transaction T has completed successfully, and arms that its effect can be committed (recorded permanently) to the database.
• [abort, T]: Records that transaction T has been aborted.
3/30/20 20
System Log (Cont’d)
• In fact some other entries (rollback, undo, redo) are also required for a recovery method.
• These entries allow the recovery manager to rollback an unsuccessful transaction (undo any partial updates).
3/30/20 21
Recovery
• Let us see how the log might be used to recover from a system crash.
• The diagram below shows transactions between the last system backup and a crash.
3/30/20 22
Recovery (Cont’d)
• The database on disk will be in a state somewhere between that at t0 and the state at tx.
• The same is also true for log entries.
3/30/20 23
Recovery (Cont’d)
• We will assume that the write-ahead log strategy is used. This means that
– old data values must be force-written to the log (i.e. the buffer must be copied to disk) before any change can be made to the database, and
– the transaction is regarded as committed when the new data values and the commit marker have been force-written to the log.
• Thus the log is force-written at least at t1, t2 and t3 in the above.
3/30/20 24
• Suppose the log was last written to disk at t4.
• By examining the log:
3/30/20
25
1. 2. 3.
We know that T0, T1 and T2 have committed and their effects should be reflected in the database after recovery.
But we do not know whether the effects of T0, T1 and T2 were reflected at the time of the crash.
We also know that T3 has started, may have modified some data, but is not committed. Thus T3 should be undone.
• The database can be recovered by rolling back T3 using the old data values from the log, and redoing the changes made by T0 … T2 using the new data values (for these committed transactions) from the log.
• Notice that instead of rolling back, the database could have been restored from the backup. This might be necessary in the event of a disk crash for example (for this reason, the log should be stored on an independent disk pack).
3/30/20 26
Checkpoints
• Notice also that using this system, the longer the time between crashes, the longer recovery may take.
• To avoid this problem, the system may take checkpoints at regular intervals.
• To do this:
3/30/20
27
– – –
a start of checkpoint marker is written to the log, then the database updates in buffers are force-written, then an end of checkpoint marker is written to the log.
In our example, suppose a checkpoint is taken at time tc. Then on recovery we only need redo T2.
3/30/20 28
Schedules of Transactions
§ To fully utilise resources, desirable to interleave the operations of transactions in an appropriate way.
§ For example, if one transaction is waiting for I/O to complete, another transaction can use the CPU.
§ A schedule S of the transactions T1,…, Tn
– is a sequential ordering of the operations of T1,…, Tn, and – preserves the ordering of operations in each transaction Ti.
3/30/20
29
cannot swap read (X) and write (X)
3/30/20 30
3/30/20
31
Incorrect
correct
§ As we have seen, if operations are interleaved arbitrarily, incorrect results may occur.
§ However, it is reasonable to assume that schedules (a) and (b) in the figure will give correct results (as long as the transactions are independent).
§ (a) and (b) are called serial schedules, and we will assume that any serial schedule is correct.
§ Notice that schedule (d) always produces the same result as schedules (a) and (b), so it should also give correct results.
§ A schedule is serializable if it always produces the same result as some serial schedule. (see E/N 17.5.1 for a formal definition).
§ Notice that schedule (c) is not serializable.
3/30/20 32
Scheduling Transactions
• Schedule and Complete Schedule?
• Serial schedule: Schedule that does not interleave the actions of different transactions.
• Equivalent schedules: For any database state, the effect (on the set of objects in the database) of executing the first schedule is identical to the effect of executing the second schedule.
• Serializable schedule: A schedule over a set S of transactions is equivalent to some serial execution of the set of committed transactions in S.
(Note: If each transaction preserves consistency, every serializable schedule preserves consistency. )
3/30/20 33
Serial Schedule
3/30/20 34
3/30/20
35
Non-serializable
Serializable
Scheduling Transactions (Cont.)
§ Recoverable schedule (RS): Transactions commit only after (and if) all transactions whose changes they read commit.
EX1: T1.R(X), T1.W(X), T2. R(X), T2.W(X), COMMIT.T2.
EX1 is not recoverable.
EX2: T1.R(X), T1.W(X), T2. R(X), T2.W(X), COMMIT.T1. Recoverable!
§ Avoid cascading aborts (ACA): Transactions read only the changes of committed transactions.
EX3: T1.R(X), T1.W(X), T2. R(X), T2.W(X)… EX3 is not ACA.
EX4: T1.R(X), T1.W(X), COMMIT.T1,T2. R(X), T2.W(X)… ACA!
§ Strict schedules (SS): A value written by a transaction is not read or overwritten by other transactions until T either aborts or commits.
EX5: T1.R(X), T1.W(X), T2.W(X)… EX5 is RS and ACA but not SS.
EX6: T1.R(X), T1.W(X), COMMIT.T1,T2.W(X)… EX6 is SS.
Note: SS is ACA and ACA is RS but not vice versa.
3/30/20 36
Check Serializability
• When there are only two transactions, there are only two serial schedules – for n transactions there will be n!.
• Fortunately there is an efficient algorithm to check whether a schedule is serializable without checking all these possibilities.
3/30/20 37
• •
Conflict Serializable Schedules
Two schedules are conflict equivalent if:
– Involve the same actions of the same transactions
– Every pair of conflicting actions is ordered the same way
Schedule S is conflict serializable if S is conflict equivalent to some serial schedule
3/30/20
38
View Serializability
• Schedules S1 and S2 are view equivalent if:
– If Ti reads initial value of A in S1, then Ti also reads initial value of A in S2
– If Ti reads value of A written by Tj in S1, then Ti also reads value of A written
by Tj in S2
– If Ti writes final value of A in S1, then Ti also writes final value of A in S2
• A schedule is view serializable if view equivalent to a serial schedule.
3/30/20 39
Properties of Serizability
§ View Serializability does not have monotonic property; that is, a schedule is view serializable but its sub-schedule may not necessarily view serializable.
Example:
Yes!
No!
T1: R (A) W(A) T2: W (A)
T1: R (A) T2:
T3:
W(A) W (A)
W(A)
§ If no blind writes, conflict serializability is equivalent to view serializability .
3/30/20 40
Check Conflict Serializability
• Algorithm
Step 1: Construct a schedule (or precedence) graph – a directed
graph.
Step 2: Check if the graph is cyclic:
3/30/20
41
• •
Cyclic: non-serializable. Acyclic: serializable.
• A directed graph G = (V, A) consists of
– a vertex set V, and
– an arc set A such that each arc connects two vertices.
• G is cyclic if G contains a directed cycle.
3/30/20 Cyclic Graph 42
Construct a Schedule Graph GS = (V, A) for a schedule S
1. A vertex in V represents a transaction.
2. For two vertices Ti and Tj, an arc Ti →Tj is added to A if
– there are two conflicting operations O1 ∈Ti and O2 ∈Tj,
– in S, O1 is before O2.
Two operations O1 and O2 are conflicting if
3/30/20
43
– –
they are in different transactions but on the same data item, one of them must be a write.
3/30/20 44
3/30/20 45
3/30/20 46
• Unfortunately, testing for serializability on the fly is not practical.
• Instead, a number of protocols have been developed which
ensure that if every transaction obeys the rules, then every
schedule will be serializable, and thus correct.
3/30/20 47
§ SS is serializable? Ø irrelevant!
Example:
T1.R(X), T2.R (X), T1.W(X), COMMIT.T1, T2.W(X), COMMIT.T2
3/30/20 48