CS3402
1
CS3402 Chapter 10: Transactions
Semester B, 2020/2021
Single-User versus Multiuser Systems
A DBMS is single-user if at most one user at a time can use the system, and it is multiuser if many users can use the system—and hence access the database—concurrently.
Single-user DBMSs are mostly restricted to personal computer systems; most other DBMSs are multiuser, for example, an airline reservations system is used by hundreds of users and travel agents concurrently.
In multiuser systems, hundreds or thousands of users are typically operating on the database by submitting transactions concurrently to the system. So we need transaction concurrency control to make the database system consistent.
CS3402 2
What is a Transaction?
A transaction is an executing program that forms a logical unit of database processing.
E.g. use of ATM and buy online tickets
A transaction includes one or more database access operations— these can include insertion, deletion, modification (update), or retrieval operations.
To specify the transaction boundaries, we use begin transaction and end transaction statements in an application program; in this case, all database access operations between the two are considered as forming one transaction.
CS3402 3
What is an Operation?
Structurally, each transaction is a process and consists of atomic steps. Each atomic step is called an operation.
A transaction = database operations + transaction operations
Database operations: read and write operations on a database
Read operation: to read the value of a data item or a group of items, e.g., SELECT
Write operation: to create a new value for a data item or a group of items, e.g., UPDATE
In between the read/write operations, there may be computation
Transaction operations: a begin operation and an end operation (commit or
abort)
CS3402
4
For transaction management (where to start and where to finish)
The new values from a transaction will become permanent only if the transaction is committed successfully
Transaction Structure & Database Consistency
Begin Transaction
Execution of End Transaction Transaction
Database in a consistent state
Database may be temporarily in an inconsistent state during execution
Database in a consistent state
Atomicity: The whole transaction is considered as an atomic unit Partial results are not allowed and is considered to be incorrect
CS3402
5
Two Sample Transactions
Two sample transactions (only showing the database operations):
CS3402 6
Read Operation: read_item(X)
Data are resided on disk and the basic unit of data transferring from the disk to the main memory is one disk block
In general, a data item (what is read or written) will be the field/fields of some records in the database (in a disk block)
read_item(X) command includes the following steps:
CS3402
7
Find the address of the disk block that contains item X
Copy that disk block into a buffer in main memory (if that disk block is not already in some main memory buffer)
Search for the required value in the buffer
Copy item X from the buffer to the program variable named X
Program
Buffer
Disk
transaction
Temporary storage
Permanent storage
Write Operation: write_item(X)
write_item(X) command includes the following steps:
CS3402
8
Find the address of the disk block that contains item X
Copy that disk block into a buffer in main memory (if that disk block is not already in some main memory buffer)
Search for the required value in the buffer
Copy item X from the program variable named X into its correct
location in the buffer
Store the updated block from the buffer back to disk (either
immediately or at some later point in time)
Note that we DO NOT need to read an item before update it
Program
Buffer
Disk
transaction
Temporary storage
Permanent storage
ACID Properties of Transactions
Atomicity: A transaction is an atomic unit of processing. It is either performed completely or not performed at all (all or nothing)
Consistency: A correct execution of a transaction must take the database from one consistent state to another (correctness)
Isolation: A transaction should not make its updates visible to other transactions until it is committed (no partial results)
Durability: Once a transaction changes the database state and the changes are committed, these changes must never be lost because of subsequent failure (committed and permanent results)
CS3402 9
What is a Schedule?
A schedule of n transactions T1, T2, … , Tn is an ordering of the operations of the transactions.
CS3402
10
Operations from different transactions can be interleaved in the schedule S.
However, for each transaction Ti that participates in the schedule S, the operations of Ti in S must appear in the same order in which they occur in Ti.
Why Concurrency Control is Needed?
Several problems can occur when concurrent transactions execute in an uncontrolled manner.
We illustrate some of these problems by referring to a simplified airline reservations database in which a record is stored for each airline flight.
X, Y are database items to represent the number of reserved seats on each flight. Figure (a) shows a transaction T1 that transfers N reservations from
flight(X) to flight(Y).
Figure (b) shows a simpler transaction T2 that just reserves M seats on flight (X).
CS3402 11
Consistency Problems in Concurrent Schedule
Lost Update Problem (write/write conflicts)
When two transactions that access the same database items have their operations interleaved in a way that makes the value of some database item incorrect (inconsistent)
For example, if X = 80 at the start (originally there were 80 reservations on the flight X), N = 5 (T1 transfers 5 seat reservations from the flight X to the flight Y), and M = 4 (T2 reserves 4 seats on X), the final result should be X = 79.
Q: In this interleaving of operations, what is the value of X in the disk?
CS3402 12
Consistency Problems in Concurrent Schedule
Incorrect Summary Problem (read/write conflicts)
One transaction (i.e. T3) is calculating an aggregate summary
function on a number of database items
Other transactions (i.e. T1) are updating some of these items, the aggregate function may calculate some values before they are updated (e.g., Y) and others after they are updated (e.g., X).
CS3402 13
Conflicting operations
Two operations in a schedule are said to conflict if they satisfy all three of the following conditions:
CS3402
14
(1) they belong to different transactions;
(2) they access the same item (e.g., item X); and
(3) at least one of the operations is a write operation (e.g., write_item(X)).
CS3402
15
How to improve transaction processing performance?
Serial Schedule Vs. Serializable Schedule
Transaction Processing Performance
Serial execution: execute transitions one by one (slow) Concurrent execution:
Interleaved processing: Concurrent execution of processes/transactions are interleaved in a single CPU system
Parallel processing: Processes/transactions are concurrently executed in multiple CPUs system
Higher Concurrency (more than one transaction is executing)
Better performance, i.e., lower response time
Problem: difficult to maintain database consistency if it is not well scheduled
CS3402 16
Interleaved vs Parallel
While waiting disk data, the CPU processes another transaction
CS3402 17
Serial schedule
A schedule S is serial if, for every transaction T participating in the schedule, all the operations of T are executed consecutively in the schedule.
Serial schedules can maintain the database consistency
Problem: Limit concurrency by prohibiting interleaving of operations
CS3402
18
If a transaction waits for an I/O operation to complete, we cannot switch the CPU processor to another transaction, thus wasting valuable CPU processing time.
Serializable schedule
Conflict equivalent: Two schedules are said to be conflict equivalent if the relative order of any two conflicting operations is the same in both schedules.
Serializable schedule: A concurrent schedule S which is conflict equivalent to a serial schedule.
CS3402
19
Can guarantee the database consistency and can have better performance.
CS3402 20
Serialization Graphs
Q: How to determine whether a schedule is serializable?
A: Check using a serialization graph (SG) (or called precedence
graph) through the following steps:
1. Nodes: For each transaction Ti participating in schedule S,
create a node labeled Ti
2. Edges: The set of edges consists of all edges Ti→Tj for which one of the following three conditions holds:
W/R conflict: Ti executes write(x) before Tj executes read(x) R/W conflict: Ti executes read(x) before Tj executes write(x) W/W conflict: Ti executes write(x) before Tj executes write(x)
3. Checking: The schedule S is serializable if and only if the precedence graph has no cycles.
CS3402 21
Serialization Graphs
CS3402 22
Serialization Graphs
CS3402 23
Constructing the precedence test for conflict serializability.
graphs for schedules A and D from Figure 17.5 to
(a) Precedence graph for
serial schedule A.
serial schedule B.
schedule C (not conflict serializable). schedule D (conflict serializable, equivalent to
(b) Precedence graph for
(c) Precedence graph for
(d) Precedence graph for
CS3402
24
schedule A).
CS3402
25
How to ensure Recoverability in a schedule?
A Transaction may Fail
Why recovery is needed? What causes a transaction to abort? 1. A computer failure (system crash):
CS3402
26
A hardware or software error occurs in the computer system during transaction execution
If the hardware crashes, the contents of the computer’s internal memory may be lost.
2. A transaction error:
Some operation in the transaction may cause it to fail, such as integer overflow or division by zero
Transaction failure may also occur because of erroneous parameter values or because of a logical programming error. In addition, the user may interrupt the transaction during its execution.
A Transaction may Fail
CS3402
27
3. Local errors or exception conditions detected by the transaction:
Certain conditions necessitate cancellation of the transaction. For example, data for the transaction may not be found
4. Concurrency control enforcement:
The concurrency control method may decide to abort the transaction, to be restarted later, because it violates serializability or because several transactions are in a state of deadlock
A Transaction may Fail
5. Disk failure:
Some disk blocks may lose their data because of a read or write malfunction or because of a disk read/write head crash. This may happen during a read or a write operation of the transaction.
6. Physical problems and catastrophes:
This refers to an endless list of problems that includes power or air-conditioning failure, fire, theft, overwriting disks
CS3402 28
Transaction State
A transaction is an atomic unit of work that is either completed in its entirety or not done at all (atomicity)
CS3402
29
For recovery purposes, the system needs to keep track of when a transaction starts, terminates, and commits or aborts
Undo Logging for Recovery
A log is a file of log records, each telling something about what some transaction has done
As transactions execute, the log manager records in the log each important event (e.g., write operations)
Logs are initially created in main memory and are allocated by the buffer manager
Why not on disk? Disk I/O takes a lot of time
Logs are periodically copied to disk by “flush-log” operation
If log records appear in nonvolatile storage (disk), we can use them to restore the database to a consistent state after a system crash
After a system failure, all data in volatile storage (memory) will lose but the data in nonvolatile storage remain
CS3402 30
Recoverability Problems in Concurrent Schedule
Dirty Read Problem (write/read conflicts)
This problem occurs when one transaction updates a database item and then the transaction fails for some reason. Meanwhile, the updated item is accessed (read) by another transaction before it is changed back (or rolled back) to its original value.
CS3402 31
Schedules Classified on Recoverability
Rollback:
If a transaction is aborted, all its effects have to be undone
Rollback of a transaction: undo those processed operations of an aborted transaction
Why needs to be rollback?
maintaining consistency and all or nothing property
Non-recoverable schedule: The committed transaction with dirty- read problem cannot be rolled back.
A schedule S is recoverable if for all Ti and Tj, Tj commits before Ti whenever Ti reads an item written by Tj
CS3402
32
i.e., Transactions commit only after all transactions whose changes they read commit.
Example (Non-Recoverable vs Recoverable)
T1 T2
T1 T2
Read(A) Write(A)
Read(A) Write(A)
Read(B)
Read(B) Commit/Abort
Commit/Abort
Non-recoverable
Recoverable
If T2 commits and then
T1 Aborts…
CS3402
33
Read(A)
Read(A) Write(A) Commit/Abort
Write(A) Commit/Abort
Cascading rollback
Cascading rollback:
CS3402
34
A single rollback leads to a series of rollback
All uncommitted transactions that read data items from a failed (aborted) transaction must be rolled back
Example (Cascading Rollback)
T1 T2 T3
Read(A) Read(B) Write(A)
Commit/Abort
Recoverable but:
When T1 fails, T2 and T3 should rollback
CS3402
35
Read(A) Write(A)
Commit/Abort
Read(A)
Commit/Abort
Cascadeless schedule
Because cascading rollback can be time-consuming—since numerous transactions can be rolled—it is important to characterize the schedules where this phenomenon is guaranteed not to occur
Cascadeless schedule: Every transaction reads only the items that are written by committed transactions.
CS3402
36
i.e., before Ti reads an item written by Tj, Tj has already committed
Example (Non-Cascadeless vs Cascadeless)
T1 T2
T1 T2
Read(A) A=A+1 Write(A)
Read(A) A=A+1 Write(A) Commit/Abort
Commit/Abort
CS3402
37
Non-cascadeless
Cacadeless
T1 rollbacks, T2 also has to rollback
T2 read A after T1 commits
Read(A) Read(B) B=B+A ….
Read(A) Read(B) B=B+A
… Commit/Abort
Commit/Abort
Strict schedule
Strict schedule: A schedule in which transactions can neither read nor write an item X until the last transaction that wrote X has committed (or aborted).
Strict schedules simplify the recovery process.
CS3402
38
In a strict schedule, the process of undoing a write_item(X) operation of an aborted transaction is simply to restore the before image (old_value) of data item X.
This simple procedure always works correctly for strict schedules, but it may not work for recoverable or cascadeless schedules.
Example 3 (Cascadeless vs Strict )
S: w1 (X,5); w2 (X,8); a1; Suppose initially, X=9 T1 writes a value 5 for X T2 writes a value 8 for X Then T1 aborts
Schedule S is cascadeless, but it may lead to potentially incorrect results:
Reason: If T1 aborts, the recovery procedure that restores the before image of an aborted write operation will restore the value of X to 9, even though it has already been changed to 8 by transaction T2
Schedule S is not strict, as it permits T2 to write item X even though
the transaction T1 that last wrote X had not yet committed (or
aborted).
CS3402 39
Schedule
Note:
Any strict schedule is also cascadeless
Any cascadeless schedule is also recoverable
Most recovery protocols allow only strict schedules, so that the recovery process itself is not complicated.
CS3402 40
References
7e
Chapter 20
CS3402 41