Database Design
Introduction to Transactions
Copyright By PowCoder代写 加微信 powcoder
Simple Model of a DB
DB: A collection of named data items
Data granularity: A field, a record, a table, a whole disk block
Basic operations: Retrieve/Change data, i.e. read/write
Transaction Management
What is a Transaction?
Transaction: A logical unit of DB processing
A set of operations
Includes one or more access operations
E.g., read (retrieval), write (insert or update), delete
move money from my savings account to my checking account
The two updates to my savings and checking account would be a single transaction
Transaction Management
What is a Transaction?
A DBMS will typically receive many transaction requests possibly at the same time (concurrently)
Transaction Manager: A component of the DB
Execute DB requests from one or more users
Protect the data from loss or damage
Transaction Management
What is a Transaction?
Specified as a stand alone operation in a high level language such as SQL
Either submitted interactively or embedded in a program
An application program may contain several transactions
Each transaction has boundaries … Begin/End
Transaction Management
Basic Transaction Operations
read_item(X) – Read a DB item named X into a program variable
write_item(X) – Writes the value of a program variable into the DB item named X
(INSERT, UPDATE, DELETE)
Transaction Management
Multi-user database systems
Suppose two “users” are accessing a banking database simultaneously
How can we make sure that the automatic process doesn’t check my accounts in the middle of the move?
This is known as “concurrency control”
I’m transferring money from savings to my checking account
Bank is checking to see if the total amount in all of my accounts is less than $250
If so, charge fee and send alert
Concurrency Control
Ensuring that processes running simultaneously (concurrently) behave in an appropriate way
Result from two processes running in an interleaved fashion is predictable
The final result of running A and B interleaved depends on how A and B are divided
Concurrency control makes this behavior predictable
Transaction Properties – ACID
To handle concurrency, transactions need to be:
The transaction needs to either be entirely performed or not performed at all – no half-done transactions are allowed
Consistent
Transactions that start from a database in a consistent state should end with the database in a consistent state
Transaction Properties – ACID
To handle concurrency, transactions need to be:
Transactions should always function as if they were executing in isolation from other transactions
Database changes made by a completed transaction must persist – even if the database fails
Concurrency Control Issues
Lost Update
Two processes update same data
Incorrect Summary
Summary process at the same time as update
Temporary Update (Dirty Read)
Write operation fails, requires undo
Transaction Management
Concurrency Control Issues
Lost updates
Two processes try to update the same item
Interleaved in a way that causes incorrect values
T1 – move money from checking to savings
T2 – add money to checking
Concurrency Control Issues
Incorrect Summary
One process is computing an aggregation of values
Another process is making changes while the computation is occurring
T1 – Move money from checking to savings
T3 – Compute the total in all accounts
Concurrency Control Issues
Temporary Update (Dirty Read)
T1 – take money out of the account
T1 performs a read, then a write
T2 – add money to the account
T2 reads, updates and writes the same item
Then T1 fails at its next step, needing to undo the write
What causes a Transaction to Fail?
Unfortunate events –
Computer failure (crash, out of memory)
Transaction/System Errors (database full, disk full)
Logical/exception Errors
insufficient funds in a banking transaction, no such record, referential integrity fail
Disk I/O failure
What should be done if a failure occurs?
Remember, the job of the Transaction Manager
Protect the data from loss or damage
SQLITE_FULL: database or disk full
SQLITE_IOERR: disk I/O error
SQLITE_BUSY: database in use by another process
SQLITE_NOMEM: out or memory
SQLITE_FULL: database or disk full
SQLITE_IOERR: disk I/O error
SQLITE_BUSY: database in use by another process
SQLITE_NOMEM: out or memory
Transaction Implementation – DBMS
How are transactions implemented at the DBMS level?
Need to be able to restore the database to the last commit point after a failure
DBMS keeps a system log (or journal) of transactions
Sequential, append-only file – all transactions appended to end of log file
Kept on disk – only a catastrophic disk failure prevents a recovery
In case of failure, system log can “play back” transactions back to the last transaction before failure
Transaction Implementation – DBMS
DBMS keeps a system log (or journal) of transactions
Need to be able to restore the database to the last commit point after a failure
Sequential, append-only file – all transactions appended to end of log file
Kept on disk – only a catastrophic disk failure prevents a recovery
In case of failure, system log can “play back” transactions back to the last transaction before failure
Transaction Schedules
We are most concerned with read (r) and write (w) operations from transactions:
r1(X) – T1 performs a read on data item X
w2(Y) – T2 performs a write on data item Y
We may also be concerned with the beginning and ending of transactions, and commit points:
b1 – T1 transaction begins
c2 – T2 transaction commits
Transaction Schedules
Some notation:
Each schedule is noted by an alphabetic subscript:
Sa, Sb, etc.
Each transaction noted by a numeric subscript:
T1, T2, etc.
Schedules (example)
The above pair of transactions would have the following notation:
Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y)
Schedules (another example)
Sb: r3(A); r1(X); w1(X); r3(X); r3(Y); r1(Y); w1(Y)
Transaction Schedules (Histories)
The order of operations of a sequence of transactions
Individual operations from multiple transactions can be interleaved together
Assume schedules have Total Ordering
For any two operations in the schedule, one MUST occur before the other
Transaction Schedules
Some notation:
Each schedule is noted by an alphabetic subscript:
Sa, Sb, etc.
Each transaction noted by a numeric subscript:
T1, T2, etc.
We are most concerned with read (r) and write (w) operations from transactions:
r1(X) – T1 performs a read on data item X
w2(Y) – T2 performs a write on data item Y
We may also be concerned with the beginning and ending of transactions, and commit points:
b1 – T1 transaction begins
c2 – T2 transaction commits
a1 – T1 transaction aborts (rollback)
Schedules (example)
The above pair of transactions would have the following notation:
Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y)
Schedules (another example)
Sb: r3(A); r1(X); w1(X); r3(X); r3(Y); r1(Y); w1(Y)
Transaction Schedules – Conflicts
Two operations in a schedule conflict if:
They belong to different transactions
They operate on the same database item
At least one of the two operations is a write operation
Transaction Schedules – Conflicts
Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y)
Two operations conflict if changing their order changes the final result of the schedule
Different transactions
Write operation
NO CONFLICT
Same transaction
Transaction Schedule – Conflicts
Suppose we have two operations:
Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y)
Two different schedules would be possible:
r2(X); w1(X) – first READ the original balance, then WRITE the new balance
w1(X); r2(X) – first WRITE the new balance, then READ the new balance
These two schedules give two different results
This kind of conflict is known as a read-write conflict
Write-write conflicts also possible
Transaction Schedules
We call a schedule a complete schedule if:
The operations in the schedule are exactly those of the transactions that make up the schedule
For any pair of operations within the same transaction, their order in the schedule is the SAME as their order in the transaction
For any pair of conflicting operations, one of the operations must occur before the other in the schedule
Complete Schedules
T1: r1(X); w1(X); r1(Y); w1(Y); c1;
T2: r2(X); w2(X); c2;
Some possible complete schedules for the above transactions
Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y); c1; c2;
Sb: r1(X); w1(X); r1(Y); w1(Y); c1; r2(X); w2(X); c2;
Sf: r2(X); w2(X); c2; r1(X); w1(X); r1(Y); w1(Y); c1;
Sg: r2(X); r1(X); w2(X); w1(X); c2; r1(Y); w1(Y); c1;
This is NOT a possible complete schedule for the above:
Sh: r1(Y); r2(X); w2(X); w1(X); c2; r1(X); w1(Y); c1;
All of the operations are there, but NOT in the correct order for T1
Complete Schedules
Note condition 3
“For any pair of conflicting operations, one of the operations must occur before the other in the schedule”
Only requires that conflicting operations have a total ordering
Two read operations can happen in any order if they come from two different transactions
Remember – operations from same transaction MUST occur in the order given by the transaction
This means that a schedule is actually a partial order of the operations of a set of transactions
Partial order – some operation pairs have no specific order
In practice, most schedules are given a total order anyway
One way to characterize schedules – ease of recovery
How easy is it for a schedule to recover from transaction/system failures?
Possible but not simple?
Impossible?
Avoid schedules where recovery is impossible
Try to create schedules where recovery is simple
What makes a schedule recoverable?
Once a transaction T is committed, it should never be necessary to roll back T
This ensures Durability in our ACID properties
If a schedule behaves with regards to this rule, the schedule is recoverable
Schedules that do not have this property are nonrecoverable
DBMS software needs to avoid placing transactions into nonrecoverable schedules!
Nonrecoverable Schedules
T1: r1(X); w1(X); r1(Y); w1(Y); c1;
T2: r2(X); w2(X); c2;
This schedule would be recoverable
Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y); c1; c2;
If we roll back just before the commit of T1, the result of T2 is unaffected – T2’s operations can be left in place
Nonrecoverable Schedules
T1: r1(X); w1(X); r1(Y); w1(Y); c1;
T2: r2(X); w2(X); c2;
This schedule would be nonrecoverable
Sc: r1(X); w1(X); r2(X); r1(Y); w2(X); c2; a1
Note that we’ve aborted T1 after committing T2
Rolling back T1 changes the result of T2
That first read for T2 would have a different result, since the first write for T1 was undone
But we’ve already committed T2! We can’t roll it back now
Nonrecoverable Schedules
T1: r1(X); w1(X); r1(Y); w1(Y); c1;
T2: r2(X); w2(X); c2;
We can modify Sc to be recoverable:
Sc: r1(X); w1(X); r2(X); r1(Y); w2(X); c2; a1
Sd: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2;
What makes Sd recoverable when Sc isn’t?
Commit of T1 happens before the commit of T2
If we need to abort T1 before it finishes, T2 can be rolled back
If we need to abort T2 before it finishes but after T1 commits, there are no reads by T1 that are affected
Cascading Rollback
Sd: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2;
Sd is recoverable, but there’s another issue
What happens if we abort T1 right before the commit?
T2 also has to be rolled back to its beginning
First read by T2 reads something that T1 has written, but hasn’t committed
These kinds of rollbacks are called cascading rollbacks
Cascading Rollback
Cascadeless rollbacks – every transaction in the schedule reads only items that were written by transactions that have committed:
Sd: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2;
Sd is not cascadeless
Se: r1(X); w1(X); r1(Y); w1(Y); c1; r2(X); w2(X); c2;
Se is cascadeless
By delaying the T2’s read from X until after the commit by T1, cascading rollback is eliminated
Cascading rollbacks are time consuming
Serializability
Recall: ACID
What do we mean by isolated ?
Transactions need to appear as if they were run by themselves – without other transactions operating
What does it take to enforce this?
The transaction needs to run without interference by other transactions
Does this mean that the only schedules we’re allowed to have are serial schedules?
Serial Schedules
The representations above are serial schedules
Either T1 executes, followed by T2…
… or T2 executes, followed by T1
Here isolation is obviously preserved
Each transaction is independent of the other
No lost updates, incorrect summaries or other issues
Upshot – any serial ordering of transactions is considered a correct ordering
Non-Serial Schedules
Now Consider Transactions T1 and T2 above
The representations above are non-serial schedules for T1 and T2
No longer completely isolated – transactions interleaved
Correctness is no longer so obvious
Could we have a lost update here? An incorrect summary?
Can we still interleave operations while preserving correctness?
Interleaving Operations
Can we interleave operations while preserving correctness?
First question – why do we care about interleaving?
Idle processors waste time!
We want to maximize our efficiency
Allow as much concurrency of operation as possible
Serial schedules are inefficient
If T1 is waiting for an I/O operation, it would be better to let T2 process if we can
Forcing every schedule to actually be a serial schedule is unacceptable from an efficiency standpoint
Interleaving Operations
So, can we interleave operations while preserving correctness?
Sometimes – if our operations are ordered in a way that makes them equivalent to a serial schedule
Interleaving
Consider these schedules
A & B are serial schedules
D is equivalent to a serial schedule,
C is vulnerable to a lost update and is not equivalent to a serial schedule
Serializable Schedules
A schedule of n transactions is a serializable schedule if it is equivalent to some serial schedule of the same n transactions
How do we test to see if a non-serial schedule is serializable?
Result Equivalence
Simplest way of testing for equivalence
Two schedules are result equivalent if they produce the same final state of the database
Not a good approach – equivalences may be accidental
Consider the schedules below – equivalent for X=100 but NOT in general
We want tests that would not consider these schedules to be equivalent
Conflict Equivalence
A better test for equivalence is conflict equivalence
Two schedules are conflict equivalent if they have the same operations and the order of any two conflicting operations is the same in both schedules
Recall: conflicting operations:
Belong to two different transactions
Access the same database item
At least one is a write operation
A schedule is considered conflict serializable if it is conflict equivalent to some serial schedule
Serializability
Schedule D above is conflict equivalent to schedule A – serializable
Schedule C above is not conflict equivalent to either possible serial schedule – not serializable
Testing for Conflict Serializability
Create a precedence graph
For each transaction Ti participating in schedule S, create a node labelled Ti in the precedence graph.
Testing for Conflict Serializability
Create a precedence graph
For each case in S where Tj executes a read_item(X) after Ti executes a write_item(X), create an edge (Ti –> Tj) in the precedence graph.
T2 reads X after T1 has written X
T1 writes X before T2 reads X
Testing for Conflict Serializability
Create a precedence graph
For each case in S where Tj executes a write_item(X) after Ti executes a read_item(X), create an edge (Ti –> Tj) in the precedence graph.
T1 writes X after T2 has read X
T2 reads X before T1 writes X
Testing for Conflict Serializability
Create a precedence graph
For each case in S where Tj executes a write_item(X) after Ti executes a write_item(X), create an edge (Ti –> Tj) in the precedence graph.
T2 writes X after T1 has written X
T1 writes X before T2 writes X
Testing for Conflict Serializability
Create a precedence graph
The schedule S is serializable if the precedence graph has no cycles.
T2 writes X after T1 has read X
T1 writes X after T2 has read X
We cannot declare Schedule C as serializable
Using Serializability
The idea of serializability is used for concurrency control
DMBS does not attempt to show that any arbitrary schedule is serializable
Instead, the DBMS guarantees that any schedule that follows the protocols will be serializable
Being serializable is not the same as being serial
Being serializable implies that the schedule is a correct schedule.
It will leave the database in a consistent state.
The interleaving is appropriate and will result in a state as if the transactions were serially executed, yet will achieve efficiency due to concurrent execution.
Serial schedule:
A schedule S is serial if, for every transaction T participating in the schedule, all the operations of T are executed consecutively in the schedule.
Otherwise, the schedule is called nonserial schedule.
Serializable schedule:
A schedule S is serializable if it is equivalent to some serial schedule of the same n transactions.
Result equivalent:
Two schedules are called result equivalent if they produce the same final state of the database.
Conflict equivalent:
Two schedules are said to be conflict equivalent if the order of any two conflicting operations is the same in both schedules.
Conflict serializable:
A schedule S is said to be conflict serializable if it is conflict equivalent to some serial schedule S’.
Transaction Processing – SQL
Transactions are implemented in SQL:
The sequence of SQL statements that follow will be processed as a single transaction
BEGIN TRANSACTION
Transaction Processing
Active transactions may have any number of read/write operations
If at any time a read/write operation fails, the transaction aborts and no changes are made
INSERT INTO SP VALUES (‘S5’, ‘P1′, 1000);
UPDATE P SET TOTQTY = TOTQTY + 1000
WHERE P#=’P1’;
Transaction Processing
Once the read/write operations are finished, the transaction moves to a partially committed state
Ensures that if there is a failure, it will be able to recover to the point after the transaction
If it can’t ensure that, the transaction aborts and no changes are made
IF error THEN ROLLBACK;
Transaction Processing
Finally the DBMS is in the committed state
All read/write operations are made permanent
If a failure happens, the database will recover to this point
The transaction is finished and we say it has been committed
END TRANSACTION;
SQL Transaction commands
COMMIT TRANSACTION (or COMMIT)
The sequence of SQL statements for the transaction is finished and the DBMS should attempt to commit the transaction
All of the updates can now be “committed”/made permanent/persistent
certain operations must be redone to ensure that all operations of a committed transaction have been applied successfully to the DB
ROLLBACK TRANSACTION (or ROLLBACK)
Abort the transaction – undo all statements from the BEGIN TRANSACTION statement
Something has gone wrong so the DB might be in an inconsistent state
Transaction Management
The Suppliers-Parts Example
Transaction Management
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com