Transactions
Reading:
http://en.wikipedia.org/wiki/Database_transact http://en.wikipedia.org/wiki/Serializability http://en.wikipedia.org/wiki/Schedule_(comput
– http://www.postgresql.org/docs/9.4/static/tutorial-t discusses savepoints
• • •
•
Definition, ACID Serializability Recoverability
i
e r
Definition
set of operations that access the database
•
– SQL includes START TRANSACTION,
ROLLBACK and COMMIT
– Statements following START TRANSACTION until either ROLLBACK or COMMIT are part of the transaction
– transactions must satisfy ACID properties
•
ACID
– entire transaction behaves like one indivisible unit
– either all operations are done, or none are
Atomicity
Atomicity
– if transaction is executed, database must look as if all operations were done in order
• (can reorder non-conflicting operations if necessary)
– if transaction is not executed, database must look as if none of the operations was done
• must undo effects of any operations that were done
ACID
– execution must not violate consistency of database
•
•
Consistency
Isolation
– execution must not be affected by concurrent execution of other transactions
– database must look as if transactions were performed in some order
• does allow concurrent execution as long as no conflicts
ACID
• ex: Kim is transferring money from her checking to her savings account
• OK if the system removes money from checking, then let’s Jim remove money from his money market account, and finally completes Kim’s transfer
•
Durability
– above properties maintained even if system fails
Sample Error scenarios
•
Interleaved execution
• balance = $100
• you withdraw $50 through ATM, bank autopays a $70 utility bill
• system gets balance = $100 for withdrawal
• because of context switch or concurrency, system next does autopay
• system still gets balance = $100, updates balance to be $100-$70 = $30
• system returns to withdrawal, updates balance to be $100- $50 = $50
• bank is unhappy 🙁
Sample Error scenarios
•
System/network failure during execution
• balance = $50 in checking, $10 in savings
• you transfer $30 from checking to savings
• system withdraws from checking: balance = $20 in checking, $10 in savings
• system crashes
• you are unhappy 🙁
•
execution
Database solutions
Concurrency control handles interleaved
– 2-phase locking is standard solution
•
system/network failures
Recovery management handles – log operations to support undo/redo
Possible transaction states
• •
•
Active : executing
Committed : successfully completed
– SQL: COMMIT
• all modifications of current transaction are saved to
permanent storage
Aborted : finished before success
– SQL: ROLLBACK
• all modifications of database are undone
Possible transaction states
•
•
Partially committed
– operations executed, but not all necessary info saved
Failed: cannot complete
– still need to confirm all modifications undone
– If you interrupt a transaction before it is done (ex: a script of INSERT’s to load data), need to ABORT to really cancel the transaction
Savepoints
•
– Only works inside a transaction
Savepoints let you to do a partial rollback
– SAVEPOINT label; • marks the point
– ROLLBACK TO label;
• Rolls back to that label
• Continues execution until transaction committed or completely rolled back
•
next
Oversimplified solutions
Complete each transaction before starting
– schedule : an actual sequence of operations
– the above would produce a serial schedule
– Which scenario would this fix? Why don’t you like it?
Oversimplified solutions
•
maintain pointer to data; edit copy of data
– on abort, delete copy
– to commit, force copy to disk, update pointer to point to copy
– (this is the approach of many text editors/word processors)
– Why not use this for a database?
•
serial schedule
Serializability
serializable schedule : equivalent to a
– let r(i,X) mean transaction i reads value of X,
– w(i, X) to update X
– ex: r(1,A), r(2,A), w(2,A), w(1,B) is equivalent to r(1,A), w(1,B), r(2,A), w(2,A)
• so as if transaction 1 is executed before transaction 2
Tables to show serializability
Transaction 1
Transaction 2
read(A)
read(A)
write(A)
write(B)
Transaction 1
Transaction 2
read(A)
write(B)
read(A)
write(A)
Serializability
– ex: r(1,A), r(2,A), w(2,A), w(1,A) is not serializable
Transaction 1
Transaction 2
read(A)
read(A)
write(A)
write(A)
Serializability
– ex: r(2,B), r(1,A), w(2,A), w(1,B) — serializable?
Transaction 1
Transaction 2
read(B)
read(A)
write(A)
write(B)
Serializability
– reads never conflict with other reads
– reads and writes on the same object may conflict
Recoverability
•
maintain consistency even with failure
schedule is recoverable if system can
– schedule notation needs: • c(i) — transaction i commits • a(i) — transaction i aborts
Recoverability examples
– ex: r(1,A), r(2,A), w(2,A), w(1,B), c(1) is recoverable
Transaction 1
Transaction 2
read(A)
read(A)
write(A)
write(B)
commit
Recoverability examples
– ex: w(1,A), r(2,A), w(2,A), w(1,B), c(2) is not • why? (what could go wrong?)
Transaction 1
Transaction 2
write(A)
read(A)
write(A)
write(B)
commit
Cascading rollback
•
transaction may cause other aborts
schedule is recoverable, but aborting a – ex: r(1,A), r(2,A), w(2,A), w(1,B), c(2) is
cascadeless
Transaction 1
Transaction 2
read(A)
read(A)
write(A)
write(B)
Cascading rollback
– ex: r(1,A), w(1,A), r(2,A) is not
• suppose we abort transaction 1 next — why does transaction 2 need to abort?
Transaction 1
Transaction 2
read(A)
write(A)
read(A)
write(A)
Cascading rollback
•
ensure a cascadeless schedule?
•
Can set isolation level of transaction
– Default will avoid cascading rollback, but can set lower level of isolation, too
How can we restrict read operations to
Transactions and performance
•
•
Serializability affects performance
– Major companies have been dropping use of transactions for non-critical data
Transactions have overhead
– Ex: for bulk insert, should avoid having a separate transaction for each INSERT
• Start transaction before first INSERT • Commit transaction after last INSERT