CS计算机代考程序代写 SQL scheme python database Java concurrency 1/42

1/42

Week 10 Workshop – Database Transactions

Designing Data-Intensive Applications, Martin Kleppmann, O’Reilly, 2015

2/42

Transactions
A transaction is a sequence of database operations grouped together for
execution as a logic unit in a DBMS.

Steps Transaction
BEGIN TRANSACTION

1 SELECT balance FROM ACCOUNT WHERE name = ’Steve’;
2 UPDATE ACCOUNT SET balance = balance-500 WHERE name=’Steve’;
3 SELECT balance FROM ACCOUNT WHERE name = ’Bob’;
4 UPDATE ACCOUNT SET balance = balance+500 WHERE name = ’Bob’;
5 COMMIT;

3/42

Transactions
What’s the difference between database transactions and programs written
by a programming language like C, Java and Python?

How are transactions handled in the query processing?

4/42

Transaction Manager – A Simplified View

Disk (tables, indexes, etc.)

Buffer
Manager

Main
Memory

Index/File
Manager

Evaluation
Engine

Transaction
Manager

5/42

Transactions – ACID Properties

Transactions
T1 : BEGIN TRANSACTION

SELECT …

UPDATE …

COMMIT

T2 : SELECT …

T3 : INSERT …

T4 : BEGIN TRANSACTION
SELECT …

DELETE …

ABORT

ACID properties

Atomicity

Consistency

Isolation

Durability

6/42

Transactions – ACID Properties

7/42

Transactions – ACID Properties

ACID properties

Atomicity

Consistency

Isolation

Durability

Send SQL statements

Fetch SQL results
Databases

Web forms

Hacker

Transaction Manager

Recovery

Concurrency
control

Consistency is the responsibility of an application developer.

8/42

Transaction Manager – Common Techniques

Transaction Manager

Concurrency
Control

Recovery

Logging for recovery – assuring atomicity/durability of transactions

e.g., Write-Ahead Log (WAL) Protocol

9/42

Logging – Introduction

A transaction log is an append-only file that records changes to objects
made by transactions.

When multiple transactions run concurrently, log records are interleaved.

A transaction log can be implemented as a separate file or set of files in the
database.

Recovery amounts to either undoing or redoing changes from the log:

Undo the operations that have not been committed;

Redo the operations that have been committed but not yet been
written to disk.

Checkpoints tell the points from which to begin applying transaction logs
during database recovery.

10/42

Write-Ahead Log (WAL) Protocol

Write-Ahead Log (WAL) requires that a record of every change to a
database is available while attempting to recover from a crash.

Any change to an object is first recorded in the log, i.e., a record
containing both the old and new values for the object.

A record in the log must be written to persistent storage before
committing the transaction.

Accordingly, the definition of a committed transaction is:

“ A transaction, all of whose log records, including a commit record,
have been written to persistent storage”.

11/42

Write-Ahead Log (WAL) Protocol

Typical fields in a log record:

Log Records
prevLSN is the LSN of the

previous log record written by
this Xact

(records of an Xact form a linked list
backwards in time)

Possible log record types:
• Update, Commit, Abort
• Checkpoint (for log

maintainence)
• Compensation Log Records

(CLRs)
– for UNDO actions

• End (end of commit or abort)

LSN

prevLSN

transID

type

length

pageID

offset

before-image

after-image

Additional

fields for

update log

records

– Each log record has a unique id called LSN (Log
Sequence Number).

– prevLSN is the LSN of the previous log record
written by the same transaction.

– Possible types include: update, commit, abort,
end, etc.

Does WAL bring in some benefits for performance?

Often results in a significantly reduced number of disk writes
Supports one sync against the log file instead of potentially many
against the data files
Enables online backup and point-in-time recovery

12/42

Transaction Manager – Recovery

Key concepts to aid in recovery:

Transaction log: records of database operations

Write-Ahead Log (WAL)

Undo …

Redo …

Checkpoint: snapshot of the state of a database

(Widely used in practice, but not covered in this course)

13/42

Transaction Manager – Common Techniques

Transaction Manager

Concurrency
Control

Recovery

Logging for recovery – assuring atomicity/durability of transactions

e.g., Write-Ahead Log (WAL) Protocol

Locking for concurrency control – assuring isolation of transactions

e.g., Two-Phase Locking (2PL) Protocol

14/42

Locking – Introduction

A lock is associated with an object, e.g., file, table, record, page, etc.

Two main types of locks:

Shared lock (read-lock): for reading an object by a transaction
Exclusive lock (write-lock): for writing an object by a transaction

(Note: there are other types of locks defined by different DBMSs)

Lock compatibility:

Lock type read-lock write-lock
read-lock Yes No
write-lock No No

15/42

Two-Phase Locking (2PL) Protocol

Locks are handled in two phases:

Expanding: locks are acquired and no locks are released.
Shrinking: locks are released and no locks are acquired.

16/42

Two-Phase Locking (2PL) Protocol

Bad news:

2PL can radically limit interleaving among transactions in some cases …

2PL may be subject to deadlocks, i.e., the mutual blocking of two or more
transactions.

Step T1 T2
1 lock-r(A)
2 read(A)
3 lock-r(B)
4 read(B)
5 lock-w(B)
6 write(B)
7 lock-w(A)
8 write(A)

Transaction Manager

Concurrency
Control

Recovery

T1 T2

B A

T1 is waiting for T2 to get a
write-lock on B. T2 is waiting
for T1 to get a write-lock on A.

17/42

Two-Phase Locking (2PL) Protocol

Good news:

2PL makes interleaving safe, i.e., guarantee the serializability property for
transactions.

Serializability means that a resulting database state is equal to a
database state of running transactions serially.

Serializability is the major correctness criterion for concurrent
transactions.

18/42

Serializability – Example
Consider A = 200 and B = 500, and we have two concurrent transactions:

A+=100 B-=100

A*=3 B*=2

T1

T2

A+=100 B-=100T1

A*=3 B*=2 T2
time

A+=100 B-=100T1

A*=3 B*=2 T2
time

Before any transactions:

A=200

B=500

A=900

B=800

A=700

B=900

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=800

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=900

Serializable transactions:

Case 1:

A+=100 B-=100

A*=3 B*=2

T1

T2

A+=100 B-=100T1

A*=3 B*=2 T2
time

A+=100 B-=100T1

A*=3 B*=2 T2
time

Before any transactions:

A=200

B=500

A=900

B=800

A=700

B=900

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=800

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=900

Case 2:

A+=100 B-=100

A*=3 B*=2

T1

T2

A+=100 B-=100T1

A*=3 B*=2 T2
time

A+=100 B-=100T1

A*=3 B*=2 T2
time

Before any transactions:

A=200

B=500

A=900

B=800

A=700

B=900

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=800

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=900

19/42

Serializability – Example
Consider A = 200 and B = 500, and we have two concurrent transactions:

A+=100 B-=100

A*=3 B*=2

T1

T2

A+=100 B-=100T1

A*=3 B*=2 T2
time

A+=100 B-=100T1

A*=3 B*=2 T2
time

Before any transactions:

A=200

B=500

A=900

B=800

A=700

B=900

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=800

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=900

Case 1: A=900 and B=800

Case 2: A=700 and B=900

Serializable transactions:

Case 1:

A+=100 B-=100

A*=3 B*=2

T1

T2

A+=100 B-=100T1

A*=3 B*=2 T2
time

A+=100 B-=100T1

A*=3 B*=2 T2
time

Before any transactions:

A=200

B=500

A=900

B=800

A=700

B=900

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=800

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=900

Case 2:

A+=100 B-=100

A*=3 B*=2

T1

T2

A+=100 B-=100T1

A*=3 B*=2 T2
time

A+=100 B-=100T1

A*=3 B*=2 T2
time

Before any transactions:

A=200

B=500

A=900

B=800

A=700

B=900

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=800

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=900

20/42

Serializability – Example
Consider A = 200 and B = 500, and we have two concurrent transactions:

A+=100 B-=100

A*=3 B*=2

T1

T2

A+=100 B-=100T1

A*=3 B*=2 T2
time

A+=100 B-=100T1

A*=3 B*=2 T2
time

Before any transactions:

A=200

B=500

A=900

B=800

A=700

B=900

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=800

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=900

Case 1: A=900 and B=800

Case 2: A=700 and B=900

Are the following transactions serializable?

Case 3:

A+=100 B-=100

A*=3 B*=2

T1

T2

A+=100 B-=100T1

A*=3 B*=2 T2
time

A+=100 B-=100T1

A*=3 B*=2 T2
time

Before any transactions:

A=200

B=500

A=900

B=800

A=700

B=900

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=800

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=900Yes. A=900 and B=800 ↪→ equivalent to Case 1!

21/42

Serializability – Example
Consider A = 200 and B = 500, and we have two concurrent transactions:

A+=100 B-=100

A*=3 B*=2

T1

T2

A+=100 B-=100T1

A*=3 B*=2 T2
time

A+=100 B-=100T1

A*=3 B*=2 T2
time

Before any transactions:

A=200

B=500

A=900

B=800

A=700

B=900

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=800

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=900

Case 1: A=900 and B=800

Case 2: A=700 and B=900

Are the following transactions serializable?

Case 4:

A+=100 B-=100

A*=3 B*=2

T1

T2

A+=100 B-=100T1

A*=3 B*=2 T2
time

A+=100 B-=100T1

A*=3 B*=2 T2
time

Before any transactions:

A=200

B=500

A=900

B=800

A=700

B=900

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=800

A+=100 B-=100T1

A*=3 B*=2 T2
time

A=900

B=900

No. A=900 and B=900 ↪→ not equivalent to Case 1 or Case 2!

22/42

Problems in Concurrent Transactions

If no concurrency control for transactions, some problems may occur:

Lost
update

write y write

Dirty
read

write y read

Unrepeatable
read

read y write
(read)

Phantom
read

read y write
(read)

23/42

Problems in Concurrent Transactions

If no concurrency control for transactions, some problems may occur:

Lost
update

write y write

Dirty
read

write y read

Unrepeatable
read

read y write
(read)

Phantom
read

read y write
(read)

objects objects objects a set of objects

24/42

The Lost Update Problem – Another Example
Ben and Amy have the same salary. T1 sets their salaries to $80,000, and
T2 sets their salaries to $90,000.

1 If executing T1 and T2 sequentially,
for T1;T2, both receive $90,000.

for T2;T1, both receive $80,000.

↪→ Either is acceptable from the transaction viewpoint.

2 If executing T1 and T2 concurrently, we may have:
T1 T2

1 write(A) (A:=80000)
2 write(A) (A:=90000)
3 write(B) (B:=90000)
4 commit
5 write(B) (B:=80000)
6 commit

↪→ It is not acceptable!

25/42

The Dirty Read Problem – Another Example
Both Ben and Amy are rewarded a bonus $5,000 and a pay rise 5%. T1
increases their salaries with $5,000 and T2 increments their salaries by 5%.

1 If executing T1 and T2 sequentially, they would have …. Also, T1 or T2
could abort for some reasons. ↪→ all are acceptable from the
transaction viewpoint.

2 If executing T1 and T2 concurrently, we may have:
T1 T2

1 read(A)
2 write(A) (A:=A+5000)
3 read(A)
4 read(B)
5 write(B) (B:=B+5000)
6 abort
7 write(A) (A:=A+A×5%)
8 read(B)
9 write(B) (B:=B+B×5%)
10 commit

↪→ It is not acceptable!

26/42

The Unrepeatable Read Problem – Another Example

Amy and Ben are using a website to book flight tickets to Brisbane.

Amy signs on first to see that only one ticket is left, and finds it
expensive.
Amy takes time to decide. Ben signs on later and also finds one ticket
left, orders it instantly, and logs off.
Amy decides to buy a ticket, and finds no tickets left.

T1 (from Amy) T2 (from Ben)
1 read(X)
2 read(X)
3 write(X) (X:=X-1)
4 commit
5 read(X)

This situation can never arise in a serial execution of T1 and T2.

27/42

The Phantom Read Problem – Another Example
Amy is 30 years old, but her age in the table players is mistakenly recorded
as 40. Ben is 28 years old and his age is correctly recorded in players.
Suppose that we have the following two current transactions:

T1: SELECT * FROM players
WHERE age<32; ... SELECT * FROM players WHERE age<32; COMMIT; T2: UPDATE players SET age=30 WHERE rating=8 and name=‘Amy’; COMMIT; T1 T2 1 read(players) 2 read(players) 3 write(players) 4 commit 5 read(players) 6 commit This situation also can never arise in a serial execution of T1 and T2. 28/42 Discussion – What are the differences between “unrepeatable read” and “phantom read”? Unrepeatable read Executing the same SELECT twice yields the same tuples, but attribute values might be different; May occur when reading objects that are affected by UPDATE from another transaction; Can be prevented using record-level locking. Phantom read Executing the same SELECT twice yields two different sets of tuples; May occur when querying a set of tuples that are affected by INSERT/DELETE/UPDATE from another transaction; Can be prevented using table-level locking. 29/42 What Should We lock? Consider the following two concurrent transactions again: T1: SELECT * FROM players WHERE age<32; ... SELECT * FROM players WHERE age<32; COMMIT; T2: UPDATE players SET age=30 WHERE rating=8 and name=‘Amy’; COMMIT; What objects should the DBMS lock in order to avoid the phantom read problem? Table-level locks e.g., read-lock on players for T1, write-lock on players for T2 Record-level locks e.g., read-lock on every record with age<32 for T1, write-lock on every record with rating=8 and name=‘Amy’ for T2 ... 30/42 Transaction Support in SQL An explicit transaction may have no BEGIN TRANSACTION statement, but must be ended with either COMMIT or ABORT (ROLLBACK) statement. When no explicit transaction statements are given, each single SQL statement is considered to be a transaction. To give programmers more control over transaction overhead, SQL allows them to specify isolation level, i.e., the degree of interference that a transaction is prepared to tolerate on concurrent transactions. Key idea: To trade off consistency (i.e., increased risk of violating database integrity) with performance (i.e., greater concurrent access to data) 31/42 Isolation Levels SQL-92 defines four isolation levels: 1 Read Uncommitted 2 Read Committed 3 Repeatable Reads 4 Serializable To specify an isolation level, e.g., SET TRANSACTION ISOLATION LEVEL serializable; The SQL standard does not impose a specific locking scheme or mandate particular behaviors. 32/42 Isolation Levels The intention is to prohibit certain problems: Isolation Level Dirty Read Unrepeatable Read Phantom Read READ UNCOMMITTED Yes Yes Yes READ COMMITTED No Yes Yes REPEATABLE READ No No Yes SERIALIZABLE No No No Different DBMSs implement isolation levels quite differently. The isolation level required for Lost Update is debatable (depending on a DBMS’s implementations). But in general, it may require the highest level SERIALIZABLE to prevent it. 1 1 https://drtom.ch/posts/2011/11/12/The Lost Update Problem - Part 1/ 33/42 Isolation Levels - Concurrency Control A DBMS provides different levels of isolation → different degrees of concurrency control to prevent different problems. Send SQL statements Fetch SQL results Databases Web forms Hacker Serializable Repeatable read Read committed Phantom read Unrepeatable read Dirty read Read uncommitted Concurrency control is NOT binary in a database system. 34/42 Isolation Levels - Read Uncommitted Read Uncommitted is the least restrictive isolation level. One transaction can see changes made by other transactions which are not yet committed. This can be quite dangerous. Use it when executing queries over read-only data or if it does not matter whether a query returns uncommitted data. Possible issue in multi-user Scenario: Lost update: This occurs when two transactions read and than update the same data and one of the updates is lost. Uncommitted Read : This occurs when a transaction reads the same row of data twice but get different data values each time. Phantom read : This occurs when a row of data that matches search criteria is not seen initially but than seen in a later read operation Isolation levels to enforce concurrency: Managing database consistency and data integrity while allowing more than one application to access the same data at the same time is known as concurrency. DB2 attempts to enforce concurrency is thought the use of given for isolation levels, which determine how data used in one transaction is locked or isolated from other transactions while the first transaction works with it . Repatble read Read stability Cursor stability Uncommitted read Uncommitted read(UR) Least restrictive isolation level available. Allows an application to access uncommitted changes of other applications. When this isolation level is used, rows retrieved by a transaction are only locked if the transaction modifies data associated with one or more rows retrieved or if another transaction drop or alter the table the rows were retrieved from. When this isolation level is used, dirty reads, non-repeatable reads, and phantoms can occur. Use it if you're executing queries on read-only tables/views/databases or if it doesn't matter whether a query returns uncommitted data values. Cursor Stability(CS) View All How to use Crosscheck Backup command in PL/SQL with Oracle Enterprise Manager SQL Server Reporting Services FAQs IBM DB2 FAQs LINQ to XML Axis methods Part 2: Descendants and DescendantsAndSelf What is tablespace and types of tablespace in Oracle SUBSTR() Function in Oracle SQL Plus View All Database database tutorials db2 DB2 article DB2 Create Database db2article DBA DBMS LINQ LINQ to XML LINQ Tutorial MySQL Oracleoracle article oracle articles oracle dba Oracle Tutorial Select statement SQL SQL function Sql server Sql server Articles SQL Server Tutorial Sql server Tutorials SQL*PLUS View More View All Jobs | Post a New Job Beginning VB 2008 Databases: From Novice to Professional Murach's SQL Server 2008 for Developers SQL Server 2008 Query Performance Tuning Distilled View All Sponsored by Become a Sponsor Sponsored by Become a Sponsor TOP LEADERS TRENDING UP MOST LIKED ARTICLE TAG CLOUD PREMIUM SPONSORS LATEST JOBS WHITEPAPERS AND BOOKS Database Concurrency in DB2 http://www.dbtalks.com/uploadfile/551dc5/database-con... 2 of 5 26/08/12 18:05 35/42 Isolation Levels - Read Committed Read Committed: One transaction only sees committed changes by other transactions. It is the most commonly used isolation level in database applications. Use it when you want to maximize concurrency between applications but do not want queries to see uncommitted data. This isolation level only locks the row that is currently referenced by a cursor that was declared and opened by the owning transaction. The lock remains in effect until next row is fetched or transaction is terminated. When this isolation level is used, lost updates and dirty reads cannot occur; non-repeatable reads and phantoms can and may be seen. Default isolation level. Use the Cursor Stability isolation level when you want maximum concurrency between applications, yet you don't want queries to see uncommitted data. Read stability: When this isolation level is used, only rows that are actually retrieved or modified by the owning transaction are locked. When this isolation level is used, lost updates, dirty reads, and non-repeatable reads cannot occur; phantoms, however, can and may be seen. Use the Read Stability isolation level when you want some level of concurrency between applications, yet you also want qualified rows to remain stable for the duration of an individual transaction. Repatable Read: Most restrictive isolation level available If an entire table or view is scanned in response to a query, the entire table or all table rows referenced by the view are locked. This greatly reduces concurrency, especially when large tables are used. Lost updates, dirty reads, non-repeatable reads, and phantoms cannot occur. Use the Repeatable Read isolation level if you're executing large queries and you don't want concurrent transactions to have the ability to make changes that could cause the query to return different results if run more than once. Database Concurrency in DB2 http://www.dbtalks.com/uploadfile/551dc5/database-con... 3 of 5 26/08/12 18:05 36/42 Isolation Levels - Repeatable Reads Repeatable Reads: The objects touched by a transaction are locked and cannot be updated or deleted by a concurrent transaction. Use it when you want some level of concurrency between applications but do not expect individual objects to be changed during a transaction. This isolation level only locks the row that is currently referenced by a cursor that was declared and opened by the owning transaction. The lock remains in effect until next row is fetched or transaction is terminated. When this isolation level is used, lost updates and dirty reads cannot occur; non-repeatable reads and phantoms can and may be seen. Default isolation level. Use the Cursor Stability isolation level when you want maximum concurrency between applications, yet you don't want queries to see uncommitted data. Read stability: When this isolation level is used, only rows that are actually retrieved or modified by the owning transaction are locked. When this isolation level is used, lost updates, dirty reads, and non-repeatable reads cannot occur; phantoms, however, can and may be seen. Use the Read Stability isolation level when you want some level of concurrency between applications, yet you also want qualified rows to remain stable for the duration of an individual transaction. Repatable Read: Most restrictive isolation level available If an entire table or view is scanned in response to a query, the entire table or all table rows referenced by the view are locked. This greatly reduces concurrency, especially when large tables are used. Lost updates, dirty reads, non-repeatable reads, and phantoms cannot occur. Use the Repeatable Read isolation level if you're executing large queries and you don't want concurrent transactions to have the ability to make changes that could cause the query to return different results if run more than once. Database Concurrency in DB2 http://www.dbtalks.com/uploadfile/551dc5/database-con... 3 of 5 26/08/12 18:05 37/42 Isolation Levels - Serializable Serializable is the highest solution level. All transactions are totally isolated from other transactions. It is safe but may cause significant performance hit. Use it when you want some level of concurrency between applications but do not expect that a query returns different sets of results when running at different times. Specifying isolation levels with queries: WITH clause (WITH [RR | RS | CS | UR]) that can be appended to a SELECT statement to set a specific query's isolation level to Repeatable Read (RR), Read Stability (RS), Cursor Stability (CS), or Uncommitted Read (UR). For Example: SELECT * FROM employee WHERE empid = '001' WITH RR SHARE THIS ARTICLE : Locking and concurrency in DB2 Locking Techniques for Concurrency Control in Database Working with databases with database objects in DB2 Concurrency in LINQ to SQL Create Database using DB2 Control Center RELATED ARTICLES POST COMMENT Hosted By CBeyond Cloud ServicesTOP MEMBERS MOST VIEWED TRENDING NOW Adding Parameter Fields in Crystal Report 10 Tags: Crystal Report 10, Crystal Report 10 Article, Crystal Report 10 Tutorial, How to add Parameter Fields in Crystal Report Adding a chart in Crystal Report 10 Tags: Adding chart in Crystal Report 2010., Crystal Report 2010, Crystal Report 2010 Article, Crystal Report 2010 Tutorial PHOTOS TIPS CONSULTING TRAINING STUDENTS MEMBERS MEDIA KIT SUGGEST AN IDEA Database Concurrency in DB2 http://www.dbtalks.com/uploadfile/551dc5/database-con... 4 of 5 26/08/12 18:05 38/42 Locks Taken by SQL Server for Isolation Levels 2 2 http://michaeljswart.com/2012/06/visualizing-transaction-isolations-for-sql-server/ 39/42 Wrap-up - Isolation Levels A lower isolation level increases the ability of many users to access data at the same time, but also increases the number of concurrency effects (such as dirty reads or lost updates) users might encounter. Conversely, a higher isolation level reduces the types of concurrency effects that users may encounter, but requires more system resources and increases the chances that one transaction will block another. Choosing the appropriate isolation level depends on balancing the data integrity requirements of the application against the overhead of each isolation level. 40/42 Research Topics 41/42 Research Topics This is an active research area covering many interesting research topics. Historically, much of the work has been done in the context of relational database systems. However, the ideas in general are independent of whether the underlying system is a relational database system or something else. Distributed database systems Graph database systems Document-oriented database systems ... 42/42 Research Topics Distributed transactions