编程辅导 www.cardiff.ac.uk/medic/irg-clinicalepidemiology

www.cardiff.ac.uk/medic/irg-clinicalepidemiology

Concurrency control

Copyright By PowCoder代写 加微信 powcoder

Information modelling
& database systems

Concurrency
large databases are used by many users
many users  many transactions
if these transactions are run sequentially, then long transactions will make others wait for long periods
therefore, it is desirable to let transactions run concurrently
… but we need to preserve isolation

let C be a column
transaction A: decrease the value of C by 5, i.e. C := C – 5
transaction B: increase the value of C by 5, i.e. C := C + 5
What would be the effect on C after completing transactions A and B?
(C – 5) + 5 = C  no change in the value of C

Problem: lost update
the final value of C has increased by 5 (i.e. C=12), but should not have changed (i.e. C=7–5+5=7)
Transaction A Time Transaction B e.g. C = 7
read(C) t1 7
C = C – 5 t2 2
t3 read(C) 7
t4 C = C + 5 12
write(C) t5 2
t6 write(C) 12
COMMIT t7 2
t8 COMMIT 12

occurs when two different transactions are trying to update the same cell at the same time

Problem: uncommitted update (“dirty read”)
the final value of C has not changed (i.e. C=7), but should have increased by 5 (i.e. C=12)
it should be as if transaction A never happened
Transaction A Time Transaction B e.g. C = 7
read(C) t1 7
C = C – 5 t2 2
write(C) t3 2
t4 read(C) 2
t5 C = C + 5 7
t6 write(C) 7
ROLLBACK t7 7
t8 COMMIT 7

occurs when a transaction reads data that has been modified by another transaction, but not yet committed

Problem: inconsistent analysis
SUM should be C + D = 2 + 9 = 11
Transaction A Time Transaction B e.g. C = 7, D = 4
read(C) t1 7
C = C – 5 t2 2
write(C) t3 2
t4 read(C) 2
t5 read(D) 4
t6 SUM = C + D 6
read(D) t7 4
D = D + 5 t8 9
write(D) 9
print(SUM) 6

occurs when a transaction reads several values trying to aggregate them, but another transaction updates some of them

Concurrency control
a transaction may be correct in itself, but it can still produce incorrect result if its execution is interfered by other transactions
concurrency control is about managing simultaneous execution of multiple transactions without them interfering with one another
to solve the problem, transactions use locks on shared data items before operating on them

Serialisability

a schedule is a time–ordered execution of operations from a set of concurrent transactions
a serial schedule is a schedule in which …
operations of individual transactions are executed consecutively
… and do not interleave with operations from other transactions
… and each transaction commits before another one is allowed to begin

Serial schedule
serial schedules are guaranteed to avoid interference and keep the database consistent
however, databases need concurrent access, which means interleaving operations from different transactions
Transaction A Time Transaction B e.g. C = 7
read(C) t1 7
C = C – 5 t2 2
write(C) t3 2
COMMIT t4 2
t5 read(C) 2
t6 C = C + 5 7
t7 write(C) 7
t8 COMMIT 7

Serialisability
two schedules are equivalent if they always have the same effect
a schedule is serialisable if it is equivalent to some serial schedule
if two transactions only read some data items, then the order is which they do it is not important
if transaction A reads and updates a data item C and transaction B reads and updates a different data item D, then again they can be scheduled in any order

Serial vs. serialisable
if two transactions only read some data items,
then the order is which they do it is not important
interleaved schedule
serial schedule
Transaction Operation

Transaction Operation

this schedule is serialisable

Conflict serialisability
Do two transactions have a conflict?
NO if they refer to different resources
NO if they only read
YES if at least one is a write and
they use the same resource
a schedule is conflict serialisable if transactions in the schedule have a conflict, but the schedule is still serialisable

Conflict serialisable schedule
interleaved schedule
serial schedule
Transaction Operation
A write(X)
B write(X)
A write(Y)
B write(Y)

this schedule is serialisable even though A and B read and write the same resources X and Y, i.e. they have a conflict
Transaction Operation
A write(X)
A write(Y)
B write(X)
B write(Y)

Conflict serialisable schedules
the main focus of concurrency control
they allow for interleaving and at the same time they are guaranteed to behave like serial schedules
important questions:
How to determine whether a schedule is conflict serialisable?
How to construct conflict serialisable schedules?

Precedence graphs
to determine whether a schedule is serialisable or not, we build a precedence graph
nodes are transactions
edges are precedence: there is an edge from A to B if A must happen before B in any equivalent serial schedule
the schedule is serialisable if there are no cycles in its precedence graph

Precedence
let A and B be two transactions
let a be an action of A and b is an action of B
A takes precedence over B if:
a is ahead of b in the schedule
both a and b involve the same resource R
at least one of a and b is a write action
in other words, A  B if:
A read(R) followed by B write(R)
A write(R) followed by B read(R)
A write(R) followed by B write(R)

remember, A  B if:
A read(R) followed by B write(R)
A write(R) followed by B read(R)
A write(R) followed by B write(R)
Transaction Action
A write(Y)
D write(X)

this schedule is serialisable

remember, A  B if:
A read(R) followed by B write(R)
A write(R) followed by B read(R)
A write(R) followed by B write(R)
Transaction Action
A write(Y)
D write(X)
A write(Z)

this schedule is not serialisable

locking is a procedure used to control concurrent access to data by ensuring serialisability of concurrent transactions
in order to use a resource (e.g. table, row, etc.) a transaction must first acquire a lock on that resource
this may deny access to other transactions to prevent incorrect results

two types of locks:
read lock (shared lock or S–lock)
write lock (exclusive lock or X–lock)
read lock allows several transactions simultaneously to read a resource, but no transactions can change it at the same time
write lock allows one transaction exclusive access to write to a resource and no other transaction can read this resource at the same time
the lock manager in the DBMS assigns locks and records them in the data dictionary

Concurrency control by locking
let T be a transaction and R be a resource
if T holds a write lock on R, then no other transactions may lock R
if T holds a read lock on R, then no other transactions may write lock A
T must acquire a read lock on R before reading R
T must acquire a write lock on R before writing R
after using a lock on R, T must release the lock in order to free up R
if the requested lock is not available, transaction waits

Two–phase locking
a transaction follows the two-phase locking protocol (2PL) if all locking operations precede the first unlock operation in the transaction
two phases:
growing phase where locks are acquired on resources
shrinking phase where locks are released

A follows 2PL protocol
all of its locks are acquired before any of them is released
B does not follow 2PL
it releases its lock on X and then goes on acquire a lock on Y

Transaction A Transaction B
read-lock(X) read-lock(X)
read(X) read(X)
write-lock(Y) unlock(X)
unlock(X) write-lock(Y)
read(Y) read(Y)
Y = Y + X Y = Y + X
write(Y) write(Y)
unlock(Y) unlock(Y)

Serialisability theorem
Any schedule of two–phased transactions is conflict serialisable.

Lost update cannot happen with 2PL
Comment Transaction A Transaction B Comment
read–lock(C) read(C)
read(C) read–lock(C)
cannot acquire write–lock(C) because B has read–lock(C) write(C)
write(C) cannot acquire write–lock(C) because A has read–lock(C)

Uncommitted update cannot happen with 2PL
Comment Transaction A Transaction B Comment
read–lock(C) read(C)
write–lock(C) write(C)
read(C) waits for A
to release
write–lock(C)
locks released ROLLBACK

Inconsistent analysis cannot happen with 2PL
Comment Transaction A Transaction B Comment
read–lock(C) read(C)
write–lock(C) write(C)
read(C) waits for A
to release
write–lock(C) and later write–lock(D)
sum = C + D
read–lock(D) read(D)
write–lock(D) write(D)

deadlock detection
deadlock prevention
timestamping

the use of locks solves one problem (serialising schedules)
… but introduces another (deadlocked schedules)
deadlock is a situation in which two or more transactions are in a simultaneous wait state, each waiting for others to release a lock
transaction A has a lock on a resource C and is waiting for a lock on a resource D
transaction B has a lock on a resource D and is waiting for a lock on a resource C

https://www.youtube.com/watch?v=lSpNhAX7DNY

Wait–for graphs
given a schedule, potential deadlocks can be detected using a wait–for graph (WFG)
nodes are transactions
there is an edge from transaction B to transaction A if B waits for A, i.e.
A holds a lock on a resource R
B is waiting for a lock on the resource R
B cannot get the lock on R unless A releases it
if the graph does not contain cycles, then the schedule will not be deadlocked

transaction B waits for A in any of the following scenarios:
A read–locks R, then B tries to write–lock it
A write–locks R, then B tries to read–lock it
A write–locks R, then B tries to write–lock it

the transactions will be deadlocked
Transaction Action Lock Waits for
A write(P) write–lock(P)
C read(S) read–lock(S)
D read(R) read–lock(R)
C read(P) A
B write(R) D
C read(S) read–lock(S)
B read(Q) read–lock(Q)
D write(P) A
A read(Q) read–lock(Q)
A write(Q) B
B read(S) read–lock(S)

Deadlock prevention
deadlocks can arise with two–phase locking
deadlock is less of a problem than an inconsistent database
we can detect and recover from deadlock
… but it would be nice to avoid it altogether
e.g. conservative two–phase locking
all locks must be acquired before the transaction starts
low ‘lock utilisation’ – transactions can hold on to locks for a long time, but not effectively use them much

Deadlock prevention
deadlocks may be prevented if transactions are to lock resources in some arbitrary but fixed order
we impose an ordering on the resources
transactions must acquire locks in this order
transactions can be ordered on the last resource they locked
this prevents deadlock
if B is waiting for a resource from A then that resource must come after all of A’s current locks
all edges in the wait–for graph point ‘forwards’, so no cycles

Resource ordering
let the resource order be X < Y, i.e. if a transaction need locks on X and Y, it will first acquire a lock on X and only afterwards a lock on Y it is impossible to end up in a situation where: B is waiting for a lock on X held by A, and A is waiting for a lock on Y held by B therefore, no deadlocks Timestamping Timestamping transactions can run concurrently using a variety of techniques we previously looked at using locks to prevent interference an alternative technique is timestamping requires less overhead in terms of tracking locks or detecting deadlocks determines the order of transactions before they are executed Timestamping each transaction has a timestamp, TS if transaction A starts before transaction B, then TS(A) < TS(B) timestamps can be generated using the system clock or an incrementing counter each resource X has two timestamps: R(X) the largest timestamp of any transaction that has read X W(X) the largest timestamp of any transaction that has written X Timestamp protocol let T be a transaction and X be a resource If T tries to read X, then if TS(T) < W(X), then T is rolled back and restarted with a later timestamp otherwise the read succeeds and R(X) is set to be max(R(X), TS(T)) if T tries to write X, then if TS(T) < W(X) or TS(T) < R(X), then T is rolled back and restarted with a later timestamp otherwise the write succeeds and W(X) is set to TS(T) let A and B be two transactions we assume that: the transactions make alternate actions timestamps are allocated from a counter starting with 1 A goes first transactions transactions no W(X) stamp, so A succeeds in read(X) R(X) is set to TS(A), which is 1 transactions no W(X) stamp, so B succeeds in read(X) R(X) is set to max(R(X), TS(B)) = max(1, 2) = 2 transactions no W(Y) stamp, so A succeeds in read(Y) R(Y) is set to TS(A), which is 1 transactions no W(Y) stamp, so B succeeds in read(Y) R(Y) is set to max(R(Y), TS(B)) = max(1, 2) = 2 1 Y = Y + X transactions no reading or writing, so no change in timestamps 1 Y = Y + X 2 Z = Y – X transactions no reading or writing, so no change in timestamps 2 Z = Y – X 1 write(Y) transactions TS(A) = 1 < 2 = read(Y) transactions A is rolled back and restarted with a later timestamp 2 write(Z) transactions B succeeds to write(Z) W(Z) is set to TS(B), which is 2 transactions A succeeds in read(X) R(X) is set to TS(A), which is 3 transactions A succeeds in read(Y) R(Y) is set to TS(A), which is 3 3 Y = Y + X transactions no reading or writing, so no change in timestamps 3 write(Y) transactions A succeeds to write(Y) W(Y) is set to TS(A), which is 3 Timestamping transactions with higher timestamps take precedence equivalent to running transactions in order of their final time values no waiting, no deadlock disadvantages long transactions might keep getting restarted by new transactions rolls back old transactions, which may have done a lot of work /docProps/thumbnail.jpeg 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com