Concurrency control (1)
Permits the concurrent execution different transactions
• Crucial in information systems with high load • banks,financialinstitution,airlinebooking
• Permits to efficiently use the DBMS
Copyright By PowCoder代写 加微信 powcoder
• Maximizing the number of served transactions • Minimizing response times
• Application load measured in transactions per second (tps) • Typical values: from tens to thousands
Concurrency control (2)
Mediates access requests to the data
• Decides to grant or deny requests
• Establishes the order of accesses (scheduler)
Concurrency control (3)
For the sake of lectures, we consider two simplifying assumptions:
• Databases in terms of abstract objects x, y, z with numerical values
• Read/write operations on object x, as read/write of the whole page where x is stored
Concurrency control: architecture
read, write begin, commit, abort Lock table
read, write
(not all and possibly in different order)
Access methods manager
Transaction manager
Concurrency manager
Secondary storage manager
Concurrency control (4)
The concurrent execution of different transactions can cause correctness issues and anomalies
• Update loss
• Dirty read
• Inconsistent reads • Ghost update
• Ghost insert
Update loss
Updates by a transaction lost because overwritten by a concurrent transaction
Transaction t1
x := x + 1
w1(x) commit
Transaction t2
x := x + 1
w2(x) Initial value: 2
commit Final value: 3 instead of 4
(the update by t1 is lost)
Dirty read
A transaction reads the intermediate result of another transaction that then aborts (whose modification is then cancelled)
Transaction t1
x := x + 1 w1(x)
Transaction t2
x := x + 1
w2(x) commit
t2 reads an intemediate state which is then cancelled
Inconsistent reads
A transaction reads objects that another transaction is modifying: some read operations are before, others are after the updates
Transaction t1
bot r1(x) w1(y)
r1(x) w1(z) commit
Transaction t2
x := x + 1 w2(x) commit
t1 reads different values for x
Ghost update
A transaction observes only a subset of the effects of another transaction (observing a status of the data that does not satisfy
integrity constraints)
Transaction t1 bot
Transaction t2 bot
y := y – 100 r2(z)
z := z + 100 w2(y)
w2(z) commit
r1(x) r1(y)
Constraint: x+y+z=1000;
s=1100 but the constraint is satisfied
s := x + y + z commit
Ghost insert
A transaction evaluates twice an aggregated value about a set of elements in a selection condition
select avg(Mark) from Exam
where Subject=‘IM’
If a new tuple is inserted between an evaluation and the subsequent one, the results may be different
The anomaly cannot be prevented referring only to already known data
Concurrency control theory
Formally a transaction
• is a sequence of read and write operations
• has a transaction identifier assigned by the system
• starts with begin transaction and ends with end transaction (omitted)
t1: r1(x) r1(y) w1(x) w1(y)
Sequence of input/output operations presented by concurrent transactions
• all the operations of each transaction that committed must appear in the schedule
• for each transaction, operations must appear in the schedule in the same order as in the transaction
For the study:
• we consider the commit-projection and ignore transactions that
(simplifying assumption, not acceptable in practice, will be removed afterwards)
Schedule: example
Transactions
• t1 : r1(x) w1(x) r1(y) w1(y) • t2 : r2(y) w2(y)
Some possible schedules
• S1 : r1(x) w1(x) r1(y) w1(y) r2(y) w2(y) • S2 : r2(y) w2(y) r1(x) w1(x) r1(y) w1(y) • S3 : r1(x) r2(y) w1(x) r1(y) w2(y) w1(y) • S4 : r2(y) r1(x) w1(x) r1(y) w1(y) w2(y)
Concurrency control
• Avoids schedules causing anomalies
• Managed by a module that accepts or refuses operations requested
by transactions (scheduler)
• Based on the identification of classes of acceptable schedules based on definitions of equivalence
• serial schedule
• serializable schedule
Serial and serializable schedules
Serial schedule
• For each transaction ti in the schedule, all the operations in ti are
executed consecutively
• n transactions Þ n! possible serial schedules
Serializable schedule
• non-serial schedule that produces the same result as a serial schedule of the same transactions
• need definition of equivalence among schedules
• progressive concepts: view-equivalence, conflict-equivalence, two-phase locking, timestamp-based
Serial schedule: example
Transactions
• t1 : r1(x) w1(x) r1(y) w1(y) • t2 : r2(y) w2(y)
Serial schedules:
• S1 : r1(x) w1(x) r1(y) w1(y) r2(y) w2(y) • S2 : r2(y) w2(y) r1(x) w1(x) r1(y) w1(y)
Possible schedules:
• S3 : r1(x) r2(y) w1(x) r1(y) w2(y) w1(y) • S4 : r2(y) r1(x) w1(x) r1(y) w1(y) w2(y)
View-serializability (1)
Preliminary definitions
• ri(x) reads-from wj(x) in a schedule S if:
• wj(x) precedes ri(x) in S
• there is no wk(x) between wj(x) and ri(x) in S
• wi(x) is a final write in a schedule S if: • itisthelastwriteonxinS
S: r1(x) w1(x) w1(y) r2(x) w2(y) • r2(x) read-from w1(x)
• w1(x) final write on x
• w2(y) final write on y
View-serializability (2)
View-equivalence
• two schedules Si and Sj (with i ≠ j) are view-equivalent (Si ov Sj) if they have
• the same read-from relations • the same final writes
View-serializability
• a schedule is view-serializable if it is • view-equivalent to a serial schedule
We denote with VSR the set of view-serializable schedules
View-serializability: example (1)
S: w0(x) r2(x) r1(x) w2(x) w2(z)
View-serializability: example (1)
S: w0(x) r2(x) r1(x) w2(x) w2(z)
• read-from:
• r2(x) ¬ w0(x)
• r1(x) ¬ w0(x)
• final write: • x: w2(x) • z: w2(z)
• view-equivalent to the serial schedule w0(x) r1(x) r2(x) w2(x) w2(z)
Þ view-serializable
View-serializability: example (2)
S: w0(x) r1(x) w1(x) r2(x) w1(z)
View-serializability: example (2)
S: w0(x) r1(x) w1(x) r2(x) w1(z)
• read-from:
• r1(x) ¬ w0(x)
• r2(x) ¬ w1(x)
• final write: • x: w1(x) • z: w1(z)
• view-equivalent to the serial schedule w0(x) r1(x) w1(x) w1(z) r2(x)
Þ view-serializable
View-serializability: example (3)
S: r1(x) r2(x) w2(x) w1(x)
View-serializability: example (3)
S: r1(x) r2(x) w2(x) w1(x)
• read-from: • r1(x) ¬
• final write: • x: w1(x)
Þ not view-serializable (loss of update)
View-serializability: example (4)
S: r1(x) r2(x) w2(x) r1(x)
View-serializability: example (4)
S: r1(x) r2(x) w2(x) r1(x)
• read-from: • r1(x) ¬
• r1(x) ¬ w2(x) • final write:
• x: w2(x)
Þ not view-serializable (inconsistent read)
View-serializability: example (5)
S: r1(x) r1(y) r2(z) r2(y) w2(y) w2(z) r1(z)
View-serializability: example (5)
S: r1(x) r1(y) r2(z) r2(y) w2(y) w2(z) r1(z)
• read-from: • r1(x) ¬ • r1(y) ¬ • r2(z) ¬
• r1(z) ¬ w2(z)
• final write: • y: w2(y) • z: w2(z)
Þ not view-serializable (ghost update)
VSR: exercise
w1(x) r1(x) w2(y) r3(x) w1(t) r2(y) r3(t) w3(x) w2(x) r2(t)
View-serializability: complexity
• Deciding if two schedules are view-equivalent: polynomial cost
• Deciding view-serializability of an arbitrary schedule: NP-hard problem
• it is necessary to compare the schedule with all the possible serial schedules
Note: view-equivalence not usable in practice due to its complexity • It is necessary to define more restrictive but practicable conditions
Conflict between operations
• Two operations ai and aj (i ≠ j) are in conflict if • they belong to two different transactions
• they access the same object
• at least one is a write
• read-write conflicts (rw o wr) • write-write conflicts (ww)
Conflict: example
t1: r1(x) r1(y) w1(x) w1(y) t2: r2(y) r2(x) w2(y)
Conflicts:
• w1(x), r2(x) • w1(y), r2(y) • w1(y), w2(y) • r1(y), w2(y)
Conflict-serializability
Conflict-equivalence
• two schedules Si and Sj (i ≠ j) are conflict-equivalent (Si oC Sj) if
• they include the same operations
• each pair of conflicting operations appear in the same order in both schedules
Conflict-serializability
• a schedule is conflict-serializable if it is • conflict-equivalent to a serial schedule
We use notation CSR to denote the set of conflict-serializable schedules
Conflict-serializability: example
S: w0(x) r1(x) w0(z) r1(z) r2(x) r3(z) w3(z) w1(x)
Conflict-serializability: example
S: w0(x) r1(x) w0(z) r1(z) r2(x) r3(z) w3(z) w1(x)
• conflicts:
• w0(x), r1(x)
• w0(x), r2(x) • w0(x), w1(x) • w0(z), r1(z) • w0(z), r3(z) • w0(z), w3(z) • r1(z), w3(z) • r2(x), w1(x)
Conflict-equivalent to serial schedule w0(x) w0(z) r2(x) r1(x) r1(z) w1(x) r3(z) w3(z)
Conflict-serializability: conflict graph
Could be evaluated building the conflict graph:
• each transaction is represented by a node
• for each pair ai and aj of operations in conflict such that ai precedes aj in the schedule, there is an edge from ti (transaction of ai) to tj (transaction of aj )
A schedule is conflict-serializable if and only if the graph is acyclic: it is conflict-equivalent to all the topological orders of the graph
Conflict graph: example
S: w0(x) r1(x) w0(z) r1(z) r2(x) r3(z) w3(z) w1(x)
• conflicts:
• w0(x), r1(x)
• w0(x), r2(x) • w0(x), w1(x) • w0(z), r1(z) • w0(z), r3(z) • w0(z), w3(z) • r1(z), w3(z) • r2(x), w1(x)
Conflict graph: example
S: w0(x) r1(x) w0(z) r1(z) r2(x) r3(z) w3(z) w1(x)
• conflicts:
• w0(x), r1(x)
• w0(x), r2(x) • w0(x), w1(x) • w0(z), r1(z) • w0(z), r3(z) • w0(z), w3(z) • r1(z), w3(z) • r2(x), w1(x)
Conflict-equivalent to serial schedule w0(x) w0(z) r2(x) r1(x) r1(z) w1(x) r3(z) w3(z)
CSR vs VSR
CSR Í VSR
• SÎCSRÞSÎVSR • SÏVSRÞSÏCSR
• SÏCSRÞdonotknowifVSR • SÎVSRÞdonotknowifCSR
CSR vs VSR: example
t1 : r1(x) w1(x)
t2 : w2(x) blind write t3 : w3(x) blind write
S : r1(x) w2(x) w1(x) w3(x) • S is view-serializable
• view-equivalent to t1, t2, t3 • S is not conflict-serializable
Conflict-serializability: complexity
• Deciding if a schedule is conflict-serializable has linear cost in the size of the graph
• However, it is practically too expensive, especially in case of distributed data
• The mechanisms adopted in practice are based on more restrictive, but also more efficient, conditions
VSR-CSR: exercise (1)
• r1(x) w2(y) w2(x) w1(y) r3(x) r3(y) w3(x) w3(y)
VSR-CSR: exercise (2)
• w2(x) r2(y) r1(x) r3(y) r2(z) w3(x) w4(y) w1(z) w4(z)
VSR-CSR: exercise (3)
• r1(x) w1(t) r3(y) w3(y) w1(x) w1(y) r2(t) r3(z) w3(t) r2(x) w2(z) w4(t)
VSR-CSR: exercise (4)
• w2(y) r3(z) w2(x) r1(y) w3(z) r4(x) w1(z) w3(x) w1(x)
Concurrency control
Systems used in practice do not evaluate serializability a-posteriori, but operate in such a way to guarantee such a property
Possible techniques:
• two phase locking
• timestamp
• mono-version • multi-version
Variable describing the status of data wrt the operations that can be executed over it
• two states (binary) lock • unlocked (0)
• locked (1)
• three states lock • unlocked
• r_locked (read, shared)
• w_locked (write, exclusive)
Locking: two states
• To access data, the transaction must ask a lock on the data
• The transaction releases the lock (unlock) when it does not need the
data anymore
• A transaction can access data only if it obtained the corresponding lock
A transaction that follows these rules is said to be well formed according to locking
Locking: three states
• To access data for reading it, the transaction must ask a r_lock on the data (shared)
• To access data for writing it, the transaction must ask a w_lock on the data (exclusive)
• The transaction releases the lock (unlock) when it does not need the data anymore
• A transaction can access data only if it obtained the corresponding lock
A transaction that follows these rules is said to be well formed according to locking
Lock management
Module that acts as lock manager
• Receives lock requests
• Grants or denies lock requests based on conflict tables
• If the request is denied, the requesting transaction is in wait status until the
lock is granted
• Properly modifies lock tables (status and possibly counter)
Conflict table: two states lock
Conflict table: three states lock
OK (c=c+1) r_locked
no* r_locked
OK (c=c-1) c=0: unlocked c>0: r_locked
*: unless the case of upgrade of an exclusive r_lock
Two phase locking
A transaction, after releasing a lock, cannot acquire other locks
• Each transaction has two phases • Increasing (acquires locks)
• Decreasing (releases locks) Guarantees serializability
Resource requests
increasing phase
decreasing phase
2PL schedule
The schedules produced by a scheduler using
• well formed transactions according to locking • conflict-based lock management
• two phase locking
belong to the 2PL class
2PL vs CSR vs VSR
2PL Í CSR Í VSR
• SÎ2PLÞSÎCSR • SÏCSRÞSÏ2PL
• SÏ2PLÞdonotknowifCSR • SÎCSRÞdonotknowif2PL
2PL vs CSR: example
S: r1(x) w1(x) r2(x) w2(x) r3(y) w1(y) • conflict-serializable
• conflict-equivalent to t3, t1, t2 • not 2PL
2PL: exercise (1)
• r1(x) w1(z) w2(t) r3(y) r4(y) w3(t) r1(z) w2(x) r4(x) w4(y)
2PL: exercise (2)
• r1(x) w3(t) r2(y) r2(z) w2(y) w1(z) r3(k) w2(t) w1(y)
Strict two phase locking
The scheduler operates without knowing how transactions will end, hence:
• we need to remove the hypothesis of using a commit-projection • we add a restriction to 2PL
Strict 2PL (used by commercial DBMSs)
• a transaction, after releasing a lock, cannot acquire other locks
• the locks of a transaction can be released only after commit/abort operations
Variations over 2PL
• Base 2PL
• a transaction, after releasing a lock, cannot acquire other locks
• Strict 2PL (prevents dirty reads) • base 2PL
• a transaction can release locks only after commit/abort
• Conservative 2PL (guarantees absence of deadlock) • base 2PL
• a transaction must acquire all the locks before starting executing operations
Concurrency control
Systems used in practice do not evaluate serializability a-posteriori, but operate in such a way to guarantee such a property
Possible techniques:
• two phase locking
• timestamp
• mono-version
• multi-version
Identifier that defines a total order among the temporal events of the system
• the timestamp of an event is greater than the timestamps of the events that preceded it
• no two events have the same timestamp
Possible implementations: • counter
• system clock
Transactions and timestamp
Each transaction is associated with a timestamp (ts)
• assigned by the system at the beginning of the transaction • represents the time at which the transaction started
A schedule is accepted only if it it reflects the serial order of transactions based on the timestamp value of the transactions
Timestamp (1)
Each object x has two indicators:
• RTM(x): max timestamp among transactions that have read x
• WTM(x): max timestamp among transactions that have written x
Timestamp (2)
Answer of the scheduler to requests
• read(x, ts)
• ts < WTM(x) à request denied, transaction aborted
• Otherwiseàrequest accepted; RTM(x) := max(RTM(x),ts)
• write(x, ts)
• ts < WTM(x) or ts < RTM(x)àrequest denied, transaction aborted • Otherwiseàrequest accepted; WTM(x) := ts
Timestamp: example
write(x,8)
write(x,11)
read(x,10)
Timestamp: example
write(x,8)
no (t8 killed)
write(x,11)
read(x,10)
no (t10 killed)
Timestamp: exercise
write(x,7)
read(x,10)
write(x,8)
read(x,13)
read(x,11)
Concurrency control with timestamps
• there is no lock request on resources Þ free from deadlock
• may kill many transactions
• behaves correctly only under the commit-projection assumption
• to remove the assumption it is necessary to buffer write operations (write on persistent memory only after commit)
• read operations of buffer data wait for commit of the writing transaction
Timestamp and multiversion
Each write operation of an object generates a new copy (version) of the same
• read operations requested by ‘old’ transactions are executed on old versions (instead of being denied and transactions aborted)
For each object x:
• there are multiple versions xi each with its own RTM and WTM • each write operation generates a new version
• each read operation operates on the version created by the most recent write operation that precedes the read operation
Transactions and timestamp with multiversion
Answer of the scheduler to requests
• read(x, ts)
• request always accepted
• read version xk such that WTM(xk) = max{WTM(xi) | WTM(xi) ≤ ts } • RTM(xk):= max(RTM(xk),ts)
• write(x, ts)
• find version xj such that WTM(xj) = max{WTM(xi) | WTM(xi) ≤ ts } • ts < RTM(xj)àrequest denied, transaction aborted
• Otherwiseàrequest accepted; create new version xk and RTM(xk):=ts
WTM(xk):=ts
Timestamp and multiversion: example
write(x,7)
write(x,12)
read(x,14)
write(x,13)
write(x,10)
read(x,11)
Timestamp and multiversion: example
write(x,7)
no (t7 killed)
write(x,12)
read(x,14)
write(x,13)
no (t13 killed)
write(x,10)
read(x,11)
Timestamp and multiversion: exercise
write(x, 7)
read(x, 15)
write(x, 10)
read(x, 8)
write(x, 18)
read(x, 6)
write(x, 16)
TSvs2PLvsCSRvsVSR
• SÎTSÞSÎCSR • SÏCSRÞSÏTS
• SÏTSÞdonotknowifCSR • SÎCSRÞdonotknowifTS
transactions with denied actions
killed and restarted
serialization order
implied by conflicts
implied by timestamp
wait for commit
strict 2PL
write buffering
It is more expensive to re-execute transactions than put them in wait Þ 2PL is better
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com