CS5481 Data Engineering Tutorial 10
1. Consider the expression r1 ⋈ r2 and the following numerical information. r1 with schema R1 is stored at S1 and r2 with schema R2 is stored at S2 and the result is needed at S1.
Size of r1: 300,000 bytes
Size of r2: 180,000 bytes
Number of tuples in r1: 20,000
Number of tuples in r2: 15,000
SizeofR1 R2:9bytes
R1 R2 in R1 is a foreign key referencing R2
Determine whether semi‐join strategy should be used in the following two cases.
a) 2/3 tuples of r2 join with some tuples of r1.
b) Only 1/3 tuples of r2 join with some tuples of r1.
a)
Strategy 1: Ship r2 to S1 Transmission cost = 180,000 bytes
Strategy 2: Semijoin
1. Compute temp1 R1 R2 (r1) at S1
2. Ship temp1 from S1 to S2; transmission cost = 90,000 bytes 3. Compute temp2 r2 ⋈ temp1 at S2.
4. Ship temp2 from S2 to S1; transmission cost = 120,000 bytes 5. Compute r1 ⋈ temp2 at S1.
Total transmission cost = 90,000 + 120,000 = 210,000 bytes
Strategy 1 has a lower transmission cost.
b)
Strategy 1: Ship r2 to S1 Transmission cost = 180,000 bytes
Strategy 2: Semijoin
1. Compute temp1 R1 R2 (r1) at S1
2. Ship temp1 from S1 to S2; transmission cost = 45,000 bytes 3. Compute temp2 r2 ⋈ temp1 at S2.
4. Ship temp2 from S2 to S1; transmission cost = 60,000 bytes 5. Compute r1 ⋈ temp2 at S1.
Total transmission cost = 45,000 + 60,000 = 105,000 bytes
Strategy 2 has a lower transmission cost.
CS5481 Data Engineering Tutorial 10
2. Consider the following schedule.
r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y);
(a) Should the schedule be allowed? Why?
(b) Drawaprecedencegraphfortheschedule.
(c) Is this schedule conflict‐serializable?
(a) The schedule should not be allowed. There is a lost update problem: The final value of X is incorrect because T2 reads the value of X before T1 changes it in the database, and hence the updated value resulting from T1 is lost.
(b) T2T1T2
(c) The schedule is not conflict‐serializable.
3. Consider the following schedule. r1(X); w1(X); r2(X); w2(X); r1(Y);
(a) Is there any problem if T1 fails at the end of the schedule?
(b) Is there any problem if T2 commits followed by T1 fails at the end of the schedule?
(c) What can be done to fix the problem in (b)?
(a) Dirty read problem: T1 updates X and then fails, so the system must change X back to its original value. Before it can do so, however, T2 reads the temporary value of X, which will not be recorded permanently in the database because of the failure of T1. The value of X that is read by T2 is called dirty data because it has been created by a transaction that has not completed and committed yet.
(b) The schedule becomes unrecoverable because T2 reads X from T1, but T1 aborts after T2 commits. Then, the value of X that T2 read is no longer valid and T2 must be aborted after it is committed, leading to a schedule that is not recoverable.
(c) The commit instruction of T2 must be postponed until after T1 commits. So, if T1 aborts, then T2 can also abort.
4. Consider the following schedule in which T3 is calculating a total of data items A to Y. r3(A); … r1(X); w1(X); r3(X); r3(Y); r1(Y); w1(Y);
(a) Is the total calculated by T3 correct? Why?
(b) Drawaprecedencegraphfortheschedule.
(c) Is this schedule conflict‐serializable?
(a) The total calculated by T3 is incorrect. There is an incorrect summary problem: T3 is calculating an aggregate summary function (such as total) while T1 is updating X and Y. The aggregate function calculates Y before it is updated and X after it is updated.
(b) T1T3T1
(c) The schedule is not conflict‐serializable.
5. Consider the following schedule that involves three transactions, T1, T2, and T3. r2(A); r1(B); w2(A); r3(A); w1(B); w3(A); r2(B); w2(B);
(a) Drawaprecedencegraphfortheschedule.
(b) Isthisscheduleconflict‐serializable?
(c) Construct a conflict‐equivalent serial schedule by swapping instructions.
(a) T1T2T3
(b) Thisscheduleisconflict‐serializable.
(c) r1(B); w1(B); r2(A); w2(A); r2(B); w2(B); r3(A); w3(A);