1. Query Execution Modeling
1. (6 points) If you were selecting a join algorithm for your query, under what conditions would you prefer each of the following algorithms? (2 points each)
(a) Sort-merge join
When there is an “interesting sort order” in the larger query. This usually arises from an ORDER BY clause.
Copyright By PowCoder代写 加微信 powcoder
(b) Nested loop join
When the join inputs are small and one or both of the relations fit into the buffer pool.
(c) Hash join
All other cases.
2. (15 points) Consider the following query:
FROM r, s, t
WHERE r.id = s.id AND s.id = t.id AND r.zip = 60208;
It has the execution tree:
50 × 750 × 1 max(50,75)
= 500 tuples
◃▹ I D max(50,100)
50×100× 1 = 50 tuples
750 tuples T
500 × 1 = 50 tuples 10
100 tuples S
500 tuples
σzip=60208
We know the following statistics about the data:
• |R| = 500, ICARD(R.zip) = 10, ICARD(R.id) = 50 • |S| = 100, ICARD(S.id) = 100
• |T| = 750, ICARD(T.id) = 75
Label the estimated cardinality of each edge in the query tree.
2. Query Optimization
3. (25 points) Given the query:
FROM a, b, c, d
WHERE a.id = b.id AND b.id = c.id AND c.x = d.x;
Let’s characterize the complexity of optimizing this query. Hint: n = n!
(a) What is the maximum number of possible plans produced by this query? Assume that we have just one
join operator implementation. Consider all possible parenthesizations. (5 points) n!×(2(n−1)/n) =⇒ 4!×6/4=24×20 =120possibleplans.
(b) What pairs of relations would produce cross products if they were joined in a single operator during
this query’s execution? (5 points)
Pairs: AD and BD (aka DA and DB)
(c) How many join orderings are possible for the relations in this query plan? (5 points)
(d) What fraction of these orders are not considered owing to the “no cross products” heuristic in the Selinger query optimizer? Recall that only left-deep plans are allowed in this setup. (10 points)
• ABCD • BACD • CABD • ACBD • BCAD • CBAD • CBDA • BCDA • DCBA • CDBA • BDCA • DBCA • DACB • ADCB • CDAB • DCAB • ACDB • CADB • BADC • ABDC • DBAC • BDAC • ADBC • DABC
10 ≈ 42% of join orders are eliminated. 24
Intuition: disqualified if D before C in order and not (CD) at the beginning.
3. Transactions
4. (9 points) For the following transaction schedules, determine if they are view serializable, conflict serializable, both, or neither. (3 points each)
Transaction 1 1. RA
Transaction 2
5. RC 6. WA
7. WA 8. WC commit
(a) 3. WA 4. WC
View Serializable Conflict Serializable Both
Transaction 1
1. RB 2. RC 3. RA 4. WA
Transaction 2 6. WA
View Serializable Conflict Serializable Both
Transaction 1
2. WA 3. WB commit
Transaction 2
4. RC 5. WA
6. WB 7. WA commit
View Serializable Conflict Serializable Both
5. (15 points) For each of the following transaction schedules select whether or not they are potentially vul- nerable to the phantom problem. (5 points each)
Transaction 1
FROM students
WHERE age > 30;
SELECT COUNT(*)
FROM students
WHERE major=’history’;
Transaction 2
DELETE FROM STUDENTS
WHERE status = ’graduated’;
Transaction 1
SELECT avg(height)
FROM athletes;
INSERT INTO athletes
VALUES (’James’, ’basketball’);
Vulnerable to the phantom problem?
Transaction 2
FROM athletes
WHERE sport = ’basketball’;
SELECT sport, COUNT(*)
FROM athletes
GROUP BY sport;
Vulnerable to the phantom problem?
T1 may reference students who are greater than 30 years old, who have graduated, and were history ma jors.
The athletes T2 views when selecting for basketball may differ from the ones it sees when counting the athletes per sport.
Transaction 1
INSERT INTO classrooms
VALUES (’Tech’, ’Auditorium’, 600)
SELECT building, max(seating_capacity)
FROM classrooms
GROUP BY building;
Vulnerable to the phantom problem?
Transaction 2
SELECT COUNT(*)
FROM classrooms
WHERE building = ’Tech’;
All writes happen before any transaction returns results. So the set of tuples these transactions are querying is consistent.
4. Concurrency Control
6. (15 points) For each of the following schedules, determine if they will result in deadlock. Assume we are using rigorous two-phase locking. Draw a waits-for graph for each of them with all of the edges that occurred during this schedule. (7.5 points each)
(a) T1(RA), T1(RB), T3(RA), T2(RB), T1(WB), T3(WC), T2(RC), T3(RB), T3(Commit), T2(RA), T2(Commit), T1(WA), T1(Commit)
Does a deadlock happen? YES NO Waits-for-graph:
(b) T1(RA), T3(RD), T2(WB), T3(RB), T4(RA), T2(WC), T1(WA), T2(RA), T4(RC), T1(Commit),
T2(WD), T3(Commit), T2(Commit), T4(Commit) Does a deadlock happen? YES NO
Waits-for-graph:
7. (15 points) The following transactions are run using optimistic concurrency control using serial validation. All of the transactions in a batch start at the same time. Determine the transactions that commit and the ones that abort. The timestamp associated with each transaction indicates the order in which it completed its read phase. (7.5 points each)
• T1: RA, WA, RB, WB • T2: RC, WC, RD, WB • T3: RB, WB, RC, WC
What transactions committed? Which ones aborted?
T1 commits because it has no competition when it enters its validation phase. T2 commits because its
read set does not intersect the write set of T1. T3 aborts because it reads B and C, and those were written by earlier transactions.
• T1: RA, RB, RC, WD • T2: RA, RB, WC, WB • T3: WA, WB, WC, WD • T4: RA, RB, RC, RD
What transactions committed? Which ones aborted?
T1 commits because it has no competition when it enters its validation phase. T2 commits because its
read set does not intersect the write set of T1. Because we are doing serial validation, T3 will overwrite the values installed by T1 and T2. T4 reads objects that are potentially dirty and thus aborts.
5. Recovery
8. (10 points (bonus)) Given the following transactions
T1 T2 T3 RA RA RC RC WB WC WB RA WC WB
This is not a schedule nor an interleaving of their work, just a set of operations for each one. If we are using rigorous two-phase locking and have the following log:
1 T1 BEGIN
2 T2 BEGIN
3 T3 BEGIN
4 T2 UPDATE B
5 T2 COMMIT
6 T3 UPDATE C
(a) What should appear in the place of ******? 8 T3 UPDATE B
(b) Give the rest of this recovery log if all transactions commit.
9 T3 COMMIT
11 T1 UPDATE B
12 T1 UPDATE C
13 T1 COMMIT
T1 cannot make progress until T3 releases its exclusive lock on C. So all of T1’s work must be last.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com