1. Storage
1. (20 points) For the following queries, estimate their disk access costs using the following access paths. Re- lation R has an attribute x that is its primary key. The table has 100 tuples. You may give R.x a range of 1 . . . 100 since primary keys admit no repeated values. Given the following constants:
• D=timetoreadorwriteapage(15ms) • B=#ofpagesinr(100pages)
• F = fanout of tree index (fanout of 10)
Copyright By PowCoder代写 加微信 powcoder
What are the costs associated with the following database configurations? You may express your answer in formulas.
Costs: c = comparison time n = # of matching records p = # of matching pages
Heap Sorted File on x
Unclustered B+-Tree on x D(1+
+logF 0.15B)
D(logF 0.15B +r)
D(logF 0.15B +r)
Clustered B+-Tree on x DlogF 1.5B
DlogF 1.5B + p DlogF 1.5B + p
Hash Index on x
BD+c(rlog2r)
SELECT * 0.5BD FROM t
WHERE x = 5;
SELECT * BD FROM t
WHERE x > 10;
SELECT * BD+c(rlog2r) FROM t
WHERE x > 7
ORDER BY x;
Dlog2B Dlog2B + p Dlog2B + p
You run this workload many times on your database (varying the constants with which x is compared, e.g., 10, 55, and 70 above) on your database and that these are the only queries you will pose to it. The 3 queries are issued at the same frequency. If I/O is your only cost, how would you design your database?
(a) (5 points) Relation r should be stored as a: A. heap file
B. sorted on x
(b) (5 points)Given this table layout, we should index x with a: A. clustered B+-tree index
B. unclustered B+-tree index C. hash index
2. Indexing
2. (15 points) Draw an extendible hash index that will halve its directory size with the removal of one element. Assume that the index has a minimum directory size of 4. Specify the element to remove below.
Remove entry 4 to halve the directory size to 2 bits.
3. (20 points) Starting with an empty B+-tree insert the values below using the configuration given. Show the resulting index.
a. (10 points) Order d = 1, Keys: 2, 14, 7, 9, 11, 24, 10, 21, 64, 72 Solution:
b. (10points)Orderd=2,Keys: 1,5,7,9,2,12,25,15 Solution:
3. Query Optimization
4. (15 points) Draw a query plan for:
SELECT a, b, c, d
FROM r, s, t
WHERE r.x > 7 AND r.id = s.id AND s.fid = t.fid;
Do not allow any cross products. Given the following:
• r has 500 tuples
• s has 700 tuples
• t has 1,000 tuples
• r.x has a range of 1…10 inclusive
• r.id has a range of 1…50 inclusive
• s.id has a range of 1…100 inclusive • s.fidhasarangeof1…100inclusive • t.fidhasarangeof1…150inclusive
Estimate the cardinality of each edge in your query plan’s directed acyclic graph.
πa,b,c,d 630×1,000× 1 =4200
700×90× 1 =630 100
◃▹s.f id=t.f id 1,000
500× 3 =90 10
◃▹r.id=s.id T
4. Transactions
5. (10 points) For the following schedules, determine if they are conflict serializable. If so, give a correct serial ordering for this schedule (e.g., T1, T2). Otherwise, show why serialization is not possible.
T1 1. RA 2. WA
1. RA 2. RB
(a) Serializable as schedule (T1, T2) by moving T1(RC) before T2’s operations. This access to C has no conflicts with the reads and writes to A and B in T2.
(b) Not conflict serializable. Conflicts that create a loop in precedence graph are (2, 6), (7, 3).
6. (10 points) Determine if the following transaction lock requests will result in deadlock. If so, identify the queries that are waiting on one another. Otherwise, give the order in which the transactions will complete. Assume rigorous two-phase locking.
(a) T1(RI), T2(RJ), T2(WI), T3(RJ), T1(WK),T1(Commit), T3(WJ), T3(RK), T3(Commit), T2(Commit)
(b) T3(RX), T2(RY), T2(WY), T1(WX), T3(RY), T1(WZ), T2(RZ), T1(WY) T1(Commit), T2(Commit), T3(Commit)
(a) T1 |-RI————–WK–| Commit
T2 |—–RJ–WI———————–| Commit T3 |————–RJ——–WJ–RK–| Commit
T2 waits for T1: (WI → RI) T3 waits for T1: (RK → WK) T3 waits for T2: (WJ → RJ)
No cycles so no deadlocks.
4. RB 5. WA 6. WB Commit
5. RB 6. WB 7. WC
T1 |————–WX——WZ——WY–| Commit
T2 |——RY–WY————–RZ——–| Commit
T3 |–RX————–RY——————| Commit
T1 waits for T3: (WX → RX) T3 waits for T2: (RY → WY)
No cycles for deadlock.
Concurrency Control
7. (10 points) Are the following queries potentially susceptible to the phantom problem? Assume that the queries run simultaneously and have interleaving operations.
FROM courses
WHERE dept=’eecs’;
SELECT COUNT(*)
FROM animals
WHERE age = 7;
DELETE FROM animals
WHERE id > 1000;
(b) Assume the courses relation has the attributes (dept, course_no).
INSERT INTO courses
VALUES (’chem’, 101);
(a) Yes, this is potentially subject to the phantom problem because the deletes in T2 may overlap with the tuples accessed by COUNT(*).
(b) No, this schedule is not vulnerable to this issue because the tuples accessed by each transaction are non-intersecting. The chemistry class would not appear in the EECS course list.
8. (10 points) Show the locks and lock points for the following schedules. Assume rigorous two-phase locking.
T1 RA RB WA
WB RA RC Commit
Locks(1) SA
XA (upgrade)
— lock pt.
XA (upgrade) — lock pt.
— lock pt.
Locks(2) SB
XB (upgrade)
— lock pt.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com