程序代写代做 graph algorithm concurrency go C chain ER Hive database Contents:

Contents:
CS W186 Fall 2019 Final
Do not turn this page until instructed to start the exam.
• You should receive one double-sided answer sheet and a 28-page exam packet.
• The midterm has 11 questions, each with multiple parts, and worth a total of 153.5 points.
Taking the exam:
• You have 170 minutes to complete the midterm.
• All answers should be written on the answer sheet. The exam packet will be collected but not graded.
• For each question, place only your final answer on the answer sheet; do not show work.
• For multiple choice questions, please fill in the bubble or box completely as shown on the left below. Answers marking the bubble or box with an X or check mark may receive a point penalty.
• A blank page is provided at the end of the exam packet for use as scratch paper.
Aids:
• You are allowed three handwritten 8.5” × 11” double-sided pages of notes.
• No electronic devices are allowed on this exam. No calculators, tablets, phones, smartwatches, etc.
Grading Notes:
• All I/Os must be written as integers. There is no such thing as 1.02 I/Os – that is actually 2 I/Os. • 1 KB = 1024 bytes. We will be using powers of 2, not powers of 10
• Unsimplified answers, like those left in log format, will receive a point penalty.
0 Pre-Exam Questions (0 points)
1. (0.0001 points) How many hours of week should a head TA work while concussed on medical leave? 2. (0.0001 points) What was the maximum attainable selection index for the PSAT in 2013-14?

1 Broken Joins (12 points)
1. (1 point) Select all possible implementations that can be used for the following join:
SELECT * FROM R, S WHERE R.a != S.b;
A. Page Nested Loop Join B. Block Nested Loop Join C. Index Nested Loop Join D. Grace Hash Join
E. Sort Merge Join
2. (3 points) The inventor of Block Nested Loop Join has decided that block sizes are now always going tobeB/2. WhatistheminI/OcostofjoiningR,Swhere[R]=70pages,[S]=50pages,andB=4 pages.
3. (4 points) From the depths of the 186 archives, we have retrieved 2 files G, S dating back to 1998 where G = grades and S = students. [G] = 110 pages, [S] = 25 pages, and B = 6 pages. 80% of values in G’s join column are duplicates while the other 20% are unique values. Assume a perfect hash function is used to partition G and that its duplicates are all mapped to partition 1. S’s join column consists of unique values and an imperfect hash function is used to create the following partitions of S:
• partition 1: 6 pages • partition 2: 3 pages • partition 3: 4 pages • partition 4: 8 pages • partition 5: 4 pages
What is the total I/O cost of performing Grace Hash Join on G, S assuming that perfect hash functions are used on each partitioning phase following the initial partitioning phase?
4. (3 points) Due to old age, our machine has forgotten how to sort. Luckily, we have indexes on the join columns between R and S. R’s index is an Alt 2 unclustered index with a height of 2 and contains 10 leaf pages. S’s index is an Alt 2 clustered index with a height of 2 and contains 12 leaf pages. R consists of 30 pages with 4 records on each page and S consists of 40 pages with 5 records on each page. What is the average cost of performing Sort Merge Join on R and S?
You can assume that resetting to a record will never require accessing a previous leaf page or a previous data page and that there are enough memory pages to satisfy any operation.
Page 2 of 28

5. (1 point) Grace Hash Join uses exactly two hash functions: 1 for partitioning and 1 for the in-memory hash table.
A. True B. False
Page 3 of 28

2 Concussion Recovery (@lakya) (11 points)
Here’s a list of buzzwords about databases that investors love to hear:
• Atomicity • Durability • Flush
• Force
• Memory Usage • No-Force
• No-Steal
• Performance
• Redo
• Recovery • Steal
• Undo
Use the given database buzzwords to fill in the blanks. A word may be used more than once.
1. (0.5 points) A property of transactions is , which refers to the idea that we will never lose the result of some transaction.
2. (0.5 points) One way to ensure this property is to use the policy.
3. (0.5 points) However, the downside to using this policy is .
4. (0.5 points) In the forward processing portion of ARIES recovery (as covered in this course), we choose to use the policy, which allows us to avoid immediately flushing dirty pages to disk upon commit.
5. (0.5 points) However, our decision to use this policy complicates which is a key part of ACID.
6. (0.5 points) To overcome this problem, we will need to operations during recovery.
Fill in the following blanks with your favorite (correct) inequality symbols:
7. (1 point) Log records must be written to disk when a transaction t commits. The inequality that must
be true in order to achieve this is: lastLSNt flushedLSN.
8. (1 point) Log records must be on disk before data page i gets written to disk. The inequality that must
be true in order to achieve this is: pageLSNi flushedLSN.
Page 4 of 28

You’re a database. You’re managing transactions and pages and logs and you oop… you just crashed. You get the log up to the crash and see the following:
LSN
Record
prevLSN
0
master: checkpoint at LSN 80

10
update: T1 writes P1
0
20
update: T2 writes P2
0
30
update: T3 writes P3
0
40
update: T1 writes P1
10
50
update: T2 writes P4
20
60
abort: T1
40
70
CLR: undo T1 LSN 40, undoNextLSN 10
60
80
begin checkpoint

90
update: T2 writes P4
50
100
abort: T2
90
110
commit: T3
30
120
end: T3
110
130
end checkpoint

CRASH
The end checkpoint also contains the following tables:
9. (2 points) Fill in the following dirty page table and transaction table as they would appear at the end of the analysis phase of ARIES. If an entry should not appear in the table then leave the row blank.
TID
Status
lastLSN
P1
10
T1
Aborting
70
P2
Page 5 of 28
PID
recLSN
Running
Running
50
30
20
T2
P3
30
T3
P4
50
PID
recLSN
TID
Status
lastLSN
P1
T1
P2
T2
P3
T3
P4

10. (4 points) Assume that we’ve already completed the redo phase. Complete the undo phase, specifically, what log records are emitted? Use the table provided in the answer sheet to fill out all records emitted during this phase. You may not need all the rows provided in the table.
Page 6 of 28

3 Octograder Queries (13 points)
There has been an increasing amount of interest in the Introduction to Database Systems course, and this semester, CS W186 expanded to a class size larger than ever. To accommodate the expected increase in volume of homework submissions, the TAs set up a new autograder, the Octograder, for the course homeworks. Octograder is accompanied by Octobot, which is a bot that manages homework submissions (and more).
Consider the following schema for Octograder:
CREATE TABLE edx_students ( edx_id INTEGER PRIMARY KEY, name VARCHAR,
email VARCHAR UNIQUE
);
CREATE TABLE assignments ( assignment_name VARCHAR PRIMARY KEY,
due_date TIMESTAMP );
CREATE TABLE submissions (
submission_id INTEGER PRIMARY KEY,
edx_id INTEGER REFERENCES edx_students(edx_id),
assignment_name VARCHAR REFERENCES assignments(assignment_name), submission_time TIMESTAMP,
submission_file VARCHAR
);
As a reminder, students can submit assignments multiple times. Also, assume all tables only contain infor- mation about the current semester.
The TAs have been very busy this semester, so Octobot, being a highly intelligent and sentient bot, wants to create some SQL queries to help the TAs.
1. (3 points) To get an idea of students’ progress on hw5, it is useful to know how many students have submitted the homework.
Select the queries that find the number of students who submitted hw5. There may be zero, one, or more than one correct answers.
A. SELECT COUNT(*)
FROM submissions
WHERE assignment_name = ‘hw5’;
B. SELECT COUNT(edx_id)
FROM submissions
WHERE assignment_name = ‘hw5’;
Page 7 of 28

C. SELECT COUNT(DISTINCT edx_id) FROM submissions
GROUP BY assignment_name
HAVING assignment_name = ‘hw5’;
2. (3 points) Since students are allowed to use slip minutes on homeworks, Octobot wants to write a query that fetches late submissions.
Select the queries that find the submission_id for all the hw1 submissions that are late. There may be zero, one, or multiple correct answers.
A. SELECT submission_id
FROM submissions S, assignments A
WHERE S.assignment_name = A.assignment_name AND S.assignment_name = ‘hw1’
AND S.submission_time > A.due_date;
B. SELECT submission_id
FROM submissions S INNER JOIN assignments A ON S.assignment_name = A.assignment_name AND S.submission_time > A.due_date
WHERE S.assignment_name = ‘hw1’;
C. SELECT submission_id FROM (
SELECT due_date
FROM assignments
WHERE assignment_name = ‘hw1’
) AS A1, submissions S
WHERE S.submission_time > A1.due_date;
3. (3 points) Octobot wants to send an email to the students who didn’t submit hw5 to ask why they didn’t submit.
Select the queries that find all the emails for all the students who didn’t submit hw5.
There may be zero, one, or multiple correct answers.
A. SELECT email FROM (
SELECT edx_id
FROM submissions
WHERE assignment_name = ‘hw5’
) AS S FULL OUTER JOIN edx_Students E
ON S.edx_id = E.edx_id
WHERE S.edx_id IS NULL;
B. SELECT email
FROM edx_students E LEFT OUTER JOIN (
SELECT edx_id
FROM submissions S
WHERE S.assignment_name = ‘hw5’
) AS S1
ON E.edx_id = S1.edx_id
Page 8 of 28

WHERE E.edx_id IS NULL;
C. SELECT email
FROM edx_students WHERE email NOT IN (
SELECT email
FROM edx_students E, submissions S
WHERE E.edx_id = S.edx_id
AND S.assignment_name = ‘hw5’);
Octobot, being an ambitious bot, wants to do more than just querying the submissions database. He wants to help with grading the submissions! The first step is to find the submission_id of all the submissions to grade. Only the latest submissions for each assignment for every student will be graded.
In order to find all the submission_id of the latest submission for each assignment for every student, Octobot comes up with the following query:
— Octobot’s query
SELECT S.submission_id
FROM Submissions S INNER JOIN (
SELECT edx_id, assignment_name, MAX(submission_time) as latest_sub_time FROM Submissions
GROUP BY edx_id, assignment_name) AS S1
ON S.submission_time = S1.latest_sub_time AND S.edx_id = S1.edx_id
AND S.assignment_name = S1.assignment_name;
4. (1.5 points) Does Octobot’s query output the correct result?
A. Yes, it outputs the submission_id of all the latest submissions for each assignment for every student.
B. No, it might output too few results.
C. No, it might output too many results.
D. No, this query errors.
Octobot is not very confident about its query, so it asks its friend, Kiwibot. You might ask, how does a food delivery bot know about SQL? Well, Kiwibot has been wandering around the campus of the top institution in Computer Science for more than two years now, and it actually gained some knowledge of SQL by overhearing the conversations of EECS students outside Soda Hall!
Kiwibot comes up with the following query:
— Kiwibot’s query
SELECT S1.submission_id
FROM Submissions S1
WHERE S1.submission_time >= ANY (
SELECT S2.submission_time
FROM Submissions S2
WHERE S1.edx_id = S2.edx_id
AND S1.assignment_name = S2.assignment_name);
Page 9 of 28

5. (1.5 points) Does Kiwibot’s query output the correct result?
A.
B. C. D.
6. (1 point) A. B.
Yes, it outputs the submission_id of all the latest submissions for each assignment for every student.
No, it might output too few results. No, it might output too many results. No, this query errors.
Kiwibot’s query contains a subquery (lines 4-7). Is it a correlated subquery? Yes.
No.
7. (0.0001 points) How many octopuses are there in the “HW5 Grades Released” piazza post?
Page 10 of 28

4 Disks, Fires, Buffalo (5 points)
1. (2 points) Fill in the corresponding box on the answer sheet for True or False.
A. For the current generation of flash storage (SSDs), reads are more costly than writes.
B. In a heap file, some pages may not be filled to capacity.
C. Fragmentation is not an issue with packed variable-length record page format
D. For the heap file implementations, the main advantage of linked lists over page directories is that inserting records is more efficient
2. (3 points) One day, the CS W186 TAs got together in a very productive meeting and made the decision to host weekly raffles for their students, with yet-to-be-determined mystery prizes. In this raffle, a new file is created every week. Throughout the week, every student who goes to office hours (assume there are many) gets entered into the file as a record containing their edX id and name. At the end of the week, the TAs draw 5 edX ids to be the winners.
In this scenario, which choices (one letter per each column) would be appropriate and optimize the performance of the raffle execution?
Page Format
File Layout
Index
(a) Fixed Length (Packed)
(d) Sorted File
(f) Clustered Index on edX ids
(b) Fixed Length (Unpacked)
(e) Heap File
(g) Unclustered Index on edX ids
(c) Slotted Page
(h) No Index
Page 11 of 28

5 More Query Pessimization (13 points)
For the following questions, assume the following:
• The System R assumptions about uniformity and independence from lecture hold.
• We use
System R defaults when selectivity estimation is not possible.
Table Schema
Records
Pages
Notes
CREATE TABLE Customers (
cid INTEGER PRIMARY KEY, name VARCHAR,
address VARCHAR,
email VARCHAR,
);
250
25
• Primary key cid is sequential in this table, starting from 1, and there are no gaps in the cid.
CREATE TABLE Purchases (
pid INTEGER PRIMARY KEY,
cid REFERENCES Customers(cid), item_id REFERENCES Items(iid),
);
10,000
1,000
None
CREATE TABLE Items (
iid INTEGER PRIMARY KEY, price INTEGER,
quantity INTEGER,
);
5000
500
• There is a clustered Alternative 3 index on iid of height 2.
• iid ranges from 1 to 7500.
• Prices for each item is unique. • Prices range from 1 to 10,000.
1. (1 point) A. B. C. D. E.
2. (1 point) A. B. C. D. E.
What is the selectivity of cid <= 100 from the Customers table? 1 2 1 4 1 5 2 5 1 10 What is the selectivity price = 10000 AND quantity < 10? 1 100000 1 50000 1 10000 1 5000 1 10 Page 12 of 28 3. (2 points) After applying the System R optimizer, which query has a lower estimated final cost for the optimal query plan. A. SELECT * FROM Customers; B. SELECT * FROM Purchases WHERE pid > 1000; C. SELECT * FROM Items WHERE iid > 1500;
4. (5 points) Select all of the following statements that are true:
A. If there is an eligible index exists, an index scan will always be selected over a sequential scan. B. We can always determine the resulting cardinality of a join.
C. We should always push down selections involving a single table.
D. We use the pass 1 table and previous pass (n-1) table to get the table for the nth pass.
E. In our pass 1 table, we only keep one type of scan for each relation.
Join
Left Relation
Left Ordering
Right Relations
Right Ordering
Join Type
(A)
C
none
P
none
BNLJ
(B)
I
iid
P
none
SMJ
(C)
C
none
I
iid
BNLJ
(D)
C
cid
P
cid
SMJ
5. (2 points) Which of the joins from the table above will NOT result in an interesting order for the next pass of the System R algorithm for the following query?
SELECT *
FROM Customers as C, Purchases as P, Items as I
WHERE C.cid = P.cid AND I.iid = P.item_id AND C.cid < 100 ORDER BY C.cid; 6. (2 points) Which of the following join orders will NOT be considered for Pass 3 in the System R algorithm for the query above? A. C ◃▹ (P ◃▹ I) B. (C ◃▹ I) ◃▹ P C. (I◃▹P)◃▹C D. P ◃▹ (C ◃▹ I) E. (C◃▹P)◃▹I F. I ◃▹ (P ◃▹ C) G. (P ◃▹ I) ◃▹ C Page 13 of 28 6 B+ tree is no more. (16 points) Kiwibot is ending its delivery service (for the semester, not for good) and plans to better balance workload across their fleet of robots for next semester so no robot malfunctions, shuts down, or catches on fire due to being overworked. They want to first know how many deliveries each robot made and how many miles each robot traveled this semester. Luckily, the startup hired an intern who happened to take CS W186 and is an expert in tree indexes. The intern decides to build the following order d = 1 Alternative 3 unclustered B+ tree index on the composite key . Assume B = 20.
1. (1 point) How many I/Os will it take to check if there exists a Kiwibot that made 3 deliveries this semester?
2. (1 point) If the intern wants to access records of all Kiwibots that have made 8 deliveries and traveled 21 miles, how many I/Os would this take, assuming there are 5 such Kiwibots with all their references on the same page?
3. (2 points) One of the company’s field testers found a lost Kiwibot in a bush but was able to repair it and retrieve its delivery and mileage data. The tester tells the intern to insert a new record into the B+ tree with Deliveries = 13, Miles = 29. How many I/Os will this take?
4. (4 points) In the worst case, how many I/Os would it take to insert a new record into the B+ tree in the original diagram? Assume we have 20 buffer pages and we do not need to read in a new page before writing to it.
Page 14 of 28

5. (6 points) Which of the following statements about the intern’s index design are true? There may be zero, one, or more than one correct answer.
A. The B+ tree in the original diagram will always be able to handle 8 new record insertions without growing its height.
B. We can lower the I/O cost of accesses over time by saving the root and all inner nodes in memory.
C. The range Deliveries < 20 will require at least a full scan. D. The range Miles > 50 will require at least a full scan.
E. Deliveries < 8 && Miles > 10 is a lexicographic range.
F. Next semester, Kiwibot implements better workload balancing and discovers a lot of robots have duplicate or very similar keys. Modifying the B+ tree to be Alternative 3 clustered will lower the I/O cost for some range scans.
Great news! Kiwibot secured their Series A funding and expanded incredibly fast in Spring 2020. The company now has delivery service in over 100 college campuses across the country. Try to manage so many new robots, its CIO instructs the intern to build a new B+ tree that is now order d = 4.
6. (2 points) What is the maximum number of keys this new B+ tree can hold if its height is 3?
Page 15 of 28

7 Zero Waste Sorting and Hashing (18 points)
It’s almost 2020! That’s bad news for Berkeley who is rushing to reach its goal of “Zero Waste by 2020.” What’s more is that they just found more waste in their external sorting and hashing algorithms! It’s your job to find out how much is being wasted and what can be done to reduce the waste.
1. (5 points) First, we need to sort N = 1000 pages using B = 30 buffer frames. For each instance of creating runs or merging in a pass, find the number of frames that go unused then find the sum of unused frames across all passes. How many total frames will go unused in the process of sorting?
2. (1.5 points) Pertaining to the question above, select all of the following that could have an effect on reducing the number of unused frames. Assume we decrease the number of buffer frames to where we can still sort the data.
A. Increasing the number of buffer frames B. Decreasing the number of buffer frames C. The values of data we’re sorting
3. (3 points) Let’s now try to sort an N=120 page file. Let’s assume that between pass 0 and pass 1, someone magically shrunk our buffer pool; in pass 0, we had B=10 buffer frames, but from pass 1 onwards, we had B=3. How many I/Os will it take to sort the file?
4. (4 points) Now, we need to hash a N = 50-page file using B = 6 buffer frames. For the first partitioning phase, assume two partitions contain 18 pages worth of data evenly between them and all other partitions are uniformly partitioned. Assume uniform partitioning. How much total empty space will be written to disk when partitioning this data across all partitioning passes?
Page 16 of 28

5. (1.5 points) Pertaining to the question above, select all of the following that could have an effect on reducing the amount of empty space on pages written to partitions. Assume uniform partitioning and that we only decrease the number of buffer frames to where we can still hash the data.
A. Increasing the number of buffer frames B. Decreasing the number of buffer frames C. The values of data we’re hashing
6. (3 points) Let us assume that we have 2 machines, M1 with B=5 and M2 with B=10. Partitioning is done on the machine where B=5, and the partitions are then sent over to the B=10 machine for the conquer phase. How many I/Os will it take to hash an N=100 page file, assuming uniform partitioning and assuming that partitions are streamed directly to memory on M2?
Page 17 of 28

8 Parallelograms (10.5 points)
We have 3 machines: m1, m2, and m3. The data for the Teachers table is range partitioned on the age column such that m1 has 40 pages of data, m2 has 15 pages, and m3 has 50 pages. The ranges for each machine are: m1 has values 1-25, m2 has values 26-30, and m3 has values 31-120. The data is not stored in sorted files on any machine. For questions 1-3, how many I/Os across all machines will it take to execute each query? Assume that each machine has 742 pages of memory.
1. (2 points) SELECT MIN(age) FROM Teachers;
2. (2 points) SELECT COUNT(DISTINCT age) FROM Teachers;
3. (1.5 points) Which queries, if any, would run faster if we round-robin partitioned the data instead of the range partitioning scheme described in the problem?
A. Query 1
B. Query 2
C. None of them
We want to join the Students table with the Classes table using the join condition: S.cid = C.id. The Students table has 90,000 pages and the Classes table has 90 pages. We have three machines: m1, m2, and m3. Currently the Students table is round robin partitioned across the 3 machines and the Classes table is all on m1. Each page is 1KB.
4. (1 point) How many KB of data from the Students table will be sent across the network in a broadcast join?
5. (1 point) How many KB of data from the Classes table will be sent across the network in a broadcast join?
6. (1 point) How many KB of data from the Classes table will be sent across the network in a parallel hash join in the average case? Assume that we have perfect hash functions that divide up the data evenly.
7. (1 point) How many KB of data from the Students table will be sent across the network in a parallel hash join in the average case? Again assume that the data is distributed uniformly and that we have hash functions that divide up the data completely evenly.
8. (1 point) If disk access cost and CPU cost are negligible compared to the network cost, which join should we pick to optimize for performance?
A. Broadcast Join B. Hash Join
Page 18 of 28

9 Emergency Rooms & Fire Departments (21 points)
Student
a. b.
takes
teaches
Class
Professor
Complete the ER diagram to enforce that every student must be enrolled in at least one class and that classes can have any number of students.
1. (1 point) Fill in a. (between Students and takes): A. Thin Line
B. Bold Line C. Thin Arrow D. Bold Arrow
2. (1 point) Fill in b. (between takes and Class): A. Thin Line
B. Bold Line C. Thin Arrow D. Bold Arrow
3. (4 points) Which of the following statements are true? There may be zero, one, or more than one correct answer.
A. Josh Hug, under the current design, can teach two classes.
B. Weak Entities have a directly attached primary key
C. Josh Hug and Joe Hellerstein cannot both teach CS W186.
D. ER Diagrams are purely engineering and have no ’policy’ implications.
Page 19 of 28

The existing database design does not allow for students to teach classes, and the department decides to change this to allow a qualified student to co-instruct a course with a professor.
Implement the following constraints:
• Students may instruct up to one class
• A course cannot be instructed by multiple students. • A student cannot instruct a course on their own.
• Classes may have multiple TAs.
• Professors cannot opt out of instructing.
• Students have to take at least one course.
• Classes must have an enrollment > 0.
4. (1 point) What line goes between Student and Instructs (Student)? A. Thin Line
B. Bold Line C. Thin Arrow D. Bold Arrow
Page 20 of 28

5. (1 point) What line goes between Class and Instructs (Student)? A. Thin Line
B. Bold Line C. Thin Arrow D. Bold Arrow
6. (1 point) What line goes between Class and takes? A. Thin Line
B. Bold Line C. Thin Arrow D. Bold Arrow
7. (1 point) What line goes between Student and takes? A. Thin Line
B. Bold Line C. Thin Arrow D. Bold Arrow
8. (1 point) What line goes between Class and TAs? A. Thin Line
B. Bold Line C. Thin Arrow D. Bold Arrow
9. (1 point) What line goes between Student and TAs? A. Thin Line
B. Bold Line C. Thin Arrow D. Bold Arrow
10. (1 point) What line goes between Professor and Instructs (Professor)? A. Thin Line
B. Bold Line C. Thin Arrow D. Bold Arrow
11. (1 point) What line goes between Class and Instructs (Professor)? A. Thin Line
B. Bold Line C. Thin Arrow D. Bold Arrow
In the final part we will be using the following attribute set: R = (SFLNGCRWH) = (S: SID, F: First Name, L: Last Name, N: Number of Units, G: GPA, C: Course (of teaching), R: Rating for class, W: Wage, H: Hours per week of work)
Page 21 of 28

12. (3 points) Given the following FDs: F = {S −→ FL,S −→ NG,SC −→ R,CH −→ W,FL −→ R,C −→ H,H −→ C}, which of the following additional FDs (added to the set in isolation from the other answer choices) would reduce the size of the candidate key if it were the only one applied? There may be zero, one, or more than one correct answer.
A. SID determines Rating
B. Rating determines Wage
C. Rating determines Course
D. Rating and First Name determine Wage and Hours of Work
13. (4 points) What is the BCNF decomposition of the following FDs F = {S −→ F L, F L −→ N G, S −→ C, SC −→ RH, CH −→ W }? Please check the appropriate boxes to indicate the attribute and its table in BCNF; you might not use all 4 columns on the answer sheet. Please reserve R1 for the largest table (in terms of number of attributes), R2 for the second largest, and so forth.
Page 22 of 28

10 Congo Web Services (19 points)
1. (4 points) Which of the following statements are true? There may be zero, one, or more than one correct answer.
A. There exists a way to check in polynomial time if any schedule is view serializable.
B. Starting from a node on a dependency graph with only outgoing edges and running Depth First Search will always return a conflict equivalent serial schedule.
C. Two Phase Locking prevents cascading aborts, but does not guarantee conflict serializability.
D. According to HW4, you cannot substitute a S lock for an IS lock (replace IS with a S).
Consider the following schedule for the next two questions:
2. (2 points) What is the set of edges of the dependency graph for the schedule? We write (Ti, Tj) if there exists a directed edge from Ti to Tj.
A. {(T1, T4), (T3, T2), (T4, T2), (T1, T3)}
B. {(T2, T4), (T3, T4), (T4, T1), (T2, T3), (T1, T4), (T3, T1)} C. {(T4, T2), (T4, T3), (T1, T4), (T3, T2), (T4, T1), (T1, T3)} D. None of the above
3. (1 point) True or False: This schedule is conflict serializable.
After graduating from UC Berkeley, the number one public university, you start working at Congo Inc. with an extremely lucrative career managing databases for their Congo Web Services (CWS). The company tasks you with solving their concurrency problems. Good thing you took CS W186!
For the next two questions, consider a database X with a table Y. Y has pages A, B, and C. Page A has tuples A1, …, A10, page B has tuples B1, …, B10, and page C has tuples C1, …, C10.
Your manager has given you the following schedule and tells you that there is currently too little concurrency because each transaction locks the entire database for every update. Having taken CS W186, you suggest using multigranularity locking (with minimum privilege)!
T1
R(A)
W(C)
R(B)
T2
R(A)
R(B)
T3
R(A)
W(B)
T4
W(A)
R(C)
Transaction
1
2
3
4
5
6
7
8
T1
R(A1 )
R(B)
W(B2 )
W(A2 )
T2
R(B1 )
T3
R(C1 )
T4
W(A2 )
W(B)
Page 23 of 28

4. (3 points) Using the new multigranularity scheme, list all the locks that T1 has after timestep 8 in the order of acquisition with the first lock acquired being on the top left and most recent lock being on the bottom right (following all time of acquisition rules from the homework). You may or may not need all the boxes. Leave unused boxes blank.
5. (1 point) True or False: This schedule has deadlock.
After solving the low concurrency issue, your manager praises your knowledge of databases and hands you a snippet of another schedule:
Your manager tells you that the transactions seem to be stuck! You do some investigating and find out that the schedule causes the transactions to be in deadlock. Again, using your CS W186 knowledge, you decide to implement deadlock avoidance!
For the following four questions, assume that T1 started first, T2 started second, T3 started third, and T4 started last. If a transaction gets aborted, it will immediately release all locks it has acquired and the wait queue will be processed. Additionally, if a transaction gets placed on a waiting queue (blocked), locks that the transaction acquired up to that point are not released and the actions for that transaction while it is blocked do not happen (as if the action was never on the schedule). Additionally, assume promotes are implemented as described on homework 4.
6. (2 points) If you use the wound-wait approach, which transaction(s) will end up getting aborted? Select all correct choices. If no transaction gets aborted, select None
A. T1
B. T2
C. T3
D. T4
E. None
Lock 1
Lock 2
Lock 3
Lock 4
Lock 5
Lock 6
Lock 7
Lock 8
Transaction
1
2
3
4
5
6
7
8
9
10
T1
W(A)
W(C)
T2
R(A)
R(C)
W(A)
T3
W(B)
W(A)
T4
R(B)
R(C)
W(C)
Page 24 of 28

7. (2 points) Which transaction(s) will end up blocked by the end of timestep 10 if using wound-wait? Select all correct choices. If no transaction gets blocked, select None
A. T1
B. T2
C. T3
D. T4
E. None
8. (2 points) If you use the wait-die approach, which transaction(s) will end up getting aborted? Select all correct choices. If no transaction gets aborted, select None
A. T1
B. T2
C. T3
D. T4
E. None
9. (2 points) Which transaction(s) will end up blocked by the end of timestep 10 if using wait-die? Select all correct choices. If no transaction gets blocked, select None
A. T1
B. T2
C. T3
D. T4
E. None
Page 25 of 28

11 Distributed Fires (15 points)
1. (1 point) Two-phase commit allows a distributed database to commit transactions as long as a majority of the servers are alive.
A. True. B. False.
2. (1 point) In a distributed database, if there is no lock dependency cycle at any individual node, then there cannot be a deadlock.
A. True. B. False.
3. (1 point) In two-phase commit with two-phase locking, when do participants release their locks? A. After logging prepare.
B. After logging commit.
The owner of a chain of boba shops was never any good at databases, and following an embarrassing incident where they couldn’t figure out how to find their most popular drink, they’ve hired a Berkeley student (that’s you!) to run their now-massive parallel distributed database.
You’re talking to the owner one day, trying to explain the two-phase commit protocol with presumed abort that you just built for the database. The owner doesn’t seem to understand why it works, though. They wonder if the protocol would still work if various parts of the protocol were changed.
For each of the following modifications, select all the problems that it might cause. There might be zero, one, or multiple correct answers.
4. (1 point) In phase 1, participants log prepare records after sending out the vote, instead of before. A. Transactions that commit might have operations that do not get committed.
B. Transactions that abort might have operations that do not get aborted.
5. (1 point) In phase 1, participants log abort records after sending out the vote, instead of before. A. Transactions that commit might have operations that do not get committed.
B. Transactions that abort might have operations that do not get aborted.
6. (1 point) The coordinator does not flush the commit records that it writes.
A. Transactions that commit might have operations that do not get committed. B. Transactions that abort might have operations that do not get aborted.
7. (1 point) In phase 2, the participant sends out acks before logging commit.
A. Transactions that commit might have operations that do not get committed. B. Transactions that abort might have operations that do not get aborted.
8. (1 point) In phase 1, a participant logs prepare, but accidentally votes no instead of yes. A. Transactions that commit might have operations that do not get committed. B. Transactions that abort might have operations that do not get aborted.
Page 26 of 28

9. (1 point) In phase 1, the coordinator sends out prepare requests to the participants. Before hearing back, however, the coordinator gets cold feet and decides to abort the transaction. It immediately logs and flushes an abort record, and tells the user the transaction aborted.
A. Transactions that commit might have operations that do not get committed. B. Transactions that abort might have operations that do not get aborted.
10. (1 point) A participant, after logging a prepare record, is too impatient to wait to hear the commit decision. It goes behind the coordinator’s back and asks all the other participants what they voted. If they all voted yes, the participant logs a commit record and commits the transaction.
A. Transactions that commit might have operations that do not get committed. B. Transactions that abort might have operations that do not get aborted.
11. (1 point) A participant, after logging a prepare record, notices that the coordinator has crashed, so it aborts the transaction.
A. Transactions that commit might have operations that do not get committed. B. Transactions that abort might have operations that do not get aborted.
12. (1 point) The coordinator, after logging a commit record, forgets to send phase 2 commit messages. A. Transactions that commit might have operations that do not get committed.
B. Transactions that abort might have operations that do not get aborted.
13. (1 point) The coordinator is paranoid and instead of logging commit after phase 1, it waits for phase 2 to complete (receiving ACKS from all participants) before committing the transaction.
A. Transactions that commit might have operations that do not get committed. B. Transactions that abort might have operations that do not get aborted.
14. (1 point) After the coordinator logs a commit record, it logs an end record right away and forgets about the transaction.
A. Transactions that commit might have operations that do not get committed. B. Transactions that abort might have operations that do not get aborted.
15. (1 point) Why bother waiting for prepares? In phase 1, as soon as the coordinator receives a single YES vote, it logs a commit record right away and commits the transaction.
A. Transactions that commit might have operations that do not get committed. B. Transactions that abort might have operations that do not get aborted.
Page 27 of 28

12 Scratch Work (0 points)
Page 28 of 28