CS 186 Introduction to Database Systems Spring 2022
INSTRUCTIONS
This is your exam. Complete it either at exam.cs61a.org or, if that doesn’t work, by emailing course staff with your solutions before the exam deadline.
This exam is intended for the student with email address If this is not your email address, notify course staff immediately, as each exam is different. Do not distribute this exam PDF even after the exam ends, as some students may be taking the exam in a different time zone.
Copyright By PowCoder代写 加微信 powcoder
For questions with circular bubbles, you should select exactly one choice. You must choose either this option
Or this one, but not both!
For questions with square checkboxes, you may select multiple choices. You could select this choice.
You could select this one too!
You may start your exam now. Your exam is due at 06:00PM Pacific Time. Go to the next page to begin.
Exam generated for 2 Preliminaries
1 CS 186 Spring 2022 Final
Unless you have special accommodations, you will have 170 minutes to complete the final.
Do not share this exam until solutions are released.
1.0.1 Contents:
• The Final has 12 questions, each with multiple parts, and worth a total of 230 points.
1.0.2 Aids:
• You may use 3 pages (double-sided) of handwritten notes as well as a calculator. • You must work individually on this exam.
1.0.3 Grading Notes:
• All I/Os must be written as integers. There is no such thing as 1.02 I/Os – that is actually 2 I/Os.
• 1 KB = 1024 bytes. We will be using powers of 2, not powers of 10.
• Unsimplified answers, like those left in log format, will receive a point penalty.
(a) What is your full name?
(b) What’s your exam room?
(c) What is your student ID number?
(d) SID of the person to your left (Put None if you are taking the exam remotely):
(e) SID of the person to your right (Put None if you are taking the exam remotely):
Exam generated for 3 1. (18 points) GradQL
The university is trying to plan graduation and is asking for your help in writing queries for their database! We have the following tables below. The graduates table represents all students who are graduating and the graduation_tickets table represents all the tickets and who owns them.
CREATE TABLE graduates (
student_id INT NOT NULL,
first_name VARCHAR(30),
last_name VARCHAR(30),
major VARCHAR(30),
PRIMARY KEY (student_id),
CREATE TABLE graduation_tickets (
ticket_id INT NOT NULL,
PRIMARY KEY (ticket_id),
FOREIGN KEY (owner) REFERENCES graduates (student_id)
Exam generated for 4 (a) (3 pt) Which of the following queries returns the number of graduation tickets each English major graduate
owns. Select all that apply.
SELECT student_id, first_name, last_name, COUNT(*)
FROM graduates g LEFT JOIN graduation_tickets t
ON g.student_id = t.owner
WHERE g.major = ‘English’
GROUP BY g. student_id;
SELECT student_id, first_name, last_name, COUNT(*)
FROM graduates g INNER JOIN graduation_tickets t
ON g.student_id = t.owner
WHERE g.major = ‘English’
GROUP BY g. student_id;
SELECT student_id, first_name, last_name, COUNT(*)
FROM graduation_tickets t LEFT JOIN graduates g
ON g.student_id = t.owner
GROUP BY g. student_id
HAVING g.major = ‘English’;
SELECT student_id, first_name, last_name, COUNT(*)
FROM graduation_tickets t INNER JOIN graduates g
ON g.student_id = t.owner
GROUP BY g. student_id
HAVING g.major = ‘English’;
E. None of the above
Exam generated for 5 (b) (3 pt) Same tables repeated below for your convenience.
CREATE TABLE graduates (
student_id INT NOT NULL,
first_name VARCHAR(30),
last_name VARCHAR(30),
major VARCHAR(30),
PRIMARY KEY (student_id),
CREATE TABLE graduation_tickets (
ticket_id INT NOT NULL PRIMARY KEY,
PRIMARY KEY (ticket_id),
FOREIGN KEY (owner) REFERENCES graduates (student_id)
The university wants to pair students with another student. Which of the following queries returns a list of ids of all possible pairings of students. Select all that apply.
SELECT g1.student_id, g2.student_id
FROM graduates g1 FULL OUTER JOIN graduates g2
ON g1.student_id != g2.student_id;
SELECT g1.student_id, g2.student_id
FROM graduates g1, graduates g2
WHERE g1.student_id != g2.student_id;
SELECT g1.student_id, g2.student_id
FROM graduates g1, graduates g2
WHERE g1.student_id NOT IN (
SELECT * FROM
graduates AS temp
WHERE g2.student_id = temp.student_id
D. None of the above
Exam generated for 6 (c) (3 pt) Same tables repeated below for your convenience.
CREATE TABLE graduates (
student_id INT NOT NULL,
first_name VARCHAR(30),
last_name VARCHAR(30),
major VARCHAR(30),
PRIMARY KEY (student_id),
CREATE TABLE graduation_tickets (
ticket_id INT NOT NULL PRIMARY KEY,
PRIMARY KEY (ticket_id),
FOREIGN KEY (owner) REFERENCES graduates (student_id)
CREATE TABLE club_members (
club_id INT NOT NULL,
student_id INT NOT NULL
FOREIGN KEY (student_id) REFERENCES graduates (student_id)
Suppose we have an additional table shown above containing club members across all clubs shown below. Which of the following queries computes the average amount of money spent on tickets across all members of each club. Assume each ticket costs $10.
SELECT c.club_id, COUNT(*) * 10
FROM graduates g INNER JOIN graduation_tickets t INNER JOIN club_members c
ON g.student_id = t.owner AND g.student_id = c.student_id
GROUP BY c.club_id;
SELECT c.club_id, COUNT(*) * 10
FROM graduates g INNER JOIN graduation_tickets t INNER JOIN club_members c
ON g.student_id = t.owner AND g.student_id = c.student_id
GROUP BY g.student_id;
SELECT c.club_id, COUNT(*) * 10
FROM graduates g INNER JOIN club_members c
ON g.student_id = c.student_id
GROUP BY g.student_id;
SELECT c.club_id, SUM(p.ticket_cost)
FROM club_members c
INNER JOIN (
SELECT g.student_id, COUNT(*) * 10 AS ticket_cost
FROM graduates g INNER JOIN graduation_tickets t
ON g.student_id = t.student_id GROUP BY g.student_id
GROUP BY c.club_id;
WITH ticket_prices AS (
Exam generated for 7 SELECT owner, 10 AS price FROM graduation_tickets
SELECT c.club_id, SUM(p.price)
FROM graduates g, ticket_prices p, club_members c
WHERE c.student_id = g.student_id AND p.owner = g.student_id
GROUP BY c.club_id;
F. None of the above
Exam generated for 8 (d) (3 pt) Suppose we have two additional tables shown below representing the students who attended the
mechanical engineering commencement and the students who attended the economics major commencement.
CREATE TABLE mech_eng_graduation_attendees (
student_id INT NOT NULL,
seat VARCHAR(30),
FOREIGN KEY (student_id) REFERENCES graduates (student_id)
CREATE TABLE econ_graduation_attendees (
student_id INT NOT NULL,
seat VARCHAR(30),
FOREIGN KEY (student_id) REFERENCES graduates (student_id)
Which of the following queries computes the total number of attendees across both the mechanical engineering and economics commencements? Assume for this question only that students may have 2 majors and may have attended both graduations. If a student attends both, count as 1 in your output. Select all that apply.
WITH attendees AS (
SELECT student_id
FROM mech_eng_graduation_attendees
SELECT student_id
FROM econ_graduation_attendees
SELECT COUNT(*)
FROM attendees;
WITH attendees AS (
SELECT student_id
FROM mech_eng_graduation_attendees
SELECT student_id
FROM econ_graduation_attendees
EXCEPT ALL
SELECT m.student_id
FROM mech_eng_graduation_attendees m, econ_graduation_attendees e
WHERE m.student_id = e.student_id
SELECT COUNT(*)
FROM attendees;
WITH attendees AS (
SELECT student_id
Exam generated for 9
FROM mech_eng_graduation_attendees
SELECT student_id
FROM econ_graduation_attendees
SELECT COUNT(*)
FROM attendees;
SELECT COUNT(*)
FROM mech_eng_graduation_attendees m OUTER JOIN econ_graduation_attendees e
WHERE m.student_id != e.student_id;
E. None of the above
Exam generated for 10 (e) (3 pt) Which of the following queries computes all students who did not attend any commencement?
Select all that apply.
SELECT g.student_id
FROM graduates g
WHERE g.student_id NOT IN (
SELECT m.student_id
FROM mech_eng_graduation_attendees m INNER JOIN econ_graduation_attendees e
SELECT g.student_id
FROM graduates g
WHERE g.student_id NOT IN (
SELECT m.student_id
FROM mech_eng_graduation_attendees m
AND g.student_id NOT IN (
SELECT m.student_id
FROM econ_graduation_attendees e
SELECT g.student_id
FROM graduates g
WHERE g.student_id NOT EXISTS (
SELECT m.student_id
FROM mech_eng_graduation_attendees m WHERE m.student_id = g.student_id
AND g.student_id NOT EXISTS (
SELECT m.student_id
FROM econ_graduation_attendeese e WHERE e.student_id = g.student_id
WITH attendees AS (
SELECT student_id
FROM mech_eng_graduation_attendees
SELECT student_id
FROM econ_graduation_attendees
SELECT a.student_id
FROM attendees
WHERE a.student_id NOT IN (SELECT g.student_id FROM graduates g);
E. None of the above
Exam generated for 11 (f) (3 pt) Which of the following is a valid relational algebra representation of the query below? Select all
that apply.
SELECT tb1.col1, COUNT(*)
FROM tb1 INNER JOIN tb2
ON tb1.col3 = tb2.col4
WHERE tb2.col5 = 3
GROUP BY tb1.col1;
A. γtb1.col1,COUNT(∗)(πtb1.col1(σtb2.col5=3(tb1tb1.col3=tb2.col4tb2))) B. γtb1.col1,COUNT(∗)(σtb2.col5=3(tb1tb1.col3=tb2.col4tb2))
C. γtb1.col1,COUNT(∗)(tb1tb1.col3=tb2.col4σtb2.col5=3(tb2)))
D. γtb1.col1,COUNT(∗)(σtb2.col5=3(πtb1.col3(tb1)tb1.col3=tb2.col4tb2)) E. None of the above
Exam generated for 12 2. (19 points) Disk and Files
Heardle (Wordle but with songs!) wants to add a new feature where players can create playlists with songs that they were not able to guess correctly. Before spending time and resources developing this feature, Heardle wants to better understand the different design decisions and actions associated with how information can be stored, read, and updated. They reached out to CS 186 to help out!
For questions 1.1 – 1.4 consider the following schema:
CREATE TABLE Songs (
song_id INTEGER PRIMARY KEY,
artist_id INTEGER NOT NULL,
title VARCHAR(30) NOT NULL,
album_id INTEGER,
length INTEGER NOT NULL,
in_top_100 BOOLEAN NOT NULL
For questions 1.1 – 1.4, assume the following:
• Integers are 4 bytes and pointers are 4 bytes.
• Record headers contain a bitmap. Bitmaps should be as small as possible, rounded up to the nearest
• Eachdatapageis1KB(1KB=1024B).
• The Songs table is stored as a heap file using the page directory implementation.
– Each header page has 9 entries. There are 100 data pages.
• If a question requires the use of slotted pages, the implementation of slotted pages will be the same as seen
in lecture, discussion, and course notes.
• There are no indices on any field in the Songs table.
(a) (3 pt) What is the maximum size of a record from the Songs table?
(b) (3 pt) What is the maximum number of records that can fit onto one data page if variable length records are used?
(c) (3 pt) What is the worst case in I/Os for the following query (assume that there is enough space on an existing data page to insert the record)?
INSERT INTO Songs VALUES (123, 1, “Love Story (Taylor’s Version)”, 13, 235, False);
(d) (3 pt) What is the worst case in I/Os for the following query? DELETE FROM Songs WHERE in_top_100 = False;
Exam generated for 13
(e) (3 pt) Now Heardle is considering storing information in a heap file using the linked list implementation seen in lecture, notes, and discussion. The heap file contains 186 data pages, where one third of the data pages are full and the remaining pages contain free space.
What is the worst case in I/Os to insert a record into the file?
(f) (4 pt) Select all the statements below that are true. There may be zero, one, or more than one correct answer.
A. Storing variable length fields at the beginning of a record is a good optimization for quick access to a record’s fields.
B. Sorted Files are generally faster than heap files when searching for a range of records by key.
C. A heap file stored as a linked list performs the same or better than a page directory implementation
when inserting records.
D. Fragmentation can occur on an unpacked page with fixed length records.
E. None of the above.
Exam generated for 14
3. (21 points) The Power of B+ Trees
(a) (3 pt) Which of the following choices are true statements? Mark all that apply.
A. While traversing a B+ Tree, we need to do a binary search across all nodes in a layer to figure out which node to take to the next layer.
B. When a leaf node overflows when inserting into an Alt. 3 B+ Tree, we copy up the middle entry (out of 2d + 1 entries).
C. We can always assume clustered indexes incur 1 I/O per page of records regardless of caching or not. D. Bulkloading allows us to make use of temporal locality.
E. The main usage of B+ Trees is to sort the records.
F. None of the above.
(b) The Technoking
Due to current events, Bob wants to analyze Elon Musk’s Twitter tweets to see if he can derive any insights.
Bob has the following table tweets containing information about Elon Musk’s tweets:
CREATE TABLE tweets {
tid INTEGER PRIMARY KEY, // the tweet id
subject TEXT, // string that summarizes the tweet’s topic
time, INTEGER, // Unix timestamp of this tweet
likes INTEGER, // number of likes
retweets INTEGER, // number of retweets
spam BOOLEAN // whether Twitter marked this as spam or not
We have one index on this table: Index A:
• Alt. 2, clustered B+ Tree on (subject, time) with h = 2, d = 4
• All leaf nodes are full.
• Each data page stores 20 records
• The index was built on a file completely sorted on key (subject, time)
Additional Assumptions:
• The tweets table has exactly 400 data pages
i. (2 pt) Bob wants to find Elon’s Tweet where he first thought about purchasing Twitter. Assuming we are given subject = “Buying Twitter” and the time this tweet was posted, what is the minimum I/O cost to execute this query?
ii. (2 pt) While sitting in , Bob overhears a conversation saying Elon Musk’s latest tweet went viral and is now his most liked tweet ever. Curious, Bob decides to write a query to find this tweet. What is the minimum I/O cost to execute this query?
Exam generated for 15 iii. (3 pt) Bob wants to write a query to find all of Elon’s Tweets with the subject “Tesla”. Assume Bob’s
data includes 80 tweets with the subject “Tesla” at unique timestamps.
What is the worst case I/O cost of Bob’s query? You must use the index (assume the index is used properly and no extra I/Os incur). Assume the query is run independently of previous parts.
Exam generated for 16 (c) B+ Trees for Machine Learning
In machine learning, the kNN(k nearest neighbor) algorithm is an algorithm used to solve classification problems. The algorithm essentially works by:
i. Given a sample point, find the distance between the sample point and all other points. Assume distance is calculated using Euclidean distance (definition of Euclidean distance is irrelevant to this question).
ii. We take the k-closest neighbors and aggregate their labels. For example, neighbor 1 is class 1, neighbor 2 is class 1, neighbor 3 is class 2, neighbor 4 is class 3. We then have 2 class 1’s, 1 class 2 and 1 class 3.
iii. Predict our sample point as the class with the most neighbors. Using our example from above, our sample point would be labeled as class 1.
We will use a B+ Tree to speed up the finding k-closest neighbors step for a sample point sample.
Assume we are given the following lines of code:
/** Returns the Euclidean distance between point1 and point2 */
private float distance(Point point1, Point point2) {
// implementation hidden
/** Creates and adds a record corresponding to `point` into our B+ Tree.
* The record is added to our B+ Tree using `key`.
private void createAddRecord(
// implementation hidden
/** Fill up a B+ Tree with keys to make finding the K-nearest neighbors of `sample` faster.
* `sample` is the sample point we want to classify
* `points` is an array of all other points not including `sample`
private void fillUpBPlusTree(Point sample, Point[] points) {
for (Point point : points) {
// QUESTION 1
// QUESTION 2
i. (2 pt) What line of code should go in the line marked as // QUESTION 1? Only answer with a single
line of code and correct Java syntax will be given credit.
ii. (2 pt) What line of code should go in the line marked as // QUESTION 2? Only answer with a single line of code and correct Java syntax will be given credit.
Exam generated for 17
iii. (3 pt) Assume we built our B+ Tree correctly. In less than 3 sentences, explain how we would use our B+ tree to find the K closest neighbors to sample. Answers longer than 3 sentences will not be given credit.
iv. (4 pt) In what situations is our B+ Tree helpful when running kNN for sample? A. We want to get rid of outliers from the graph.
B. We want to find neighbors within a range of distance from the sample point. C. We want to rerun kNN multiple times.
D. We want to run kNN for a point which is not sample. E. None of the above.
Exam generated for 18
4. (15 points) Buffer Management (a) (3 points) True/False
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com