程序代写代做代考 chain algorithm database The University of British Columbia

The University of British Columbia
Computer Science 404: Database Systems Midterm #2, October 23, 2018. Instructor: E.M. Knorr
Time: 75 minutes. Only a simple, non-programmable, non-communicating calculator is permitted. Closed book. No notes. No cell phones. Have your student ID card ready.
Name Student # (PRINT) (Last) (First)
Signature
The examination has 10 pages and that includes this
cover sheet. Check that you have a complete paper. Print your name and ID at the top of this page, and
provide your signature.
You do not need to print your name on any pages other than the cover page because we use serial numbers.
A simple, non-programmable, non-communicating calculator is permitted. No other aids are allowed.
Work quickly and do the easy questions first. Part marks are available.
The marks for each question are given in braces. Do not spend too much time on any one question.
To minimize disruptions during the exam, please avoid asking the invigilators for help, tips, or explanations.
Write down any reasonable assumptions that you are making, if you believe that a question is ambiguous.
Marks
Question
Max.
Achieved
1
4
2
4
3
4
4
6
5
6
6
4
7
4
8
7
9
10
10
6
11
10
Total
65

Questions 1-3 are multiple-choice questions. Circle all TRUE answers. There may be as few as 0 TRUE answers to the question, and as many as 5 TRUE answers—so be sure to read all parts of the question! For each question, you’ll get 3 marks if you only get 1 wrong, 2 marks if you get 2 wrong, and 0 marks otherwise.
In case of ambiguity, please write down any reasonable assumptions that you need to make.
1. {4 marks} This question is about B+ trees. Unless told otherwise, assume that the index holds unique keys (i.e., no duplicate keys are allowed).
a) If a multi-level B+ tree has order 2, then that means it has at most 4 children per internal node.
b) In a B+ tree: All things being equal, a longer search key (in bytes) results in a smaller fanout.
c) In a B+ tree: For a large index, a longer search key (in bytes) usually results in a tree with fewer levels.
d) Consider a B+ tree index on student ID number. If there are 1000 leaf pages, you may have to visit all 1000 leaf pages if you’re searching for a specific key that does not exist in the index (e.g., when searching for student ID #12345678 that doesn’t exist).
e) Depending on the key distributions, it is possible for a B+ tree to have different path lengths (heights) for some of its leaf nodes. For example, it is possible for leaf page x to have path length 3, and leaf page y to have path length 4.
2. {4 marks} This question deals with prefix-key compression and with bulk loading in an Alt. 2 index.
a) Prefix key compression takes place in the non-leaf nodes.
b) Prefix key compression works best for data that is relatively static (meaning that updates
don’t occur very often).
c) Prefix key compression can lead to shallower B+ trees.
d) When loading a large table via bulk loading, we can turn on logging to speed things up just in
case the bulk load job fails.
e) When building indexes via bulk loading, there is usually no need to sort the keys before
loading them into the leaf pages of a B+ tree.
3. {4 marks} Which of the following statements is/are true?
a) A bucket in a hash index usually contains exactly one data entry per page.
b) Looking up a primary key in a hash index tends to be faster than looking it up in a B+ tree.
c) During the operation of the merge phase of the external mergesort algorithm, it is possible for
a sorted run to have more pages than the size B of the buffer pool that we are using.
d) A memory-resident file can be sorted in one pass.
e) In the sort phase of External Mergesort, we must reserve space for an output buffer.
Page 2

4. {6 marks} Insert the following entries (in the order given) into a B+ tree that is order 1 (max. 2 entries per page). Show the result at each split of the B+ tree. (Don’t show the state of the tree for every key you add; otherwise, it takes too much time.) Be sure to show the final result of the tree. Follow the notation used in the textbook and in class; so, if the number keys doesn’t split evenly, put the extra one in the right hand side’s node.
Insertthekeysinthisorder: 1260,70,40,2520,12,8,9,3.
Start with an empty B+ tree root node containing two blank entries. Show the keys that you’re inserting at each stage.
Page 3

5. {6 marks} Suppose we have an Extendible Hash index structure shown below. You can assume that each bucket can hold up to 3 entries. Use the same hash function from class (i.e., modulo last n bits of each binary value). All of the nubs are 1 at the start; the first bucket holds keys 2, 4, and 6; the second bucket holds keys 1, 3, and 5.
a) Add this key: 12—and show the index structure after it is successfully inserted:
b) Let us build on your answer to (a). Provide an example of the minimum number of unique keys that
need to be added to (a)’s result so that the directory is capable of holding 8 arrows (because the global depth nub is 3—note that 23 = 8). Draw the resulting structure with the new keys that you plan to insert into the result of (a). Use the space below to draw the complete final index structure.
Page 4

6. {4 marks} Suppose you have B = 60 buffer pages, and you want to sort a file containing 12,000 pages using general multiway external mergesort (GMEMS). For each pass, compute: the number of sorted runs produced at each pass, and the size of each sorted run (in pages) at each pass, including the final pass. Show all your work.
7. {4 marks} Consider an Alt. 2 B+ tree index containing only unique keys, with a maximum of 6 entries for any of its pages.
What is the maximum total number of: (a) nodes in the whole tree, and (b) keys in the leaf pages … that you can have in a tree of height 2 (i.e., a 3-level tree)? In other words, it will not be possible to put even one more key into it, and still have height 2. Justify your answer by showing your work. It may help to sketch enough of the tree to support your answer.
Page 5

Note: In this exam, unless mentioned otherwise, assume that all indexes are Alt. 2; have page size 4K (i.e., 4096 bytes); no record or data entry (i.e., key + rid) or index entry (i.e., key + pointer) can span across two pages; that all pointers (including each rid) are 10 bytes long; and as usual, we’ll ignore housekeeping bytes (e.g., overhead bytes, sibling pointers) on all the pages. Don’t worry about whether there is an odd or an even number of data entries or records in your answer; just compute one of these two. Always show your work.
8. {7 marks} Consider the following information about a table and an index in a Realtor database. A record takes up 120 bytes.
a) If we have 89,000 records in the Homes table, and we try to fill each 4K data page to capacity, how many pages would we need?
b) For the Homes table, suppose we create a B+ tree clustering index on the combination of SellerID (4 bytes), BuyerID (4 bytes), and Location (15 bytes). Thus, we will have a composite search key. Compute the number of pages found at each level of the B+ tree, assuming that we want to fill the pages to capacity.
Page 6

9. {10
marks}. Suppose we have the following:
    
A B+ tree index with N = 150,000 leaf pages and a 4-level (height = 3) Alt. 2 index of customer IDs. The root page has 15 children.
An extendible hash index on the customer IDs. There are N = 150,000 Alt. 2 buckets and a 4- page directory pointing to those buckets.
The buffer pool is large and currently empty. (Assume this for every one of the questions below.)
Let us assume that exactly 20 records can fit on each data page, and that the data pages are filled to capacity.
The search key is ONLY on the customer ID, and it is an integer that takes up 4 bytes.
Answer the following questions, and in each case, briefly justify your answer. You can assume that the source of all of these queries is an SQL statement, which you do not have to write. You can assume we want to know the total number of disk pages: index pages and data table pages, as needed.
a) We want to display the full data records of 5 different customers. In the best case possible, what is the smallest number of page I/Os that can achieve this? Briefly, justify your answer.
i. What is your answer using the hash index?
ii. What is your answer using the B+ tree?
b) We want to display the complete data records of 25 different customers. In the worst case, what is the number of page I/Os that are needed? Briefly, justify your answer.
i. What is your answer using the hash index?
ii. What is your answer using the B+ tree?
c) If we have 150,000 leaf pages, and we fill the leaf pages to capacity, then how many records are in the data table?
Page 7

10. {6 marks} Consider the following Linear Hashing index. Assume that we allow a maximum of 3 data entries per bucket.
Assume that the splitting rule is as follows (just like in class): If you try to insert an entry into a location that currently has no room for another data entry anywhere in its bucket/overflow chain, then you’ll trigger a split (but there’s at most one split per insertion). If you need to make any other reasonable assumptions, write them down.
next = 0; Level = 0, so we’re going to use just the last 2 bits to start 00
01 Binary example: 25 = (11001)2 10
11
Add the following 5 keys in this order, and show the result of the complete index structure at the end. (It’s OK to overwrite the structure above during your work, but show the final result of the complete structure below.)
Addthesekeys,inthisorder: 15,10,18,32,17
16
8
4
25
9
1
6
2
3
Page 8

11. {10 marks} This question is about external mergesort. We have an unsorted input file that fills 600 contiguous cylinders. The buffer pool is 20 cylinders in size. Our disk geometry specs state that we have 10 tracks per cylinder, so that means the buffer pool holds 200 tracks. (Don’t worry about the number of individual pages.)
Let us assume that the arm/head on the disk drive is currently at some unpredictable location. There are multiple users of the disk drive, but once you start a read or write operation, you do so uninterrupted until the number of pages of your request is completed, even if you need to access multiple tracks or cylinders in the same disk request. Furthermore, you always read and write contiguously.
A short seek (SS) is to an adjacent (neighbouring) cylinder only. All other seeks are called long seeks (LS), even if you move just two cylinders over. Do not include time for rotation or transfer.
There are 2 questions for you to answer, and you do not have to calculate any milliseconds.
a) Compute the number of long seeks (LS) and short seeks (SS) that it takes to READ the unsorted file into memory during the first phase or pass (i.e., during the sort phase), and the number of LS and SS operations that it takes to WRITE out the sorted runs from this phase. Show your work. (Do not compute the merge step on this page; we’ll do that in part (b) on the next page.)
Page 9

b) We still have 20 cylinders (i.e., 200 tracks) of space available in the buffer pool for the merge phase, and we’re going to break down the input buffers and output buffer, as follows. Use 5 tracks (0.5 cylinders) for every input buffer, and the rest for the output buffer. Compute the number of long seeks (LS) and short seeks (SS) that it takes to do the merge phase (both reading and writing) to produce the final sorted file. Show your work.
i) First, provide a sketch of the merge phase (like in class), labelling the input buffers and their sizes, and the output buffer and its size.
ii) Then, compute the number of LS and SS seek operations. Provide a total for the merge part (i.e., Part (b)), but you don’t have to add in your calculations from Part (a).
Page 10