The University of British Columbia
Computer Science 404: Database Systems Midterm #2, February 11, 2015. Instructor: E.M. Knorr
Time: 48 minutes. Only a simple, non-programmable, non-communicating calculator is permitted. Closed book. No notes. No cell phones.
Name Student No (PRINT) (Last) (First)
Signature
The examination has 8 pages + 2 appendix pages, and that includes this cover sheet. Check that you have a complete paper consisting of 10 pages.
Print your name and ID at the top of this page, and provide your signature. Have your student ID ready.
You do not need to print your name on any pages other than the cover page because we use serial numbers.
A simple, non-programmable, non-communicating calculator is permitted. No other aids are allowed.
Work quickly and do the easy questions first. Part marks are available.
The marks for each question are given in braces. Do not spend too much time on any one question.
To minimize disruptions during the exam, please avoid asking the invigilators for help, tips, or explanations.
Write down any reasonable assumptions that you are making, if you believe that a question is ambiguous.
Marks
Question
Max.
Achieved
1
4
2
9
3
5
4
4
5
4
6
4
7
4
8
5
9
4
Total
43
Question 1 is a multiple-choice question worth 4 marks. You¡¯ll get 3 if you only get 1 wrong, 2 if you get 2 wrong, and 0 otherwise. Circle all TRUE answers. There may be as few as 0 TRUE answers to the question, and as many as 5 correct answers¡ªso be sure to read all parts of the question!
In case of ambiguity, please write down any reasonable assumptions that you need to make.
1. {4 marks} Which of the following statements about Alt. 2 extendible hash indexes are generally true? Assume that we are dealing with indexes on unique keys (i.e., no duplicates). A ¡°bucket¡± is a ¡°page¡±.
a) A bad hash function will still distribute keys pretty evenly across the entire hash structure.
b) It is often faster to find a unique key in a hash index than in a B+ tree index.
c) Usually, hash indexes can support range queries efficiently.
d) The local depth nub in an extendible hashing index bucket tells you how many
e) Hash buckets need to be at least half full.
Page 2
2. {9 marks} Consider the following schema for a mining table created by an investment firm. Assume that we use 4K pages (4096 bytes), that a tuple cannot span pages, and that we try to fill pages to capacity. All rid and child pointers are 10 bytes long, and as usual, we¡¯ll ignore housekeeping bytes. Don¡¯t worry about whether there is an odd or an even number of data entries in your answer.
Details: Of the many thousands of mines around the world, supposes an investment firm tracks 200 mines and has 10 full years¡¯ worth of data for each of those mines (assume 365.25 days/year). Numeric fields take up 4 bytes, dates are stored in integer format (e.g., 20150211 means February 11, 2015), and strings are 15 bytes each. Each data record contains 110 bytes. The table and its fields are:
MineDailyProduction (mineID, mineDate, KilogramsOfOre, TypeOfMetal, …) Suppose there is an Alternative 2 B+ tree clustering index on the MineDailyProduction table¡¯s
mineDate and mineID fields taken together (i.e., it is a composite key containing 2 fields). a) {5 marks} Compute the number of leaf pages (only) in this index. Show your work.
b) {2 marks} How many complete records will fit on one data page if we plan to leave approximately 30% free space on each data page of the table? Show your work. Make reasonable assumptions.
c) {2 marks} Note that the leading field of the clustering index¡¯s key is the date. Why is this firm planning to leave 30% free space on each data page? Be specific.
Page 3
3. {5 marks} Suppose we have the following entries that we¡¯re going to insert into an empty Extendible Hash index structure shown below. You can assume that each bucket can hold up to 3 entries. Use the same hash function from class (i.e., modulo last n bits of each binary value). The index contains no keys so far, but we¡¯ll start with pointers to 2 initially empty buckets.
Add these keys, in this order: 12, 15, 7, 24, 8, 6, 16¡ªand show the resulting structure every time the directory doubles, and at the end. Note that 12 = (1100)2.
For the ¡°end¡±, it is OK to add any remaining keys directly to the existing structure if it doesn¡¯t split again (to save you writing time).
Page 4
4. {4 marks} A data entry is a
5. {4 marks} Suppose we have a B+ tree of height 2 containing 28,000 unique primary keys, kept sorted in alphanumeric order. The leaf pages can contain a maximum of 140 data entries per page, and these leaf pages are filled to capacity. How many pages need to be retrieved into an empty buffer pool to determine the first 10,000 primary keys (only), in order? Show your work.
Page 5
6. {4 marks} Suppose a parent node in a B+ tree has 3 children: A, B, and C¡ªwhere the middle node is B. All 3 of A, B, and C are leaf pages, and they currently contain 56 data entries each. The order of the tree is 42. Using the principles of B+ tree deletions and maintenance from class, how many keys can we delete from B without having to merge two of A, B, and C? Assume that only node B has keys that are being deleted from the tree. Briefly, justify your answer.
7. {4 marks} Consider a B+ tree index of order d = 5 (i.e., max. 10 entries for any of its nodes/pages). What is the maximum possible number of data entries at the leaf level that can appear in this B+ tree, if its height is exactly 2 (i.e., has 3 levels)?
Page 6
8. {5 marks} Compute the total number of cylinders that the disk arm has to move for each of these 2 algorithms: (a) the regular elevator algorithm (SCAN with LOOK), just like in class; and (b) the SSTF (Shortest Seek Time First) algorithm. Let us assume that the disk assembly is currently at cylinder 0 and is heading higher.
The cylinder requests that have already arrived are, in order: 1500, 2100, 4000, and 800. Assume that immediately after fetching the page(s) (i.e., completing the service) for cylinder 2100 that the following requests arrive nearly simultaneously: 1300, 2000, 4500, and 1600. Thus, there are 8 service requests in all.
Page 7
9. {4 marks} Consider the SYSIBM.SYSCOLUMNS and SYSIBM.SYSTABLES DB2 catalog tables on the next 2 pages, and the query below. Recall that a TBCREATOR is a unique identifier in case multiple tables have the same name.
In plain English, in one sentence, explain what the purpose of the following (relatively simple) SQL catalog query is.
SELECT FROM
WHERE
C.TBCREATOR, C.TBNAME SYSIBM.SYSTABLES T, SYSIBM.SYSCOLUMNS C C.NAME = ‘SALARY’
AND
C.TBNAME = T.NAME
AND
C.TBCREATOR = T.CREATOR AND
T.TYPE = ¡®T¡¯
AND
T.DBNAME = ¡®PAYROLL¡¯;
Page 8