The University of British Columbia
Computer Science 404: Database Systems Midterm #1, January 21, 2015. Instructor: E.M. Knorr
Time: 48 minutes. Only a simple, non-programmable, non-communicating calculator is permitted. Closed book. No help sheets. No cell phones, no smartphones, or other devices.
Name Student No (PRINT) (Last) (First)
Signature
The examination has 6 pages, but that includes
this cover sheet. Check that you have a complete paper.
Print your name and ID at the top of this page, and provide your signature. Have your student ID ready.
You do not need to print your name on any pages other than the cover page because we’re using serial numbers.
A simple, non-programmable, non-communicating calculator is permitted. No other aids are allowed.
No cell phones of any kind.
Work quickly and do the easy questions first. Part marks are available.
The marks for each question are given in braces. Do not spend too much time on any one question.
To minimize disruptions during the exam, please avoid asking the invigilators for help, tips, or explanations. Please write down any reasonable assumptions that you are making, if you believe that a question is ambiguous.
Marks
Question
Max.
Achieved
1
4
2
4
3
4
4
3
5
6
3
4
7
3
8
4
9
6
10
4
11
4
12
5
Total
48
Questions 1-3 are multiple choice {i.e., 4 marks each—you get 3 if you only get 1 wrong, 2 if you get 2 wrong, and 0 otherwise}. Circle all TRUE answers. There may be as few as 0 answers to circle for a given numbered question, and as many as 5 to circle—so be sure to read all parts of each question. (In case of ambiguity, please write down any reasonable assumptions that you need to make.)
1. {4 marks} Which of the following statements are true? Circle the letter(s) for ALL correct answers. a) Caching can occur at levels between the registers and RAM, but not at the disk level or higher (e.g., disk, tapes, Internet).
b) It is probably not a good idea to put 10 or more indexes on a frequently-updated table.
c) The main purpose of an index is: to quickly detect duplicate keys during SQL INSERT operations.
d) A database system like DB2 maintains an exclusive lock on the files containing the data table and its indexes.
e) It is usually a good idea to leave 15-20% free space on each page of a read-only table such as a large Sales table containing historical data for the years 2010-2014.
2. {4 marks} Which of the following statements are true about disks and buffer pools?
a) The more platters and surfaces that you have in a given disk drive, the greater the total number of tracks on that disk drive.
b) Sequential flooding refers to the case where we have many users all wanting to get service from the same disk drive, at about the same time.
c) The MRU page replacement algorithm is useful for database utilities that make only a single pass through the data pages (i.e., when you only have to read each page once).
d) The more pages that you can fit on a track, the less time it takes to read that track.
e) Suppose we have a large buffer pool, and a small SeattleSeahawks table. When millions of Super Bowl (football) fans want to access the table at about the same time using an SQL SELECT statement, then each page will have to be brought into the buffer pool many times— thus slowing down performance.
3. {4 marks} Which of the following statements are true about indexes, tables, and DB2 (as per our DB2 z/OS case study in class)?
a) When we reorganize a table in a database, it’s a good idea to rebuild its indexes.
b) When changes occur to a record on a page in a DB2 table, the (ARIES) log contains the before and after cases for the changed data.
c) One of the main reasons for self-tuning (or autonomic) database systems is to reduce the number of decisions that a database/technical support person has to make about the DBMS.
d) Every column in a table must appear in some index.
e) A single instance of a DB2 table (e.g., the live, production Payroll table) cannot be owned by two different databases.
Page 2
The questions on this page are short answer questions. Please do not write too much.
4. {3 marks} Briefly (one or two sentences, or point form) explain why a table cannot be clustered in several different ways at the same time. In other words, why can’t we have several different clustering indexes for the same table?
5. {3 marks} Instead of using a reference bit, suppose we use a pin count to keep track of our pages in the buffer pool. Our goal is that if a page is in use by a transaction (like SELECT or UPDATE), then we try to “pin” it (i.e., keep it) in the buffer pool until that transaction either commits or aborts. We want these pages to stay in the buffer pool longer, and therefore victim pages should come from committed or aborted transactions. But, suppose the buffer pool is full of pinned pages, and the user of a pinned frame has gone away for lunch. What can the DBMS do to allow a new transaction to safely start using the buffer pool? Provide a brief answer—one sentence is fine.
6. {4 marks} Explain the difference between random reads and sequential reads, and briefly argue why one is preferable over the other. One or two sentences (or point form) are fine.
Page 3
Use this information for Questions 7-10. The Doctor table contains 13 columns. It has 10 numeric fields each taking up 4 bytes of space, and it has 3 alphanumeric fields each taking up 25 bytes of space. We’re using 4K pages and no Doctor record is allowed to span 2 pages (i.e., you can’t put a partial row on a page).
Use the following disk geometry specifications for these next few questions:
Seek time = 0 if on the same cylinder, else (1 ms setup time + 1 ms for every 800 cylinders moved)
Rotation speed = 14,000 rpm (revolutions per minute)
Transfer time = only the time it takes for the complete block to pass under the head
No parallel reads are allowed
Tracks per cylinder = 8
Number of cylinders = 32,000
Sector size = 2,048 bytes
Sectors on a track = 16,384
Block (Page) size = 4K (4,096 bytes)
Assume that you are the only user on the disk drive, and assume that there are no other requests queued. List any reasonable assumptions that you need to make. The arm/head assembly is currently at some unknown cylinder on the disk drive.
7. {3 marks} How many 4K pages will fit on a single cylinder? Show your work.
8. {4 marks} How many Doctor records will fit on a single cylinder? Remember that a record cannot span two pages (i.e., the complete record must fit on a single page). Show your work.
Page 4
9. {6 marks} If you’re currently on cylinder 23,000, then how long will it take us to read all of cylinders 23,001-23,005, inclusive, assuming that there is plenty of room in the buffer pool. Use the same disk geometry specifications as on the previous page. Provide answers in milliseconds, rounded to 3 places after the decimal (e.g., 2.073 ms). Show your work.
10. {4 marks} What is the capacity of the whole disk drive in terabytes (TB), when using these conversion factors: 1 TB = 1,024 GB, 1 GB = 1,024 MB, and 1 MB = 1,024 KB. Show your work.
Page 5
11. {4 marks} The LRU (Least Recently Used) algorithm created the following table of BP frames. Determine the reference string that produced it. Write the reference string in the empty boxes at the top of the table.
Reference String
Frame 1
10
10
10
10
35
35
Frame 2
40
50
50
50
50
50
Frame 3
70
70
90
90
90
25
Page Fault?
N
Y
Y
N
Y
Y
12. {5 marks} Use the clock algorithm for this question. (This is the algorithm without the dirty bit, and for which the reference bit only takes on the values 0 & 1.) Complete the table below to show the contents of the buffer pool for each I/O request in the reference string, just like in class. Assume that the buffer pool starts out empty. Circle the current oldest page in the buffer pool for each column (starting with the frame for “25 / 1”). Also, be sure to check the box for page faults, when they occur.
Ref. String
25
3
82
3
89
3
24
3
15
Frame 1
25 /1
Frame 2
24 /0
Frame 3
89 /0
Page Fault?
Y
Page 6