The University of British Columbia
Computer Science 404: Database Systems Midterm #1, September 27, 2018. Instructor: E.M. Knorr
Time: 70 minutes. Only a simple, non-programmable, non-communicating calculator is permitted. Closed book. No help sheets. No cell phones, no smartphones, etc.
Name Student No (PRINT) (Last) (First)
Signature
The examination has 11 pages, but that includes
this cover sheet. Check that you have a complete paper.
Print your name and ID at the top of this page, and provide your signature. Have your student ID ready.
You do not need to print your name on any pages other than the cover page because we’re using serial numbers.
A simple, non-programmable, non-communicating calculator is permitted. No other aids are allowed.
Work quickly and do the easy questions first. Part marks are available. Show your work for all calculations!
The marks for each question are given in braces. Do not spend too much time on any one question.
To minimize disruptions during the exam, please avoid asking the invigilators for help, tips, or explanations. Please write down any reasonable assumptions that you are making, if you believe that a question is ambiguous.
Marks
Question
Max.
Achieved
1-3
12
4
9
5
9
6
3
7
6
8
7
9
6
10
6
Total
58
Questions 1-3 are multiple choice {i.e., 4 marks each—you get 3 if you only get 1 wrong, 2 if you get 2 wrong, and 0 otherwise}. Circle the letter in front of all true statements. There may be as few as 0 correct answers to circle for a given question number, and as many as 5 correct answers—so be sure to read all parts of each question. (In case of ambiguity, please write down any reasonable assumptions that you need to make.)
1. {4 marks} As discussed in class, which of the following statements are true?
a) To improve performance, a DBMS buffer pool is often stored on a fast disk drive.
b) DB2 catalog statistics are updated whenever a transaction involving one or more SQL
INSERT, UPDATE, or DELETE statements commits.
c) Larger buffer pools tend to result in slower response times for typical SQL queries.
d) If we have two buffer pools used by a DBMS, it’s possible that each of them can use
different page replacement algorithms.
e) If the page size is 4K, and we have a 1 GB buffer pool that’s currently empty, and we issue
an SQL SELECT query to read an entire 500 MB data table, then we will get a lot of page faults.
2. {4 marks} Which of the following statements are true?
a) If there are t tracks per surface, and s surfaces, then the total number of cylinders on that disk
drive is t * s.
b) On a disk drive with only 1 platter (but 2 surfaces), it takes the same amount of time to read
track number 3588 as it does to read cylinder 3588.
c) The clock algorithm is one of the better disk scheduling algorithms.
d) When a disk drive spins faster (higher RPM), then its seek time will go down.
e) For its transactions, DB2 uses a STEAL/NO FORCE approach to buffer pool management.
3. {4 marks} Which of the following statements are true?
a) Both the C-SCAN (circular scan) and FCFS (first come, first serve) disk scheduling
algorithms have the advantage of fairness when servicing disk requests, compared to SSTF
(shortest seek time first).
b) In terms of the execution of DB2 utility programs in the real-world: we tend to run more
image copy jobs than reorg jobs. (A “reorg” is a reorganization of a table and its indexes.)
c) Hash indexes are great for looking up a record based on its primary key.
d) In a university application, it is usually a good idea to cluster the Student table by
StudentNumber, if the student numbers were originally generated randomly for each student.
e) Metadata for DB2 database objects is stored in the DB2 catalog.
Page 2
4. {9 marks} Consider the following disk geometry specifications. Show your work below.
Seek time = 0 if on the same cylinder, else (1 ms setup time + 1 ms for every 500 cylinders moved)
Rotation speed = 16,000 rpm (revolutions per minute)
Transfer time = only the time it takes for the complete block to pass under the head
Tracks per cylinder = 16
Number of cylinders = 32,000
Sector size = 1,024 bytes
Sectors on a track = 16,384
Block (Page) size = 4K (4,096 bytes)
Assume that you are the only user on the disk drive, and assume that there are no other requests queued.
a) {3 marks} If there are 50 records per 4K page, then how many records can you fit on a cylinder?
b) {4 marks} Suppose you’re on cylinder 5000 on the disk drive, and you want to read track number 8000 on surface #5 (out of 1-16 surfaces), and track number 8000 on surface #7. How long would it take to read both tracks? (i.e., compute the total access time)
c) {2 marks} On average, how long does it take to read a randomly selected page from this disk, assuming a random cylinder starting position for the disk arm/head? Let us assume it takes 0.0 ms for transfer time for the page.
Page 3
5. {9 marks} For the following page replacement algorithms, fill in all the missing parts.
a) {5 marks} Use the clock algorithm for this question. (This is the algorithm without the dirty bit, and for which the reference bit only takes on the values 0 & 1.) Complete the table below (including the entire missing left column, with one set of values that work). Show the contents of the buffer pool for each I/O request in the reference string. Assume that the buffer pool does not start out empty. Circle the current oldest page in the buffer pool for each column in the table. Enter “Y” for page faults, when they occur.
Ref. String
50
40
40
12
88
Frame 1
50 /1
Frame 2
33 /0
Frame 3
22 /1
Page Fault?
Y
b) {4 marks} Use the extended clock algorithm for this question. This is the algorithm with both a reference bit and a dirty page bit—and the values for these counters can only be 0 and 1. Complete the table below to show the contents of the buffer pool for each I/O request in the reference string—just like in class. The oldest page is circled. Fill in the rest of the details in this table to the right of the column that I’ve filled in, and show the circle in the final column. (Just in case it’s helpful to you the transitions are as follows: (1) 0/0 or 0/0* becomes the victim page; (2) 0/1 becomes 0/0*; (3) 1/0 becomes 0/0; (4) 1/0* becomes 0/0*; and (5) 1/1 becomes 0/1.)
Reference String
246w
88
360
404w
Frame 1
777 0/1
Frame 2
246 1/1
Frame 3
404 0/0*
Page Fault? (y or n)
Y
Which page do we write out, if any, at this point?
none
Page 4
6. {3 marks} How fast would a disk have to spin (in RPM) if we wanted to transfer (transmit) one track in exactly 3.5 ms (or very close to it)? Show your work. Ignore rotational delay and seek time. For precision, just round the RPM value to the nearest integer.
7. {6 marks} We’re about to load a DB2 table with 500,000 records of size 30 bytes per record. (Remember, you can’t split a record across 2 pages.) Suppose we specify PCTFREE 30 and FREEPAGE 19, so that we have 30% free space on each page, and after every 19 pages, we leave a blank page. In other words, every 20th page is blank. How many 4K pages will be needed? (Don’t worry if you’re one page off.)
Page 5
8. {7 marks} The Pharmacy database has many tables. Two of the tables have these schemas, where integers are 4 bytes, variable-character fields need 2 extra bytes to store their length, and the number of bytes for character fields are as shown. Show your work for the questions below.
Drug( drugID (int), brandName (varchar(25)), genericName (varchar(50)), manufacturerCode (char (3)) )
Dosage( drugID (int), deliveryTypeCode (char (2)), unitType (int), dose(int) )
a) {2 marks} Compute the maximum number of Dosage records that can fit on a 4K page, if we want to leave 18% free space on each page. Remember that a 4K page has 4096 bytes, and records cannot span 2 pages (i.e., use full rows only on the page).
b) {2 marks} If we allocate no free space, but we fill our variable-length fields to capacity, then how many complete Drug records can we fit on a 4K page?
c) {3 marks} If a Drug record has an average of 15 characters for the brand name (e.g., names like “Tylenol”), and an average of 22 characters for the generic name (e.g., names like “Acetaminophen”), then what percentage of space would you save (to the nearest 1%) using varchar’s … compared to using fixed-length characters for those same varchar fields? In other words, how much space, in percent, would you save? If you wish, you can assume that there are 5000 rows in the Drug table.
Page 6
9. {6 marks} Compute the service completion schedule (i.e., cumulative completion times, like in class and the exercises) for the following table of incoming single-page disk requests, using the Elevator Algorithm (with Look ahead: go only as far as necessary). Please note the arrival time for each request!
When traveling from one cylinder to another (e.g., cyl. 100 to cyl. 200), it is OK to be interrupted along the way by a request for cyl. 150, providing cyl. 150 is on or between your current location and cyl. 200.
We have just finished servicing cylinder 5000, and are waiting for where to go next, since there were no more requests in the service list. Seek time is 1 ms for setup time, plus 2 ms for every 1000 cylinders moved. A complete revolution takes 6 ms, and for simplicity, the transfer time for one page is 0.0 ms. Be sure to show your work. We’ll assume average times to get each page. We start at time = 0 ms.
We are requesting a specific page from the following cylinders:
Here is the arrival time of the request for that cylinder (ms)
9000
0
5000
8
7000
15
8000
18
Page 7
10. {6 marks} Consider the following pages (attached) of DB2 catalog data for SYSIBM.SYSCOPY.
A bank is conducting an audit. They would like to have a list of all the backups (image copies) performed for the ACCOUNT database. There are only two kinds of image copies that they’re interested in: full image copies, and non-full image copies (these are called incremental image copies).
Write an SQL query to list these backups for the database by reporting: all the tablespace names, an indicator of whether this was a full or incremental image copy, the number of pages actually copied to the backup file, the name of the backup file (this is just called the backup’s dataset name), and the date and time on record for this image copy (called its timestamp). Sort the rows in descending order by timestamp, so that the newest image copy appears first. You can assume that the timestamp is something like YYYY-MM-DD-HHMMSS…
Page 8