Q1. Relational Algebra [9 marks, 3 marks each]
Consider the following relational database, where the primary keys are underlined.
employee (person_name, city)
works (person_name, company_name, salary) company (company_name, business)
Give an expression in relational algebra to express each of the following queries
a) Find the names of all employees who do not work for company “Microhard”, if there exist some employees who are self-employed and do not work for any company.
person_name (employee) – person_name (company_name = “Microhard” (works)) b) Find the names and cities of all employees who work for IT companies (IT is the name
of the business).
person_name, city (employee ⨝ works ⨝ business = “IT” (company)))
c) For each company with number of employees greater than 100, list the company name and the business.
t ← company_name γ count(person_name) as num_employees (works) company_name, business (company ⨝ (num_employees > 100 t))
Q2. Indexing [10 marks]
a) Consider a file with 30,000 records of size 100 bytes stored on a disk with block size 1,024 bytes and a record cannot span multiple blocks. Suppose that a secondary index is constructed on a candidate key of the file (i.e., the search key of the secondary index is a candidate key of the file). Suppose that the search key is 10 bytes long and a pointer is 6 bytes long. Find the number of block accesses required to perform a binary search for a record using the index. Show the steps clearly (no steps, no marks).
[4 marks]
Number of index entries per block = (1024 / (10+6)) = 64
Number of index entries = number of records in the data file = 30000
Number of index blocks = (30000/64) = 469
Number of block accesses required to perform a binary search for an index record = log2469 = 9
Number of block accesses required to search for a record using the index = 9 + 1 = 10
b) Consider the following B+-tree with n=4. What is the minimum number of search- key values you must delete for the tree to shrink down by one level? Show the sequence of deletions and draw a diagram for each deletion.
[6 marks]
38
15
27
34
70
70
77
27
28
30
38
47
5
7
34
35
15
17
2 search-key values have to be deleted. -5
-70
38
27
34
70
70
77
38
47
34
35
27
28
30
7
15
17
27
34
38
38
47
77
34
35
27
28
30
7
15
17
Q3. Hashing [6 marks]
Suppose that we are using extendable hashing on a file that contains records with integer search-key values. Suppose the hash function is h(x) = x mod 32 which generates 5-bit values and each bucket can hold two records. In the following figure, some records have been inserted. Draw the structure after a new record with search-key value equal to your own student ID is inserted. You need to show the i value of the bucket address table and the ij value of each bucket in your diagram.
14, 35
57, 20
Suppose the student ID is 51234567.
2
51234567, 35
2
2
14
1
57, 20
Query processing and optimization [25 marks]
Consider the following relations, where the keys are underlined:
ENGINEER (ID, Name)
PROJECT (PID, ICEngID)
The ICEngID attribute in PROJECT is the ID of the engineer who is in charge of the project and PID is the ID of the project.
Consider the following query.
SELECT FROM WHERE
*
ENGINEER E, PROJECT P E.ID=P.ICEngID
Given the following statistics and indices:
o number of tuples in ENGINEER: 1,600
o number of tuples in PROJECT: 3,200
o size of a tuple in ENGINEER: 50 bytes
o size of a tuple in PROJECT: 80 bytes
o disk block size: 512 bytes
o tuples do not span across blocks
o V(ICEngID, PROJECT) = V(ID, ENGINEER) = 1,600
o 3-level B+-tree primary index on ID for ENGINEER
o 4-level B+-tree primary index on PID for PROJECT
o 3-level B+-tree secondary index on ICEngID for PROJECT
a) Estimate the number of output tuples for the query. Explain.
Since ID is a key for ENGINNER, then a tuple of PROJECT will join with one tuple from ENGINNER. Therefore, the number of output tuples is the number of tuples in PROJECT, i.e. 3,200.
b) Draw a fully annotated evaluation plan if the query is computed with the merge-join algorithm for the worst-case estimate.
[3 marks]
[4 marks]
materialize
sortICEngID
PROJECT
ENGINEER
c) What is the minimum amount of memory in number of blocks for the worst-case estimate of the evaluation plan in part b)?
3 memory blocks.
ICEngID=ID
(merge join)
[1 mark]
d) What is the worst-case cost in number of disk block transfers of the evaluation plan in part b)? Show the steps clearly (no steps, no marks).
o Number of blocks in ENGINEER = 1,600 / 512/50) = 160 blocks o Number of blocks in PROJECT = 3,200 / 512/80 = 534 blocks
o Sort PROJECT on ICEngID
o initial number of runs = 534 / 3 = 178 o number of merge passes = log2178 = 8 o cost=534*(2*8+1)=9,078
o Cost of writing the sorting output = 534 o Costofmergejoin=160+534=694
o Total cost = 9,078 + 534 + 694 = 10,306
e) Is it possible to reduce the worst-case cost in number of disk block transfers if the query is computed with the indexed nested-loop join algorithm instead? Draw a revised evaluation plan and show the steps clearly to support your answer (no steps, no marks).
[11 marks]
[6 marks]
PROJECT
Number of projects per engineer = 3200 / V(ICEngID, PROJECT) = 3200 / 1600 = 2
ENGINEER
ICEngID=ID
(use index on ICEngID)
Cost of indexed nested-loop join = 160 + 1,600 * (3+2) = 8,160.
So, the cost is reduced.
– END –