Example Exam Paper DTA(M)
Part A: Relational Modelling
Question 1.
(a) Provide a formal definition of the Super Key and the Primary Key in a relation R;
Answer: Slide 16/Lecture: Relational Model
(b) Provide an example for the super key and for the primary key.
Answer: Slide 16/Lecture: Relational Model
Question 2.
(a) Provide a formal definition of the Functional Dependency: X Y in a relation R(X,Y);
Answer: Slide 17/Lecture: Functional Dependency Theory & Normalization Theory: Part I
(b) Provide an example of a functional dependency.
Answer: Slide 18/Lecture: Functional Dependency Theory & Normalization Theory: Part I
(c) Provide a formal definition of the Partial Functional Dependency in a relation R;
Answer: Slide 32/Lecture: Functional Dependency Theory & Normalization Theory: Part I
(d) Provide an example of a partial functional dependency.
Answer: Slide 32/Lecture: Functional Dependency Theory & Normalization Theory: Part I
Question 3.
Assume the relation R(A, B, C, D) with four attributes {A, B, C, D}. Consider the set of the Functional
Dependencies (FDs), which are the only dependencies that hold true in the relation R:
FD1: C → D
FD2: C → A
FD3: B → C
(a) Identify the candidate key(s) for the relation R and explain your answer.
Key is B; B functionally determines C, and transitively determines attributes D and A.
(b) Identify the highest Normal Form (NF) that relation R satisfies, i.e., one of: 1NF, 2NF, 3NF, or BCNF
and explain your answer.
R is in 2NF, since no partial FDs hold true (key B is singleton); R is not in 3NF due to the transitive FDs
of the non-prime D and A from the candidate key B via the non-prime attribute C.
(c) If relation R is not in BCNF, decompose R into a set of BCNF relations. Explain your
methodology.
FDs: C → D and C → A both cause violations of BCNF;
Apply BCNF Theorem for C → D
R\{D} and {C} U {D} thus: R1(A, B, C) and R2(C, D)
R1 is not in BCNF due to C → A; R2 is in BCNF
R1\{A} and {C} U {A}, thus: R3(B, C) and R4(C, A)
Merge R2 with R4 (optional): R3(B, C) and R5(C, D, A)
Part B: SQL
Question 4.
Consider the following relational schema:
EMPLOYEE(SSN, NAME, SALARY, SUPERVISOR_SSN, PID)
PROJECT(PID, NAME, MANAGER_SSN)
Description: EMPLOYEE is a table of employees uniquely identified by their SSN, working at a project
with unique code: PID and having an employee as supervisor, identified by the SUPERVISOR_SSN.
PROJECT is a table storing information about projects uniquely identified by a PID, and having one
manager, who is an employee identified by MANAGER_SSN.
(a) For each relation, identify the Primary Key and Foreign Keys (if any) and explain your answer.
Employee: PK is SSN and FK is PID referencing to PID attribute in Project relation.
Project: PK is PID and FK is MANAGER_SSN referencing to SSN in Employee relation.
(b) Write a SQL query that shows the name(s) of the employee(s) who earn more than £40000.
SELECT E.NAME
FROM EMPLOYEE E
WHERE E.SALARY > 40000
(c) For those employees, who work in the projects with PID =1 or PID = 2, retrieve the names of their
supervisors.
SELECT S.NAME
FROM EMPLOYEE S, EMPLOYEE E
WHERE E.SUPERVISOR_SSN = S.SSN AND E.PID IN (1, 2)
(d) Find the number of employees working per project.
SELECT E.PID, COUNT(*)
FROM EMPLOYEE E
GROUP BY E.PID
(e) Find the minimum and the maximum salaries per project.
SELECT E.PID, MAX(E.SALARY), MIN(E.SALARY)
FROM EMPLOYEE E
GROUP BY E.PID
(f) For those projects with more than 10 employees, show their average salary.
SELECT E.PID, COUNT(*), AVG(E.SALARY)
FROM EMPLOYEE E
GROUP BY E.PID
HAVING COUNT(*) > 10
(g) Check if there exists a project whose manager is supervisor.
SELECT P.PID
FROM PROJECT P, EMPLOYEE S
WHERE S.SSN = P.MANAGER_SSN AND
EXISTS (SELECT * FROM EMPLOYEE E WHERE S.SSN = E.SUPERVISOR_SSN)
Part C: Relational Algebra and Heuristic Optimization
Question 5.
Consider the relational schema in Question 4 (repeated here for convenience):
EMPLOYEE(SSN, NAME, SALARY, SUPERVISOR_SSN, PID)
PROJECT(PID, NAME, MANAGER_SSN)
(a) Write the relational algebra expression of the following SQL query:
SELECT E.NAME, P.NAME
FROM EMPLOYEE E, PROJECT P
WHERE P.PID = E.PID AND E.SALARY > 50000
E.NAME, P.NAME(σE.SALARY > 5000 (PROJECT P.PID = E.PID EMPLOYEE)
(b) Provide the Canonical Tree and an Optimal Tree of the SQL query in Question 5(a). Explain why
your solution is optimal.
Let us provide Only the Optimal Tree; the Canonical Tree is trivial and can be any non-optimal tree
derived directly from the query.
(c) Provide the Optimal Tree of the following SQL query and explain why your solution is optimal:
SELECT S.NAME
FROM EMPLOYEE S
WHERE S.PID IN (1, 2) AND S.SSN IN (SELECT E.SSN
FROM EMPLOYEE E
WHERE E.SALARY < 50000) Let us provide Only the Optimal Tree; the Canonical Tree is trivial and can be any non-optimal tree derived directly from the query. (d) Assume that there are 100 employees and 10 projects. (1). Optimize the following SQL query: SELECT E.NAME, P.NAME FROM EMPLOYEE E, PROJECT P WHERE P.MANAGER_SSN = E.SSN The query joins two relations with their PK-FK. Since one project has only one manager and we retrieve the information for each manager, then the only optimization is on projecting on the attributes [SSN, E.Name, MANAGER_SSN, PName} before the joining and then only on the E.Name and P.Name after the joining. (2). Estimate the number of tuples retrieved. Given that each department has a unique manager, we expect to retrieve 10 tuples.