Database Theory and Applications M
SELECTED SOLUTIONS FROM PAST EXAM PAPERS
Part A: Relational Modelling & Normalisation
Exam Paper 2019: Question 2 (b) Consider the relation R(A, B, C, D, E) with the following FDs:
• FD1: A → E
• FD2: B → D
• FD3: {A, B} → C
1. Which are the possible candidate keys in the relation R? Explain your answer.
The minimum subset of attributes that functionally determine all attributes is {A,B}, since A and B partially
determine E and D, respectively, and both determine C. Hence, {A,B} → {E,D,C}.
2. Which is the maximum Normal Form (NF) of the relation R? Explain your answer.
The relation is in 1NF due to atomic attributes. Now, to check about the 2NF, there should not be any partial
dependency of non-prime attributes on the primary key (and any candidate key). D and E are partially
dependent on {A,B}, thus, the relation is in 1NF only.
3. Decompose the relation R into a set of relations in Boyce-Codd NF (BCNF) and show the process of this
decomposition in your answer.
To avoid the partial dependencies, we split relation R into relations corresponding to the direct FDs. That is,
we construct the relations where the primary key directly determines the non-prime attributes, which are:
R1(A,E), R2(B,D), and R3(A,B,C) and the primary keys are provided underlined. Now, in each of the relations
R1, R2, and R3, the primary key directly determines the non-prime attributes. In this case, R1, R2, and R3 are
in 2NF. There are no transitive dependencies in each relation, thus, the schema is also in 3NF. Since in all FDs
now, the left-hand side corresponds to primary key, the schema is also in BCNF.
Exam Paper 2020: Question 2
Consider the relation R(A, B, C, D, E) with the following FDs:
FD1: A → C; FD2: C → D; FD3: D → E; FD4: B → D
a) Which are the possible candidate keys in the relation R? Explain your answer.
b) Which is the maximum Normal Form (NF) of the relation R? Explain your answer.
c) Decompose the relation R into a set of relations in Boyce-Codd NF (BCNF). Describe your
methodology at each stage of decomposition.
(a) By transitivity, A → {C, D, E} and since B → D then {A, B} can determine all the attributes in the
relation.
(b) Since there is a partial FD in the PK {A, B}, where B → D, then it is not in the 2NF. Hence, it is in
the 1NF.
(c) For 2NF, A → C, A → D, and A → E are partial FDs, we obtain that R1(A, C, D, E) and R2(A, B).
To 3NF, there are transitive FDs: A→C with C→D and D→E, this we split R1 to R3(A,C), R4(C,D), and
R5(D,E). Now, they are all also in BCNF.
Exam Paper 2020: Question 3
Assume a relation R(A, B, C, D, E) where {A,B,C} is the (composite) primary key. Consider also the
FDs: FD1: {A,B} → E; FD2: C → D;
a) Which is the maximum Normal Form (NF) of the relation R? Explain your answer.
b) Decompose the relation R into a set of relations in 2NF. Describe your methodology of decomposition.
(a) There are partial FDs like {A, B} → E, thus being only in 1NF.
(b) AB→E is partial FD, thus, we create the new R1(A, B, E). C→D is a partial FD, thus, we create the
new R2(C, D). The remaining is R3(A, B, C). Now they are all in 2NF.
Exam Paper 2020: Question 4
Assume a relation R(A, B, C) where B is the primary key. Consider also the FD: C → A; Decompose the
relation R into a set of relations in BCNF, if relation R is not already in BCNF. For each relation in BCNF,
define the primary and foreign keys, if exist.
The relation is not in the BCNF since in the C→A, the LHS (left-hand side) of the FD is not a superkey.
By applying the BCNF decomposition theorem given the violating FD C→A, we obtain that: R1(C, A)
and R2(B, C). In R1, the attribute C is the PK (due to the original FD: C →A), while in R2, B is the PK,
due to the original FD: B → C (recall: B is the PK, thus, determining all the other attributes in R). Also,
the C attribute in R2 is the FK referencing to the C attribute in R1.
Part B: SQL
Exam Paper 2020: Question 5. Consider the following relational schema:
Author(AuthorID, Name)
Authoring(ARTID, AID)
Article(ArticleID, PublicationYear, Title)
where the attribute AID is a foreign key in Authoring relation referencing to AuthorID in Author relation, and the
attribute ARTID is a foreign key in Authoring referencing to ArticleID in Article relation. The primary key is
underlined in each relation.
a) Write a SQL query that shows the publication years of the articles written by author ‘John’.
SELECT DISTINCT C.PublicationYear
FROM Author A, Authoring B, Article C
WHERE A.AuthorID = B.AID AND B.ARTID = C.ArticleID
AND A.Name LIKE ‘John’
b) Write a SQL query that shows the number of articles written by each author.
SELECT A.AuthorID, COUNT(*)
FROM Author A, Authoring B
WHERE A.AuthorID = B.AID
GROUP BY A.AuthorID
c) Write a SQL query that shows the names of the authors who have written more than 100 articles.
SELECT A.AuthorID, A.Name
FROM Author A, Authoring B
WHERE A.AuthorID = B.AID
GROUP BY A.AuthorID
HAVING COUNT(*) > 100
d) Write a SQL query that checks if there exist authors who have not written any article. If so, the query returns
their names.
SELECT A.Name
FROM Author A
WHERE NOT EXISTS (
SELECT *
FROM Authoring B
WHERE A.AuthorID = B.AID)
e) For each author who has written more than 50 articles, show how many of these articles have been
published since 2016.
SELECT B.AID, COUNT(*)
FROM Authoring B, Article C
WHERE B.ARTID = C.ArticleID AND C.PublicationYear >=2016
AND B.AID IN (SELECT A1.AID
FROM Authoring A1
GROUP BY A1.AID
HAVING COUNT(*) > 50)
GROUP BY B.AID;
f) Which is the expected result of the following query? Explain your answer.
SELECT * FROM ARTICLE WHERE ARTICLEID NOT IN (NULL, 1, 2)
The problem is the NULL value in the set, where the IN operator is applied. Specifically, ARTICLEID NOT IN (1, 2,
NULL) means: ARTICLEID should be neither 1, nor 2, nor NULL. That is, it is ARTICLEID < > 1 AND ARTICLEID < > 2
AND ARTICLEID < > NULL. However, ARTICLEID < > NULL is UNKNOWN thus any TRUE AND UNKNOWN evaluates
to UNKNOWN and then the WHERE evaluates to UNKNOWN then returns nothing!
More explanation with an example (only for your revision): Assume that the article ID is 4. Then, 4 is neither 1 (i.e.,
‘4 <> 1’ evaluates to TRUE) nor 2 (i.e., ‘4 <> 2’ evaluates to TRUE). The comparison of 4 with NULL is ‘4 <> NULL’,
which returns UNKNOWN. Hence, the whole expression is: TRUE AND TRUE AND UNKNOWN, which evaluates to
UNKNOWN, thus, nothing will be returned. Assume now that the article ID is 2. Then, ‘2 <> 1’ is TRUE, ‘2 <> 2’ is
FALSE, and ‘2 <> NULL’ is UNKNOWN. Hence, the whole expression is: TRUE AND FALSE AND UNKNOWN. TRUE
AND FALSE is FALSE. And, FALSE AND UNKNOWN is FALSE. Thus, nothing will be returned.
Part C: Relational Algebra & Heuristic Optimization
Exam Paper 2019: Question 6. Assume the relations: Employee(SSN, SUPER_SSN, Name) and Department(DNO,
MGR_SSN), where SSN is the Social Security Number (SSN) of the employee (Primary Key in relation Employee),
SUPER_SSN is the SSN of the employee’s supervisor, DNO is the unique identifier of the department (Primary Key
in relation Department) and MGR_SSN corresponds to the SSN of an employee who is manager at a department.
Consider the following query:
SQL1:
SELECT E1.NAME
FROM EMPLOYEE E1, EMPLOYEE E2, DEPARTMENT D
WHERE E1.SUPER_SSN = E2.SSN
AND E2.SSN = D.MGR_SSN AND D.DNO = 5
(a) What does the SQL1 query return?
Employees who are supervised by the manager of Department 5.
(b) Write the SQL1 query in a Relational Algebra Expression.
πE1.NAME (EMPLOYEE θ E1.SUPER_SSN = E2.SSN (EMPLOYEE θ E2.SSN = D.MGR_SSN (σD.DNO = 5 DEPARTMENT)))
Note: θ stands for the join operator.
(d) Draw the optimal Relation Algebra Tree of the SQL1 query after applying heuristic optimization rules. Provide
a brief explanation on the heuristic rules you adopted.
Firstly, the selection operator over the DEPARTMENT relation is applied to retrieve the unique department No.5,
since the DNO is a PK on that relation. This is the ‘entry’ point to the optimization. We reduce the attributes and
focus only on the MGR_SSN (project operator before joining), which will be used for the joining operator of the
Employee E2. The same holds true for applying the project operator over the Employee E2 to keep only the SSN
attribute before joining. Then, we apply the join operator over the condition ‘MGR_SSN = E2.SSN‘ and in addition,
we apply the project operator to keep only the E2.SSN, which will be used to join the Employee E1 relation. Before
this join, we apply the project operator over the E1 and keep the attributes: NAME and SUPER_SSN. That is
because, first, we need the SUPER_SSN to join with the E2.SSN, and then the E1.Name will be used for the last
project operator, which will give us the final outcome. The rationale is that we apply the selection and projection
operators before a join operation, to reduce the number of tuples and their attributes.