CS计算机代考程序代写 SQL database Database Theory and Applications M

Database Theory and Applications M

SELECTED SOLUTIONS FROM PAST EXAM PAPERS

Part A: Relational Modelling & Normalisation

Exam Paper 2019: Question 2 (b) Consider the relation R(A, B, C, D, E) with the following FDs:

• FD1: A → E

• FD2: B → D

• FD3: {A, B} → C

1. Which are the possible candidate keys in the relation R? Explain your answer.

The minimum subset of attributes that functionally determine all attributes is {A,B}, since A and B partially

determine E and D, respectively, and both determine C. Hence, {A,B} → {E,D,C}.

2. Which is the maximum Normal Form (NF) of the relation R? Explain your answer.

The relation is in 1NF due to atomic attributes. Now, to check about the 2NF, there should not be any partial

dependency of non-prime attributes on the primary key (and any candidate key). D and E are partially

dependent on {A,B}, thus, the relation is in 1NF only.

3. Decompose the relation R into a set of relations in Boyce-Codd NF (BCNF) and show the process of this

decomposition in your answer.

To avoid the partial dependencies, we split relation R into relations corresponding to the direct FDs. That is,

we construct the relations where the primary key directly determines the non-prime attributes, which are:

R1(A,E), R2(B,D), and R3(A,B,C) and the primary keys are provided underlined. Now, in each of the relations

R1, R2, and R3, the primary key directly determines the non-prime attributes. In this case, R1, R2, and R3 are

in 2NF. There are no transitive dependencies in each relation, thus, the schema is also in 3NF. Since in all FDs

now, the left-hand side corresponds to primary key, the schema is also in BCNF.

Exam Paper 2020: Question 2

Consider the relation R(A, B, C, D, E) with the following FDs:

FD1: A → C; FD2: C → D; FD3: D → E; FD4: B → D

a) Which are the possible candidate keys in the relation R? Explain your answer.

b) Which is the maximum Normal Form (NF) of the relation R? Explain your answer.

c) Decompose the relation R into a set of relations in Boyce-Codd NF (BCNF). Describe your
methodology at each stage of decomposition.

(a) By transitivity, A → {C, D, E} and since B → D then {A, B} can determine all the attributes in the

relation.

(b) Since there is a partial FD in the PK {A, B}, where B → D, then it is not in the 2NF. Hence, it is in

the 1NF.

(c) For 2NF, A → C, A → D, and A → E are partial FDs, we obtain that R1(A, C, D, E) and R2(A, B).

To 3NF, there are transitive FDs: A→C with C→D and D→E, this we split R1 to R3(A,C), R4(C,D), and

R5(D,E). Now, they are all also in BCNF.

Exam Paper 2020: Question 3

Assume a relation R(A, B, C, D, E) where {A,B,C} is the (composite) primary key. Consider also the

FDs: FD1: {A,B} → E; FD2: C → D;

a) Which is the maximum Normal Form (NF) of the relation R? Explain your answer.

b) Decompose the relation R into a set of relations in 2NF. Describe your methodology of decomposition.

(a) There are partial FDs like {A, B} → E, thus being only in 1NF.

(b) AB→E is partial FD, thus, we create the new R1(A, B, E). C→D is a partial FD, thus, we create the

new R2(C, D). The remaining is R3(A, B, C). Now they are all in 2NF.

Exam Paper 2020: Question 4

Assume a relation R(A, B, C) where B is the primary key. Consider also the FD: C → A; Decompose the

relation R into a set of relations in BCNF, if relation R is not already in BCNF. For each relation in BCNF,

define the primary and foreign keys, if exist.

The relation is not in the BCNF since in the C→A, the LHS (left-hand side) of the FD is not a superkey.

By applying the BCNF decomposition theorem given the violating FD C→A, we obtain that: R1(C, A)

and R2(B, C). In R1, the attribute C is the PK (due to the original FD: C →A), while in R2, B is the PK,

due to the original FD: B → C (recall: B is the PK, thus, determining all the other attributes in R). Also,

the C attribute in R2 is the FK referencing to the C attribute in R1.

Part B: SQL

Exam Paper 2020: Question 5. Consider the following relational schema:

Author(AuthorID, Name)

Authoring(ARTID, AID)

Article(ArticleID, PublicationYear, Title)

where the attribute AID is a foreign key in Authoring relation referencing to AuthorID in Author relation, and the
attribute ARTID is a foreign key in Authoring referencing to ArticleID in Article relation. The primary key is
underlined in each relation.

a) Write a SQL query that shows the publication years of the articles written by author ‘John’.

SELECT DISTINCT C.PublicationYear

FROM Author A, Authoring B, Article C

WHERE A.AuthorID = B.AID AND B.ARTID = C.ArticleID

AND A.Name LIKE ‘John’

b) Write a SQL query that shows the number of articles written by each author.

SELECT A.AuthorID, COUNT(*)

FROM Author A, Authoring B

WHERE A.AuthorID = B.AID

GROUP BY A.AuthorID

c) Write a SQL query that shows the names of the authors who have written more than 100 articles.

SELECT A.AuthorID, A.Name

FROM Author A, Authoring B

WHERE A.AuthorID = B.AID

GROUP BY A.AuthorID

HAVING COUNT(*) > 100

d) Write a SQL query that checks if there exist authors who have not written any article. If so, the query returns
their names.

SELECT A.Name

FROM Author A

WHERE NOT EXISTS (

SELECT *

FROM Authoring B

WHERE A.AuthorID = B.AID)

e) For each author who has written more than 50 articles, show how many of these articles have been
published since 2016.

SELECT B.AID, COUNT(*)

FROM Authoring B, Article C

WHERE B.ARTID = C.ArticleID AND C.PublicationYear >=2016

AND B.AID IN (SELECT A1.AID

FROM Authoring A1

GROUP BY A1.AID

HAVING COUNT(*) > 50)

GROUP BY B.AID;

f) Which is the expected result of the following query? Explain your answer.

SELECT * FROM ARTICLE WHERE ARTICLEID NOT IN (NULL, 1, 2)

The problem is the NULL value in the set, where the IN operator is applied. Specifically, ARTICLEID NOT IN (1, 2,

NULL) means: ARTICLEID should be neither 1, nor 2, nor NULL. That is, it is ARTICLEID < > 1 AND ARTICLEID < > 2

AND ARTICLEID < > NULL. However, ARTICLEID < > NULL is UNKNOWN thus any TRUE AND UNKNOWN evaluates

to UNKNOWN and then the WHERE evaluates to UNKNOWN then returns nothing!

More explanation with an example (only for your revision): Assume that the article ID is 4. Then, 4 is neither 1 (i.e.,

‘4 <> 1’ evaluates to TRUE) nor 2 (i.e., ‘4 <> 2’ evaluates to TRUE). The comparison of 4 with NULL is ‘4 <> NULL’,

which returns UNKNOWN. Hence, the whole expression is: TRUE AND TRUE AND UNKNOWN, which evaluates to

UNKNOWN, thus, nothing will be returned. Assume now that the article ID is 2. Then, ‘2 <> 1’ is TRUE, ‘2 <> 2’ is

FALSE, and ‘2 <> NULL’ is UNKNOWN. Hence, the whole expression is: TRUE AND FALSE AND UNKNOWN. TRUE

AND FALSE is FALSE. And, FALSE AND UNKNOWN is FALSE. Thus, nothing will be returned.

Part C: Relational Algebra & Heuristic Optimization

Exam Paper 2019: Question 6. Assume the relations: Employee(SSN, SUPER_SSN, Name) and Department(DNO,

MGR_SSN), where SSN is the Social Security Number (SSN) of the employee (Primary Key in relation Employee),

SUPER_SSN is the SSN of the employee’s supervisor, DNO is the unique identifier of the department (Primary Key

in relation Department) and MGR_SSN corresponds to the SSN of an employee who is manager at a department.

Consider the following query:

SQL1:

SELECT E1.NAME

FROM EMPLOYEE E1, EMPLOYEE E2, DEPARTMENT D

WHERE E1.SUPER_SSN = E2.SSN

AND E2.SSN = D.MGR_SSN AND D.DNO = 5

(a) What does the SQL1 query return?

Employees who are supervised by the manager of Department 5.

(b) Write the SQL1 query in a Relational Algebra Expression.

πE1.NAME (EMPLOYEE θ E1.SUPER_SSN = E2.SSN (EMPLOYEE θ E2.SSN = D.MGR_SSN (σD.DNO = 5 DEPARTMENT)))

Note: θ stands for the join operator.

(d) Draw the optimal Relation Algebra Tree of the SQL1 query after applying heuristic optimization rules. Provide

a brief explanation on the heuristic rules you adopted.

Firstly, the selection operator over the DEPARTMENT relation is applied to retrieve the unique department No.5,

since the DNO is a PK on that relation. This is the ‘entry’ point to the optimization. We reduce the attributes and

focus only on the MGR_SSN (project operator before joining), which will be used for the joining operator of the

Employee E2. The same holds true for applying the project operator over the Employee E2 to keep only the SSN

attribute before joining. Then, we apply the join operator over the condition ‘MGR_SSN = E2.SSN‘ and in addition,

we apply the project operator to keep only the E2.SSN, which will be used to join the Employee E1 relation. Before

this join, we apply the project operator over the E1 and keep the attributes: NAME and SUPER_SSN. That is

because, first, we need the SUPER_SSN to join with the E2.SSN, and then the E1.Name will be used for the last

project operator, which will give us the final outcome. The rationale is that we apply the selection and projection

operators before a join operation, to reduce the number of tuples and their attributes.