CO526 Databases: Exercises
2018
In family history database, there is a person table, where people are identified by their name,
and always have their gender, date of birth (dob) and place of birth (born in) recorded. In addition,
each person may optionally have recorded the name of their father, and the name of their mother.
If the person has died, then the date of death dod must be present. Note that only a fragment of
the data held in the database is listed below.
person
name gender dob dod? father? mother? born in
Alice F 1885-02-25 1969-12-05 null null Windsor
Andrew M 1960-02-19 null Philip Elizabeth II London
Andrew of Greece M 1882-02-02 1944-12-03 George I of Greece null Athens
Anne (Princess) F 1950-08-15 null Philip Elizabeth II London
Charles M 1948-11-14 null Philip Elizabeth II London
Elizabeth II F 1926-04-21 null George VI Elizabeth London
…
person(father)
fk
⇒ person(name)
person(mother)
fk
⇒ person(name)
Questions
In the following questions you can test for a value v being null using the predicate isNull(v), and
v being not null using isNotNull(v). You may use subcripts on relation names to creat aliases of
relations, such that persona, personb, etc are aliases for person.
1. Describe how you would enhance the database schema (with additonal tables, columns,
primary keys or foreign keys) to allow the storage of which person is a monarch, and ensure
that we record for just monarchs (i) the year of succession to the throne as succ year, and
(ii) the name of the country which they are monarch of as cname.
The cleanest technique is to note that this implies a subset of persons in the form
monarch(name,succ year,country)
monarch(name)
fk
⇒ person(name)
2. Write a query in each of the following languages that returns the scheme (name,born in)
containing the name and place of birth of all people known to have been born in the same
place as their mother.
(a) RA
(b) Datalog
(a)
πperson.name,person.born in σperson.mother=personm.name∧person.born in=personm.born in(person× personm)
(b) We will assume SQL type handling of NULLs in Datalog
shared maternal birthplace(Name,BornIn) :-
person(Name, , ,Mother,BornIn),
person(Mother, , , ,BornIn).
3. Write a query in each of the following languages that returns the scheme (name) containing
names of all people known to be parents.
(a) RA
(b) Datalog
(a)
πmother σIsNotNull(mother) person ∪ πfather σIsNotNull(father) person
(b)
parent(Name) :-
person( , ,Name, , ),
isNotNull(Name).
parent(Name) :-
person( , , ,Name, ),
isNotNull(Name).
4. Write a query in each of the following languages that returns the scheme (name) containing
the names of all men not known to be fathers.
(a) RA
(b) Datalog
(a)
πname σgender=‘M’ person− πfather σisNotNull(father) person
(b) Using the general approach in the lectures:
isNotFather(Man) :-
person(Man, ‘M’, , , ),
¬isFather(Man).
isFather(Man) :-
person( , ,Man, , ).
In this case, since isFather contains only one predicate, one can simplify it to:
isNotFather(Man) :-
person(Man, ‘M’, , , ),
¬person( , ,Man, , ).
5. Write a query in each of the following languages returning the scheme (name) listing those
people that have had at least one child of each gender that appears in the database.
(a) RA
πfather as name,gender σIsNotNull(father) person÷ πgender person ∪
πmother as name,gender σIsNotNull(mother) person÷ πgender person
Note that the above answer does make the assumption that each person can be just
a father, or just a mother. If you want to capture the concept of a person sometimes
being a mother, and sometimes being a father, you would need a union on the LHS of
each division.
(b) Datalog
all genders(Name) :-
parent(Name, ),
¬missing gender(Name).
missing gender(Name) :-
parent(Name, ),
person( ,Gender, , , , , ),
¬parent(Name,Gender).
parent(Name,Gender) :-
person( ,Gender, , ,Name, , ),
isNotNull(Name).
parent(Name,Gender) :-
person( ,Gender, , , ,Name, ),
isNotNull(Name).
6. Suppose the following RA query q has been executed:
πpersonb.name,persona.father σpersona.name=personb.mother∧isNotNull(persona.father)(persona × personb)
(a) If the query has been executed at one point in time, since which ∆ row have been
inserted into person to give person′ = person ∪∆p, give an RA query that returns any
additional answers to the original query.
(b) If the enhancements requested in Question 1(a) have been added, how could your answer
be improved?
(a) Note that the additional rows might be mothers, or children, or both. So
π∆p.name,person.father σperson.name=∆p.mother∧isNotNull(person.father)(person×∆p) ∪
πperson.name,∆p.father σ∆p.name=person.mother∧isNotNull(∆p.father)(∆p × person) ∪
π∆pa .name,∆pb .father
σ∆pb .name=∆pa .mother∧isNotNull(∆pb .father)
(∆pb ×∆pa)
(b) If one observes that the FK means that ∆p cannot be adding a mother of an existing
record in person, and hence we do not need to consider any existing rows of person in the
RHS of the union. Thus the query simplifies to:
π∆p.name,person.father σperson.name=∆p.mother∧isNotNull(person.father)(person×∆p) ∪
π∆pa .name,∆pb .father
σ∆pb .name=∆pa .mother∧isNotNull(∆pb .father)
(∆pb ×∆pa)
7. Write an SQL query that returns the scheme (name,born in) ordered by name containing the
name and place of birth of all people known to have been born in the same place as their
mother.
SELECT pe r son . name ,
pe r s on . b o r n i n
FROM pe r son
JOIN pe r son AS mother
ON pe r son . mother=mother . name AND pe r son . b o r n i n=mother . b o r n i n
8. Write an SQL query that returns the scheme (name) ordered by name containing the name
of all people known to be parents.
SELECT mother AS name
FROM per son AS mother
WHERE mother IS NOT NULL
UNION
SELECT f a t h e r AS name
FROM per son AS f a t h e r
WHERE f a t h e r IS NOT NULL
ORDER BY name
9. Write an SQL query that returns the scheme (name) ordered by name that lists parents for
whom all known children are of the same sex.
SELECT mother AS name ,
gender
FROM per son AS mother
WHERE mother IS NOT NULL
AND gender=ALL (SELECT gender
FROM per son
WHERE pe r son . mother=mother . mother )
UNION
SELECT f a t h e r AS name ,
gender
FROM per son AS f a t h e r
WHERE f a t h e r IS NOT NULL
AND gender=ALL (SELECT gender
FROM per son
WHERE pe r son . f a t h e r=f a t h e r . f a t h e r )
ORDER BY name ;
10. Suppose you have to design a new database to hold the following information about the com-
panies that do business with ACME Computing Ltd. These companies may be customers,
suppliers, or both. For all companies we record their name and contact email address, and
for companies that are VAT registered, we must record their VAT number.
For suppliers, we record the purchasing manager that deals with the supplier, and record a
number of types of product that the supplier can supply. We associate to each supplier all the
stock items currently being supplied. It is company policy that each stock item may come
from only one supplier, and some stock items are manufactured by ACME Computing Ltd
itself, and therefore have no supplier. Each stock item has a part number and description,
and some stock items have their colour recorded.
For customers, we record the sales manager that deals with the customer, and a credit limit.
We also record all orders made by the customer. Each order is on a particular date, is given
an order number, and has a reference number given by the customer. An order may have
any number of stock items, with the quantity and price for each stock item recorded.
When orders are sent, we record the tracking number of each parcel the order was sent in,
and the date on which the parcel was sent. We also wish to record how much of each stock
item was put in each parcel, in case the parcel gets lost and an insurance claim must be
made. We record the delivery address for each order, from a record of delivery addresses, for
which we record the customer name, postcode and address. We identify an address record
by the combination of the customer name and postcode.
(a) Design an ERADHKLMNOSVW schema to represent this new database.
stock
pno
description
colour?
supplier
purchasing manager
type*
qty price
on
0:N
0:N
parcel
sent in
0:N
0:N
qty
track no
date
for
1:1
0:N
supplies
0:1
0:N
delivery
address
postcode
address
sent to
0:N
1:1
order
order no
date
reference
for
1:1
0:N
customer
credit limit
manager
company
name
vat number?
email
✒ ■
(b) Map the ER schema you designed in (i) into a relational schema.
company(name,vat number?,email)
supplier(name,purchasing manager)
supplier type(name,type)
customer(name,credit limit,manager)
stock(pno,description,colour?,name?)
order(ono,date,reference,dname,postcode)
parcel(track no,date)
on(pno,ono,qty,price)
sent in(pno,ono,track no,qty)
delivery address(name,postcode,address)
customer(name)
fk
⇒ company(name)
supplier(name)
fk
⇒ company(name)
supplier type(name)
fk
⇒ supplier(name)
stock(name)
fk
⇒ supplier(name)
on(pno)
fk
⇒ stock(pno)
on(ono)
fk
⇒ order(ono)
sent in(pno,ono)
fk
⇒ on(pno,ono)
sent in(track no)
fk
⇒ parcel(track no)
order(dname,postcode)
fk
⇒ delivery address(name,postcode)
delivery address(name)
fk
⇒ customer(name)
11. Suppose that a relation R(A,B,C,D,E, F,G,H) has the functional dependency set S =
{A → BE,AC → G,AFG → E,B → ACG,CF → D,D → G,DEG → FCD,G → H}
(a) Compute a minimum cover Sc of S.
Rewrite with single attribute on RHS
Since D → G
DEG → FCD ⇒ DE → FCD
Since D → D
DE → FCD ⇒ DE → CF
Since A → B,B → CG
AC → G ⇒ ∅
Since A → BE
AFG → E ⇒ ∅
Therefore Sc = {A → BE,B → ACG,CF → D,D → G,DE → CF,G → H}
(b) Identify and justify all the candidate keys of R.
We find A+ = A,B,C,E,G,H
Thus A is not a candiate key, and we should consider adding D or F to A
adding D (ie AD+) will cover all attributes
Therefore, AF+ covers all attributes (because of A → C,CF → D)
Therefore BD+ and BF+ cover all attributes (because of B → A)
Since nothing determines A or B except each other, all candidate keys must include A or B,
and therefore we need not consider keys based on any other attributes.
So the candidate keys are AD,BD,AF,BF
(c) Decompose the relation R into 3NF.
Non-prime attributes are CEGH
Since G → H , and G is not a key, remove H from R to get
R1(G,H)
Since D → G, and D is not a key, remove G from R to get
R2(D,G)
Since A → CE, and A is not a key, remove CE from R to get
R3(A,C,E)
This leaves
R4(A,B,D, F )
R1, R2, R3, R4 are in 3NF.
However, the FDs B → G,CF → D,D → G,DE → CF are not preserved.
B → G can be preserved by adding G to R3 to get R5(A,C,E,G) (since A → B and B → A, it
is as good to preserve A → G).
DE → CF can be preserved by adding R6(C,D,E, F ).
Now R1, R2, R4, R5, R6 are in 3NF and preserve FDs.
Note that it is wrong to overnormalise the relations, and you should use the algorithm to
perform 3NF rather than just use all FDs to breakup the original relation.
(d) Decompose the relation R into BCNF.
Since A → B and A is not a key, R4 is not in BCNF, so remove B from R4 to get
R7(A,B) and R8(A,D, F )
Note that R7 can combine into R5 (since A → B and B → A) to get R9(A,B,C,E,G)
Since CF → D and CF is not a key of R6, we no longer can have this relation to preserve
FDs. However we can add R10(C,D, F ) to preserve CF → D. To preserve DE → CF we must
add R11(D,E,C) and R12(D,E, F ). Now BCNF is R1, R2, R8, R9, R10, R11, R12.
12. The following histories describe the sequence of operations performed respectively by four
transactions T1–T4.
H1 = r1[cCZ ], r1[cR], r1[cB], r1[cGB], c1
H2 = r2[cB], r2[cR], r2[cCZ ], w2[cCZ ], c2
H3 = r3[cB], w3[cB], r3[cCZ ], w3[cCZ ], c3
H4 = w4[cR], w4[cB], w4[cGB], c4
(a) Briefly explain if the following concurrent execution is serialisable and recoverable. If
non-serialisable, explain what anomaly occurs.
Ha = r1[cCZ ], r1[cR], r3[cB], w3[cB ], r3[cCZ ], w3[cCZ ],
r1[cB], r1[cGB], c1, c3,
Conflicts r1[cCZ ] → w3[cCZ ] and w3[cB] → r1[cB] form a cycle, therefore not serialisable.
Anomaly is inconsistent analysis by T1
No dirty r1[cB ] is committed whilst still dirty, so non-recoverable
(b) Briefly explain if the following concurrent execution is serialisable and recoverable. If
non-serialisable, explain what anomaly occurs.
Hb = r3[cB], w4[cR], w4[cB], w4[cGB], w3[cB],
r3[cCZ ], w3[cCZ ], c4, c3
Conflicts r3[cB] → w4[cB] and w4[cB] → w3[cB] form a cycle, therefore not serialisable.
No dirty reads so recoverable and ACA. However, dirty write w3[cB] means not ST.
(c) Briefly explain if the following concurrent execution is serialisable and recoverable. If
non-serialisable, explain what anomaly occurs.
Hc = r1[cCZ ], w4[cR], r1[cR], w4[cB ], r1[cB], w4[cGB], c4, r1[cGB], c1
Conflicts w4[cR] → r1[cR], w4[cB ] → r1[cB], and w4[cGB] → r1[cGB], do not form a cycle, and
therefore the history is serialisable.
Two dirty reads, and therefore not ACA (or ST), but since at time of c1 reads are no longer
dirty, the history is recoverable.
(d) Briefly explain which (if any) pair of the transactions taken from T1–T4 will be serialis-
able for all concurrent executions of the pair, and also briefly explain which (if any) pair
of the transactions taken from T1–T4 will be recoverable for all concurrent executions
of the pair.
Since T1 and T2 share only one conflict (between r1[cCZ ] and w2[cCZ ]), and therefore can
sequence those two operations in any order, it means all possible concurrent executions of
T1, T2 are serialisable.
Since all pairs of histories have at least one read-write conflict, there will always be a possible
non-recoverable concurrent history where the read follows the write in the conflicting pair, and
the transaction containing the read commits first.
(e) Give a concurrent execution of the four transactions, which produces a deadlock involv-
ing all four transactions, and draw a waits-for graph for the deadlock state.
r1[cCZ ], r1[cR], r3[cB], w3[cB ], r3[cCZ ], deadlock
T3 T2
T1 T4
r1[cB ]
❄
w4[cR]
✛
w3[cCZ ]
✻
r2[cB ]
✛
13. The table below lists the contents of a database log, which keeps only UNDO records on a
table country.
UNDO w1[cR, population = 148, 179, 000]
UNDO w4[cTR, population = 62, 481, 123]
UNDO w2[cCH, population = 7, 392, 444]
UNDO w2[cGB, population = 58, 543, 111]
UNDO w3[cCH, population = 7, 312, 222]
UNDO w5[cCH, population = 7, 210, 000]
LOG c4
UNDO w3[cB, population = 11, 020, 000]
LOG c3
country
name code capital area population
Czech Republic CZ Prague 78,703 10,321,120
Switzerland CH Bern 41,290 7,207,060
Russia R Moscow 17,075,200 148,178,487
Belgium B Brussels 30,510 10,170,241
Turkey TR Ankara 780,580 62,484,478
United Kingdom GB London 244,820 58,489,975
Egypt ET Cairo 1,001,450 63,575,107
.
.
.
(a) If the country table has the contents illustrated above, describe the actions performed
by the recovery procedure using the above log, and what population figures will be left
after recovery.
Working back from the end of the log, we must perform the oldest UNDO action of each object
from transactions which did not commit, where that UNDO is after the last UNDO action for
that object from a committed transaction.
UNDO w1[cR, population = 148, 179, 000]
UNDO w2[cGB, population = 58, 543, 111]
UNDO w5[cCH, population = 7, 210, 000]
Hence all rows of country will be the same, except for R the value will be 148,179,000, GB the
value will be 58,543,111, CH the value will be 7,210,000.
(b) If an additional LOG record were added for a5 to record the completion of aborting T5,
would your answer to (a) change, and if so, how does it change?
You do not need to perform UNDO records associated with T5, since the purpose of recording
of the abort is to indicate that the UNDO for T5 has been completed.
(c) Considering the time just after when c4 occurs, describe and justify which updates from
the above log must have been written to disc, which might have been written to disc,
and which must not have been written to disc.
With UNDO logging, everything must be written to disc from a committed transaction, so all
w4 operations must be on disc. Since T3 has yet to commit, this rule does not apply the the
preceeeding w3.
Any transactions might have been flushed out, so all the write operations preceding c4 from
T2, T3, T4 and T5 might be written to disc.
In UNDO only logging, there are no restrictions on which items must not have been flushed
out.