Databases
Worksheets
Peter Mc.Brien
Wednesday 28 th March 2018
DB Worksheet 1: Primitive Relational Algebra Operators
branch
sortcode bname cash
56 ’Wimbledon’ 94340.45
34 ’Goodge St’ 8900.67
67 ’Strand’ 34005.00
movement
mid no amount tdate
1000 100 2300.00 5/1/1999
1001 101 4000.00 5/1/1999
1002 100 -223.45 8/1/1999
1004 107 -100.00 11/1/1999
1005 103 145.50 12/1/1999
1006 100 10.23 15/1/1999
1007 107 345.56 15/1/1999
1008 101 1230.00 15/1/1999
1009 119 5600.00 18/1/1999
account
no type cname rate? sortcode
100 ’current’ ’McBrien, P.’ NULL 67
101 ’deposit’ ’McBrien, P.’ 5.25 67
103 ’current’ ’Boyd, M.’ NULL 34
107 ’current’ ’Poulovassilis, A.’ NULL 56
119 ’deposit’ ’Poulovassilis, A.’ 5.50 56
125 ’current’ ’Bailey, J.’ NULL 56
key branch(sortcode)
key branch(bname)
key movement(mid)
key account(no)
movement(no)
fk
⇒ account(no)
account(sortcode)
fk
⇒ branch(sortcode)
1. What is the result of the RA query πcname,sortcode σtype=‘deposit’ account
2. What is the result of the RA query
πaccount.no,cname,mid σaccount.no=movement.no∧movement.amount<0 account×movement
3. Write an RA query to return the scheme (cname) that lists the cname of all customers that
have made a withdrawal from an account.
4. Write an RA query to return the scheme (cname,bname) that lists the cname of all deposit
account holders, together with bname where the account is held.
5. Write an RA query to return the scheme (sortcode) that lists the sortcodes of branches that
either have less than £10,000 of cash, or that hold deposit accounts.
6. Write an RA query to return the scheme (sortcode) that lists those branches where no deposit
account is held
DB Worksheet 2: Derived Relational Algebra Operators
1. What is the result of the RA query πbname,cname(branch ⋊⋉ account)
2. What is the result of the RA query πbname,cname(branch ⋊⋉ account ⋊⋉ movement)
3. What is the result of the RA query πsortcode,type account÷ πsortcode branch
4. Write an RA query returning the scheme (bname,mid) that lists branch names, together with
mids that have occurred at that branch.
5. Write an RA query returning the scheme (sortcode,bname,cash) that lists rows of all branches
that have at least one deposit account.
6. Write an RA query returning the scheme (sortcode) that lists the sortcodes of branches that
have both have less than £10,000 of cash, and that hold deposit accounts.
DB Worksheet 3: Equivalences Between RA Expressions
Consider the following queries over the bank branch database.
1. Put a tick or cross to indicate if each equivalence holds:
Equivalence Correct?
πno,type σtype=‘deposit’ account ≡ σtype=‘deposit’ πno,type account
πno,type σtype=‘deposit’ account ≡ σtype=‘deposit’ πtype,no account
πtype σtype=‘deposit’ account ≡ σtype=‘deposit’ πno account
σsortcode=56(account×movement) ≡ σsortcode=56 account×movement
πsortcode(account×movement) ≡ πsortcode account×movement
πno,type σtype=‘deposit’ account ≡ πno,type σtype<>‘current’ account
2. Simplify the following RA query to contain as few RA operators as possible
πno,type σsortcode=56 πno,type,sortcode σtype=‘deposit’ account
3. Rewrite the following RA query into the form π… σ…R × S, the general form of which we
will call a project select product (PSP) query.
σaccount.no=movement.no(πno,cname account× πmid,no σamount>1000 movement)
4. Rewrite the following RA to be a union between two PSP queries
σaccount.no=movement.no(πno,cname,rate account×
(πmid,no σamount>1000 movement ∪ πmid,no σamount<100movement))
5. Rewrite the following query to an equivalent form that minimises the number of tuples, and
the number of attributes within those tuples, that are handled by the × operator.
πno,cname,tdate σamount<0∧account.no=movement.no(account×movement)
DB Worksheet 4: Datalog
branch
sortcode bname cash
56 ’Wimbledon’ 94340.45
34 ’Goodge St’ 8900.67
67 ’Strand’ 34005.00
movement
mid no amount tdate
1000 100 2300.00 5/1/1999
1001 101 4000.00 5/1/1999
1002 100 -223.45 8/1/1999
1004 107 -100.00 11/1/1999
1005 103 145.50 12/1/1999
1006 100 10.23 15/1/1999
1007 107 345.56 15/1/1999
1008 101 1230.00 15/1/1999
1009 119 5600.00 18/1/1999
account
no type cname rate? sortcode
100 ’current’ ’McBrien, P.’ NULL 67
101 ’deposit’ ’McBrien, P.’ 5.25 67
103 ’current’ ’Boyd, M.’ NULL 34
107 ’current’ ’Poulovassilis, A.’ NULL 56
119 ’deposit’ ’Poulovassilis, A.’ 5.50 56
125 ’current’ ’Bailey, J.’ NULL 56
key branch(sortcode)
key branch(bname)
key movement(mid)
key account(no)
movement(no)
fk
⇒ account(no)
account(sortcode)
fk
⇒ branch(sortcode)
1. Write a Datalog query returning the scheme (cname,bname) listing the cname of all deposit
account holders, together with bname where the account is held.
2. Write a Datalog query returning the scheme (no,bname) listing the account number and
branch name of all accounts that have no movements recorded for the account.
3. Write a Datalog query that returns the scheme (no) listing the ‘target account’ numbers,
defined as those accounts that have made a withdrawal, or are of type deposit.
4. List what the following Datalog query returns, and explain its semantics.
query(CName) :-
account( , , ,CName, , ),
¬sub query(CName).
sub query(CName) :-
account( , ,CName, , ),
account( ,Type, , , ),
¬account( ,Type,CName, , ).
DB Worksheet 5: Translating Between RA and SQL
Consider the following fragment of the bank branch database:
account
no type cname rate? sortcode
100 ’current’ ’McBrien, P.’ NULL 67
101 ’deposit’ ’McBrien, P.’ 5.25 67
103 ’current’ ’Boyd, M.’ NULL 34
107 ’current’ ’Poulovassilis, A.’ NULL 56
119 ’deposit’ ’Poulovassilis, A.’ 5.50 56
125 ’current’ ’Bailey, J.’ NULL 56
movement(no)
fk
⇒ account(no)
movement
mid no amount tdate
1000 100 2300.00 5/1/1999
1001 101 4000.00 5/1/1999
1002 100 -223.45 8/1/1999
1004 107 -100.00 11/1/1999
1005 103 145.50 12/1/1999
1006 100 10.23 15/1/1999
1007 107 345.56 15/1/1999
1008 101 1230.00 15/1/1999
1009 119 5600.00 18/1/1999
1. Write the RA expression using project, select and product operators equivalent to the SQL
query below.
SELECT account.cname, movement.amount
FROM account JOIN movement ON account.no=movement.no
WHERE account.rate>2.0
AND movement.amount>1000
2. Modify your RA expression to use any other RA operators you have been shown to write a
mimimal RA expression (i.e. one that uses as few operators as possible).
3. Write a minimal SQL query (i.e. as compact as posible) equivalent to the RA expression
πno movement− πno account
4. Write a minimal SQL query equivalent to the RA expression πno,type account
5. Write a minimal SQL query equivalent to the RA expression πtype account
6. Write a minimal SQL query equivalent to the RA expression πtype,cname account
7. Write a minimal SQL query equivalent to the RA expresss account⋉movement
DB Worksheet 6: SQL Set Operators
The following questions again use the bank branch database:
branch
sortcode bname cash
56 ’Wimbledon’ 94340.45
34 ’Goodge St’ 8900.67
67 ’Strand’ 34005.00
movement
mid no amount tdate
1000 100 2300.00 5/1/1999
1001 101 4000.00 5/1/1999
1002 100 -223.45 8/1/1999
1004 107 -100.00 11/1/1999
1005 103 145.50 12/1/1999
1006 100 10.23 15/1/1999
1007 107 345.56 15/1/1999
1008 101 1230.00 15/1/1999
1009 119 5600.00 18/1/1999
account
no type cname rate? sortcode
100 ’current’ ’McBrien, P.’ NULL 67
101 ’deposit’ ’McBrien, P.’ 5.25 67
103 ’current’ ’Boyd, M.’ NULL 34
107 ’current’ ’Poulovassilis, A.’ NULL 56
119 ’deposit’ ’Poulovassilis, A.’ 5.50 56
125 ’current’ ’Bailey, J.’ NULL 56
key branch(sortcode)
key branch(bname)
key movement(mid)
key account(no)
movement(no)
fk
⇒ account(no)
account(sortcode)
fk
⇒ branch(sortcode)
1. Write an SQL query returning the scheme (mid,no,amount,tdate) listing all details of move-
ments for accounts 100,101,103 and 107.
2. Write an SQL query returning the scheme (sortcode) listing the sortcode of all branches
without any deposit accounts.
3. Write an SQL query without using any negation (i.e. without the use of NOT or EXCEPT)
returning the scheme (no) listing accounts with no movements on or before the 11-Jan-1999.
4. Write an SQL query returning the scheme (cname) listing customers that have every type of
account that appears in account.
DB Worksheet 7: Null Values in SQL
In a modified version of the bank branch database, called bank branch null, there are the
following two tables:
movement
mid no amount tdate
0999 119 45.00 null
1000 100 2300.00 5/1/1999
1001 101 4000.00 5/1/1999
1002 100 -223.45 8/1/1999
1004 107 -100.00 11/1/1999
1005 103 145.50 12/1/1999
1006 100 10.23 15/1/1999
1008 101 1230.00 15/1/1999
1009 119 5600.00 18/1/1999
1010 100 null 20/1/1999
1011 null null 20/1/1999
1012 null 600.00 20/1/1999
1013 null -46.00 20/1/1999
account
no type cname rate? sortcode
100 ’current’ ’McBrien, P.’ null 67
101 ’deposit’ ’McBrien, P.’ 5.25 67
119 ’deposit’ ’Poulovassilis, A.’ 5.50 56
125 ’current’ ’Bailey, J.’ null 56
1. Write an SQL query returning the scheme (mid) to find movements known not to have
occurred on 5/1/1999
2. Write an SQL query returning the scheme (mid) to find movements that have or might have
occurred on 5/1/1999.
3. Write an SQL query returning the scheme (no,mid) that lists account numbers, and any
movements mids that have or might have occurred on that account
4. What is the result of the following query:
SELECT account . no
FROM account
WHERE account . no NOT IN (SELECT movement . no FROM movement )
5. Write an SQL query returning the scheme (no) that lists those account numbers that might
not have any movements.
DB Worksheet 8: Left, Right, Outer and Inner Joins
The following question uses the bank branch null database:
movement
mid no amount tdate
0999 119 45.00 null
1000 100 2300.00 5/1/1999
1001 101 4000.00 5/1/1999
1002 100 -223.45 8/1/1999
1004 107 -100.00 11/1/1999
1005 103 145.50 12/1/1999
1006 100 10.23 15/1/1999
1008 101 1230.00 15/1/1999
1009 119 5600.00 18/1/1999
1010 100 null 20/1/1999
1011 null null 20/1/1999
1012 null 600.00 20/1/1999
1013 null -46.00 20/1/1999
account
no type cname rate? sortcode
100 ’current’ ’McBrien, P.’ null 67
101 ’deposit’ ’McBrien, P.’ 5.25 67
119 ’deposit’ ’Poulovassilis, A.’ 5.50 56
125 ’current’ ’Bailey, J.’ null 56
1. What is the result of
SELECT account . no , movement . mid
FROM account NATURAL LEFT JOIN movement
2. What is the result of
SELECT account . no , movement . mid
FROM account NATURAL FULL OUTER JOIN movement
3. Write an SQL query returning the scheme (no,cname,mid) that lists all account numbers in
the database (including just those in movement), and lists any known mid or cname for each
account. The result for the current data should be:
no cname mid
100 McBrien, P. 1000
100 McBrien, P. 1006
100 McBrien, P. 1010
100 McBrien, P. 1002
101 McBrien, P. 1008
101 McBrien, P. 1001
103 null 1005
107 null 1004
119 Poulovassilis, A. 999
119 Poulovassilis, A. 1009
125 Bailey, J. null
DB Worksheet 9: OLAP Queries in SQL
The following questions should be written to run on the bank branch database listed below.
branch
sortcode bname cash
56 ’Wimbledon’ 94340.45
34 ’Goodge St’ 8900.67
67 ’Strand’ 34005.00
movement
mid no amount tdate
1000 100 2300.00 5/1/1999
1001 101 4000.00 5/1/1999
1002 100 -223.45 8/1/1999
1004 107 -100.00 11/1/1999
1005 103 145.50 12/1/1999
1006 100 10.23 15/1/1999
1007 107 345.56 15/1/1999
1008 101 1230.00 15/1/1999
1009 119 5600.00 18/1/1999
account
no type cname rate? sortcode
100 ’current’ ’McBrien, P.’ NULL 67
101 ’deposit’ ’McBrien, P.’ 5.25 67
103 ’current’ ’Boyd, M.’ NULL 34
107 ’current’ ’Poulovassilis, A.’ NULL 56
119 ’deposit’ ’Poulovassilis, A.’ 5.50 56
125 ’current’ ’Bailey, J.’ NULL 56
key branch(sortcode)
key branch(bname)
key movement(mid)
key account(no)
movement(no)
fk
⇒ account(no)
account(sortcode)
fk
⇒ branch(sortcode)
1. Write an SQL query returning the scheme (no,balance,avg trans) that lists for each account
that has transactions the account no, the balance (computed as the sum of the movements
for the account), and the average value of transactions on that account.
2. Alter the query for (1) so that it includes the customer name of each account, and includes all
accounts held at the bank in the listing of balances, even if the account has no movements.
3. Write an SQL query returning the scheme (cname,current balance,deposit balance) that lists
one row for each customer (i.e. each distinct cname), with a column for the net balance of
all current accounts held by the customer, and a column for the net balance of all deposit
accounts held by the customer.
4. Write an SQL query returning the scheme (no,cname,type,pc cust funds,pc type funds) that
lists one row for each account, and for each account, lists the no, cname and type of the
account, and in pc cust funds the percentage of the customer funds held in the account, and
in pc type funds the percentage of the total funds in this particular type of account. For the
current data this should result in:
no cname type pc cust funds pc type funds
100 McBrien, P. current 28.52 84.22
101 McBrien, P. deposit 71.48 48.29
103 Boyd, M. current 100.00 5.87
107 Poulovassilis, A. current 4.20 9.91
119 Poulovassilis, A. deposit 95.80 51.71
125 Bailey, J. current NULL 0.00
DB Worksheet 10: Constructing an ERKLMOS Schema
The following text gives a description of a UoD, which you have been asked to build a relational
database that stores data associated with the UoD.
The payroll system for BIG Inc records the salaries, status, joining date, name, and
payroll number for all of the corporations 30,000 employees. Each employee works for
one division, and each division has an account number for paying its staff. We identify
divisions by their name, and record the address where the division’s HQ is located.
For employees sent abroad by BIG Inc, we record the address, country and telephone
number of the foreign tax office that will handle the employee. It is assumed that each
country has one central tax office that we have to deal with. All other employees have
their tax affairs dealt with by the Inland Revenue.
Draw an ERKLMOS schema of the UoD
DB Worksheet 11: Constructing an ERADHKLMNOSVW Schema
The following text gives a description of a UoD, which you have been asked to build a relational
database that stores data associated with the UoD.
The customer and supplier database of Big Inc will hold all accounts of the company,
divided into customer accounts and supplier accounts. All accounts have an account
number, and one account manager assigned from the company’s staff. Big Inc identifies
staff by a sid, and records the staff member’s name and room. The account managers
have a limit on the number of accounts they can manage. Only certain staff members
are permitted to be account managers.
For customer accounts we need to record a credit limit on the balance of the account,
and the telephone number of the accounts department at the customer.
For supplier accounts we need to record which Big Inc products are supplied, and at
what price.
Big Inc products are identified by the company standard part no and all have a descrip-
tion. For some we record the colour. Some products have a record of the components,
each component identified by a combination of part no and component number, and
again each has a description. Some products do not have a supplier.
Big Inc has purchased a copy of the Post Office address file, and associates every
account to an address from this file. The address data includes street number, street
name, town, county and post code, and uses a combination of street number and post
code as a key.
Draw an ERADHKLMNOSVW schema of the UoD
DB Worksheet 12: Minimal Cover of FDs and Candidate Keys
Suppose a relation R(A,B,C,D,E, F,G,H) has the FDs
S = {AB → DEH,BEF → A,FGH → C,D → EG,EG → BF,F → BH}
1. Rewrite S to an equivalent set of FDs which only have a single attribute on the RHS of each
FD.
2. Consider each FD X → A, and for each B ∈ X , consider if X → B from the other FDs. If
so, replace X → A by (X −B) → A in S.
3. Consider each FD X → A, and compute X+ without using X → A. If A ⊆ X+, delete
X → A since it is rundundant. This will give a minimal cover Sc of S.
4. Justify what are the minimal candidate keys of R constrained by Sc
DB Worksheet 13: Lossless Decomposition of Relations
1. Suppose relation R(A,B,C,D,E) has the FDs S = {AB → C,C → DE,E → A}. For each
of the following decompositions, determine if the decomposition if lossless, and if not lossless,
illustrate a dataset for R that fails to decompose in a lossless manner over the relations.
(a) R1(A,B,C), R2(C,D,E)
(b) R1(A,B,C), R2(C,D), R3(D,E)
2. Suppose relation R(A,B,C,D,E, F ) has the FDs S = {AB → CD,C → E,A → F}. Give
a lossless decomposition of R into three relations R1, R2, and R3.
3. Suppose relation R(A,B,C,D,E, F ) has the FDs S = {AB → CD,C → E,F → A}. Give
a lossless decomposition of R into three relations R1, R2, and R3.
DB Worksheet 14: Normal Forms
Suppose a relation R(A,B,C,D,E, F,G,H) has the FDs
Sc = {AB → D,EF → A,FG → C,D → EG,EG → F, F → BH}
1. Decompose the relation into 3NF
2. Decompose the relation into BCNF
3. Determine if your decompositions in (1) and (2) preserve FDs, and if they do not, suggest
how to amend you schema to preserve FDs.
DB Worksheet 15: Anomalies in Transactions
BEGIN TRANSACTION rental charge
UPDATE directory
SET charge=charge+17
COMMIT TRANSACTION rental charge
BEGIN TRANSACTION transfer charge
UPDATE directory
FROM charge=charge+100
WHERE telephone=1000
UPDATE directory
SET charge=charge-100
WHERE telephone=1002
COMMIT TRANSACTION transfer charge
BEGIN TRANSACTION total charge
SELECT SUM(charge)
FROM directory
COMMIT TRANSACTION total charge
directory
telephone name charge
1000 Adams 10.00
1001 Jones 120.25
1002 Black 344.00
rental charge H1 = r1[d1000], w1[d1000], r1[d1001], w1[d1001], r1[d1002], w1[d1002]
transfer charge H2 = r2[d1000], w2[d1000], r2[d1002], w2[d1002]
total charge H3 = r3[d1000], r3[d1001], r3[d1002]
For each of the following histories, identify if they are a concurrent execution of some pair of the
above histories, and if so, then determine if any of the following three anomalies has occurred:
A lost update
B inconsistent analysis
C dirty read
1. r3[d1000], r1[d1000], w1[d1000], r1[d1001], w1[d1001], r1[d1002], w1[d1002], c1, r3[d1001], r3[d1002], c3
2. r1[d1000], r1[d1001], r1[d1002], r2[d1000], w2[d1000], r2[d1002], w2[d1002], w1[d1000], w1[d1001], w1[d1002], c1, c2
3. r1[d1000], r2[d1000], w1[d1000], r1[d1001], w1[d1001], r1[d1002], w2[d1000], r2[d1002], w1[d1002], w2[d1002], c1, c2
4. r3[d1000], r3[d1001], r2[d1000], w2[d1000], r3[d1002], r2[d1002], w2[d1002], c2, a3
5. r2[d1000], w2[d1000], r1[d1000], r2[d1002], w2[d1002], a2, w1[d1000], r1[d1001], w1[d1001], r1[d1002], w1[d1002], c1
DB Worksheet 16: Serialisability
H1 = r1[o1], w1[o1], w1[o2], w1[o3], c1
H2 = r2[o2], w2[o2], w2[o1], c2
H3 = r3[o1], w3[o1], w3[o2], c3
Hx = r1[o1], w1[o1], r2[o2], w2[o2], w2[o1], c2, w1[o2], r3[o1], w3[o1], w3[o2], c3, w1[o3], c1
Hy = r3[o1], w3[o1], r1[o1], w1[o1], w3[o2], c3, w1[o2], r2[o2], w2[o2], w2[o1], c2, w1[o3], c1
1. Determine what are the ordered conflicting pairs in Hx, and write them down in the form
rwi[o] → rwj [o] (where the rw should be replaced by a r or w in your answer as appropriate).
Use your answer to determine if Hx is CSR.
2. Determine the conflicting pairs in Hy, and use your answer to determine if Hy is CSR.
DB Worksheet 17: Recoverability
Hw = r2[o1], r2[o2], w2[o2], r1[o2], w2[o1], r2[o3], c2, c1
Hx = r2[o1], r2[o2], w2[o1], w2[o2], w1[o1], w1[o2], c1, r2[o3], c2
Hy = r2[o1], r2[o2], w2[o2], r1[o2], w2[o1], c1, r2[o3], c2
Hz = r2[o1], w1[o1], r2[o2], w2[o2], r2[o3], c2, r1[o2], w1[o2], w1[o3], c1
1. Fill in for each history the list of pairs of operations where one transaction has read from
another transaction (wi[o] → rj [o]), and where one transaction has overwritten another
transaction (wi[o] → wj [o]).
Hw Hx Hy Hz
2. Determine if Hw is RC, ACA, or ST.
3. Determine if Hx is RC, ACA, or ST.
4. Determine if Hy is RC, ACA, or ST.
5. Determine if Hz is RC, ACA, or ST.
DB Worksheet 18: Deadlocks and Waits-For Graphs
You are told that three transactions H1, H2, H3 perform the following sequence of operations:
H1 : w1[o1], r1[o2], r1[o4]
H2 : r2[o3], r2[o2], r2[o1]
H3 : r3[o4], w3[o4], r3[o3], w3[o3]
1. Write down all the possible conflict pairs from H1, H2, H3 in the form rwi → rwj .
2. Write a concurrent execution of H1, H2, H3 which results in a deadlock involving all three
transactions. [Hint, use the conflict pairs to determine which transaction might have to wait
for another, and then work out possible cycles of such waits-for].
3. Draw the WFG for your answer in (2).
4. How would you resolve the deadlock?
DB Worksheet 19: Cache Consistent Checkpoint
The log belong is kept by a DM that is using cache consistent checkpointing and where the
scheduler is using strict executions.
LOG b7
UNDO w7[b67, cash=34005.25]
REDO w7[b67, cash=37005.25]
LOG b2
UNDO w2[b34, cash=10900.67]
REDO w2[b34, cash=8900.67]
LOG b6
UNDO w6[a101, rate=5.25]
REDO w6[a101, rate=6.00]
LOG b1
UNDO w1[b56, cash=94340.45]
REDO w1[b56, cash=84340.45]
CP {1, 2, 6}
UNDO w6[a119, rate=5.50]
REDO w6[a119, rate=6.00]
LOG c6
UNDO w2[b67, cash=34005.00]
REDO w2[b67, cash=36005.25]
LOG b8
LOG c2
UNDO w1[b34, cash=8900.67]
REDO w1[b34, cash=18900.67]
LOG b9
UNDO w9[b67, cash=36005.00]
REDO w9[b67, cash=20000.00]
LOG c9
You have to recover the database after a system failure using the above log.
1. In the backward scan for undos, what will be the set of committed transactions, the set of
uncommitted transactions, and list of undo action(s) be when you reach the cp record?
2. Which actions(s) must you undo before the cp record?
3. Which actions(s) must you redo?
4. Can you devise a more efficient algorithm that minimises the number of redo and undo
actions performed?