6. (De-)normalisation
6. (De-)normalisation
We will study:
Decomposition & Normalisation Two properties
Lossless join Dependency preservation
Normal forms
BCNF 3NF
De-normalisation Trade-o↵s
Data redundancy vs. query e ciency
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 2
6. (De-)normalisation
Housekeeping
Assignment 1 marking
Any appeals must be submitted in writing within two weeks of the results being released on Wattle. You must provide the reasons why your marks need to be reviewed and email to comp2400@anu.edu.au or comp6240@anu.edu.au.
SQL assessment in Week 8
COMP2400
The specification will be released on 30 August and submissions are due on 8am 20 September.
The assessment can only be done individually.
The submissions can only be done on Wattle (no hard copy). Each student needs to attend a 5-minute oral exam in Week 8 (specific time/venue to be released next week).
COMP6240
A mid-term exam will be held during 6:00-9:00pm, 21 September 2016 (Wednesday in Week 8) at CSIT labs
(specific time/venue to be released next week).
Each student may bring one A4 page with notes on both sides.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 1
6. (De-)normalisation
Recap
A driving force for the study of dependencies has been schema design.
primary key
candidate keys superkeys
functional dependencies
The goal of schema design is to select the most appropriate schema for a particular database application.
The choice of a schema is guided by semantic information about the application data provided by users and captured by dependencies.
A common approach starts with a universal relation and applies decomposition to create new relations that satisfy certain normal forms (i.e. normalization).
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 3
6. (De-)normalisation
Decomposition vs Normalisation
Decomposing a relation schema can possibly create more problems than it solves!
Thus, we need to consider two important questions:
1 Do we need to decompose a relation?
2 What problem (if any) does a given decomposition cause?
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 4
6. (De-)normalisation
Two Properties
We need to consider the following properties when decomposing a relation:
1 Lossless join – “ capture the same data ”
To disallow the possibility of generating spurious tuples when a NATURAL JOIN operation is applied to the relations after decomposition.
2 Dependency preservation – “ capture the same meta-data ”
To ensure that each functional dependency can be inferred from functional dependencies after decomposition.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 6
6. (De-)normalisation
Decomposition vs Normalisation
Decomposing a relation schema can possibly create more problems than it solves!
Thus, we need to consider two important questions:
1 Do we need to decompose a relation?
Several normal forms
,! help us to decide whether or not to decompose a relation
2 What problem (if any) does a given decomposition cause? Two properties
,! help us to decide how to decompose a relation
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 5
6. (De-)normalisation
Two Properties – Lossless Join
Example 1: Decomposition R into R1 and R2 as follows has the lossless join property because the natural join of R1 and R2 yields R.
R
Name
StudentID
DoB
Mike
123456
20/09/1989
Mike
123458
25/01/1988
R1
Name
StudentID
Mike
123456
Mike
123458
R2
StudentID
DoB
123456
20/09/1989
123458
25/01/1988
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 7
6. (De-)normalisation
Two Properties – Lossless Join
Example 2: However, the following decomposition from R into R3 and R4 doesn’t have the lossless join property. Why?
R
Name
StudentID
DoB
Mike
123456
20/09/1989
Mike
123458
25/01/1988
R3
Name
StudentID
Mike
123456
Mike
123458
R4
Name
DoB
Mike
20/09/1989
Mike
25/01/1988
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 8
6. (De-)normalisation
Two Properties – Lossless Join
Example 2: However, the following decomposition from R into R3 and R4 doesn’t have the lossless join property. It generates spurious tuples.
SELECT * FROM R3 NATURAL JOIN R4
Name
StudentID
DoB
Mike
123456
20/09/1989
Mike
123456
25/01/1988
Mike
123458
20/09/1989
Mike
123458
25/01/1988
R
Name
StudentID
DoB
Mike
123456
20/09/1989
Mike
123458
25/01/1988
R3
Name
StudentID
Mike
123456
Mike
123458
R4
Name
DoB
Mike
20/09/1989
Mike
25/01/1988
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 9
6. (De-)normalisation
Two Properties – Dependency Preservation
Example 1: Suppose that we have {StudentID} ! {Name} defined on R. Is the following decomposition dependency preserving?
R
Name
StudentID
CourseNo
Mike
123456
COMP2400
Mike
123458
COMP2600
R1
Name
CourseNo
Mike
COMP2400
Mike
COMP2600
R2
StudentID
CourseNo
123456
COMP2400
123458
COMP2600
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 10
6. (De-)normalisation
Two Properties – Dependency Preservation
Example 1: Suppose that we have {StudentID} ! {Name} defined on R. As StudentID and Name are decomposed into the different relations, {StudentID} ! {Name} is lost.
R
Name
StudentID
CourseNo
Mike
58
COMP2400
Mike
58
COMP2600
R1
Name
CourseNo
Mike
COMP2400
Mike
COMP2600
R2
StudentID
CourseNo
58
COMP2400
58
COMP2600
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 11
6. (De-)normalisation
Normal forms
1NF
+
2NF
+
3NF
+
BCNF …
Note that:
Normal Forms
Test criteria
weak
+
strong
BCNF 3NF 2NF 1NF
1NF is independent of keys and functional dependencies.
2NF, 3NF and BCNF are based on keys and functional dependencies.
4NF and 5NF are based on other dependencies (will not be covered in this course).
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 12
6. (De-)normalisation
BCNF
Do not represent the same fact twice (within a relation)!
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 14
6. (De-)normalisation
1 2
Normalisation
Decomposing a relation into smaller relations in a certain normal form Each normal form reduces certain kind of data redundancy.
Each normal form does not have certain types of (undesirable) dependencies.
What normal forms will we learn?
Boyce-Codd normal form (BCNF) Third normal form (3NF)
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 13
6. (De-)normalisation
BCNF – Definition
A relation schema R is in BCNF if whenever a non-trivial FD X ! A holds in R, then X is a superkey .
Note: this does not necessarily mean a good design.
When a relation schema is in BCNF, all data redundancy based on
functional dependency are removed.
Here data redundancy is considered in terms of FDs, i.e., for a non-trivial FD X ! Y , there exists a relation R that contains two distinct tuples t1 and t2 with t1[XY] = t2[XY].
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 15
6. (De-)normalisation
BCNF – Definition
A relation schema R is in BCNF if whenever a non-trivial FD X ! A holds in R, then X is a superkey .
Note: this does not necessarily mean a good design.
When a relation schema is in BCNF, all data redundancy based on
functional dependency are removed.
Here data redundancy is considered in terms of FDs, i.e., for a non-trivial FD X ! Y , there exists a relation R that contains two distinct tuples t1 and t2 with t1[XY] = t2[XY].
{CourseName} ! {Instructor}
TEACH
StudentID
CourseName
Instructor
u123456
Operating Systems
Hegel
u234566
Databases
Qing
u234567
Databases
Qing
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 16
6. (De-)normalisation
Normalisation to BCNF
Consider the relation schema TEACH with the following FDs: {StudentID, CourseName} ! {Instructor};
{Instructor} ! {CourseName}.
Is TEACH in BCNF?
TEACH
StudentID
CourseName
Instructor
u123456
Operating Systems
Hegel
u234567
Operating Systems
Ammar
u234567
Databases
Mark
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 17
6. (De-)normalisation
Normalisation to BCNF
Consider the relation schema TEACH with the following FDs: {StudentID, CourseName} ! {Instructor};
{Instructor} ! {CourseName}.
Is TEACH in BCNF?
{StudentID, CourseName} is a superkey but {Instructor} is not a superkey.
Not in BCNF because of {Instructor} ! {CourseName}.
TEACH
StudentID
CourseName
Instructor
u123456
Operating Systems
Hegel
u234567
Operating Systems
Ammar
u234567
Databases
Mark
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 18
6. (De-)normalisation
Normalisation to BCNF
Algorithm for a lossless BCNF decomposition
Input: a relation schema R and a set ⌃ of FDs on R.
Output: a set S of relation schemas in BCNF, each having a set of FDs
Start with S = {R};
While R0 in S is not in BCNF, do the following:
Find a (non-trivial) FD X ! Y in R0 that violates BCNF, where X \ Y = ; and (X )+ = X [ Y ;
Replace R0 in S by two relation schemas (R0 Y) and (X [ Y). Project the FDs in ⌃ onto each relation schema in S.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 19
6. (De-)normalisation
Normalisation to BCNF
R
X->Y
XY
R-Y
…
X’->Y’
X’Y’
…
R-Y-Y’
…
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 20
6. (De-)normalisation
BCNF – Example
Consider TEACH with the following FDs again: {StudentID,CourseName} ! {Instructor};
{Instructor} ! {CourseName}.
Can we normalise TEACH into BCNF by a lossless decomposition?
TEACH
StudentID
CourseName
Instructor
u123456
Operating Systems
Hegel
u234567
Operating Systems
Ammar
u234567
Databases
Mark
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 21
6. (De-)normalisation
BCNF – Example
Consider TEACH with the following FDs again: {StudentID,CourseName} ! {Instructor};
{Instructor} ! {CourseName}.
Start with S = {TEACH}, then replace TEACH with R1 and R2.
TEACH
StudentID
CourseName
Instructor
u123456
Operating Systems
Hegel
u234567
Operating Systems
Ammar
u234567
Databases
Mark
CourseName
R1
Instructor
StudentID
R2
Instructor
Operating Systems
u123456
Hegel
Operating Systems
Hegel
Ammar
u234567
Databases
Mark
u234567
Ammar
Mark
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 22
6. (De-)normalisation
BCNF – Example
Consider the relation schema TEACH with the following FDs: {StudentID,CourseName} ! {Instructor};
{Instructor} ! {CourseName}.
TEACH
StudentID
CourseName
Instructor
u123456
Operating Systems
Hegel
u234567
Operating Systems
Ammar
u234567
Databases
Mark
CourseName
R1
Instructor
R2
Operating Systems
Operating Systems
Hegel
u123456
Ammar
u234567
Databases
Mark
u234567
Ammar
Mark
Is the decomposition dependency preserving?
StudentID
Instructor
Hegel
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 23
6. (De-)normalisation
BCNF – Example
Consider the relation schema TEACH with the following FDs:
{StudentID,CourseName} ! {Instructor}; Lost! {Instructor} ! {CourseName}.
TEACH
StudentID
CourseName
Instructor
u123456
Operating Systems
Hegel
u234567
Operating Systems
Ammar
u234567
Databases
Mark
CourseName
R1
Instructor
R2
Operating Systems
Operating Systems
Hegel
u123456
Ammar
u234567
Databases
Mark
u234567
Ammar
Mark
No. We have {Instructor} ! {CourseName} on R1 and ; on R2.
StudentID
Instructor
Hegel
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 24
6. (De-)normalisation
BCNF – Exercise
Consider INTERVIEW={OfficerID, CustomerID, Date, Time, Room} with the following FDs:
{OfficerID, Date} ! {Room} {CustomerID, Date} ! {OfficerID, Time} {OfficerID, Date, Time} ! {CustomerID} {Date, Time, Room} ! {CustomerID}
Is INTERVIEW in BCNF? If not, normalize INTERVIEW into BCNF using a lossless decomposition.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 25
6. (De-)normalisation
BCNF – Exercise
Consider INTERVIEW={OfficerID, CustomerID, Date, Time, Room} with the following FDs:
{OfficerID, Date} ! {Room} {CustomerID, Date} ! {OfficerID, Time} {OfficerID, Date, Time} ! {CustomerID} {Date, Time, Room} ! {CustomerID}
Is INTERVIEW in BCNF? If not, normalize INTERVIEW into BCNF using a lossless decomposition.
{CustomerID, Date}, {OfficerID, Date, Time}, and {Date, Time, Room} are the keys.
Every superset of one of these keys is a superkey.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 26
6. (De-)normalisation
BCNF – Exercise
Consider INTERVIEW={OfficerID, CustomerID, Date, Time, Room} with the following FDs:
{OfficerID, Date} ! {Room} {CustomerID, Date} ! {OfficerID, Time} {OfficerID, Date, Time} ! {CustomerID} {Date, Time, Room} ! {CustomerID}
Is INTERVIEW in BCNF? If not, normalize INTERVIEW into BCNF using a lossless decomposition.
{CustomerID, Date}, {OfficerID, Date, Time}, and {Date, Time, Room} are the keys.
Every superset of one of these keys is a superkey.
INTERVIEW is not in BCNF because of {OfficerID, Date} ! {Room}.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 27
6. (De-)normalisation
BCNF – Exercise
We decompose INTERVIEW along the FD: {OfficerID, Date} ! {Room}:
OfficerID
S1011
S1011
S1024
S1024
CustomerID
P100
P105
P108
P107
INTERVIEW
Date
14/11/2013
Time
Room
12/11/2013
12/11/2013
10:00
R15
12:00
R15
14:00
R10
14/11/2013
INTERVIEW1
OfficerID
Date
Room
S1011
12/11/2013
R15
S1024
14/11/2013
R10
OfficerID
S1011
14:00
R10
INTERVIEW2
CustomerID
Date
Time
10:00
S1011
S1024
S1024
P100
P105
P108
P107
12/11/2013
12/11/2013
14/11/2013
12:00
14:00
14/11/2013
14:00
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 28
6. (De-)normalisation
BCNF – Exercise
Consider INTERVIEW={OfficerID, CustomerID, Date, Time, Room} with the following FDs:
{OfficerID, Date} ! {Room} {CustomerID, Date} ! {OfficerID, Time} {OfficerID, Date, Time} ! {CustomerID} {Date, Time, Room} ! {CustomerID}
Does this decomposition preserve FDs?
INTERVIEW2
OfficerID
CustomerID
Date
Time
S1011
P100
12/11/2013
10:00
S1011
P105
12/11/2013
12:00
S1024
P108
14/11/2013
14:00
S1024
P107
14/11/2013
14:00
INTERVIEW1
OfficerID
Date
Room
S1011
12/11/2013
R15
S1024
14/11/2013
R10
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 29
6. (De-)normalisation
BCNF – Exercise
Consider INTERVIEW={OfficerID, CustomerID, Date, Time, Room} with the following FDs:
{OfficerID, Date} ! {Room} {CustomerID, Date} ! {OfficerID, Time} {OfficerID, Date, Time} ! {CustomerID} {Date, Time, Room} ! {CustomerID}
{Date, Time, Room} ! {CustomerID} is lost!
INTERVIEW2
OfficerID
CustomerID
Date
Time
S1011
P100
12/11/2013
10:00
S1011
P105
12/11/2013
12:00
S1024
P108
14/11/2013
14:00
S1024
P107
14/11/2013
14:00
INTERVIEW1
OfficerID
Date
Room
S1011
12/11/2013
R15
S1024
14/11/2013
R10
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 30
6. (De-)normalisation
BCNF – Order Does Matter
When applying BCNF decomposition, the order in which the FDs are applied may lead to different results.
Example: Consider R = {A,B,C,D} and {A ! B,B ! C}. What are the results if we decompose along the FDs in different orders?
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 31
6. (De-)normalisation
BCNF – Order Does Matter
When applying BCNF decomposition, the order in which the FDs are applied may lead to different results.
Example: Consider R = {A,B,C,D} and {A ! B,B ! C}. What are the results if we decompose along the FDs in different orders?
Case 1:
R11 = {A,B},⌃11 = {A ! B};R12 = {A,C,D},⌃12 = ;. Case 2:
R21 = {B,C},⌃21 = {B ! C};R22 = {A,B,D},⌃22 = {A ! B}.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 32
6. (De-)normalisation
BCNF – Order Does Matter
When applying BCNF decomposition, the order in which the FDs are applied may lead to different results.
Example: Consider R = {A,B,C,D} and {A ! B,B ! C}. Case 1:
R11 = {A,B},⌃11 = {A ! B};R12 = {A,C,D},⌃12 = ;. Case 2:
R21 = {B,C},⌃21 = {B ! C};R22 = {A,B,D},⌃22 = {A ! B}.
Are they dependency preserving?
Case 2 is dependency preserving, but Case 1 is not.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 34
6. (De-)normalisation
BCNF – Order Does Matter
When applying BCNF decomposition, the order in which the FDs are applied may lead to different results.
Example: Consider R = {A,B,C,D} and {A ! B,B ! C}. Case 1:
R11 = {A,B},⌃11 = {A ! B};R12 = {A,C,D},⌃12 = ;. Case 2:
R21 = {B,C},⌃21 = {B ! C};R22 = {A,B,D},⌃22 = {A ! B}. Are they dependency preserving?
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 33
6. (De-)normalisation
Lossless join & Dependency Preservation
So far, we know how to find a lossless BCNF-decomposition, but it may not be dependency preserving.
Is there a less restrictive normal form such that a lossless and dependency preserving decomposition can always be found?
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 35
6. (De-)normalisation
3NF – Definition
A relation schema R is in 3NF if whenever a non-trivial FD X ! A holds in R, then X is a superkey or A is a prime attribute .
3NF allows data redundancy but excludes relation schemas with certain kinds of FDs:
1 Partial FD
2 Transitive FD
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 36
6. (De-)normalisation
StudentID
123456
123458
Normalisation to 3NF
Consider the following FDs of ENROL:
{StudentID, CourseNo, Semester} ! {ConfirmedBy ID, StaffName};
{ConfirmedBy ID} ! {StaffName}.
Is ENROL in 3NF?
CourseNo
COMP2400
ENROL
Semester
2008 S2
ConfirmedBy ID
u12
StaffName
2010 S2
Jane
123458
COMP2400
COMP2600
2008 S2
u13
u13
Linda
Linda
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 38
6. (De-)normalisation
3NF – Definition
1 Partial FD: A non-prime attribute A is functionally dependent on X, which is a proper subset of some key.
Key Attributes X Attribute A
2 Transitive FD: A non-prime attribute A is functionally dependent on X, which is not a proper subset of any key.
Key Attributes X Attribute A
Example: Let R = {A,B,C,D}, ⌃ = {A ! B,B ! C} and {A,D} is the only key. Then
B ! C is a transitive FD; A ! B is a partial FD.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 37
6. (De-)normalisation
Normalisation to 3NF
Consider the following FDs of ENROL:
{StudentID, CourseNo, Semester} ! {ConfirmedBy ID, StaffName};
{ConfirmedBy ID} ! {StaffName}.
Is ENROL in 3NF?
{StudentID, CourseNo, Semester} is the only key.
Not in 3NF. {ConfirmedBy ID} ! {StaffName} is a transitive FD, i.e., StaffName is transitively dependent on the key.
StudentID
123456
123458
CourseNo
COMP2400
ENROL
Semester
2010 S2
ConfirmedBy ID
StaffName
Jane
2008 S2
u12
123458
COMP2400
COMP2600
2008 S2
u13
u13
Linda
Linda
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 39
6. (De-)normalisation
Normalisation to 3NF
Algorithm for a dependency-preserving and lossless 3NF-decomposition
Input: a relation schema R and a set ⌃ of FDs on R.
Output: a set S of relation schemas in 3NF, each having a set of FDs
choose a key K for R and a minimal cover ⌃0 for ⌃ takeR0 =K andstartwithS ={R0}
then successively take all FDs X ! A1,…,X ! Ak for each left hand sideX ofaFDin⌃0:
addX [{A1}···[{Ak}toS
remove all redundant ones from S (i.e., remove Rj if Rj ✓ Ri ) project the FDs in ⌃ onto each relation schema in S
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 40
6. (De-)normalisation
Normalisation to 3NF
R
A key
A minimal cover
R0 … X’A’1…A’n (Remove redundant ones)
XA1…AK
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 41
6. (De-)normalisation
Normalisation to 3NF – Example
Consider ENROL again:
{StudentID, CourseNo, Semester} ! {ConfirmedBy ID, StaffName}
{ConfirmedBy ID} ! {StaffName}
Can we normalise ENROL into 3NF by a lossless and dependency preserving decomposition?
StudentID
CourseNo
Semester
ConfirmedBy ID
StaffName
123456
COMP2400
2010 S2
u12
Jane
123458
COMP2400
2008 S2
u13
Linda
123458
COMP2600
2008 S2
u13
Linda
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 42
6. (De-)normalisation
Normalisation to 3NF – Example
Consider ENROL again:
{StudentID, CourseNo, Semester} ! {ConfirmedBy ID, StaffName} {ConfirmedBy ID} ! {StaffName}
A key is {StudentID, CourseNo, Semester}.
A minimal cover is {{StudentID, CourseNo, Semester} !
{ConfirmedBy ID}, {ConfirmedBy ID} ! {StaffName}}.
Hence, we have:
R0={StudentID, CourseNo, Semester} (omitted)
R1={StudentID, CourseNo, Semester, ConfirmedBy ID} with {StudentID, CourseNo, Semester} ! {ConfirmedBy ID, StaffName} R2={ConfirmedBy ID, StaffName} with
{ConfirmedBy ID} ! {StaffName}
StudentID
CourseNo
Semester
ConfirmedBy ID
StaffName
123456
COMP2400
2010 S2
u12
Jane
123458
COMP2400
2008 S2
u13
Linda
123458
COMP2600
2008 S2
u13
Linda
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 43
6. (De-)normalisation
3NF – Exercises
Is a relation schema R in 3NF in terms of a given set ⌃ of FDs? Exercise1:TakeR={C,Z,S}and⌃={Z !C, CS!Z}
Exercise 2: Take R = {A, B, C, D} and ⌃={A!B, B!C, CD!A, AC!D}
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 44
6. (De-)normalisation
3NF – Exercises
Let us do some exercises for the 3NF-decomposition algorithm. Exercise 1: R = {A, B, C, D} and ⌃ = {A ! B, B ! C, AC ! D}:
Exercise2:R={A,B,C,D}and⌃={AD!B, AB!C, C!B}:
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 46
6. (De-)normalisation
3NF – Exercises
Is a relation schema R in 3NF in terms of a given set ⌃ of FDs? Exercise1:TakeR={C,Z,S}and⌃={Z !C, CS!Z}
We discover that the keys are ZS and CS. Hence, all attributes are prime and R is in 3NF.
Exercise 2: Take R = {A, B, C, D} and ⌃={A!B, B!C, CD!A, AC!D}
We discover that the keys are A, BD and CD. Hence, all attributes are prime and R is in 3NF.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 45
6. (De-)normalisation
3NF – Exercises
Let us do some exercises for the 3NF-decomposition algorithm. Exercise 1: R = {A, B, C, D} and ⌃ = {A ! B, B ! C, AC ! D}:
A is a key and {A ! B, B ! C, A ! D} is a minimal cover. R0 =A,R1 =ABD(henceomitR0),R2 =BC
The 3NF-decomposition is {ABD, BC} .
Exercise2:R={A,B,C,D}and⌃={AD!B, AB!C, C!B}:
AD is a key, and ⌃ is its own minimal cover.
R0 =AD,R1 =ABD(omitR0),R2 =ABC,R3 =CB(omitR3) The 3NF-decomposition is {ABD, ABC} .
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 47
6. (De-)normalisation
Consider INTERVIEW:
3NF – Exercise
INTERVIEW
OfficerID
CustomerID
Date
Time
Room
S1011
P100
12/11/2013
10:00
R15
S1011
P105
12/11/2013
12.00
R15
S1024
P108
12/11/2013
10.00
R10
S1024
P108
14/11/2013
14.00
R10
1 {OfficerID, Date} ! {Room}
2 {CustomerID, Date} ! {OfficerID, Time}
3 {OfficerID, Date, Time} ! {CustomerID}
4 {Date, Time, Room} ! {CustomerID}
Is INTERVIEW in 3NF? If not, normalise INTERVIEW into lossless and dependency preserving 3NF.
We know that {CustomerID, Date}, {OfficerID, Date, Time}, and {Date, Time, Room} are the keys.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 48
6. (De-)normalisation
3NF vs BCNF 1
Let R be a relation schema. We check every non-trivial FD X ! Y of R,
1 R is in 3NF if X is a superkey or Y are prime attributes;
2 RisinBCNFifX isasuperkey.
1R is in 3NF but not in BCNF if X is not a superkey but Y are prime attributes
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 50
6. (De-)normalisation
Summary of Normal Forms
1NF, 3NF and BCNF are popular in practice. Other normal forms are rarely used.
1NF: only atomic values for attributes
(part of the definition for the relational data model);
2NF: no partial dependencies
(just some intermediate result in the history of normalization theory);
3NF: no partial and transitive FDs, and dependencies can be preserved;
BCNF: no partial and transitive FDs, but dependencies may not be preserved.
3NF can only minimise (not necessarily eliminate) redundancy. So a relation schema in 3NF may still have update anomalies.
A relation schema in BCNF eliminates redundancy.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 49
6. (De-)normalisation
3NF vs BCNF
Consider R = {A,B,C,D} and ⌃ = {AB ! CD,D ! A}. We know that:
1 AB and BD are the keys of R;
2 A, B, D are the prime attributes of R.
Is R in 3NF? Is R in BCNF?
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 51
6. (De-)normalisation
3NF vs BCNF
Consider R = {A,B,C,D} and ⌃ = {AB ! CD,D ! A}. We know that:
1 AB and BD are the keys of R1;
2 A, B, D are the prime attributes of R1.
R is in 3NF but not in BCNF.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 52
6. (De-)normalisation
Denormalisation
The normalisation process enhances data integrity by minimising redundancy and inconsistency (i.e., avoid update anomalies).
However, do we need to normalize relation schemas in all cases when designing a relational database?
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 54
6. (De-)normalisation
3NF vs BCNF
We often want to keep two properties in a decomposition:
1 lossless join
2 dependency preservation
Facts
There exists an algorithm that can always generate a lossless and
dependency-preserving decomposition into 3NF, There exists an algorithm that can generate a lossless
decomposition into BCNF,
But, a lossless and dependency-preserving BCNF decomposition
does not always exist.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 53
6. (De-)normalisation
Why Denormalisation?
The normalisation process may degrade performance when data are frequently queried.
Since relation schemas are decomposed into many smaller ones after normalisation, queries need to join many relations together in order to return the results.
Unfortunately, join operation is very expensive.
When data is more frequently queried rather than being updated (e.g., data warehousing system), a weaker normal form is desired (i.e., denormalisation).
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 55
6. (De-)normalisation
Denormalisation
Denormalisation is a design process that
happens after the normalisation process,
is often performed during the physical design stage, and
reduces the number of relations that need to be joined for certain queries.
We need to distinguish:
Unnormalised – there is no systematic design.
Normalised – redundancy is reduced after a systematic design (to minimise data inconsistencies).
Denormalised – redundancy is introduced after analysing the normalised design (to improve efficiency of queries)
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 56
6. (De-)normalisation
Trade-offs
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 57
6. (De-)normalisation
Trade-offs
A good database design is to find a balance between desired properties, then normalise/denormalise relations to a desired degree.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 58
6. (De-)normalisation
Trade-offs – Data Redundancy vs. Query Efficiency
Normalisation:
No Data Redundancy but No Efficient Query Processing
Data redundancies are eliminated in the following relations.
STUDENT
Name
StudentID
DoB
Tom
123456
25/01/1988
Michael
123458
21/04/1985
COURSE
CourseNo
Unit
COMP2400
6
COMP8740
12
ENROL
StudentID
CourseNo
Semester
123456
COMP2400
2010 S2
123456
COMP8740
2011 S2
123458
COMP2400
2009 S2
However, the query for “list the names of students who enrolled in a course with 6 units” requires 2 join operations.
SELECT Name, CourseNo FROM Enrol e, Course c, Student s WHERE
e.StudentID=s.StudentID and e.CourseNo=c.CourseNo and c.Unit=6;
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 59
6. (De-)normalisation
Trade-offs – Data Redundancy vs. Query Efficiency
Denormalisation:
Data Redundancy but Efficient Query Processing
If a student enrolled 15 courses, then the name and DoB of this student need to be stored repeatedly 15 times in ENROLMENT.
However, the query for “list the names of students who enrolled a course with 6 units” can be processed efficiently (no join needed).
SELECT Name, CourseNo FROM Enrolment WHERE Unit=6;
ENROLMENT
Name
StudentID
DoB
CourseNo
Semester
Unit
Tom
123456
25/01/1988
COMP2400
2010 S2
6
Tom
123456
25/01/1988
COMP8740
2011 S2
12
Michael
123458
21/04/1985
COMP2400
2009 S2
6
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 60
6. (De-)normalisation
Summary
Both normalisation and denormalisation are useful in database design.
Normalisation: obtain database schema avoiding redundancies and data inconsistencies
Denormalisation: join normalized relation schemata for the sake of better query processing
Some problems of (de-)normalisation:
FDs cannot handle null values.
To apply normalisation, FDs must be fully specified.
The algorithms for normalisation are not deterministic, leading to different decompositions.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 61
6. (De-)normalisation
Big Data: Model or Not Model?
Traditional data modelling?
If it is not impossible, analyzing the dependencies (keys, functional dependencies, etc.) is challenging. How/whether traditional dat modelling techniques can be used for modelling large data sets?
No data modeling?
Is data just dumped into big files? If so, how would you retrieve data? Start at the beginning and search through, reading all the data to the end until you find what youre looking for?
Other approaches for data modelling?
NoSQL concepts Will be covered in Weeks 11 & 12
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 62
6. (De-)normalisation
Some questions
Weekly Review
What problems may be caused by the use of decompositions?
When is a decomposition said to be dependency-preserving or lossless-join?
What is the motivation for putting a relation in BCNF? Similarly, what is the motivation for putting a relation in 3NF?
What is a partial functional dependency? What is a transitive functional dependency?
How can we find a lossless-join BCNF decomposition?
How can we find a lossless-join and dependency-preserving 3NF decomposition?
Why do we need to consider de-normalisation for some applications?
More exercises
Attend Lab 5 next week, which will be about functional dependencies and normalisation.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 63