www.cardiff.ac.uk/medic/irg-clinicalepidemiology
Normalisation
Copyright By PowCoder代写 加微信 powcoder
Information modelling
& database systems
in the last lecture we considered what logical database design is about
in this lecture we will consider the process of normalisation, which is fundamental to database designs
… but first we will consider a concept that is fundamental to normalisation – functional dependencies
Functional dependencies
Functional dependencies
functional dependency (FD) can be explained using the following diagram
t1 a1 b1
t2 a2 b2
let A and B be sets of attributes
let t1 and t2 be any two tuples
if whenever a1 = a2 we have b1 = b2, then we say that A functionally determines B
notation: A B
{ID} {name, surname}
{code} {title}
{ID, code} {name, surname, title}
ID name surname code title
123 CMT207 Information modelling & database systems
456 CMT209 Informatics
Properties of FDs
FDs are defined for a relation schema, not for instances
a key is a special kind of FD and determines all the attributes of a table
if A B and A is not a key, then the relation has redundancy
combining rule: if A B and A C, then A B,C
splitting rule: if A B,C, then A B and A C
A A is always true
AB A is always true
Armstrong rules
to determine whether a functional dependency is implied by others, we use these rules:
reflexivity if B ⊆ A, then A B
augmentation if A B, then AC BC
transitivity if A B and B C, then A C
given a set of functional dependencies S, these rules can determine which dependencies are redundant with respect to S or if S is minimal
let R(A, B, C, D, E, F) be a relation
let S = {A BC, B E, CD EF} be a set of FDs
Q: Is AD F implied by S?
proof reason
1. A BC given
2. BC C reflexivity
3. A C 1, 2, transitivity
4. AD CD 3, augmentation
5. CD EF given
6. AD EF 4, 5, transitivity
7. EF F reflexivity
8. AD F 6, 7, transitivity
is implied
Normalisation
Functional dependencies & normalisation
normalisation is a process used to eliminate unwanted data dependencies (functional or others) from relations
we define a set of normal forms
each normal form has fewer functional dependencies than the previous one
functional dependencies represent redundancy
each normal form
has less redundancy
than the previous
First normal form (1NF)
First normal form (1NF)
a relation is in 1NF if its every attribute value is an atomic (non-decomposable) data item
to normalise a relation to 1NF, simply split up any non-atomic values
to normalise the following table into 1NF, we need to “flatten” it
Supplier# Status City Product# Qty
S1 20 London {P1,P2,P3,P4} {30,20,40,20}
S2 10 Paris {P1,P2} {30, 40}
S3 10 Paris P2 200
S4 20 London {P2,P5} {20,40}
Supplier# Status City Product# Qty
S1 20 London {P1,P2,P3,P4} {30,20,40,20}
S2 10 Paris {P1,P2} {30, 40}
S3 10 Paris P2 200
S4 20 London {P2,P5} {20,40}
Supplier# Status City Product# Qty
S1 20 London P1 30
S1 20 London P2 20
S1 20 London P3 40
S1 20 London P4 20
S2 10 Paris P1 30
S2 10 Paris P2 40
S3 10 Paris P2 200
S4 20 London P2 40
S4 20 London P5 40
First normal form (1NF)
some problems persist in 1NF
INSERT anomalies
can’t add supplier S5 until it supplies at least one product
UPDATE anomalies
if S1 moves from London to Amsterdam, we must update the relation 4 times
DELETE anomalies
if we remove product P2,
we lose information about supplier S3
Supplier# Status City Product# Qty
S1 20 London P1 30
S1 20 London P2 20
S1 20 London P3 40
S1 20 London P4 20
S2 10 Paris P1 30
S2 10 Paris P2 40
S3 10 Paris P2 200
S4 20 London P2 40
S4 20 London P5 40
Second normal form (2NF)
Second normal form (2NF)
a relation is in 2NF if it is in 1NF and every non–key attribute is dependent on the key
in the previous example, we have {Supplier#, Product#} as a key and the following functional dependencies
the given relation is not in 2NF because City and Status are not dependent on the key, but only a part of it
Normalising 1NF relations to 2NF
decompose a relation so that non-key attributes dependent on the key only
Supplier(Supplier#, Status, City)
SP(Supplier#, Product#, Qty)
Supplier# Status City Product# Qty
S1 20 London P1 30
S1 20 London P2 20
S1 20 London P3 40
S1 20 London P4 20
S2 10 Paris P1 30
S2 10 Paris P2 40
S3 10 Paris P2 200
S4 20 London P2 40
S4 20 London P5 40
Supplier# Product# Qty
Supplier# Status City
S1 20 London
S2 10 Paris
S3 10 Paris
S4 20 London
S5 30 Cardiff
Second normal form (2NF)
some problems will still persist in 2NF
INSERT anomalies
cannot add cities without suppliers, e.g.
(City:Rome, Status:50)
UPDATE anomalies
if a city changes a status, we must update multiple rows
DELETE anomalies
if we delete S5, we lose information about Cardiff as well
these are caused by the dependencies among the
non–key attributes
Supplier# Status City
S1 20 London
S2 10 Paris
S3 10 Paris
S4 20 London
S5 30 Cardiff
Third normal form (3NF)
Third normal form (3NF)
a relation is in 3NF if it is in 2NF and there are no functional dependencies among non–key attributes
in the previous example, we have Supplier# as a key and the following functional dependencies
this relation is not in 3NF because of a non–key dependency between City and Status
Normalising 2NF relations to 3NF
decompose a relation so that non–key attributes are not dependent on other non–key attributes
CS(City, Status)
SC(Supplier#, City)
Supplier# Status City
S1 20 London
S2 10 Paris
S3 10 Paris
S4 20 London
S5 30 Cardiff
City Status
Cardiff 30
Supplier# City
S5 Cardiff
Third normal form (3NF)
3NF resolves problems that persist in 2NF
but 3NF can still have anomalies
normalisation is related to database design
a database should normally be in 3NF at least
if your design leads to a non–3NF database,
then you need to re–design it
identify functional dependencies
think if they will lead to any insert, update or delete anomalies
remove them
Normalisation so far
first normal form 1NF
all data values are atomic
second normal form 2NF
every non–key attribute is dependent on the key
third normal form 3NF
no functional dependencies among non–key attributes
Boyce–Codd normal form (BCNF)
Third normal form (3NF)
some problems will still persist in 3NF
consider the following relation, which is in 3NF:
Enrolment(Student#, Module, Lecturer, Mark)
functional dependencies:
Student# Module Lecturer 1 Physics Hawking 80
S1 Music Mozart 70
S2 Literature Orwell 80
S3 Music Bach 65
S4 Physics Hawking 75
Third normal form (3NF)
INSERT anomalies
cannot add that Knuth teaches Computer Science until some student is enrolled
UPDATE anomalies
if we replace Hawking by Einstein, we must update multiple rows
DELETE anomalies
if we delete student S3, we lose information that Bach teaches Music
these anomalies are caused by a dependency
Lecturer Module whose determinant is not a key
Boyce–Codd normal form (BCNF)
a relation is in BCNF if for every functional dependency A B either:
B A (i.e. the dependency is trivial), or
A contains a candidate key of the relation
in other words, a relation is in BCNF iff every determinant in a non–trivial dependency is a (super) key
if there is only one candidate key, then 3NF and BCNF are the same
Normalising relations to BCNF
to normalise Enrolment into BCNF, we re-define the key
Enrolment(Student#, Module, Lecturer, Mark)
Enrolment(Student#, Lecturer, Module, Mark)
functional dependencies:
the revised relation is not in 2NF!
Normalising relations to BCNF
we just need to normalise the revised table as usual
S2(Lecturer, Module)
S1(Student#, Lecturer, Mark)
they are now in 3NF and BCNF
Student# Module Lecturer 1 Physics Hawking 80
S1 Music Mozart 70
S2 Literature Orwell 80
S3 Music Bach 65
S4 Physics Hawking 75
Student# Lecturer 1 Hawking 80
S1 Mozart 70
S2 Orwell 80
S3 Bach 65
S4 Hawking 75
Lecturer Module
Hawking Physics
Mozart Music
Orwell Literature
Normalisation
when a relation is in BCNF there are no longer any anomalies that result from functional dependencies
however, there may still ne anomalies that are caused by other dependencies
normalisation is good for relations that are subject to frequent updates, but is not always good for retrieval
e.g. Customer(Name, City, Postcode) is not in 3NF because Postcode City, but do we want to normalise it into 3NF?
in practice, we usually normalise relations into BCNF at the logical design stage and then de-normalise them at the physical design stage where necessary
/docProps/thumbnail.jpeg
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com