CS代写 CMT207 Information modelling & database systems

www.cardiff.ac.uk/medic/irg-clinicalepidemiology

Normalisation

Copyright By PowCoder代写 加微信 powcoder

Information modelling
& database systems

in the last lecture we considered what logical database design is about
in this lecture we will consider the process of normalisation, which is fundamental to database designs
… but first we will consider a concept that is fundamental to normalisation – functional dependencies

Functional dependencies

Functional dependencies
functional dependency (FD) can be explained using the following diagram

t1 a1 b1
t2 a2 b2

let A and B be sets of attributes
let t1 and t2 be any two tuples
if whenever a1 = a2 we have b1 = b2, then we say that A functionally determines B
notation: A  B

{ID}  {name, surname}
{code}  {title}
{ID, code}  {name, surname, title}

ID name surname code title
123 CMT207 Information modelling & database systems
456 CMT209 Informatics

Properties of FDs
FDs are defined for a relation schema, not for instances
a key is a special kind of FD and determines all the attributes of a table
if A  B and A is not a key, then the relation has redundancy
combining rule: if A  B and A  C, then A  B,C
splitting rule: if A  B,C, then A  B and A  C
A  A is always true
AB  A is always true

Armstrong rules
to determine whether a functional dependency is implied by others, we use these rules:
reflexivity if B ⊆ A, then A  B
augmentation if A  B, then AC  BC
transitivity if A  B and B  C, then A  C
given a set of functional dependencies S, these rules can determine which dependencies are redundant with respect to S or if S is minimal

let R(A, B, C, D, E, F) be a relation
let S = {A  BC, B  E, CD  EF} be a set of FDs
Q: Is AD  F implied by S?
proof reason
1. A  BC given
2. BC  C reflexivity
3. A  C 1, 2, transitivity
4. AD  CD 3, augmentation
5. CD  EF given
6. AD  EF 4, 5, transitivity
7. EF  F reflexivity
8. AD F 6, 7, transitivity

is implied

Normalisation

Functional dependencies & normalisation
normalisation is a process used to eliminate unwanted data dependencies (functional or others) from relations
we define a set of normal forms
each normal form has fewer functional dependencies than the previous one
functional dependencies represent redundancy
each normal form
has less redundancy
than the previous

First normal form (1NF)

First normal form (1NF)
a relation is in 1NF if its every attribute value is an atomic (non-decomposable) data item
to normalise a relation to 1NF, simply split up any non-atomic values
to normalise the following table into 1NF, we need to “flatten” it
Supplier# Status City Product# Qty
S1 20 London {P1,P2,P3,P4} {30,20,40,20}
S2 10 Paris {P1,P2} {30, 40}
S3 10 Paris P2 200
S4 20 London {P2,P5} {20,40}

Supplier# Status City Product# Qty
S1 20 London {P1,P2,P3,P4} {30,20,40,20}
S2 10 Paris {P1,P2} {30, 40}
S3 10 Paris P2 200
S4 20 London {P2,P5} {20,40}

Supplier# Status City Product# Qty
S1 20 London P1 30
S1 20 London P2 20
S1 20 London P3 40
S1 20 London P4 20
S2 10 Paris P1 30
S2 10 Paris P2 40
S3 10 Paris P2 200
S4 20 London P2 40
S4 20 London P5 40

First normal form (1NF)
some problems persist in 1NF
INSERT anomalies
can’t add supplier S5 until it supplies at least one product
UPDATE anomalies
if S1 moves from London to Amsterdam, we must update the relation 4 times
DELETE anomalies
if we remove product P2,
we lose information about supplier S3
Supplier# Status City Product# Qty
S1 20 London P1 30
S1 20 London P2 20
S1 20 London P3 40
S1 20 London P4 20
S2 10 Paris P1 30
S2 10 Paris P2 40
S3 10 Paris P2 200
S4 20 London P2 40
S4 20 London P5 40

Second normal form (2NF)

Second normal form (2NF)
a relation is in 2NF if it is in 1NF and every non–key attribute is dependent on the key
in the previous example, we have {Supplier#, Product#} as a key and the following functional dependencies

the given relation is not in 2NF because City and Status are not dependent on the key, but only a part of it

Normalising 1NF relations to 2NF
decompose a relation so that non-key attributes dependent on the key only

Supplier(Supplier#, Status, City)
SP(Supplier#, Product#, Qty)

Supplier# Status City Product# Qty
S1 20 London P1 30
S1 20 London P2 20
S1 20 London P3 40
S1 20 London P4 20
S2 10 Paris P1 30
S2 10 Paris P2 40
S3 10 Paris P2 200
S4 20 London P2 40
S4 20 London P5 40

Supplier# Product# Qty

Supplier# Status City
S1 20 London
S2 10 Paris
S3 10 Paris
S4 20 London
S5 30 Cardiff

Second normal form (2NF)
some problems will still persist in 2NF
INSERT anomalies
cannot add cities without suppliers, e.g.
(City:Rome, Status:50)
UPDATE anomalies
if a city changes a status, we must update multiple rows
DELETE anomalies
if we delete S5, we lose information about Cardiff as well
these are caused by the dependencies among the
non–key attributes

Supplier# Status City
S1 20 London
S2 10 Paris
S3 10 Paris
S4 20 London
S5 30 Cardiff

Third normal form (3NF)

Third normal form (3NF)
a relation is in 3NF if it is in 2NF and there are no functional dependencies among non–key attributes
in the previous example, we have Supplier# as a key and the following functional dependencies

this relation is not in 3NF because of a non–key dependency between City and Status

Normalising 2NF relations to 3NF
decompose a relation so that non–key attributes are not dependent on other non–key attributes

CS(City, Status)
SC(Supplier#, City)

Supplier# Status City
S1 20 London
S2 10 Paris
S3 10 Paris
S4 20 London
S5 30 Cardiff

City Status
Cardiff 30

Supplier# City
S5 Cardiff

Third normal form (3NF)
3NF resolves problems that persist in 2NF
but 3NF can still have anomalies
normalisation is related to database design
a database should normally be in 3NF at least
if your design leads to a non–3NF database,
then you need to re–design it
identify functional dependencies
think if they will lead to any insert, update or delete anomalies
remove them

Normalisation so far
first normal form 1NF
all data values are atomic
second normal form 2NF
every non–key attribute is dependent on the key
third normal form 3NF
no functional dependencies among non–key attributes

Boyce–Codd normal form (BCNF)

Third normal form (3NF)
some problems will still persist in 3NF
consider the following relation, which is in 3NF:
Enrolment(Student#, Module, Lecturer, Mark)
functional dependencies:

Student# Module Lecturer 1 Physics Hawking 80
S1 Music Mozart 70
S2 Literature Orwell 80
S3 Music Bach 65
S4 Physics Hawking 75

Third normal form (3NF)
INSERT anomalies
cannot add that Knuth teaches Computer Science until some student is enrolled
UPDATE anomalies
if we replace Hawking by Einstein, we must update multiple rows
DELETE anomalies
if we delete student S3, we lose information that Bach teaches Music
these anomalies are caused by a dependency
Lecturer  Module whose determinant is not a key

Boyce–Codd normal form (BCNF)
a relation is in BCNF if for every functional dependency A  B either:
B  A (i.e. the dependency is trivial), or
A contains a candidate key of the relation
in other words, a relation is in BCNF iff every determinant in a non–trivial dependency is a (super) key
if there is only one candidate key, then 3NF and BCNF are the same

Normalising relations to BCNF
to normalise Enrolment into BCNF, we re-define the key
Enrolment(Student#, Module, Lecturer, Mark)

Enrolment(Student#, Lecturer, Module, Mark)
functional dependencies:

the revised relation is not in 2NF!

Normalising relations to BCNF
we just need to normalise the revised table as usual

S2(Lecturer, Module)
S1(Student#, Lecturer, Mark)
they are now in 3NF and BCNF

Student# Module Lecturer 1 Physics Hawking 80
S1 Music Mozart 70
S2 Literature Orwell 80
S3 Music Bach 65
S4 Physics Hawking 75

Student# Lecturer 1 Hawking 80
S1 Mozart 70
S2 Orwell 80
S3 Bach 65
S4 Hawking 75

Lecturer Module
Hawking Physics
Mozart Music
Orwell Literature

Normalisation
when a relation is in BCNF there are no longer any anomalies that result from functional dependencies
however, there may still ne anomalies that are caused by other dependencies
normalisation is good for relations that are subject to frequent updates, but is not always good for retrieval
e.g. Customer(Name, City, Postcode) is not in 3NF because Postcode  City, but do we want to normalise it into 3NF?
in practice, we usually normalise relations into BCNF at the logical design stage and then de-normalise them at the physical design stage where necessary

/docProps/thumbnail.jpeg

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com