程序代写代做代考 Functional Dependencies hadoop database Databases

Databases
Lecture 9 – Normalisation
Bernhard Reus
1
Normalisation
[C&B, Ch. 13]
In this lecture:.
• How to get rid of anomalies by Normalisation.
• What normal forms are there?
staffNo
005
007
003
004
lname
Smith
Nail
Gordo n
Miller
jobTitle
Manager
Agent
Manager
Agent
salary
36,000
25,200
NULL
20,400
branchNo
003
004
005
003
BranchAddress
163 Main Street, Glasgow
22 Deer Road, London
82 Oxford Street, London
163 Main Street, Glasgow
2
© Bernhard Reus, University of Sussex, 2004-16

Objective of Normalisation
Disallow Anomalies (from last lecture) by removing data redundancy.
How?
3
What is Normalisation?
“A technique for producing a set of relations with desirable properties, given the data
requirements of an enterprise.”
• First developed by Codd (1972) • Supports good Database Design. • Get rid of anomalies.
4
© Bernhard Reus, University of Sussex, 2004-16

First Normal Form (1NF)
• … is not yet about dependency.
• Each row must only contain one value for each column.
• i.e. no multi-valued attributes in columns.
• Relations obtained by our translation from
conceptual to logical design (Lec. 7) are
automatically in 1NF.
(multi-valued attributes stored in separate table)
5
Second Normal Form (2NF)
“A relation in 1NF such that every non- primary-key attribute is fully functionally
dependent on the primary key, is in 2NF.”
Slogan:
Each attribute individually functionally depends on the whole primary key.
6
© Bernhard Reus, University of Sussex, 2004-16

Example
clientNo
propNo
startDate
propAddress
ownerNo
primary key
attributes depend on
a part of the primary key
{clientNo,propNo} ® startDate only propNo ® {propAddress,ownerNo,ownerName}
ownerNo ® ownerName
ownerName
7
clientNo
propNo
startDate
propAddress
ownerNo
ownerName
clientNo
propNo
startDate
propNo
propAddress
ownerNo
ownerName
2NF Normalisation (Example)
• Decompose the relational schema along the functional dependency that is only full on a partial key (with this partial key as primary key of the new schema).

© Bernhard Reus, University of Sussex, 2004-16
8

Observation
• If the primary key of a schema in 1NF consists of one attribute only, the schema is automatically in 2NF.
9
Transitive Dependency
“If A®B and B®C, then C is transitively dependent on A via B (provided that A is
not functionally dependent on B or C)” • Example
propNo ® ownerNo ® ownerName
propNo
propAddress
ownerNo
ownerName
10
© Bernhard Reus, University of Sussex, 2004-16

Third Normal Form (3NF)
“A relation that is in 2NF and in which no non-primary-key attribute is transitively
dependent on the primary key is in 3NF.”
Slogan:
Each attribute functionally depends on the primary key only (and not on other attributes).
11
clientNo
propNo
startDate
propNo
propAddress
ownerNo
ownerName
Example (cont’d)

{clientNo,propNo} ® startDate
propNo ® {propAddress,ownerNo,ownerName} ownerNo ® ownerName
Transitive functional dependency:
propNo ® ownerNo ® ownerName ownerName depends on non-primary-key attribute.
© Bernhard Reus, University of Sussex, 2004-16
12

3NF Normalisation (Example)

• Decompose the relational schema along the transitive dependency, introducing the “middle” attribute(s) as primary
clientNo
propNo
startDate
propNo
propAddress
ownerNo
propNo
propAddress
ownerNo
ownerNo
ownerName
key of the new schema.
13
ownerName
3NF Result
clientNo
propNo
startDate
propAddress
ownerNo
ownerName
Rents
Owner
Property
14
clientNo
propNo
startDate
propNo
propAddress
ownerNo
ownerNo
ownerName
© Bernhard Reus, University of Sussex, 2004-16

Summary (popular crib)
• “Every field in a record must depend on – the Key (1NF),
– the Whole Key (2NF),
– and Nothing But The Key (3NF).”
15
What about candidate keys?
• We have only considered functional dependencies on the primary key and ignored alternative candidate keys.
• Should there be alternative candidate keys one must define the Normal Forms slightly more generally to detect all redundancies.
• See [C&B. Chapter 13.9, Ch 14]
• Sketched on the following slides:
16
© Bernhard Reus, University of Sussex, 2004-16

Redefine
• Primary key attribute replaced by
– prime attribute: an attribute which is part of any candidate key.
not “all”
n Functional dependency on primary key
replaced by
n functional dependency on all candidate keys (not just primary).
17
General Definition of 2NF/3NF
non-prime
“A relation in 1NF such that every non-primary key attribute is fully functionally dependent on
the primary key, is in 2NF.” all candidate keys
“A relation that is in 2NF and for which no non-primary key attribute is transitively
dependent on the primary key, is in 3NF.”
non-prime
some candidate key
20
© Bernhard Reus, University of Sussex, 2004-16

Example
ClientNo
PropNo
StartDate
Fee
candidate key {ClientNo,PropNo} {ClientNo,StartDate} {PropNo,StartDate}
primary key
candidate key
Fee does depend fully functionally on primary key {ClientNo,PropNo} but not on candidate key {ClientNo,StartDate}
Non prime: Fee
22
Reformulation of 3NF
“A relation that is in 2NF and for which no non-prime attribute is transitively dependent on
some candidate key, is in 3NF.”
“A relation is in 3NF provided that, for all functional dependencies X ® A of a non-prime attribute A, it is true that if X ® A is full then X
is a key.”
23
© Bernhard Reus, University of Sussex, 2004-16

Normal Form and Design
• Every good design will deliver schemas in 3NF.
• 3NF eliminates partial or transitive dependencies in order to reduce redundancy (and avoid anomalies).
• In case we have several, overlapping, candidate keys there may still be some redundancy:
24
Boyce-Codd Normal Form (BCNF)
“A relation is in BCNF, if and only if, every determinant of a full
dependency is a candidate key.” left side of dependency
• BCNF is a stronger form of 3NF (if there are composite candidate keys that are overlapping)
• Canwehave:X®AwhereAisa prime attribute and X is not a key?
Ted Codd
25
Yes in 3NF
No in BCNF
© Bernhard Reus, University of Sussex, 2004-16

More Normalisation
• Fourth and Fifth Normal Forms apply to
situations involving M:M relationships (expressed through cross-reference tables).
Need additional dependency types.
Multi-valued (MVD) dependency; Lossless-join dependency
• In practice, many databases are de- normalised to a certain extent. This has to do
with performance: a de-normalised database may require fewer joins and can be faster for retrievals (but inserts are slower for a de-normalised table due to extra checks)
26
Normalisation Pro and Con
OLTP
• For online transactional databases (where data is inserted and updated continuously) one usually normalises to at least 3rd Normal Form to avoid anomalies (redundancy).
OLAP
27
• For online analytical processing (eg data warehousing) where one works with “fixed” data it is advantageous to use de-normalised databases to avoid costly join operations and allow for declaration of good indexes.
© Bernhard Reus, University of Sussex, 2004-16

From de-normalised to noSQL
• The NoSQL movement started in 2009.
• NoSQL now translated as “not only SQL”.
• They are a response to the need (Web 2.0) for horizontally scalable (also often distributed, schema-free, open-source) databases with large amount of indexed data (videos etc).
• Very different technology to relational databases, less “support”, more rudimentary “key-value” storage solution.
28
NoSQL (cont’d)
• NoSQL databases are optimised for heavy
read/write load (support REST api)
• Types: key-value stores, column stores, document databases, graph databases
REpresentational State Transfer
• Schema-free so not appropriate if you have many (small) concurrent transactions with insert/update/delete (see later in course).
• >200 different products: www.nosql-database.org
• MostpopularMongoDB(from“humongous”),a
document based NoSQL database that stores BSON.
Binary JSON
29
© Bernhard Reus, University of Sussex, 2004-16

NoSQL and The Cloud
• Amazon EC2 uses its own e DynamoDB (SimpleDB outdated)
• Google uses its own BigTable
• Facebook uses its own Apollo but also RockDB,
Habse/Hadoop
• All still use relational databases (eg. MySQL) for certain data…
More on NoSQL later, if we have time. First we need to learn SQLJ
30
© Bernhard Reus, University of Sussex, 2004-16