Microsoft PowerPoint – 12- DatabaseDesign_SchemaRefinementv2.pptx [Autosaved]
© 2018 A. Alawini & A. Parameswaran
Database Design:
Normal Forms
Abdu Alawini
University of Illinois at Urbana-Champaign
CS411: Database Systems
October 10, 2018
1
© 2018 A. Alawini & A. Parameswaran
Announcements
• Project Track 1 – Stage 2
And
• Project Track 2 – Stage 1
are due Today!
• HW 2 is due Oct 15 (23:59)
2
© 2018 A. Alawini & A. Parameswaran
•Conceptual design: (ER & UML Models are used
for this.)
• What are the entities and relationships we need?
•Logical design:
• Transform ER design to Relational Schema
•Schema Refinement: (Normalization)
• Check relational schema for redundancies and
related anomalies.
•Physical Database Design and Tuning:
• Consider typical workloads; (sometimes) modify the
database design; select file types and indexes.
Overview of Database Design
3
© 2018 A. Alawini & A. Parameswaran
Last time!
How do we obtain a good design?
Functional Dependencies and Keys
Rules about Functional Dependencies
Splitting/Combination
Trivial Dependencies
Attribute Closure
• FD Closure
4
© 2018 A. Alawini & A. Parameswaran
Recap: Functional Dependencies
• A form of constraint (hence, part of the schema)
• Finding them is part of the database design
• Used heavily in schema refinement
• Holds for ALL instances!
Definition:
If two tuples agree on the attributes
A , A , … A
1 2 n
then they must also agree on the attributes
B , B , … B 1 2 m
Formally:
A , A , … A
1 2 n
B , B , … B 1 2 m
Where have we seen
this before?
5
© 2018 A. Alawini & A. Parameswaran
Recap: Keys are a type of FD
• Key of a relation R is a set of attributes that
• functionally determines all attributes of R
• none of its subsets determines all attributes of R
• There could be many keys of a relation
• Student (UIN, email, dept, age)
• UIN UIN, email, dept, age
• email UIN, email, dept, age
• Superkey
• “Superset” of key
• a set of attributes that contains a key
• Any examples for student?
6
© 2018 A. Alawini & A. Parameswaran
Today!
•Closure of a set of functional dependencies
•Armstrong’s axioms
•Boyce-Codd Normal Form
•Third Normal Form
7
© 2018 A. Alawini & A. Parameswaran
Closure of a set of FDs
• Given a relation schema R & a set S of FDs
• Closure of S: S+ = all FDs logically implied by S
• Allows us to answer all questions of the type
• is the FD f logically implied by S?
• Example
• R = {A,B,C,G,H,I}
• S = A B; A C; CG H; CG I; B H
• would A H be logically implied?
• yes (you can prove this, using the definition of FD)
• How to compute S+?
8
© 2018 A. Alawini & A. Parameswaran
Computing S+
•To check if a specific A B is true, we can compute A+
• We already have an algorithm for that
•To compute all A B implied by S, i.e., to compute the
closure of S, we can use a particular algorithm.
• This algorithm depends on the so-called “Armstrong’s axioms”.
• These axioms form a complete description of FD rules
9
© 2018 A. Alawini & A. Parameswaran
Armstrong’s Axioms
•Reflexivity rule
•A1A2…An a subset of A1A2…An
•Augmentation rule
•A1A2…An B1B2…Bm, then
A1A2…An C1C2..Ck B1B2…Bm C1C2…Ck
•Transitivity rule
•A1A2…An B1B2…Bm and
B1B2…Bm C1C2…Ck, then
A1A2…An C1C2…Ck
10
© 2018 A. Alawini & A. Parameswaran
Inferring S+ using Armstrong’s Axioms
• S+ = S
• Loop until S+ does not change any further
• For each f in S+, apply reflexivity & augmentation rules
• Add the new FDs to S+
• For each pair of FDs in S+, apply the transitivity rule
• Add the new FD to S+
This procedure could lead to LOTS of rules!!
• You have n attributes, with one FD valid, A B
• Exponential in n. Why?
11
© 2018 A. Alawini & A. Parameswaran
Two Simple Rules using Armstrong’s
Axioms
• Armstrong’s axioms
• Reflexivity: A subset of A
• Transitivity: A B, B C implies A C
• Augmentation: A B implies AC BC
• Simple Rules
• Trivial rule – same as reflexivity
• Combining/splitting rule be expressed as well?
• X Y and X Z, then X YZ
• X YZ then X Y
12
© 2018 A. Alawini & A. Parameswaran
Two Simple Rules in Armstrong’s Axioms
• Armstrong’s axioms
• Reflexivity: A subset of A
• Transitivity: A B, B C implies A C
• Augmentation: A B implies AC BC
• Simple Rules
• Trivial rule – same as reflexivity
• Combining rule be expressed as well?
• To show: X Y and X Z, then X YZ
• XZ YZ (aug) and X XZ (aug)
• X YZ (trans)
13
© 2018 A. Alawini & A. Parameswaran
Two Simple Rules in Armstrong’s Axioms
• Armstrong’s axioms
• Reflexivity: A subset of A
• Transitivity: A B, B C implies A C
• Augmentation: A B implies AC BC
• Simple Rules
• Trivial rule – same as reflexivity
• Splitting rule be expressed as well?
• To show: X YZ then X Y
• YZ Y (reflex); then XY (trans)
14
© 2018 A. Alawini & A. Parameswaran
Outline of Rules
• Two examples of rules
• Splitting/Combination
• Trivial Dependencies
• Attribute Closure
• Algorithm
• Uses
• FD Closure
• A complete set of rules:
• Armstrong’s axioms
• Algorithm
15
© 2018 A. Alawini & A. Parameswaran
But we were talking about schema design
•Armed with the concepts of “functional dependency” and
“superkey” (and tools to reason about them), we will now
define what a “good schema” is.
16
© 2018 A. Alawini & A. Parameswaran
17
Normal Forms
First Normal Form = all attributes are atomic
Second Normal Form (2NF) = old and obsolete
Boyce Codd Normal Form (BCNF)
Third Normal Form (3NF)
Fourth Normal Form (4NF)
Others…
© 2018 A. Alawini & A. Parameswaran 18
Boyce-Codd Normal Form
A simple condition for removing anomalies from relations:
A relation R is in BCNF if and only if:
Whenever there is a nontrivial FD: A1A2…An B,
then A1A2…An is a super-key for R.
In English (though a bit vague):
Whenever a set of attributes of R is determining another attribute,
it should determine all attributes of R.
© 2018 A. Alawini & A. Parameswaran 19
Boyce-Codd Normal Form
A simple condition for removing anomalies from relations:
A relation R is in BCNF if and only if:
Whenever there is a nontrivial FD: A1A2…An B,
then A1A2…An is a super-key for R.
Why does this make sense?
Say R(A, B, C) with AB as the key has an FD: AC.
Then C is being repeated for multiple Bs
© 2018 A. Alawini & A. Parameswaran 20
Example
Name SSN Phone Number
Fred 123-321-99 (201) 555-1234
Fred 123-321-99 (206) 572-4312
Joe 909-438-44 (908) 464-0028
Joe 909-438-44 (212) 555-4000
What are the dependencies?
SSN Name
Is the left side a superkey?
No
Is it in BCNF?
No.
© 2018 A. Alawini & A. Parameswaran 21
Decompose it into BCNF
SSN Name
123-321-99 Fred
909-438-44 Joe
SSN Phone Number
123-321-99 (201) 555-1234
123-321-99 (206) 572-4312
909-438-44 (908) 464-0028
909-438-44 (212) 555-4000
SSN Name
Now is it in BCNF?
© 2018 A. Alawini & A. Parameswaran 22
BCNF Decomposition
Find a dependency that violates the BCNF condition:
A , A , … A
1 2 n
B , B , … B
1 2 m
A’sOthers B’s
R2 R1
Heuristic : choose B , B , … B “as large as possible”1 2 m
Decompose:
Continue until
there are no
BCNF violations
left.
© 2018 A. Alawini & A. Parameswaran 23
Example Decomposition
Name SSN Age EyeColor PhoneNumber
Functional dependencies:
SSN Name, Age, Eye Color
Person:
BCNF: Person1(SSN, Name, Age, EyeColor),
Person2(SSN, PhoneNumber)
© 2018 A. Alawini & A. Parameswaran
Example
Same example, slightly more complex.
Person (Name, SSN, Age, EyeColor, Phone, Draftworthy)
• FD 1: SSN Name, Age, EyeColor
• FD 2: Age Draftworthy
24
© 2018 A. Alawini & A. Parameswaran
Example
• Person (Name, SSN, Age, EyeColor, Phone, Draftworthy)
• FD 1: SSN Name, Age, EyeColor
• FD 2: Age Draftworthy
• FD 1 and 2 imply SSN Name, Age, EyeColor, Draftworthy
• Split based on this
• (SSN, Name, Age, EyeColor, Draftworthy)
• (SSN, Phone Number)
• Split based on Age Draftworthy
• (SSN, Name, Age, EyeColor)
• (Age, Draftworthy)
• (SSN, Phone Number)
25
© 2018 A. Alawini & A. Parameswaran
Example
•Movie (title, yr, length, genre, studioName, starName)
•(Title, year, starName) is a key
•FD: Title, year length, genre, studioName
26
© 2018 A. Alawini & A. Parameswaran
Example
•Movie (title, yr, length, genre, studioName, starName)
•(Title, year, starName) is a key
•FD: Title, year length, genre, studioName
• (Title, Year, Length, Genre, StudioName)
• (Title, Year, StarName)
27
© 2018 A. Alawini & A. Parameswaran
Example
•Movie (title, yr, studioName, President, PresAddr)
•FD: Title, yr studioName
•FD: studioName President
•FD: President PresAddr
28
© 2018 A. Alawini & A. Parameswaran
Example
• Movie (title, yr, studioName, president, presAddr)
• FD: Title, yr studioName
• FD: studioName president
• FD: president presAddr
(title, yr, studioName, president)
(president, presidentaddr)
(title, yr, studioName)
(studioName, president)
(president, presidentaddr)
29
© 2018 A. Alawini & A. Parameswaran
Two-attribute relations
• Let A and B be the only two attributes of R
• Claim: R is in BCNF. (See Example 3.17.)
• If A B is true, B A is not:
• If B A is true, A B is not:
• If A B is true, B A is true:
30
© 2018 A. Alawini & A. Parameswaran
Two-attribute relations
• Let A and B be the only two attributes of R
• Claim: R is in BCNF. (See Example 3.17.)
• If A B is true, B A is not:
• A B does not violate BCNF
• If B A is true, A B is not:
• Symmetric
• If A B is true, B A is true:
• Both are keys, therefore neither violate BCNF
31
© 2018 A. Alawini & A. Parameswaran
32
BCNF Decomposition: The Algorithm
Input: relation R, set S of FDs over R
1) Check if R is in BCNF, if not:
a) pick a violation FD f: A B
b) compute A+
c) create R1 = A+, R2 = A union (R – A+)
d) compute all FDs over R1, using R and S.
Repeat similarly for R2. (See Algorithm 3.12)
e) Repeat Step 1 for R1 and R2
2) Stop when all relations are BCNF, or are
two-attributes
(Remember, two attribute relations are always in BCNF)
© 2018 A. Alawini & A. Parameswaran
33
Q: Is BCNF Decomposition unique?
• R(SSN, netid, phone).
• FD1: SSN -> netid
• FD2: netid -> SSN
• Each of these two FDs violates BCNF.
Can you tell me two different BCNF decomp for R?
• Pick FD1 and decompose, you get:
• (SSN, netid); (SSN, phone).
• Pick FD2 and you get
• (netid, SSN); (netid, phone).
© 2018 A. Alawini & A. Parameswaran
34
Properties of BCNF
• Property 1. BCNF removes certain types of redundancies
• those caused by adding many-many or one-many relationships
• For examples of redundancy that it cannot remove, see “multi-valued
redundancy”
• But this is not in curriculum
© 2018 A. Alawini & A. Parameswaran
35
Properties of BCNF
• BCNF Decomposition avoids information loss
• You can construct the original relation instance from the decomposed
relations’ instances.
• How would get R(A, B, C) from R(A, B), R(B, C)?
• Ans: Natural join
© 2018 A. Alawini & A. Parameswaran
An easy decomposition?
• We saw that two-attribute relations are in BCNF.
• Why don’t we break any R(A,B,C,D,E) into R1(A,B); R2(B,C);
R3(C,D); R4(D,E)? Why bother with finding BCNF violations etc.?
• Turns out, this leads to information loss …
36
© 2018 A. Alawini & A. Parameswaran
Example of the “easy decomposition”
• R = (A,B,C); decomposed into R1(A,B); R2(B,C)
37
A B C
1 2 3
4 2 6
A B
1 2
4 2
B C
2 3
2 6
© 2018 A. Alawini & A. Parameswaran
Example of the “easy decomposition”
• R = (A,B,C); decomposed into R1(A,B); R2(B,C)
38
A B C
1 2 3
4 2 6
A B
1 2
4 2
B C
2 3
2 6
A B C
1 2 3
4 2 6
1 2 6
4 2 3
Nat.Join
We get back some “bogus tuples” !
© 2018 A. Alawini & A. Parameswaran 39
BCNF Decomposition is Lossless
Example R(A, B, C), A -> C
BCNF: R1(A,B), R2(A,C)
Some tuple (a,b,c) in R
decomposes into (a,b) in R1
and (a,c) in R2
Recover tuples in R: (a,b,c),
© 2018 A. Alawini & A. Parameswaran 40
BCNF Decomposition is Lossless
Example R(A, B, C), A -> C
BCNF: R1(A,B), R2(A,C)
Some tuple (a,b,c) in R (a,b’,c’) also in R
decomposes into (a,b) in R1 (a,b’) also in R1
and (a,c) in R2 (a,c’) also in R2
Recover tuples in R: (a,b,c),
© 2018 A. Alawini & A. Parameswaran 41
BCNF Decomposition is Lossless
Example R(A, B, C), A -> C
BCNF: R1(A,B), R2(A,C)
Some tuple (a,b,c) in R (a,b’,c’) also in R
decomposes into (a,b) in R1 (a,b’) also in R1
and (a,c) in R2 (a,c’) also in R2
Recover tuples in R: (a,b,c), (a,b,c’), (a,b’,c), (a,b’,c’) also in R
Is any of these a “bogus tuple” (not present in R)?
© 2018 A. Alawini & A. Parameswaran 42
BCNF Decomposition is Lossless
Example R(A, B, C), A -> C
BCNF: R1(A,B), R2(A,C)
Some tuple (a,b,c) in R (a,b’,c’) also in R
decomposes into (a,b) in R1 (a,b’) also in R1
and (a,c) in R2 (a,c’) also in R2
Recover tuples in R: (a,b,c), (a,b,c’), (a,b’,c), (a,b’,c’) also in R
Is any of these a “bogus tuple” (not present in R)?
No! Also see text 3.4.1 for proof. (Essentially an extension of this arg)
© 2018 A. Alawini & A. Parameswaran
43
Lossless Decompositions
A decomposition is lossless if we can recover:
R(A,B,C)
{ R1(A,B) , R2(A,C) }
R’(A,B,C) = R(A,B,C)
R’ is in general larger than R. Why?
Must ensure R’ = R
Decompose
Recover
© 2018 A. Alawini & A. Parameswaran
✔1) minimize redundancy
✔2) avoid info loss
3) preserve dependency
4) ensure good query performance
44
Desirable Properties of Schema Refinement
© 2018 A. Alawini & A. Parameswaran
45
However,
•BCNF is not always dependency preserving
•In fact, some times we cannot find a BCNF
decomposition that is dependency preserving
•Can handle this situation using 3NF
•But what is “dependency preserving”?
© 2018 A. Alawini & A. Parameswaran
46
Normal Forms
First Normal Form = all attributes are atomic
Second Normal Form (2NF) = old and obsolete
Boyce Codd Normal Form (BCNF)
Third Normal Form (3NF)
Fourth Normal Form (4NF)
Others…
© 2018 A. Alawini & A. Parameswaran 47
3NF: A Problem with BCNF
Phone Address Name
FD’s: Phone -> Address; Address, Name -> Phone
So, there is a BCNF violation (Phone Address), and we decompose.
Phone Address
Phone Name
Phone -> Address
No FDs
© 2018 A. Alawini & A. Parameswaran 48
So where’s the problem?
Phone Address Phone Name
1234 10 Downing 1234 John
5678 10 Downing 5678 John
FD’s: Phone-> Address; Address, Name -> Phone
No problem so far. All local FD’s are satisfied.
Let’s put all the data into a single table:
Phone Address Name
1234 10 Downing John
5678 10 Downing John
Violates the dependency: Address, Name Phone