Relational DB Design Theory
CSC 343
Winter 2021
MICHAEL LIUT (MICHAEL.LIUT@UTORONTO.CA)
ILIR DEMA (ILIR. DEMA@UTORONTO.CA)
DEPARTMENT OF MATHEMATICAL AND COMPUTATIONAL SCIENCES UNIVERSITY OF TORONTO MISSISSAUGA
Introduction
• There are always many different schemas for a given set of data. e.g. you could combine or divide tables.
• How do you pick a schema? Which is better? What does “better” mean?
• Fortunately, there are some principles to guide us.
2
Schemas and Constraints
• Consider the following sets of schemas: Students(utorid, name, email)
vs.
Students(utorid, name) Emails(utorid, address)
• Consider also:
House(street, city, value, owner, propertyTax)
vs. House(street, city, value, owner)
TaxRates(city, value, propertyTax)
Constraints are domain-dependent
Avoid Redundancy
This table has redundant data, and that can lead to anomalies.
name
addr
beersLiked
manf
favBeer
Janeway
Voyager
Bud
A.B.
WickedAle
Janeway
Voyager
WickedAle
Pete’s
WickedAle
Spock
Enterprise
Bud
A.B.
Bud
Update anomaly: if Janeway is transferred to Intrepid, will we remember to change each of her tuples?
Deletion anomaly: if nobody likes Bud, we lose track of the fact that Anheuser-Busch manufactures Bud.
4
Database Design Theory
• Allows us to improve a schema systematically.
• The general idea is to:
1. express constraints on data; and
2. use these to decompose the relations.
• Ultimately, get a schema that is in normal form.
• “Normal” meaning conforming to a standard.
• “Normal Form” referring to guaranteeing ‘good’ properties; such as no anomalies.
• The process of converting a schema to a normal form is called normalization.
5
Part I
Functional Dependency Theory
6
Functional Dependencies
• Let’s say “XàY holds in R”, this means that “X functionally determines Y”.
• Conventions:
• …, X, Y, Z represent sets of attributes; A, B, C,… represent single attributes.
• No braces used for sets of attributes, just ABC, rather than {A,B,C}.
• Why functional dependency?
• “functional” because there is a mathematical function that takes a value for X and
gives a unique value for Y.
• “dependency” because the value of Y depends on the value of X.
7
Functional Dependencies (FDs)
• Need a special type of constraint to help us with normalization.
XàY is an assertion about relation R that whenever two tuples of R agree on all the
attributes in set X, they must also agree on all attributes in set Y. e.g. Let’s say that X = {AB} and Y = {C} R
A
B
C
x1
y1
c2
x1
y1
c2
x2
y2
c3
8
Properties about FDs
1. Rules
◦ Trivial FDs
◦ Splitting/Combining
◦ Armstrong’s Axioms
2. Algorithms related to FDs
◦ Closure (of a set of attributes of a relation)
◦ Minimal Basis (of a relation)
9
Rule: Trivial FDs
• Not all functional dependencies are useful. • AàA always holds
• ABCàA also always holds.
• A functional dependency with an attribute on both sides. • ABCàAD becomes ABCàD
OR
• Delete the trivial FDs:
ABCàA and ABCàD becomes just ABCàD
The right side is a subset of the left side.
This is called “singleton form”.
10
Rule: Splitting/Combining
XàA1A2…An holds for R exactly when each of XàA1, XàA2, …, XàAn hold for R. e.g. AàBC is equivalent to AàB and AàC
e.g. AàF and AàG is equivalent to AàFG • There is no splitting rule for the left side.
e.g. ABCàDEF is NOT equivalent to ABàDEF and CàDEF • Usually, FDs are expressed with singleton right sides.
11
Example: FDs
Drinkers(name, addr, beersLiked, manf, favBeer)
Reasonable FDs to assert:
nameàaddr, favBeer
Note this FD is the same as: nameàaddr and nameàfavBeer beersLiked à manf
12
Example: FDs
name à addr
beersLiked à manf name à favBeer
name
addr
beersLiked
manf
favBeer
Janeway
Voyager
Bud
A.B.
WickedAle
Janeway
Voyager
WickedAle
Pete’s
WickedAle
Spock
Enterprise
Bud
A.B.
Bud
13
Rule: Armstrong’s Axioms
X, Y, Z are sets of attributes
1. Reflexivity: if X Ê Y, then X à Y.
2. Augmentation: if X à Y, then XZ à YZ for any Z.
3. Transitivity: if XàY and YàZ , then XàZ.
4. Union: if XàY and XàZ, then XàYZ.
5. Decomposition: if XàYZ, then XàY and XàZ.
14
Transitive Property
The transitive property holds for FDs
– Consider the FDs: AàB and BàC; then AàC holds.
– Consider the FDs: ADàB and BàCD; then ADàCD holds or just ADàC (because of trivial FDs).
15
How do you identify FDs?
• FDs are based on domain knowledge
o Intrinsic features of the data that is specific to your use case. o Something you know (or assume) about the data.
• Database engines cannot identify FDs for you o Designer must specify them as part of the schema.
o DBMS can only enforce FDs when told to do so.
• DBMSs cannot “optimize” FDs
o It has only a finite sample of the data. o A FD constrains the entire domain.
16
FDs are a Generalization of Keys
• Superkey: XàR
o A superkey must include all of the attributes of the relation on the right-hand side
(RHS).
•A Functional Dependency: XàY
• A FD can involve just a subset of the key.
e.g. House (street, city, value, owner, tax) street, city à value, owner, tax city, value à tax (FD only)
(both FD and key)
17
Inferring FDs
• Given a set of FDs, it is often possible to infer further FDs. • Let’s assume that we have the following FDs:
X1àA1, X2àA2,
…, XnàAn.
• Does the FD YàB also hold in any relation that satisfies the given FDs? o To prove this, you must assume that two tuples agree on all attributes of Y
18
Example: Inferring FDs
For Example:
if AàB and BàC holds, then surely AàC holds, even if we do not explicitly say so.
AàC is entailed (implied) by {AàB, BàC}
19
Example: Inferring FDs in General
ClientID
Income
OtherProd
Rate
Country
City
State
225
High
A
2.1%
USA
San Francisco
MD
420
High
A
2.1%
USA
San Francisco
CA
333
High
B
3.0%
USA
San Francisco
CA
576
High
B
3.0%
USA
San Francisco
CA
128
Low
C
4.5%
UK
Reading
Berkshire
193
Low
C
4.5%
UK
London
London
550
Low
B
3.5%
UK
London
London
F1: [Income, OtherProd]à[Rate] F2: [Country, City]à[State] How to prove it in the general case?
20
Algorithm: Closure Test
• Closure Test for Functional Dependencies • Given attribute set Y and FD set F:
Denote YF+ or Y+ the closure of Y relative to F YF+ is the set of all FDs given or implied by Y
• Computing the closure of Y 1. Startwith:YF+ =Y, F’=F
2. While there exists an f ∈ F’ s.t. LHS(f) ⊆ YF+ : § YF+ = YF+ ∪ RHS(f)
§ F’ = F’ – f
3. Stop when: YàB for all B YF+
21
Algorithm: Closure
• Computing the closure Y+ given attribute set Y and FD set F:
XA
Y+
new Y+
22
Example: Closure
23
Example: Closure
• Given:
F: ABà C X
XF+
{A, D, E}
{A, B, C, D, E}
{A, C, B, D, E} {B}
{D, E}
AàD
A Dà E AB ACà B AC
• Question:
1. Is ABàE entailed by F? YES
B D
2. Is DàC entailed by F? NO
• Result: XF+ allows us to determine all FDs of the form XàY entailed by F.
24
A Geometric View of FDs
• Imagine the set of all instances of a particular relation.
• That is, all finite sets of tuples have the proper number of components. • Each instance is a point in space.
e.g. R(A,B):
{}
{(1,2),(3,4)}
{(1,2),(3,4), (1,3)}
{(5,1)}
25
A FD is a Subset of Instances
• For each XàA there is a subset of all instances that satisfy the FD. • Therefore, FDs can be represented by a region in the space.
e.g. R(A,B), where AàB:
{(1,2),(3,4)}
{}
AàB
{(1,2),(3,4), (1,3)}
{(5,1)}
26
Representing Sets of FDs
• If each FD is a set of relation instances, then a collection of FDs correspond to the intersection of those sets.
o intersection: all instances that satisfy all of the FDs. BàC
AàB
CDàA
Instances satisfying
AàB, BàC, and CDàA
27
Implication of FDs
• If a FD YàB follows from the FDs X1àA1, …, XnàAn, then the region in space of instances for YàB must include the intersection of the regions for the FDs XiàAi.
o Every instance satisfying all the FDs XiàAi surely satisfies YàB.
o However, an instance could satisfy YàB, yet not be in this intersection.
•ForasetofFDsF,F+ istheset of all FDs implied by F.
AàB AàC BàC
28
Part II
Schema Decomposition
29
Relational Schema Design
• Goal is to avoid redundancy and the anomalies it enables.
o Update anomaly: one occurrence of a fact is changed, but not all
occurrences have been updated.
o Deletion anomaly: valid fact is lost when a tuple is deleted.
Recall the FDs: nameàaddr,favBeer and beersLikedàmanf.
Drinkers(name, addr, beersLiked, manf, favBeer)
name
addr
beersLiked
manf
favBeer
Janeway
Voyager
Bud
A.B.
WickedAle
Janeway
???
WickedAle
Pete’s
???
Spock
Enterprise
Bud
???
Bud
Data is redundant, because the ??? can be figured out by using the FDs.
30
Goal of Decomposition
• Eliminate redundancy by decomposing a relation into several relations. • Check that a decomposition does not lead to “bad design”.
vYou want to identify a “good” method to split relations. üSplitting a relation into 2+ smaller relations, limiting redundancy. üSplitting F into subsets which apply to the new relations.
üCompute the projection of functional dependencies to check!
31
Schema Decomposition
An issue with decomposition is information loss – we may not be able to reconstruct the corresponding instance of the original relation.
32
Good Properties of Decomposition
1.
2. 3.
Lossless Join Decomposition
◦ When we join decomposed relations we should get exactly what we started with.
Avoid Anomalies
◦ Avoid redundant data.
Dependency Preservation
◦ (𝐹!∪…∪𝐹”)#=𝐹#
33
Example: Splitting Relations
Student Name
Student Email
Course
Instructor
Alice
alice@utoronto.ca
CSC 343
Liut
Alice
alice@utoronto.ca
CSC 209
Petersen
Bob
bob@utoronto.ca
CSC 148
Zingaro
Laura
laura@utoronto.ca
CSC 343
Dema
Students (email, name)
Courses (code, instructor)
Taking (email, courseCode)
Students Taking Courses has additional tuples!
– (Alice, alice@utoronto.ca, CSC 343, Jones), but Alice is not in Dema’s section of CSC 343
– (Laura, laura@utoronto.ca, CSC 343, Liut), but Laura is not in Liut’s section of CSC 343
Why did this happen? How to prevent it?
34
Information Loss with Decomposition
• Decompose R into S and T
oConsidertheFD AàB,withAonlyinSandBonlyinT.
• FD Loss
o Attributes A and B are no longer in the same relation; you must join T and S to
enforce AàB (which is expensive!).
• Join Loss
o Neither (S ∩ T)àS nor (S ∩ T)àT in F+
§ Joining T and S will produce extraneous tuples.
35
Property: Lossless Join Decomposition
• Often confused with “Lossy Decomposition”.
• Lost of information is unavoidable when retrieving the initial relation.
• A decomposition should not lose information!
• A decomposition (R1,…,Rn) of a schema, R, is lossless if every valid
instance, r, of R can be reconstructed from its components: r=r1⋈…⋈rn whereri=PRi(r)
36
Example: Lossless Decomposition
◦ Loses the fact that (12, Jen) lives at 2 Pine (not 3 Oak) ◦ Loses the fact that (13, Jen) lives at 3 Oak
Remember: lossy decompositions yield more tuples than they should when relations are joined together.
37
Testing for Losslessness
• A (binary) decomposition of R = (R, F) into R1= (R1, F1) and R2 = (R2, F2) is lossless if and only if:
◦ either the FD (R1 ∩ R2)àR1 is in F+; OR
◦ theFD(R1 ∩R2)àR2 isin F+.
◦ all attributes common to both R1 and R2 functionally determine ALL the attributes in R1; OR
◦ all attributes common to both R1 and R2 functionally determine ALL the attributes in R2.
38
Example: Decomposition Property
In our example
◦ NameàID,Name
◦ NameàName,Addr
A lossless decomposition ◦ [ID,Name] and [ID,Addr]
Example 2:
◦ CategoryàModelName,Category
◦ CategoryàPrice,Category
◦ Better to use [MN, Category] and [MN, Price]
In other words, if R1 ∩ R2 forms a superkey of either R1 or R2, the decomposition of R is a lossless decomposition
39
Example: Lossless Decomposition
• A decomposition is lossless if we can recover:
40
Example: Lossless Decomposition
• Given:
• Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number,
amount)
• FDs: branch-nameàbranch-city, assets and loan-numberàamount, branch-name
• Decompose Lending-schema into two schemas:
• Branch-schema = (branch-name, branch-city, assets)
• Loan-info-schema = (branch-name, customer-name, loan-number, amount)
Show that the decomposition is a Lossless Decomposition
41
Example: Lossless Decomposition
• Decompose Lending-schema into two schemas:
• Branch-schema = (branch-name, branch-city, assets)
• Loan-info-schema = (branch-name, customer-name, loan-number, amount)
Since Branch-schema ∩ Loan-info-schema = {branch-name} We are given: branch-nameàbranch-city, assets
Therefore, this is a lossless decomposition.
42
Projecting FDs
• Once we have split a relation, we have to re-factor our FDs to match. o Each FD must only mention attributes from one relation.
• Similar to geometric projection.
o Many possible projections (depends on how we slice it). o Keep only the ones we need (minimal basis).
43
Part III Normal Forms
44
Motivation for Normal Forms
• To assist us in identifying a “good” schema
• Everyone may have a slightly different definition of “good”, but there are some guidelines that assist us in achieving what we all want (avoiding anomalies, reducing/eliminating redundant information, etc…).
• Many Normal Forms: • 1st
• 2nd
• 3rd
• Boyce-Codd
• … and many others which we won’t discuss …
BCNF 3NF 2NF 1NF
45
1st Normal Form (1NF)
• No multi-valued attributes allowed.
o Imagine storing a list of values in an attribute.
• Counterexample
o Course(name, instructor, [student,email]*)
Name
Instructor
Student Name
Student Email
CSC 343
Liut
Alice
alice@utoronto.ca
CSC 343
Liut
Alice
alice@utoronto.ca
CSC 343
Liut
Bob
bob@utoronto.ca
CSC 148
Simion
Laura
laura@utoronto.ca
46
2nd Normal Form (2NF)
• Non-prime attributes depend on candidate keys.
o Consider non-prime attributes A
o Then there exists a FD X s.t. XàA, and X is a candidate key.
• Counterexample
o Movies(title, year, star, studio, studioAddress, salary)
o FDs: title, yearàstudio and studioàstudioAddress and staràsalary
Title
Star Wars
Star Wars
Star Wars
Patriot Games
Last Crusade
Year
1977
1977
1977
1992
1989
Star
Hamill
Ford
Fisher
Ford
Ford
Studio
LucasFilm
LucasFilm
LucasFilm
Paramount
LucasFilm
StudioAddr
1 Lucas Way
1 Lucas Way
1 Lucas Way
Cloud 9
1 Lucas Way
Salary
$100,000
$100,000
$100,000
$2,000,000
$1,000,000
47
3rd Normal Form (3NF)
• Non-prime attributes depend only on candidate keys. o Consider FD XàA
o Either X is a superkey OR A is prime (part of a key)
• Counterexample
o studio à studioAddr (studioAddr depends on studio which is not a candidate key)
Title
Year
Studio
StudioAddr
Star Wars
1977
LucasFilm
1 Lucas Way
Patriot Games
1992
Paramount
Cloud 9
Last Crusade
1989
LucasFilm
1 Lucas Way
48
3NF, Dependencies, and Join Loss
Theorem: always possible to convert a schema to become lossless join and dependency-preserving.
Caveat: always possible to create schemas in 3NF for which these properties do not hold.
• FD Loss Example 1:
– MovieInfo(title, year, studioName)
– StudioAddress(title, year, studioAddress)
Cannot enforce studioNameà studioAddress
• Join Loss Example 2:
– Movies(title, year, star)
– StarSalary(star, salary)
Movies StarSalary yields additional tuples
49
Boyce-Codd Normal Form (BCNF)
• One additional restriction over 3NF
o All non-trivial FDs have superkey LHS.
• Counterexample
– CanadianAddress(street, city, province, postalCode)
– Candidate keys: {street, postalCode}, {street, city, province}
– FD: postalCode à city, province
– Satisfies 3NF: city, province both prime
– Violates BCNF: postalCode is not a superkey Possible anomalies involving postalCode
50
Boyce-Codd Normal Form (BCNF)
• A relation R is in BCNF if, whenever, XàA is a non-trivial FD that holds in R, X is a superkey.
o Recall: non-trivial means that A is not contained in X.
• A relation not in BCNF: Drinkers(name, addr, beersLiked, manf, favBeer) with the FDs name à addr, favBeer and beersLiked à manf.
o The only key is: {name, beersLiked}
o In each FD, the left side is not a superkey.
oAny one of these FDs shows Drinkers is not in BCNF.
51
Example: BCNF
Beers(name, manf, manfAddr) FDs: nameàmanf and manfàmanfAddr qBeers w.r.t. nameàmanf does not violate BCNF, but manfàmanfAddr does.
vBCNF required that the only FDs that hold are the result of key(s).
52
Decomposition into BCNF
• Given: relation R with FDs F
• Look among the given FDs for a BCNF violation XàY (i.e., X is not a superkey)
• Compute X +.
• Find X + ≠ X ≠ all attributes, (otherwise X is a superkey)
• Replace R by relations with: § R1=X+.
§ R2=R–(X+–X)=R-X+ÈX
• Continue to recursively decompose the two new relations
• Project given FDs F onto the two new relations.
53
Decomposition on X à Y LHS
R1
R-X +
X X +-X
R2
R
54
What do we want from a decomposition?
ü Lossless Join : it should be possible to project the original relations onto the decomposed schema, and then reconstruct the original,
i.e. retrieve the original tuples
ü Dependency Preservation : All the original FDs should be satisfied.
ü No anomalies
55
BCNF Decomposition
• What do we get from a BCNF decomposition?
§ Lossless Join : ✓
§ No anomalies : ✓
§ DependencyPreservation:✗
56
3NF Decomposition
• What do we get from a 3NF decomposition?
§ Lossless Join : ✓
§ No anomalies : ✗
§ DependencyPreservation:✓
Unfortunately, neither BCNF nor 3NF can guarantee all three properties we want.
57
Limits of Decomposition
• Pick two…
o Lossless Join
o Dependency Preserving o Anomaly-Free
• 3NF
o Provides lossless join and dependency preservation. o May allow anomalies.
• BCNF
o Anomaly-free and lossless join.
o Sacrifice dependency preservation.
Use domain knowledge to choose 3NF vs. BCNF
58
Questions?
Q
THANKS FOR LISTENING
I’LL BE ANSWERING QUESTIONS NOW
&
A
59
Citations, Images and Resources
Database Management Systems (3rd Ed.), Ramakrishnan & Gehrke
Some content is based off the slides of Dr. Fei Chiang – http://www.cas.mcmaster.ca/~fchiang/
60
Supporting Content
61
Class Exercise!
• You will be given 5 minutes to prove if ABàF holds in relation R(A, B, C, D, E, F), given the FDs:
ABàC,
BCàAD,
DàE,
CFàB.
Algorithm:
• Hint: you are computing the closure!
1. Startwith:YF+ =Y, F’=F
2. While there exists an f ∈ F’ s.t. LHS(f) ⊆ YF+ :
§ YF+ =YF+∪RHS(f)
§ F’ = F’–f
3. Stop when: YàB for all B YF+
62
Solution: Class Exercise!
• Iterations:
– X = {A,B}
– X = {A,B,C}
– X = {A,B,C,D}
– X = {A,B,C,D,E}
Use: AB®C
Use: BC®AD
Use: D®E
No more changes to X are possible
• The FD: CF®B cannot be used because its left side is never contained in X.
63
Algorithm: Minimal Basis
• A.k.a.: Minimal Cover. Slightly different from Canonical Cover. • A canonical cover allows more than one attribute on the RHS.
• Minimal Bases is the opposite of closure.
• Given a set of FDs F, find the minimal F’ s.t. F’ ⊆ F and F’ entails f for all f ∈ F’
• Properties of a Minimal Basis F’ are:
o RHS is always singleton
o If any FD is removed from F’, F’ is no longer a minimal basis.
o If for any FD in F’ we remove one or more attributes from the LHS of f ∈ F’, the result is no longer a minimal basis.
64
Algorithm: Minimal Basis
• Minimal Basis for functional dependencies:
o Right sides are singleton.
o No FD can be removed.
o No attribute can be removed from a left side.
• Constructing a Minimal Cover:
o Decompose the RHS to single attributes.
o Repeatedly try to remove an attribute from a LHS and see if the removed attribute can be derived from the remaining FDs.
o Repeatedly try to remove a FD and see if the remaining FDs are equivalent to the original set (i.e. does the closure of the LHS attributes, with removed FD, include the RHS attribute?).
65
Algorithm: Minimal Basis
• Formally, it is straightforward but time-consuming;
1. Split all RHS into singletons
2. For all f in F’, test whether J = (F’– f)+ is still equivalent to F+, as it might make F’ too small.
3. Foralli∈LHS(f),forallf∈F’,letLHS(f’)=LHS(f)–i
** test whether (F’ – f + f’)+ is still equivalent to F+ **
4. Repeat (2) and (3) until neither makes progress.
66
Example: Minimal Basis
Given a relation R(A, B, C, D) and a defined set of FDs F = {AàAC, BàABC, DàABC}, find the minimal basis M of F.
1st Step: H = {AàA, AàC, BàA, BàB, BàC, DàA, D->B, DàC}
2nd Step:
– A->A, B->B: can be removed as trivial
– A->C: can’t be removed, as there is no other LHS with A –
– B->A: can’t be removed, because for J=H-{BàA} is B+=BC –
– B->C: can be removed, because for J=H-{BàC} is B+=ABC –
D->A: can be removed, because for J=H-{DàA} is D+=DBA D->B: can’t be removed, because for J=H-{DàB} is D+=DC D->C: can be removed, because for J=H-{DàC} is D+=DBAC
Step 2 outcome: H = {AàC, BàA, DàB}
67
Example: Minimal Basis
3rd Step
– H does not change as all LHS in H are single attributes.
4th Step
– H does not change.
Minimal Basis: M = H = {AàC, BàA, DàB}
Finding Minimal Basis can be complicated! You will work through examples in tutorial!
68
Algorithm: Projecting FDs
Projecting FDs
Given: a relation 𝑅, with a set of FDs 𝐹 that hold in 𝑅, and a relation 𝑅’ ⊂ 𝑅. Determine: the set of all FDs 𝐹’ that they follow from 𝐹 and involve only attributes of 𝑅’.
FD Projection Algorithm
1. Start with 𝐹’ = Ø
2. For each subset X of 𝑅’:
a. Compute 𝑋!.
b. For each attribute A in 𝑋!: if A is in 𝑅”, then add XàA to 𝐹”.
3. Compute the minimal basis of 𝐹’.
69
Improving Projection’s Efficiency
• Ignore trivial dependencies.
o There is no need to add XàA if A is in X itself.
• Ignore trivial subsets.
o The empty set or the set of all attributes (both are subsets of X).
• Ignore supersets of 𝑋 if 𝑋! = 𝑅.
o They can only give us “weaker” FDs (with more on the LHS).
Even with these tricks, projection is still expensive!
70
Example: Projecting FDs
• ABC with FDs AàB and BàC
– A+=ABC;yieldsAàB,AàC
• We ignore AàA as it is trivial.
• We ignore the supersets of A, AB + and AC +, because they can only
give us “weaker” FDs (with more on the LHS). – B+ = BC; yields BàC
– C + = C; yields nothing.
– BC+=BC;yieldsnothing.
• Resulting FDs: AàB, AàC, and BàC
• Projection onto AC : AàC
o Only FD that involves a subset of {A,C}
• Projection on BC: BàC
o Only FD that involves subset of {B,C}
71
Example: BCNF
Failure to Preserve Dependencies
• Suppose we start with R(A,B,C) and FDs ABàC and CàB
• There are two keys: {A,B} and {A,C}.
• CàB is a BCNF violation, so we must decompose it into AC, BC.
The problem is that if we use AC and BC as our database schema, we cannot enforce the FD ABàC in these decomposed relations.
72
3NF avoids this problem
• 3rd Normal Form (3NF) modifies the BCNF condition so we do not have to decompose in this problem situation.
• An attribute is prime if it is a member of any key.
• XàA violates 3NF if and only if X is not a superkey, and also A is not prime. i.e. it’s ok if X is not a superkey, as long as A is prime.
73
Example: 3NF
Preserves Dependencies
• In our problem situation with FDs ABàC and CàB, we have keys AB and AC.
• Thus, A, B, and C are each prime.
• Although CàB violates BCNF, it does not violate 3NF.
74
Algorithm: 3NF Synthesis
• We can always construct a decomposition into 3NF relations with a lossless join and dependency preservation.
• Need minimal basis for the FDs (same as used in projection) oRight sides are single attributes.
oNo FD can be removed.
oNo attribute can be removed from a left side.
• One relation for each FD in the minimal basis. oSchema is the union of the left and right sides.
• If no key is contained in an FD, then add one relation whose schema is some key.
75
Example: 3NF Synthesis
• Relation R(ABCD) with FDs A à B and A à C.
• Decomposition: AB and AC from the FDs, with AD for a key.
76