CS3402
1
CS3402 : Chapter 4 Normal Forms
Functional Dependency
Functionaldependencyisaconstraintbetweentwosetsof attributes from the database
Formaldefinition:
Let R be a relation schema, and R, R (i.e., and are
sets of R’s attributes). We say:
If in any relation instance r(R), for all pairs of tuples t1 and t2 in r, we have:
(t1[] = t2[]) (t1[] = t2[])
CS3402
2
Inference Rules for FDs
IR1 (reflective rule)
If X is a subset of Y, then X → Y
IR2 (augmentation rule)
If X Y, then XZ YZ
IR3 (transitive rule)
If X Y and Y Z, then X Z
IR4 (decomposition rule)
If X YZ, then X Y and X Z
IR5 (union rule)
If X Y and X Z, then X YZ
IR6 (pseudotransitive rule)
If X Y and WY Z, then WX Z
CS3402 3
Inference Rules for FDs
Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X
Note both X and X+ are a set of attributes
If X+ consists of all attributes of R, X is a superkey for R
From the value of X, we can determine the values the whole tuple
X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F
FromXtofindoutX+
CS3402 4
Inference Rules for FDs
Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F.
TwosetsofFDsFandGareequivalentif:
Every FD in F can be inferred from G, and Every FD in G can be inferred from F
Hence, F and G are equivalent if F+ =G+
CS3402 5
Relational Database Design
Logical/ConceptualDBDesign Schema
what relations (tables) are needed? what their attributes should be?
Whatisa“bad”DBDesign?
– Repetition of data/information
– Potential inconsistency
– Inability to represent certain information – Loss of data/information
CS3402
6
Relational Database Design
Introduction (con’d)
Normalization theory
– based on functional dependencies
* universe of relations
* 1st Normal Form (1NF) * 2NF
* 3NF
* BCNF
* 4NF
* …
1NF
2NF
3NF
BCNF
4NF
5NF
CS3402
7
Normalization
Normalization
Proposed by Codd (1972)
take a relation schema through a series of tests based on FD and primary keys to certify whether it satisfies a certain normal form
Decompose a relation into several related relation to achieve: Minimizing redundancy
Minimizing the insertion, deletion and update anomalies
CS3402 8
Normalization
Normalizationrequirestwoproperties
Non-additive or lossless join
Decomposition is reversible and no information is lost
No spurious tuples (tuples that should not exist) should be generated by doing a natural-join of any relations (Extremely important)
Preservation of the functional dependencies
Ensure each functional dependency is represented in some individual relation or could be inferred from other dependencies after decomposition.
CS3402
9
Lossless Decomposition
S#
Country
City
S3
USA
N.Y.
S5
USA
L.A.
reversible
decomposition
Natural Join
Natural Join
S#
Country
S3
USA
S5
USA
S#
Country
S3
USA
S5
USA
S#
City
S3
N.Y.
S5
L.A.
Country
City
USA
N.Y.
USA
L.A.
Country
City
S#
USA
N.Y.
S3
USA
N.Y.
S5
USA
L.A.
S3
USA
L.A.
S5
CS3402
10
First Normal Form with Primary Key
First normal form (1NF)
Disallow multi-values attributes, composite attributes and their combination
The domain of an attribute must be atomic (simple and indivisible) values
DEPARTMENT: each department can have a number of locations 1NF: DEPT_LOCATIONS(Dnumber, Dlocation)
1NF also disallows multivalued attributes that are themselves composite
CS3402 11
CS3402 12
CS3402
13
Figure 14.10 Normalizing nested relations into 1NF. (a) Schema of the EMP_PROJ relation with a nested relation attribute PROJS. (b) Sample extension of the EMP_PROJ relation showing nested relations within each tuple. (c) Decomposition of EMP_PROJ into relations EMP_PROJ1 and EMP_PROJ2 by propagating the primary key.
Problems with 1NF
Example:
FIRST(S#, Country, City, P#, Qty)
and its Functional Dependency Diagram: What’s the primary key? (S#,P#)
Qty
S# P#
Country City
CS3402
14
Problems with 1NF
Problems with 1NF Insert Anomalies
Inability to represent certain information:
Eg, cannot enter “Supplier and City” information until
Supplier supplies at least one product
Delete Anomalies
Deleting the “only tuple” for a supplier will destroy all the information about that supplier
Update Anomalies
“S# and City” could be redundantly represented for each P#, which may cause potential inconsistency when updating a tuple
CS3402
15
Solution: 1NF->2NF
Possible Solution:
Replace the original table by two sub-tables
SECOND(S#, City, Country) SP (S#, P#, Qty)
and now, their FD diagrams are:
Qty
S# P#
S# Country City
CS3402
16
Second Normal Form with Primary Key
X->Y
Full functional dependency
If removal of any attribute A from X means that the dependency
does not hold any more
E.g., {S#, P#} → Qty
Partial functional dependency
CS3402
17
If some attributes A belonging to X can be removed from X and the dependency still holds
E.g., {S#, P#} → Country as S# → Country
Second Normal Form with Primary Key
A relation R is in 2NF if every nonprime attributes A in R is fully functional dependent on the primary key of R
An attribute of R is called prime (key) attribute of R if it is a member of some candidate key of R. Otherwise it is nonprime
Non-prime (non-key) attributes: Country, City, Qty
Primary key: {S#, P#}
CS3402 18
Second Normal Form with Primary Key
The nonprime attributes Country violates 2NF because of S#- >Country
Similarly, S#->City
If a relation is not in 2NF, it can be 2NF normalized into a number of 2NF relations in which nonprime attributes are associated only with the part of the primary key on which they are fully functional dependent.
Solutions: SP (P#,S#,Qty); Second(S#, City, Country);
CS3402 19
Problem with 2NF
Still, there are problems with 2NF: Insert Anomalies
Cannot insert “City and Country” information until some Supplier in that city exist
Delete Anomalies
Deleting an “only tuple” for a particular city will destroy all
the information about that city
Update Anomalies
The supplier move to another city,inconsistency between CS3402 city and country will happen 20
Country
Solution: 2NF->3NF
Possible Solution:
Replace the SECOND table by two sub-tables
SC (S#, City)
CC (City, country)
and still keep the table SP (S#, P#, Qty). Now the FD diagrams for the new tables:
S# P#
city city Country
Qty
S#
CS3402
21
Third Normal Form with Primary Key
3NFisbasedontheconceptoftransitivedependency
AfunctionaldependencyX->YinarelationRistransitive dependency if there exists a set of attributes Z in R that is neither a candidate key nor a subset of any key of R, and both X->Z and Z- >Y hold
X->Z->Y(Z:anon-primeattributeorasetincludingnon-prime attributes )
E.g.,dependencyS#->CountryistransitivethroughCity
S#->City and City->Country, and City is neither a key itself nor a
subset of the key
S#-> City-> Country
CS3402 22
Third Normal Form with Primary Key
AccordingtoCodd’soriginaldefinition,arelationschemeRisin 3NF if it satisfies 2NF and no non-prime attribute of R is transitively dependent on the primary key
TonormalizeRinto3NF,Rshouldbedecomposedtobreakthe transitive dependency.
3NF:SC(S#,City),CC(City,Country),SP(S#,P#,Qty).
S# P#
city city Country
Qty
S#
CS3402
23
Normal Forms Defined Informally
1st normal form
All attributes are simple (no composite or multivalued attributes)
2nd normal form
All non-prime attributes depend on the whole key (no partial
dependency)
3rd normal form
All non-prime attributes only depend on the key (no transitive
dependency)
Ingeneral,3NFisdesirableandpowerfulenough
But,stilltherearecaseswhereastrongernormalformisneeded
CS3402 24
Test and Remedy for Normalization
CS3402 25
Consider all candidate keys of a relation (not just a defined primary key)
General Definitions of 2NF and 3NF
LOTSdescribesparcelsoflandforsaleinvariouscountiesofa state
Twocandidatekeys:Property_id#and{County_name,Lot#}
Property_id#istheprimarykey
LOTSviolates2NFduetoFD3asTax_rateispartiallydependent on the candidate key {County_name, Lot#}
County_name=>Tax-rate(partialdependent)
CS3402 26
General Definition of Second Normal Form
CS3402 27
CS3402 28
General Definition of Third Normal Form
Superkey=>includeakey
Primeattributes=>attributesinakey
LOTS2 is in 3NF
FD4inLOTS1violates3NFbecauseAreaisnotasuperkeyand
Price is not a prime attribute in LOTS1
LOTS1violates3NFbecausePriceistransitivelydependenton
each of the candidate keys of LOTS via the nonprime attribute Area
TonormalizeLOTS1into3NF,LOTS1B{Area,Price}
CS3402 29
Boyce-Codd Normal Form
BCNFwasproposedasasimplerformof3NF,butitwasfoundto be stricter than 3NF
Every relation in BCNF is also in 3NF
BUT Relation in 3NF is not necessarily in BCNF
Difference:
Condition which allows A to be prime is absent from BCNF
CS3402 30
Boyce-Codd Normal Form
Supposewehavethousandsoflotsintherelationbutthelotsare from only two counties: A and B
LotsizesinAareonly0.5,0.6,0.7,0.8,0.9acreswhereaslotsinB are 1.1, 1.2, …, 1.9
Then, Area -> County_name
FD5satisfies3NFinLOTS1AbecauseCounty_nameisaprime
attribute
FD5violatesBCNFinLOTS1AbecauseAreaisnotasuperkeyof LOTS1A
CreatenewrelationR(Area,County_name).Theareaoflotscanbe represented in R(Area, County_name) to reduce redundancy
CS3402 31
CS3402
32
County_name is a prime attribute
Boyce-Codd Normal Form
Some notes on BCNF:
BCNF is stronger than 3NF
BCNF is useful when a relation has: multiple candidate keys, and
these candidate keys are composite ones, and they overlap on some common attributes
BCNF reduces to 3NF if the above conditions do not apply
CS3402 33
References
6e
Ch. 14, p. 487-515 Ch. 15, p. 529-533
CS3402
34