CS计算机代考程序代写 database scheme Functional Dependencies CS3402

CS3402
1
CS3402 : Chapter 4 Normal Forms

Functional Dependency
 Functionaldependencyisaconstraintbetweentwosetsof attributes from the database
 Formaldefinition:
 Let R be a relation schema, and   R,   R (i.e.,  and  are
sets of R’s attributes). We say: 
 If in any relation instance r(R), for all pairs of tuples t1 and t2 in r, we have:
(t1[] = t2[])  (t1[] = t2[])
CS3402
2

Inference Rules for FDs
IR1 (reflective rule)
If X is a subset of Y, then X → Y
IR2 (augmentation rule)
If X  Y, then XZ  YZ
IR3 (transitive rule)
If X  Y and Y  Z, then X  Z
IR4 (decomposition rule)
If X YZ, then X  Y and X  Z
IR5 (union rule)
If X  Y and X  Z, then X  YZ
IR6 (pseudotransitive rule)
If X  Y and WY  Z, then WX  Z
CS3402 3

Inference Rules for FDs
 Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X
 Note both X and X+ are a set of attributes
 If X+ consists of all attributes of R, X is a superkey for R
 From the value of X, we can determine the values the whole tuple
 X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F
 FromXtofindoutX+
CS3402 4

Inference Rules for FDs
 Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F.
 TwosetsofFDsFandGareequivalentif:
 Every FD in F can be inferred from G, and  Every FD in G can be inferred from F
 Hence, F and G are equivalent if F+ =G+
CS3402 5

Relational Database Design
 Logical/ConceptualDBDesign  Schema
what relations (tables) are needed? what their attributes should be?
 Whatisa“bad”DBDesign?
– Repetition of data/information
– Potential inconsistency
– Inability to represent certain information – Loss of data/information
CS3402
6

Relational Database Design
 Introduction (con’d)
 Normalization theory
– based on functional dependencies
* universe of relations
* 1st Normal Form (1NF) * 2NF
* 3NF
* BCNF
* 4NF
* …
1NF
2NF
3NF
BCNF
4NF
5NF
CS3402
7

Normalization
 Normalization
 Proposed by Codd (1972)
take a relation schema through a series of tests based on FD and primary keys to certify whether it satisfies a certain normal form
 Decompose a relation into several related relation to achieve:  Minimizing redundancy
 Minimizing the insertion, deletion and update anomalies
CS3402 8

Normalization
 Normalizationrequirestwoproperties
 Non-additive or lossless join
 Decomposition is reversible and no information is lost
 No spurious tuples (tuples that should not exist) should be generated by doing a natural-join of any relations (Extremely important)
 Preservation of the functional dependencies
 Ensure each functional dependency is represented in some individual relation or could be inferred from other dependencies after decomposition.
CS3402
9

Lossless Decomposition
S#
Country
City
S3
USA
N.Y.
S5
USA
L.A.
reversible
decomposition
Natural Join
Natural Join
S#
Country
S3
USA
S5
USA
S#
Country
S3
USA
S5
USA
S#
City
S3
N.Y.
S5
L.A.
Country
City
USA
N.Y.
USA
L.A.
Country
City
S#
USA
N.Y.
S3
USA
N.Y.
S5
USA
L.A.
S3
USA
L.A.
S5
CS3402
10

First Normal Form with Primary Key

First normal form (1NF)
 Disallow multi-values attributes, composite attributes and their combination
 The domain of an attribute must be atomic (simple and indivisible) values
DEPARTMENT: each department can have a number of locations 1NF: DEPT_LOCATIONS(Dnumber, Dlocation)
1NF also disallows multivalued attributes that are themselves composite
  
CS3402 11

CS3402 12

CS3402
13
Figure 14.10 Normalizing nested relations into 1NF. (a) Schema of the EMP_PROJ relation with a nested relation attribute PROJS. (b) Sample extension of the EMP_PROJ relation showing nested relations within each tuple. (c) Decomposition of EMP_PROJ into relations EMP_PROJ1 and EMP_PROJ2 by propagating the primary key.

Problems with 1NF
 Example:
FIRST(S#, Country, City, P#, Qty)
and its Functional Dependency Diagram: What’s the primary key? (S#,P#)
Qty
S# P#
Country City
CS3402
14

Problems with 1NF
 Problems with 1NF  Insert Anomalies
Inability to represent certain information:
Eg, cannot enter “Supplier and City” information until
Supplier supplies at least one product
 Delete Anomalies
Deleting the “only tuple” for a supplier will destroy all the information about that supplier
 Update Anomalies
“S# and City” could be redundantly represented for each P#, which may cause potential inconsistency when updating a tuple
CS3402
15

Solution: 1NF->2NF
 Possible Solution:
Replace the original table by two sub-tables
SECOND(S#, City, Country) SP (S#, P#, Qty)
and now, their FD diagrams are:
Qty
S# P#
S# Country City
CS3402
16

Second Normal Form with Primary Key
 X->Y

Full functional dependency
 If removal of any attribute A from X means that the dependency
does not hold any more
 E.g., {S#, P#} → Qty
Partial functional dependency

CS3402
17
 
If some attributes A belonging to X can be removed from X and the dependency still holds
E.g., {S#, P#} → Country as S# → Country

Second Normal Form with Primary Key
 A relation R is in 2NF if every nonprime attributes A in R is fully functional dependent on the primary key of R
 An attribute of R is called prime (key) attribute of R if it is a member of some candidate key of R. Otherwise it is nonprime
 Non-prime (non-key) attributes: Country, City, Qty
 Primary key: {S#, P#}
CS3402 18

Second Normal Form with Primary Key
 The nonprime attributes Country violates 2NF because of S#- >Country
 Similarly, S#->City
 If a relation is not in 2NF, it can be 2NF normalized into a number of 2NF relations in which nonprime attributes are associated only with the part of the primary key on which they are fully functional dependent.
 Solutions: SP (P#,S#,Qty); Second(S#, City, Country);
CS3402 19

Problem with 2NF
 Still, there are problems with 2NF:  Insert Anomalies
 Cannot insert “City and Country” information until some Supplier in that city exist
 Delete Anomalies
Deleting an “only tuple” for a particular city will destroy all
the information about that city
 Update Anomalies
The supplier move to another city,inconsistency between CS3402 city and country will happen 20
Country

Solution: 2NF->3NF
 Possible Solution:
Replace the SECOND table by two sub-tables
SC (S#, City)
CC (City, country)
and still keep the table SP (S#, P#, Qty). Now the FD diagrams for the new tables:
S# P#
city city Country
Qty
S#
CS3402
21

Third Normal Form with Primary Key
 3NFisbasedontheconceptoftransitivedependency
 AfunctionaldependencyX->YinarelationRistransitive dependency if there exists a set of attributes Z in R that is neither a candidate key nor a subset of any key of R, and both X->Z and Z- >Y hold
 X->Z->Y(Z:anon-primeattributeorasetincludingnon-prime attributes )
 E.g.,dependencyS#->CountryistransitivethroughCity
 S#->City and City->Country, and City is neither a key itself nor a
subset of the key
 S#-> City-> Country
CS3402 22

Third Normal Form with Primary Key
 AccordingtoCodd’soriginaldefinition,arelationschemeRisin 3NF if it satisfies 2NF and no non-prime attribute of R is transitively dependent on the primary key
 TonormalizeRinto3NF,Rshouldbedecomposedtobreakthe transitive dependency.
 3NF:SC(S#,City),CC(City,Country),SP(S#,P#,Qty).
S# P#
city city Country
Qty
S#
CS3402
23

Normal Forms Defined Informally
 1st normal form
 All attributes are simple (no composite or multivalued attributes)
 2nd normal form
 All non-prime attributes depend on the whole key (no partial
dependency)
 3rd normal form
 All non-prime attributes only depend on the key (no transitive
dependency)
 Ingeneral,3NFisdesirableandpowerfulenough
 But,stilltherearecaseswhereastrongernormalformisneeded
CS3402 24

Test and Remedy for Normalization
CS3402 25

Consider all candidate keys of a relation (not just a defined primary key)
General Definitions of 2NF and 3NF
 LOTSdescribesparcelsoflandforsaleinvariouscountiesofa state
 Twocandidatekeys:Property_id#and{County_name,Lot#}
 Property_id#istheprimarykey
 LOTSviolates2NFduetoFD3asTax_rateispartiallydependent on the candidate key {County_name, Lot#}
 County_name=>Tax-rate(partialdependent)
CS3402 26

General Definition of Second Normal Form
CS3402 27

CS3402 28

General Definition of Third Normal Form
 Superkey=>includeakey
 Primeattributes=>attributesinakey
 LOTS2 is in 3NF
 FD4inLOTS1violates3NFbecauseAreaisnotasuperkeyand
Price is not a prime attribute in LOTS1
 LOTS1violates3NFbecausePriceistransitivelydependenton
each of the candidate keys of LOTS via the nonprime attribute Area
 TonormalizeLOTS1into3NF,LOTS1B{Area,Price}
CS3402 29

Boyce-Codd Normal Form
 BCNFwasproposedasasimplerformof3NF,butitwasfoundto be stricter than 3NF
 Every relation in BCNF is also in 3NF
 BUT Relation in 3NF is not necessarily in BCNF
 Difference:
 Condition which allows A to be prime is absent from BCNF
CS3402 30

Boyce-Codd Normal Form
 Supposewehavethousandsoflotsintherelationbutthelotsare from only two counties: A and B
 LotsizesinAareonly0.5,0.6,0.7,0.8,0.9acreswhereaslotsinB are 1.1, 1.2, …, 1.9
 Then, Area -> County_name
 FD5satisfies3NFinLOTS1AbecauseCounty_nameisaprime
attribute
 FD5violatesBCNFinLOTS1AbecauseAreaisnotasuperkeyof LOTS1A
 CreatenewrelationR(Area,County_name).Theareaoflotscanbe represented in R(Area, County_name) to reduce redundancy
CS3402 31

CS3402
32
County_name is a prime attribute

Boyce-Codd Normal Form
 Some notes on BCNF:
 BCNF is stronger than 3NF
 BCNF is useful when a relation has: multiple candidate keys, and
these candidate keys are composite ones, and they overlap on some common attributes
 BCNF reduces to 3NF if the above conditions do not apply
CS3402 33

References
 6e 

Ch. 14, p. 487-515 Ch. 15, p. 529-533
CS3402
34