Relational Model
CHAPTER 14
Copyright By PowCoder代写 加微信 powcoder
Basics of Functional Dependencies and Normalization for Relational Databases
Collect requirements (users, views, constraints, tasks)
Model requirements and validate (ERD and EERD)
Map ERD to relational model
Verify model with relational algebra
Build model using DBMS (SQL)
Optimize model (Normalization and Indexing)
Build transactions and other user tools
Chapter 14 Outline
Informal Design Guidelines for Relation Schemas
Functional Dependencies
Normal Forms Based on Primary Keys
General Definitions of Second and Third Normal Forms
Boyce- Form
Informal Design Guidelines
for Relation Schemas
Measures of quality:
Making sure attribute semantics are clear
Reducing redundant information in tuples
Reducing NULL values in tuples
Disallowing possibility of generating spurious tuples
Guideline 1
Design relation schema so that it is easy to explain its meaning
Small errors in ERD are translated into errors in relational schema and result in dysfunctional DB
Do not combine attributes from multiple entity types and relationship types into a single relation
What do we see here?
Client List
Client Account Investor C/S Investment
Id Client Name Type AUM Contact Phone Inception Strategy Representative Manager
2242F1 Allegany Fire Department Pension P1 $ 32,426,757.32 Sheryl Robinson 555-386-8197 9/21/2009 LCV Riley Denning
1322P2 Johnsonville Police Department 401K K1 $ 22,167,992.72 555-287-7177 2/15/2001 MCG Keisha Robinette
19 I1 $ 2,309,023.00 555-302-1697 11/14/2014 SCG Karl Rentoff Jorge Villa
I10 I1 $ 752,323.99 555-302-1791 6/2/2007 QNT Karl Rentoff Bryan Reynolds
9923F1 Three Rivers Fire Department Pension P2 $ 5,922,233.00 555-521-2123 4/23/2016 LCV Riley Lumberg 555-521-2127
2344C1 City of Workers 401K K1 $ 14,299,728.33 555-979-2223 5/26/2011 LCV Keisha Denning
2344C1 City of Workers 401K K1 $ 9,827,298.41 555-979-2223 7/27/2013 QNT Keisha Reynolds
2344C1 City of Workers 401K K2 $ 41,237,998.72
* Note: is with Parkland Investment Group
555-979-2223 7/27/2013 SCG Keisha Villa
What is Good or Bad About This Relation?
S# – Suppliers, P# – Parts;
Primary Key = {S#, P#}
Normalization
Data Redundancy
City of a supplier is listed several times
Waste of storage
Normalization
Guideline 2
Design base relation schemas so that
NO Redundant Information exists in Tuples
NO Update anomalies will occur
Types of anomalies:
Modification
Group attributes into relation schemas
Significant effect on storage space
Update (Modification) Anomaly
Changing data in a row forces changes to other rows because of duplication or results in invalid data
Deletion Anomaly
Deleting rows may cause a loss of data that would be needed for other future rows
Insertion Anomaly
Adding new rows forces user to create duplicate data, store NULL values, create invalid dummy data, or being unable to store data at all
Data Anomalies
Update Anomaly
If we change the city of a supplier, we must update all the tuples related to this supplier. If we miss one tuple, the data is no longer consistent.
If we do a find/replace, we risk changing the city for the wrong supplier
Normalization
Data Anomalies
Insertion Anomaly
Until a supplier actually supplies a part, we can’t record (insert) its city.
Normalization
Data Anomalies
Deletion Anomaly
If we delete all the shipments of a supplier, its city is gone too.
Normalization
3. NULL Values in Tuples
Problems with NULLs
Wasted storage space
Problems understanding meaning
Consider creating separate relations for specializations to prevent occurrence of numerous NULL values
Guideline 3
Multivalued attributes in ERDs should be resolved into a separate relation and not in multiple attributes in the same relation
Student ( … Major1, Major2, Major 3 …)
If NULLs are unavoidable, make sure that they apply in exceptional cases only, not to a majority of tuples
4. Generation of Spurious Tuples
NATURAL JOIN
Many-to-many
Result produces spurious tuples
Represent spurious information that is not valid
Guideline 4
Design relation schemas to be joined with equality conditions on attributes that are appropriately related
Guarantees that no spurious tuples are generated
Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations
Avoid dummy data or overuse of default values
Summary and Discussion of Design Guidelines
Anomalies cause redundancies, data duplication, decline in data quality, and make data unusable in the long run
Waste of storage space due to NULLs
Difficulty of performing operations and joins due to NULL values
Generation of invalid and spurious data
Normal Forms Based on Primary Keys
Normalization process
Approaches for relational schema design
Perform a conceptual schema design using a conceptual model then map conceptual design into a set of relations
Design relations based on external knowledge derived from existing implementation of files or forms or reports
i.e. Functional dependancies
Normalization of Relations
Takes a relation schema through a series of tests
Certify whether it satisfies a certain normal form
Proceeds in a top-down fashion
Normal form tests
Normalization
Normal Forms – definitions
1NF: All domain values in R are atomic
2NF : R is in 1NF and every non-key attribute is fully dependent on the key
3NF: R is 2NF and every non-key attribute is non-transitively dependent on the key
The Whole Key
And Nothing But The Key
functional
dependencies
multivalued
dependencies
dependencies
HIGHER NORMAL FORMS
First Normal Form
Part of the formal definition of a relation in the basic (flat) relational model
Only attribute values permitted are single atomic (or indivisible) values
Does not allow nested relations
Tuple with a relation within it
First Normal Form (cont’d.)
Techniques to achieve first normal form
Remove nested relation and non-atomic attributes into a new relation
Propagate the primary key into it
1NF – Normalization of Repeating Groups
OrderNum was original PK
PK is expanded to include the determinant of the repeated group: OrderNum+PartNum
1NF – Normalization of Repeating Groups
Before: S# (PQ)*
3 “records”
After: S# P# QTY
6 “records”
Definition of Functional Dependency
Constraint between two sets of attributes from the database
Cannot determine which FDs hold and which do not unless meaning of, and relationships among, attributes is known
Functional Dependence
Take a relation R(X, Y, …), where X and Y are sets of attributes:
X Y means
Y is functionally dependent on X
X (functionally) determines Y
In Plain English:
Each X-value in R is associated with precisely one Y-value in R.
Or, equivalently
If two tuples have the same X-value, they must have the same Y-value.
Normalization
Identify the Functional Dependencies
If two tuples have the same X-value, they must have the same Y-value.
For example, {S#} → {CITY}
Normalization
Functional Dependencies
{Part, Warehouse} {Qty}
{Warehouse} {W_Address}
Normalization
( Part, Warehouse, Qty, W_Address)
P1 W1 15 Columbus
P2 W1 10 Columbus
P3 W1 25 Columbus
P1 W2 10 Dayton
P2 W2 35 Dayton
P3 W2 11 Dayton
P1 W3 44 Cincinnati
Primary Key = {Part, Warehouse}
Full Functional Dependency
X Y i.e. ,Y depends on X
If Z is a proper subset of X, i.e. Z X.
there may NOT exist a Z such that Z Y.
Normalization
( Part, Warehouse, Qty, W_Address)
P1 W1 15 Columbus
P2 W1 10 Columbus
P3 W1 25 Columbus
P1 W2 10 Dayton
P2 W2 35 Dayton
P3 W2 11 Dayton
P1 W3 44 Cincinnati
Full Functional Dependencies
{Part, Warehouse} {Qty}
{Warehouse} {W_Address}
{Part, Warehouse} {W_Address}
W_Address does not depend on both Part and Warehouse
Normalization
( Part, Warehouse, Qty, W_Address)
P1 W1 15 Columbus
P2 W1 10 Columbus
P3 W1 25 Columbus
P1 W2 10 Dayton
P2 W2 35 Dayton
P3 W2 11 Dayton
P1 W3 44 Cincinnati
Second Normal Form
Full functional dependency on the key
Table is in 2NF if it is in 1NF and no non-prime attribute is dependent on any proper subset of any candidate key of the table.
Put simply, a table is in 2NF if it is in 1NF and every non-prime attribute of the table is dependent on the whole of every candidate key.
Second, normalize into a number of 2NF relations
Nonprime attributes are associated only with part of candidate key on which they are fully functionally dependent
Normalization
Second Normal Form: general definition
Second Normal Form: every nonprime attribute is not partially dependent on any candidate key.
W_Address is non-prime.
W_Address is partially dependent on candidate key Part, Warehouse.
Thus (Part, Warehouse, Qty, W_Address) is not in 2NF.
Decompose To Satisfy 2NF
Normalization
INVENTORY’
( Part, Warehouse, Qty)
( Warehouse, W_Address)
W1 Columbus
W3 Cincinnati
Every non-key attribute is fully dependent on the key
Decompose Relations
Dependencies preserved
{Part, Warehouse} {Qty}
{Warehouse} {W_Address}
{Part, Warehouse} {W_Address}
No loss of information
INVENTORY = INVENTORY_1 * LOCATION
Normalization
( Part, Warehouse, Qty, W_Address)
P1 W1 15 Columbus
P2 W1 10 Columbus
P3 W1 25 Columbus
P1 W2 10 Dayton
P2 W2 35 Dayton
P3 W2 11 Dayton
P1 W3 44 Cincinnati
INVENTORY_1
( Part, Warehouse, Qty)
( Warehouse, W_Address)
W1 Columbus
W3 Cincinnati
Validating Normalization
Given a relation R and functional dependencies, F, on R
Assume R is not in a desirable form for enforcing F
Decompose relation R into relations, R1,…, Rk, with associated functional dependencies, F1,…, Fk, such that R1,…, Rk are in a more desirable form
Preserve full dependencies
Don’t lose information.
Normalization
2NF to solve update anomalies
S# – Suppliers, P# – Parts
Primary Key = {S#, P#}
P# Pname, Color, Weight, Pcity
S# Status, City
S#, P# Qty
S# Status, City
S#, P# Qty
P# Pname, Color, Weight, Pcity
Normalization
Result : 2NF
Data Anomalies Solved
Insertion Anomaly
Enter information that S5 located in Athens, even though S5 does not currently supply any parts.
Normalization
Data Anomalies Solved
Deletion Anomaly
Delete the shipment connecting S3 and P2 by deleting the appropriate tuple from SP; we do not lose information that S3 is located in .
Normalization
Data Anomalies Solved
Update Anomaly
Change the city for S1 from London to Amsterdam by changing it once.
Normalization
Possible Data Anomalies in 2NF
Normalization
Transitive dependence
A supplier’s city has one status value associated with it
S# City Status
Possible Data Anomalies in 2NF
Can’t enter the status of a city until a supplier is located in that city.
Deleting the only tuple for a supplier causes status of the city to be lost.
Status of a city appears many times causing redundancy.
Normalization
Transitive dependence
Third Normal Form
Based on concept of transitive dependency
Problematic FD
Left-hand side is part of primary key
Left-hand side is a nonkey attribute
Normalization
Third Normal Form
A relation, R, is in 3NF if it satisfies 2NF and no nonprime attribute of R is transitively dependent on Primary Key
X Y is a Transitive Dependence:
set of attributes Z that is not a subset of any candidate key,
Third Normal Form:
Every nonprime attribute is
Fully functionally dependent on every candidate key.
Nontransitively dependent on every candidate key.
General Definitions of Second
and Third Normal Forms (cont’d.)
Prime attribute
Part of any candidate key will be considered as prime
Consider partial, full functional, and transitive dependencies with respect to all candidate keys of a relation
Result: 3NF
Normalization
functional
dependencies
multivalued
dependencies
dependencies
HIGHER NORMAL FORMS
General Definitions of Second
and Third Normal Forms
Normalization
Normal Forms – definitions
1NF: All domain values in R are atomic
2NF : R is in 1NF and every non-key attribute is fully dependent on the key
3NF: R is 2NF and every non-key attribute is non-transitively dependent on the key
BCNF: Every determinant in R is a candidate key.
The Whole Key
And Nothing But The Key
Normalization Steps
Check for repeated groups, multivalued attributes. If exist, propagate data to eliminate any repeating groups or nested tables, must expand PK to allow this. Determinant of the repeated group or multivalued attribute becomes a part of new composite PK. 1NF level.
Identify candidate key(s). Show functional dependencies. Any attributes not FD on the entire Key? Any attributes FD on part of the Key? If so, decompose (normalize) to 2NF.
Any non-key attributes FD on non-key? If so, decompose (normalize) to 3NF
Surrogate Keys in Normalization
Keys, keys, keys…
data attribute(s) that uniquely identify a tuple
Minimal superkey
Candidate key
One of several keys
Keys, keys, keys…
Primary key
preferred candidate key for an entity type
Alternate key
AKA secondary key, another unique identifier
Foreign key
Attribute(s) in an entity type that relates to a primary key in another relation
Informal Definitions
Key of a Relation:
Each row has a value of a data item (or set of items) that uniquely identifies that row in the table
Called the key
Sometimes row-ids or sequential numbers are assigned as keys to identify the rows in a table
Called artificial key or surrogate key
Keys, keys, keys…
Natural key
key that is formed of attributes that already exist in the real world. For example, SSN
Composite key
key of two or more attributes.
Surrogate key
key with no business meaning
Natural or Surrogate Key?
Natural keys exist already
Natural keys are coupled to business requirements
Can use Natural keys if the key values won’t change
If requirements change, you may need to change the key:
For example, if clients change CustomerNumber from numeric to alphanumeric then you would have to change every single table where CustomerNumber is used as a foreign key.
Normalization
Second Normal Form
in the presence of
Multiple Candidate Keys
Candidate key(s)
minimal super key, before surrogate key is created
Prime Attribute
An attribute that is a member of any candidate key of R.
Nonprime Attribute
An attribute of R that is not a prime attribute.
2NF Summary
Only a concern in the presence of composite candidate keys.
Having an surrogate key does not eliminate the need to consider all candidate keys and functional dependencies.
Table is in 2NF if it is in 1NF and every non-prime attribute of the table is dependent on the whole of every candidate key.
S# SNAME STATUS CITY P# PNAME COLOR WEIGHT QTY
S1 Smith 20 London P1 Nut Red 12 300
S1 Smith 20 London P2 Bolt Green 17 200
S1 Smith 20 London P3 Screw Blue 17 400
S1 Smith 20 London P4 Screw Red 14 200
S1 Smith 20 London P5 Cam Blue 12 100
S1 Smith 20 London P6 Cog Red 19 100
S2 Jones 10 Paris P1 Nut Red 12 300
S2 Jones 10 Paris P2 Bolt Green 17 400
S3 Blake 30 P2 Bolt Green 17 200
S4 Clark 20 London P2 Bolt Green 17 200
S4 Clark 20 London P4 Screw Red 14 300
S4 Clark 20 London P5 Cam Blue 12 400
S# SNAME STATUS CITY P# PNAME COLOR WEIGHT QTY
S1 Smith 20 London P1 Nut Red 12 300
S1 Smith 20 London P2 Bolt Green 17 200
S1 Smith 20 London P3 Screw Blue 17 400
S1 Smith 20 London P4 Screw Red 14 200
S1 Smith 20 London P5 Cam Blue 12 100
S1 Smith 20 London P6 Cog Red 19 100
S2 Jones 10 Paris P1 Nut Red 12 300
S2 Jones 10 Paris P2 Bolt G
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com