Normalization
Normalization
Lecture 09
What is Normalization?
Normalization is a series of steps used to evaluate and modify table structures to ensure that every non-key column in every table is directly dependent on the primary key.
The results of normalization are reduced redundancies, fewer anomalies and improved efficiencies.
Two purposes of normalization
Purposes of normalization
Eliminate redundant data (the same data stored in more than one table)
Eliminating redundant data is achieved by splitting tables with redundant data into two or more tables without the redundancy
Ensure the data within a table are related
Normalization involves the process of applying rules called normal forms to table structures that produce a design that is free of data redundancy problems.
Normal Forms
Several normal forms exist
1NF
2NF
3NF
Each normal form addresses the potential for a particular type of redundancy.
A table is said to be in one of the normal forms if it satisfies the rules required by that form.
First Normal Form (1NF)
1
2
no repeating groups – each row/column intersection only contains one value
3
Primary key is identified
Two dimensional table format
1NF (Example)
product_id product_desc whse_id bin qty whse_address city prov pcode
145 Saw 122
322 136
175 40
25 122 Peter St.
4433 Oak Ave Newmarket
Oakville Ont
Ont L4T5Y6
L5T6R5
355 Screwdriver 122 111 55 122 Peter St. Newmarket Ont L4T5Y6
130 Hammer 322 98 35 4433 Oak Ave Oakville Ont L5T6R5
Product no 145 is stored in two different warehouses.
Having more than one value at the intersection of a row and a column is referred to as having a repeating group.
A table that contains a repeating group is called an un-normalized table (UNF)
Inventory [product_id, product_desc(whse_id,bin,qty,whse_address,city,prov,pcode)]
Inventory
1NF (Example)
The repeating groups can be eliminated by filling in the values in vacant cells of the table.
Each row/column intersection must contain only a single value to satisfy the first rule for 1NF.
What should the primary key be?
product_id product_desc whse_id bin qty whse_address city prov pcode
145 Saw 122 136 40 122 Peter St. Newmarket Ont L4T5Y6
145 Saw 322 175 25 4433 Oak Ave Oakville Ont L5^6R5
355 Screwdriver 122 111 55 122 Peter St. Newmarket Ont L4T5Y6
130 Hammer 322 98 35 4433 Oak Ave Oakville Ont L5T6R5
Inventory
1NF (Example)
The primary Key should be product_id concatenated to whse_id.
Inventory [product_id, whse_id, product_desc, bin,qty, whse_address, city, prov, pcode]
product_id whse_id product_desc bin qty whse_address city prov pcode
145 122 Saw 136 40 122 Peter St. Newmarket Ont L4T5Y6
145 322 Saw 175 25 4433 Oak Ave Oakville Ont L5T6R5
355 122 Screwdriver 111 55 122 Peter St. Newmarket Ont L4T5Y6
130 322 Hammer 98 35 4433 Oak Ave Oakville Ont L5T6R5
Inventory
Problems with Un-normalized Data
Update Problem
Data Inconsistency Problem
Data Redundancy Problem
Insert Problem
Deletion Problem
Modification Anomaly
product_id whse_id product_desc bin qty whse_address city prov pcode
145 122 Saw 136 40 122 Peter St. Newmarket Ont L4T5Y6
145 322 Saw 175 25 4433 Oak Ave Oakville Ont L5T6R5
355 122 Screwdriver 111 55 122 Peter St. Newmarket Ont L4T5Y6
130 322 Hammer 98 35 4433 Oak Ave Oakville Ont L5T6R5
Inventory
The Update Problem
The need to perform the same update in several locations of the database because the same data is repeated
Oakville warehouse is moved to Burlington
We will have to make more than one change to the database
Data Inconsistency
product_id whse_id product_desc bin qty whse_address city prov pcode
145 122 Saw 136 40 122 Peter St. Newmarket Ont L4T5Y6
145 322 Saw 175 25 4433 Oak Ave Oakville Ont L5T6R5
355 122 Screwdriver 111 55 122 Peter St. Newmarket Ont L4T5Y6
130 322 Hammer 98 35 4433 Oak Ave Oakville Ont L5T6R5
Inventory
The Data Inconsistency Problem
When the same data is repeated in several records, they can be inconsistent
What is the inconsistency?
Data Redundancy
product_id whse_id product_desc bin qty whse_address city prov pcode
145 122 Saw 136 40 122 Peter St. Newmarket Ont L4T5Y6
145 322 Saw 175 25 4433 Oak Ave Oakville Ont L5T6R5
355 122 Screwdriver 111 55 122 Peter St. Newmarket Ont L4T5Y6
130 322 Hammer 98 35 4433 Oak Ave Oakville Ont L5T6R5
Inventory
The Data Redundancy Problem
the unnecessary repetition of data in the database of non-key fields.
While it is fine to repeat Primary Keys and Foreign Keys, we do not want to repeat data fields.
The Insert Problem
Only one table for storing information, STUDENT
Let us say we have just hired a new teacher: Mr. Vert. We have no way to put him into the database as he has no students yet.
STUDENT(Student-Num, Student-Name, Teacher, Student-Age)
1243658712 Tom Blu Ms.Greene 14
2343216578 Jill Fall Mr.Brown 14
3214325436 Jack Pail Ms.Green 14
The Deletion Problem
If there is no teacher table, and if a teacher’s students all go to high school, then the teacher will disappear from our database.
STUDENT(Student-Num, Student-Name, Teacher, Student-Age)
1243658712 Tom Blu Ms.Greene 14
2343216578 Jill Fall Mr.Brown 14
3214325436 Jack Pail Ms.Green 14
Second Normal Form (2NF)
1
2
no partial dependencies; all non-key columns are fully dependent on the entire primary key
1NF
2NF
Is this table in 2NF?
Fix this
product_id whse_id product_desc bin qty whse_address city prov pcode
145 122 Saw 136 40 122 Peter St. Newmarket Ont L4T5Y6
145 322 Saw 175 25 4433 Oak Ave Oakville Ont L5T6R5
355 122 Screwdriver 111 55 122 Peter St. Newmarket Ont L4T5Y6
130 322 Hammer 98 35 4433 Oak Ave Oakville Ont L5T6R5
Inventory
2NF
Inventory
Warehouse
Product
Let’s write this out in a text based form called Relational Schema:
Capitalize the entity name
Put attributes in parenthesis
Bold and underline the primary key
For later – Italicize the foreign key or use FK
2NF
Inventory
Warehouse
Product
PRODUCT (product_id, prod_desc)
WAREHOUSE (whse_id, whse_address, city, prov, pcode)
INVENTORY (whse_id (FK), product_id (FK), bin, qty)
ERD comes to a similar conclusion
Distribution Company Many-to-Many ERD Many products stored in many warehouses
A large distribution company has many warehouses across the county which provides faster shipping for customers. The company handles many products that may be stored in any of the company’s warehouses. Of course, there will be times when a product is out of stock and is not stored in a particular warehouse. The company wants to know the quantity on hand (inventory) for each product at each warehouse. For products store the product id, product name, and unit cost. For warehouses, store the warehouse id and city.
ERD
Third Normal Form (3NF)
1
2
A non-key column cannot determine the value of another non-key column. Every non-key column must depend directly on the primary key
2NF
3NF
Warehouse
To satisfy the second rule of 3NF, the warehouse table can be split into two tables.
3NF
Warehouse
Postal Code determines city and province.
PostalCode
Why is this table not in 3NF?
ORDER
ORDER in 3NF
ORDER
Functional Dependency
A functional dependency occurs when one or more attributes in a table uniquely determines another attribute
product_id prod_desc
whse_id, product_id bin, qty
whse_id whse_address, city, prov, pcode
Partial Dependency
Partial dependency is where a non-key column is dependent on part of the primary key but is not dependent on the entire primary key.
[product_id, whse_id, product_desc, bin,qty, whse_address, city, prov, pcode]
The prod_desc column is dependent on the product_id key but is not determined by the whse_id key.
The whse_address column is dependent on the whse_id key but is not related to the product_id key
We only need to look for this and solve it when we have a concatenated key
Boyce-Codd Normal Form (BCNF)
1
Every determinant in a table is a candidate key. If there is only one candidate key, 3NF and BCNF are the same.
Fourth Normal Form (4NF)
1
2
It has no multivalued dependencies. A multivalued fact is one in which several values for a column might be determined by one value for another column.
3NF
Summary
Rules for reducing data redundancy and related problems called normal forms
UNF?
1NF – no repeating groups
2NF – no partial dependencies
3NF – no transitive dependencies
Functional dependency?
Partial dependency?
User Views and Merging Relations
Do not overstep normalization rules
1NF:
Order [OrderNo, , PartNo, Orderdate, CustNo, CustLname,PartDesc,QtyOrd, Price)]
Going from 1NF to 2NF does not give us a
Customer[CustNo, CustLname]
The non key value determining a non key value is handled
when Going from 2NF to 3NF
We should not lose any attributes, relations or composite keys
when going from 1NF to 2NF to 3NF
Only apply the normalization rules as written.
When we reach the merge stage we will get rid of repeated attributes and reduce composite keys if possible.
Merging Relations
1A CUSTOMER[ CustNo, CustName, CustStreet, CustZip, CustRep ]
1B ZipCode[CustZip, CustCity, CustSt]
2A PART[ PartNo, PartDescr, QtyOnHand, Class, Whse, Price]
3A CUSTOMER[CustNo, CustName]
3B CUSTORDER[CustNo, OrderNo, OrderDate]
4A ORDER [OrderNo, OrderDate, CustNo]
4B ORDERDETAIL [OrderNo, PartNum, NumOrdered QuotedPrice]
4C PART [PartNum, PartDescr]
5A CUSTORDER[CustNum, OrderNum]
5B ORDER[OrderNum, OrderDate]
5C CUST[CustNum, CustName, RepNo]
5D REP [RepNo, RepName]
Use consistent names for same attribute in each relation
2 PART[ PartNo, PartDescr, QtyOnHand, Class, Whse, Price]
4B ORDERDETAIL [ OrderNo, PartNum, NumOrdered]
4B ORDERDETAIL [ OrderNo, PartNo, NumOrdered, QuotedPrice]
4C PART [PartNum, PartDescr]
4C PART [PartNo, PartDescr]
5A CUSTORDER[CustNum, OrderNum]
5A CUSTORDER[CustNo, OrderNo]
5B ORDER[OrderNum, OrderDate]
5B ORDER[OrderNo, OrderDate]
5C CUST[CustNum, CustName, RepNo]
5C CUST[CustNo, CustName, RepNo]
Examine the Primary Key of each relation using a composite key
Determine if all parts should be part of the Primary Key
3B CUSTORDER[CustNo, OrderNo, OrderDate]
CustNo is not required to be part of the PK because
OrderNo uniquely identifies an order.
If the Composite Primary key Was left as is then unique values could include
(OrderNo 1001, CustNo 2) and (OrderNo 1001, CustNo 5)
This would be incorrect because there is no situation where Customers 2 and 5 would each have an order numbered 1001
As a result we will have:
5A CUSTORDER[OrderNo, CustNo, OrderDate]
and CUSTORDER[CustNo, OrderNo] becomes
5A CUSTORDER[ OrderNo, CustNo]
Use consistent names for relations with the same attributes in the Primary Key
Consider the following relations
4A ORDER[OrderNo,OrderDate,CustNo]
5A CUSTORDER[CustNo,OrderNum]
5B ORDER[OrderNum,OrderDate]
After fixing the inconsistent naming:
4A ORDER[OrderNo,OrderDate,CustNo]
5A CUSTORDER[CustNo,OrderNo]
5B ORDER[OrderNo,OrderDate]
Create one relation for relations having an identical Primary Key
The PK can be a one-part PK or a concatenated PK (2 or more attributes) but it must match exactly
1A,3A,5C CUSTOMER[ CustNo, CustName, CustStreet,CustZip, RepNo ]
2,4C PART[ PartNo, PartDescr, QtyOnHand, Class, Whse, Price]
3B, 4A,5A,5B ORDER [OrderNo, OrderDate, CustNo]
4B ORDERDETAIL [OrderNo, PartNum, NumOrdered,QuotedPrice]
5D REP [RepNo, RepName]
Resolve any new transitive dependencies
If any new transitive were created when merging relations, it must be resolved.
Example:
CUSTOMER[ CustNo, CustName, CustStreet, CustZip, RepNo ]
CUSTOMER[ CustNo, CustName, RepName]
Merged as
CUSTOMER[ CustNo, CustName, CustStreet, CustZip, RepNo,RepName ]
Which has the transitive dependency after merging
RepName is determined by attribute RepNo
and would be resolved as
CUSTOMER[ CustNo, CustName, CustStreet, CustZip, RepNo]
REP[RepNo, RepName]
After the Merge We Have:
CUSTOMER[ CustNo, CustName, CustStreet, CustZip, RepNo(FK)]
ZipCode[CustZip, CustCity, CustSt,]
REP[RepNo, RepName]
PART[ PartNo, PartDescr, QtyOnHand, Class, Whse, Price]
ORDER [OrderNo, OrderDate, CustNo(FK)]
ORDERDETAIL [OrderNo (FK1), PartNum (FK2), NumOrdered, QuotedPrice ]
CUSTORDER[CustNo, OrderNo]
/docProps/thumbnail.jpeg