INFO20003 Database Systems
Dr Renata Borovica-Gajic
Lecture 15 Normalization
Copyright By PowCoder代写 加微信 powcoder
Learning Objectives
• By the end of this lecture, you should be able to:
– Define normalization
– Explain and identify database anomalies
– Define and identify functional dependencies
– Normalize relations to:
• 1st Normal Form (1NF) • 2nd Normal Form (2NF) • 3rd Normal Form (3NF)
INFO20003 Database Systems © University of Melbourne 4
Motivation for normalization
• What happens if we don’t normalize?
INFO20003 Database Systems © University of Melbourne 5
What’s wrong with the organization of data in this table?
Student ID#
Student Name
Campus Address
Subject ID
Subject Title
Lecturer Name
Lecturer Office
Lecturer Phone
166 Grattan Street
Accounting
166 Grattan Street
166 Grattan Street
166 Grattan Street
166 Grattan Street
302 Royal Parade
Accounting
302 Royal Parade
302 Royal Parade
302 Royal Parade
224 Swanston St.
Accounting
224 Swanston St.
224 Swanston St.
224 Swanston St.
Accounting
INFO20003 Database Systems © University of Melbourne 6
Anomalies in Denormalized Data:
• Consider the following denormalized table (relation) :
130 C200 75 200 C300 100 250 C200 75 425 C400 150 500 C300 100 575 C500 50 ………
Student-ID
• Insertion Anomaly: A new course cannot be added until at least one student has enrolled (which comes first student or course?)
• Deletion Anomaly: If student 425 withdraws, we lose all record of course C400 and its fee!
• Update Anomaly: If the fee for course C200 changes, we have to change it in multiple records (rows), else the data will be inconsistent.
INFO20003 Database Systems © University of Melbourne 8
Normalisation
• A technique used to remove undesired redundancy from databases (Break one large table into several smaller tables).
A relation is normalized if all determinants are candidate keys
How do we normalise?
INFO20003 Database Systems © University of Melbourne 9
Invoice example
INFO20003 Database Systems © University of Melbourne 10
Invoice example – Spreadsheet Format
Invoice Number
Customer Name
Customer Address
Sales Person
Product ID
Product Name
Unit Price
14-Aug- 09
John / Synex
128 Juanita Ave…
PSV880. 006
AMD Athlon X2DC
PSV880. 037
LC.V890. 002
LG 8.5” LCD
HPQ754. 071
HP LaserJet 5200
not relational model
15-Aug- 09
Mary / ThisCo
HP Q754.071
HP LaserJet 5200
LCV890. 002
LG 8.5” LCD
INFO20003 Database Systems © University of Melbourne 11
Invoice Number
Product ID
PSV880.006
PSV880.037
LC.V890.002
HPQ754.071
HPQ754.071
LCV890.002
Product Name
AMD Athlon X2DC
LG 8.5” LCD
HP LaserJet 5200
HP LaserJet 5200
LG 8.5” LCD
Customer Name
John / Synex
Mary / ThisCo
Invoice example – Spreadsheet Format
Unit Price
Customer Address
128 Juanita Ave…
Sales Person
Break into two
How do we connect?
INFO20003 Database Systems © University of Melbourne 12
Invoice Number
Customer Name
John / Synex
Mary / ThisCo
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
HPQ754.07 1
LCV890.002
Invoice example – Spreadsheet Format
Product Name
Customer Address
128 Juanita Ave…
AMD Athlon X2DC
LG 8.5” LCD
HP LaserJet 5200
HP LaserJet 5200
LG 8.5” LCD
Unit Price
Sales Person
Invoice Number
INFO20003 Database Systems © University of Melbourne 13
Invoice Number
Customer Name
John / Synex
Mary / ThisCo
Invoice example – Spreadsheet Format
Customer Address
128 Juanita Ave…
Product Name
Sales Person
This is about product
This is about Order (invoice)
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
HPQ754.07 1
LCV890.002
AMD Athlon X2DC
LG 8.5” LCD
HP LaserJet 5200
HP LaserJet 5200
LG 8.5” LCD
Unit Price
Invoice Number
INFO20003 Database Systems © University of Melbourne 14
Invoice Number
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
HPQ754.07 1
Customer Name
John / Synex
Mary / ThisCo
Invoice example – Spreadsheet Format
Customer Address
128 Juanita Ave…
Invoice Number
Sales Person
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
Product Name
AMD Athlon X2DC
LG 8.5” LCD
HP LaserJet 5200
Break into two
Unit Price
INFO20003 Database Systems © University of Melbourne 15
Invoice Number
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
HPQ754.07 1
Customer Name
John / Synex
Mary / ThisCo
Invoice example – Spreadsheet Format
Customer Address
128 Juanita Ave…
Invoice Number
Sales Person
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
Product Name
AMD Athlon X2DC
LG 8.5” LCD
HP LaserJet 5200
What about amount?
Unit Price
INFO20003 Database Systems © University of Melbourne 16
Invoice Number
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
HPQ754.07 1
Customer Name
John / Synex
Mary / ThisCo
Invoice Number
Invoice example – Spreadsheet Format
Customer Address
128 Juanita Ave…
Sales Person
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
Product Name
AMD Athlon X2DC
LG 8.5” LCD
HP LaserJet 5200
What about sales person?
Could be derived
Unit Price
INFO20003 Database Systems © University of Melbourne 17
Invoice Number
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
HPQ754.07 1
14-Aug- 09
15-Aug- 09
Customer Name
John / Synex
Mary / ThisCo
Invoice Number
Invoice example – Spreadsheet Format
Customer Address
128 Juanita Ave…
Sales Person ID
What about customer?
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
Sales Person ID
Sales Person
Product Name
AMD Athlon X2DC
LG 8.5” LCD
HP LaserJet 5200
Unit Price
INFO20003 Database Systems © University of Melbourne 18
Invoice Number
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
HPQ754.07 1
Customer ID
Invoice Number
Invoice example – Spreadsheet Format
Sales Person ID
Sales Person ID
Sales Person
Product ID
Product Name
Unit Price
PSV880.006
AMD Athlon X2DC
PSV880.037
LC.V890.00 2
LG 8.5” LCD
HPQ754.07 1
HP LaserJet 5200
Customer ID
Customer Name
Customer Address
John / Synex
128 Juanita Ave…
Mary / ThisCo
INFO20003 Database Systems © University of Melbourne 19
Normalized Relations and ER Diagram
• We can name the relations now
– Customer (CustomerNumber, CustomerName, CustomerAddress)
– Clerk (ClerkNumber, ClerkName)
– Product (ProductNumber, ProductDescription)
– Invoice (InvoiceNumber, Date, CustomerNumber, ClerkNumber)
– InvoiceLineItem (InvoiceNumber, ProductNumber, UniPrice, Quantity)
INFO20003 Database Systems © University of Melbourne 20 —
• Now let’s go back to theoretical concepts…
INFO20003 Database Systems © University of Melbourne 21
Functional Dependency
• A functional dependency concerns values of attributes in a relation
• A set of attributes X determines another set of
attributes Y if each value of X is associated with only
one value of Y
– Written X → Y
• X determines Y (If I know X then I also know Y)
• Emp# Emp-name • Emp# Salary
INFO20003 Database Systems © University of Melbourne 22
Functional Dependency: Definitions
• Determinants (X,Y→ Z) A(X, Y, Z, D) – the attribute(s) on the left hand side of the arrow
• Key and Non-Key attributes
– each attribute is either part of the primary key or it is not
• Partial functional dependency (Y→Z)
– a functional dependency of one or more non-key attributes
upon part (but not all) of the primary key
• Transitive dependency (Z→D)
– a functional dependency between 2 (or more) non-key
attributes
INFO20003 Database Systems © University of Melbourne 23 —
Armstrong’s Axioms
Functional dependencies can be identified using Armstrong’s Axioms
𝐴 = 𝑋1,𝑋2,…,𝑋𝑛 𝑎𝑛𝑑 𝐵 = (𝑌1,𝑌2,…,𝑌𝑛) 1. Reflexivity: 𝐵⊆𝐴⟹𝐴→𝐵
Example: Student_ID, name -> name
2. Augmentation: 𝐴→𝐵⟹AC→𝐵𝐶
Example: Student_ID -> name => Student_ID, surname ->name, surname
3. Transitivity: 𝐴→𝐵andB→𝐶⟹𝐴→𝐶 Example: ID -> birthdate and birthdate -> age then ID ->age
INFO20003 Database Systems © University of Melbourne 24
Steps in Normalisation
First Normal Form
Keep atomic data/Remove repeating groups Remove partial dependencies
Remove transitive dependencies
Second Normal Form
Third Normal Form
INFO20003 Database Systems © University of Melbourne 25
First Normal Form
Remove Repeating Groups
• repeating groups of attributes cannot be represented in a flat, two dimensional table
• removing cells with multiple values (keep atomic data)
Set of values
Example: Order-Item (Order#, Customer#, (Item#, Desc, Qty))
• Order-Item (Order#, Customer#, (Item#, Desc, Qty))
• Order-Item (Order#, Item#, Desc, Qty)
• Order (Order#, Customer#)
Break them into two Use PK/FK to connect
INFO20003 Database Systems © University of Melbourne 26
Second Normal Form
Remove Partial Dependencies
a non-key attribute cannot be identified by part of a composite key
Example: Order-Item (Order#, Item#, Desc, Qty)
• Order-Item (Order#, Item#, Desc, Qty) Partial dependency
Item (Item#, Desc)
Order-Item (Order#, Item#, Qty)
INFO20003 Database Systems © University of Melbourne 27
Partial Dependency Anomalies
Order-Item (Order#, Item#, Desc, Qty)
change item desc in many places
data for last item lost when last order for that item is deleted
cannot add new item until it is ordered
INFO20003 Database Systems © University of Melbourne 28
Solution to these Anomalies
Order-Item
delete last order for item, but item
add new item at any time
change item description in one place
INFO20003 Database Systems © University of Melbourne 29
Third Normal Form
Remove Transitive Dependencies
a non-key attribute cannot be identified by another non-key attribute
Example: Employee (Emp#, Ename, Dept#, Dname)
• Employee (Emp#, Ename, Dept#, Dname)
Transitive dependency
• Employee (Emp#, Ename, Dept#)
• Department( Dept#, Dname)
INFO20003 Database Systems © University of Melbourne 30
Transitive Dependency Anomalies
Example: Employee (Emp#, Ename, Dept#, Dname)
change dept name in many places
data for dept lost when last employee for that dept is deleted
cannot add new dept until an employee is allocated to it
INFO20003 Database Systems © University of Melbourne 31
Solution to these Anomalies
delete last emp in dept, but dept remains
add new dept at any time
change dept name in one place
INFO20003 Database Systems © University of Melbourne 32
Summary of Normalisation
First Normal Form
Remove repeating groups Remove partial dependencies Remove transitive dependencies
Second Normal Form
Third Normal Form
INFO20003 Database Systems © University of Melbourne 36
Normalisation vs :
– Normalised relations contains a minimum amount of redundancy and allow users to insert, modify, and delete rows in tables without errors or inconsistencies (anomalies)
Denormalization:
– The pay-off: query speed.
– The price: extra work on updates to keep redundant data
consistent.
• Denormalization may be used to improve performance of
time-critical operations.
INFO20003 Database Systems © University of Melbourne 37
What’s Examinable?
• Normalisation Process
• Anomalies
• Functional dependencies
INFO20003 Database Systems © University of Melbourne 38 —
Next Lecture
• Hands on Normalization
INFO20003 Database Systems © University of Melbourne 39 —
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com