INFO20003 Database Systems
Dr Renata Borovica-Gajic
Lecture 15 Normalization
Week 8
Learning Objectives
• By the end of this lecture, you should be able to:
– Define normalization
– Explain and identify database anomalies
– Define and identify functional dependencies
– Normalize relations to:
• 1st Normal Form (1NF) • 2nd Normal Form (2NF) • 3rd Normal Form (3NF)
INFO20003 Database Systems © University of Melbourne 4
Motivation for normalization
• What happens if we don’t normalize?
INFO20003 Database Systems © University of Melbourne 5
What’s wrong with the organization of data in this table?
Student ID#
Student Name
Campus Address
Degree
Phone
Subject ID
Subject Title
Lecturer Name
Lecturer Office
Lecturer Phone
Sem.
Grade
A121
Joy Egbert
166 Grattan Street
B.Com.
555-7771
ACC101
Accounting
Davern
T240C
8344-1846
1-11
H1
A121
Joy Egbert
166 Grattan Street
B.Com.
555-7771
ECO101
Economics
Smyth
T240F
8344-1868
1-11
H2B
A121
Joy Egbert
166 Grattan Street
B.Com.
555-7771
ECO104
Quant. M.
Collier
T240D
8344-5716
1-11
H2B
A121
Joy Egbert
166 Grattan Street
B.Com.
555-7771
FIN101
Finance.
James
T240D
8344-5275
1-11
H2A
A121
Joy Egbert
166 Grattan Street
B.Com.
555-7771
ACC103
Processes
Wise
T240E
8344-5309
1-11
H3
A123
Larry Mueller
302 Royal Parade
B.Com.
555-1235
ACC101
Accounting
Davern
T240C
8344-1846
1-11
H1
A123
Larry Mueller
302 Royal Parade
B.Com.
555-1235
ECO101
Economics
Smyth
T240F
8344-1868
1-11
H2B
A123
Larry Mueller
302 Royal Parade
B.Com.
555-1235
ECO104
Quant. M.
Collier
T240D
8344-5716
1-11
H2A
A123
Larry Mueller
302 Royal Parade
B.Com.
555-1235
FIN101
Finance.
James
T240D
8344-5275
1-11
H3
A124
Mike Guon
224 Swanston St.
B.Eco.
555-2214
ACC101
Accounting
Davern
T240C
8344-1846
1-11
H2A
A124
Mike Guon
224 Swanston St.
B.Eco.
555-2214
ECO101
Economics
Smyth
T240F
8344-1868
1-11
H2A
A124
Mike Guon
224 Swanston St.
B.Eco.
555-2214
ECO104
Quant. M.
Collier
T240D
8344-5716
1-11
H2B
A124
Mike Guon
224 Swanston St.
B.Eco.
555-2214
ACC103
Processes
Wise
T240E
8344-5309
1-11
H2B
A126
Jackie Judson
85 Barry Street
B.Eco.
555-1245
ACC101
Accounting
Davern
T240C
8344-1846
1-11
H1
A126
Jackie Judson
85 Barry Street
B.Eco.
555-1245
ECO101
Economics
Smyth
T240F
8344-1868
1-11
H2B
A126
Jackie Judson
85 Barry Street
B.Eco.
555-1245
ECO104
Quant. M.
Collier
T240D
8344-5716
1-11
H2B
A126
Jackie Judson
85 Barry Street
B.Eco.
555-1245
ACC103
Processes
Wise
T240E
8344-5309
1-11
H2A
…
…
…
…
…
…
…
…
…
…
…
…
INFO20003 Database Systems © University of Melbourne 6
Anomalies in Denormalized Data:
• Consider the following denormalized table (relation) :
Student-ID
Course-ID
Fee
130
C200
75
200
C300
100
250
C200
75
425
C400
150
500
C300
100
575
C500
50
…
…
…
• Insertion Anomaly: A new course cannot be added until at least one student has enrolled (which comes first student or course?)
• Deletion Anomaly: If student 425 withdraws, we lose all record of course C400 and its fee!
• Update Anomaly: If the fee for course C200 changes, we have to change it in multiple records (rows), else the data will be inconsistent.
INFO20003 Database Systems © University of Melbourne 8
Normalisation
• A technique used to remove undesired redundancy from databases (Break one large table into several smaller tables).
A relation is normalized if all determinants are candidate keys
How do we normalise?
INFO20003 Database Systems © University of Melbourne 9
Invoice example
INFO20003 Database Systems © University of Melbourne 10
Invoice example – Spreadsheet Format
Invoice Number
Date
Customer Name
Customer Address
Sales Person
Terms
Product ID
Product Name
Unit Price
Quantity
Amount
Sub Total
INV0012
14-Aug- 09
John / Synex
128 Juanita Ave…
Charles Wooten
COD
PSV880. 006
AMD Athlon X2DC
580
6
3480
9463
PSV880. 037
PDC E5300
645
4
2580
LC.V890. 002
LG 8.5” LCD
230
10
2300
HPQ754. 071
HP LaserJet 5200
1103
1
1103
This is
not relational model
INV0013
15-Aug- 09
Mary / ThisCo
123 Smith Street…
Charles Wooten
COD
HP Q754.071
HP LaserJet 5200
1103
2
2206
3356
LCV890. 002
LG 8.5” LCD
230
5
1150
INFO20003 Database Systems © University of Melbourne 11
Invoice Number
INV0012
INV0013
Product ID
PSV880.006
PSV880.037
LC.V890.002
HPQ754.071
HPQ754.071
LCV890.002
Date
14-Aug-09
15-Aug-09
Product Name
AMD Athlon X2DC
PDC E5300
LG 8.5” LCD
HP LaserJet 5200
HP LaserJet 5200
LG 8.5” LCD
Customer Name
John / Synex
Mary / ThisCo
Invoice example – Spreadsheet Format
Unit Price
580
645
230
1103
1103
230
Customer Address
128 Juanita Ave…
123 Smith Street…
Quantity
6
4
10
1
2
5
Sales Person
Charles Wooten
Charles Wooten
Terms
COD
COD
Amount
3480
2580
2300
1103
2206
1150
Sub Total
9463
3356
Discount
0
0
Sales Tax
780.70
100
Shipping
0
0
Break into two
But…
How do we connect?
INFO20003 Database Systems © University of Melbourne 12
Invoice Number
INV0012
INV0013
Date
14-Aug-09
15-Aug-09
Customer Name
John / Synex
Mary / ThisCo
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
HPQ754.07 1
LCV890.002
Invoice example – Spreadsheet Format
Product Name
Customer Address
128 Juanita Ave…
123 Smith Street…
AMD Athlon X2DC
PDC E5300
LG 8.5” LCD
HP LaserJet 5200
HP LaserJet 5200
LG 8.5” LCD
Unit Price
580
645
230
1103
1103
230
Sales Person
Charles Wooten
Charles Wooten
Terms
COD
COD
Quantity
6
4
10
1
2
5
Sub Total
9463
3356
Amount
3480
2580
2300
1103
2206
1150
Discount
0
0
Invoice Number
INV0012
INV0012
INV0012
INV0012
INV0013
INV0013
Sales Tax
780.70
100
Shipping
0
0
Add FK
INFO20003 Database Systems © University of Melbourne 13
Invoice Number
INV0012
INV0013
Date
14-Aug-09
15-Aug-09
Customer Name
John / Synex
Mary / ThisCo
Invoice example – Spreadsheet Format
Customer Address
128 Juanita Ave…
123 Smith Street…
Product Name
Sales Person
Charles Wooten
Charles Wooten
Terms
COD
COD
Sub Total
9463
3356
Discount
0
0
Sales Tax
780.70
100
Shipping
0
0
This is about product
This is about Order (invoice)
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
HPQ754.07 1
LCV890.002
AMD Athlon X2DC
PDC E5300
LG 8.5” LCD
HP LaserJet 5200
HP LaserJet 5200
LG 8.5” LCD
Unit Price
580
645
230
1103
1103
230
Quantity
6
4
10
1
2
5
Amount
3480
2580
2300
1103
2206
1150
Invoice Number
INV0012
INV0012
INV0012
INV0012
INV0013
INV0013
INFO20003 Database Systems © University of Melbourne 14
Invoice Number
INV0012
INV0013
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
HPQ754.07 1
LCV890.00
Date
14-Aug-09
15-Aug-09
Quantity
6
4
10
1
2
5
Customer Name
John / Synex
Mary / ThisCo
Amount
3480
2580
2300
1103
2206
1150
Invoice example – Spreadsheet Format
Customer Address
128 Juanita Ave…
123 Smith Street…
Invoice Number
INV0012
INV0012
INV0012
INV0012
INV0013
INV0013
Sales Person
Charles Wooten
Charles Wooten
Terms
COD
COD
Sub Total
9463
3356
Discount
0
0
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
Sales Tax
780.70
100
Product Name
AMD Athlon X2DC
PDC E5300
LG 8.5” LCD
HP LaserJet 5200
Shipping
0
0
Break into two
Unit Price
580
645
230
1103
INFO20003 Database Systems © University of Melbourne 15
Invoice Number
INV0012
INV0013
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
HPQ754.07 1
LCV890.00
Date
14-Aug-09
15-Aug-09
Quantity
6
4
10
1
2
5
Customer Name
John / Synex
Mary / ThisCo
Amount
3480
2580
2300
1103
2206
1150
Invoice example – Spreadsheet Format
Customer Address
128 Juanita Ave…
123 Smith Street…
Invoice Number
INV0012
INV0012
INV0012
INV0012
INV0013
INV0013
Sales Person
Charles Wooten
Charles Wooten
Terms
COD
COD
Sub Total
9463
3356
Discount
0
0
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
Sales Tax
780.70
100
Product Name
AMD Athlon X2DC
PDC E5300
LG 8.5” LCD
HP LaserJet 5200
Shipping
0
0
What about amount?
Unit Price
580
645
230
1103
INFO20003 Database Systems © University of Melbourne 16
Invoice Number
INV0012
INV0013
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
HPQ754.07 1
LCV890.00
Date
14-Aug-09
15-Aug-09
Quantity
6
4
10
1
2
5
Customer Name
John / Synex
Mary / ThisCo
Invoice Number
INV0012
INV0012
INV0012
INV0012
INV0013
INV0013
Invoice example – Spreadsheet Format
Customer Address
128 Juanita Ave…
123 Smith Street…
Sales Person
Charles Wooten
Charles Wooten
Terms
COD
COD
Sub Total
9463
3356
Discount
0
0
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
Sales Tax
780.70
100
Product Name
AMD Athlon X2DC
PDC E5300
LG 8.5” LCD
HP LaserJet 5200
Shipping
0
0
What about sales person?
Could be derived
Unit Price
580
645
230
1103
INFO20003 Database Systems © University of Melbourne 17
Invoice Number
INV0012
INV0013
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
HPQ754.07 1
LCV890.00
Date
14-Aug- 09
15-Aug- 09
Quantity
6
4
10
1
2
5
Customer Name
John / Synex
Mary / ThisCo
Invoice Number
INV0012
INV0012
INV0012
INV0012
INV0013
INV0013
Invoice example – Spreadsheet Format
Customer Address
128 Juanita Ave…
123 Smith Street…
Sales Person ID
1
1
Terms
COD
COD
What about customer?
Sub Total
9463
3356
Discount
0
0
Sales Tax
780.70
100
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
Shipping
0
0
Sales Person ID
Sales Person
1
Charles Wooten
Product Name
AMD Athlon X2DC
PDC E5300
LG 8.5” LCD
HP LaserJet 5200
Unit Price
580
645
230
1103
INFO20003 Database Systems © University of Melbourne 18
Invoice Number
INV0012
INV0013
Product ID
PSV880.006
PSV880.037
LC.V890.00 2
HPQ754.07 1
HPQ754.07 1
LCV890.00
Date
14-Aug-09
15-Aug-09
Quantity
6
4
10
1
2
5
Customer ID
1
2
Invoice Number
INV0012
INV0012
INV0012
INV0012
INV0013
INV0013
Invoice example – Spreadsheet Format
Sales Person ID
1
1
Terms
COD
COD
Sub Total
9463
3356
Discount
0
0
Sales Tax
780.70
100
Shipping
0
0
Sales Person ID
Sales Person
1
Charles Wooten
Product ID
Product Name
Unit Price
PSV880.006
AMD Athlon X2DC
580
PSV880.037
PDC E5300
645
LC.V890.00 2
LG 8.5” LCD
230
HPQ754.07 1
HP LaserJet 5200
1103
Customer ID
Customer Name
Customer Address
1
John / Synex
128 Juanita Ave…
2
Mary / ThisCo
123 Smith Street…
INFO20003 Database Systems © University of Melbourne 19
Normalized Relations and ER Diagram
• We can name the relations now
– Customer (CustomerNumber, CustomerName, CustomerAddress)
– Clerk (ClerkNumber, ClerkName)
– Product (ProductNumber, ProductDescription)
– Invoice (InvoiceNumber, Date, CustomerNumber, ClerkNumber)
– InvoiceLineItem (InvoiceNumber, ProductNumber, UniPrice, Quantity)
INFO20003 Database Systems © University of Melbourne 20 —
• Now let’s go back to theoretical concepts…
INFO20003 Database Systems © University of Melbourne 21
Functional Dependency
• A functional dependency concerns values of attributes in a relation
• A set of attributes X determines another set of
attributes Y if each value of X is associated with only
one value of Y
– Written X → Y
• X determines Y (If I know X then I also know Y)
• Emp# Emp-name • Emp# Salary
INFO20003 Database Systems © University of Melbourne 22
Functional Dependency: Definitions
• Determinants (X,Y→ Z) A(X, Y, Z, D) – the attribute(s) on the left hand side of the arrow
• Key and Non-Key attributes
– each attribute is either part of the primary key or it is not
• Partial functional dependency (Y→Z)
– a functional dependency of one or more non-key attributes
upon part (but not all) of the primary key
• Transitive dependency (Z→D)
– a functional dependency between 2 (or more) non-key
attributes
INFO20003 Database Systems © University of Melbourne 23 —
Armstrong’s Axioms
Functional dependencies can be identified using Armstrong’s Axioms
𝐴 = 𝑋1,𝑋2,…,𝑋𝑛 𝑎𝑛𝑑 𝐵 = (𝑌1,𝑌2,…,𝑌𝑛) 1. Reflexivity: 𝐵⊆𝐴⟹𝐴→𝐵
Example: Student_ID, name -> name
2. Augmentation: 𝐴→𝐵⟹AC→𝐵𝐶
Example: Student_ID -> name => Student_ID, surname ->name, surname
3. Transitivity: 𝐴→𝐵andB→𝐶⟹𝐴→𝐶 Example: ID -> birthdate and birthdate -> age then ID ->age
INFO20003 Database Systems © University of Melbourne 24
Steps in Normalisation
First Normal Form
Keep atomic data/Remove repeating groups Remove partial dependencies
Remove transitive dependencies
Second Normal Form
Third Normal Form
INFO20003 Database Systems © University of Melbourne 25
First Normal Form
Remove Repeating Groups
• repeating groups of attributes cannot be represented in a flat, two dimensional table
• removing cells with multiple values (keep atomic data)
Set of values
Example: Order-Item (Order#, Customer#, (Item#, Desc, Qty))
• Order-Item (Order#, Customer#, (Item#, Desc, Qty))
• Order-Item (Order#, Item#, Desc, Qty)
• Order (Order#, Customer#)
Break them into two Use PK/FK to connect
INFO20003 Database Systems © University of Melbourne 26
Second Normal Form
Remove Partial Dependencies
a non-key attribute cannot be identified by part of a composite key
Example: Order-Item (Order#, Item#, Desc, Qty)
• Order-Item (Order#, Item#, Desc, Qty) Partial dependency
Item (Item#, Desc)
Order-Item (Order#, Item#, Qty)
INFO20003 Database Systems © University of Melbourne 27
Partial Dependency Anomalies
Order-Item (Order#, Item#, Desc, Qty)
Order#
Item#
Desc
Qty
27
873
nut
2
28
402
bolt
1
28
873
nut
10
30
495
washer
50
• UPDATE
• DELETE
• INSERT
change item desc in many places
data for last item lost when last order for that item is deleted
cannot add new item until it is ordered
INFO20003 Database Systems © University of Melbourne 28
Solution to these Anomalies
Order-Item
Order#
27
28
28
30
Item#
873
402
873
2
1
10
Qty
delete last order for item, but item
ains
Item
rem
495
50
Item#
Desc
873
nut
402
bolt
495
washer
add new item at any time
change item description in one place
INFO20003 Database Systems © University of Melbourne 29
Third Normal Form
Remove Transitive Dependencies
a non-key attribute cannot be identified by another non-key attribute
Example: Employee (Emp#, Ename, Dept#, Dname)
• Employee (Emp#, Ename, Dept#, Dname)
Transitive dependency
• Employee (Emp#, Ename, Dept#)
• Department( Dept#, Dname)
INFO20003 Database Systems © University of Melbourne 30
Transitive Dependency Anomalies
Example: Employee (Emp#, Ename, Dept#, Dname)
Emp#
Ename
Dept#
Dname
10
Smith
D5
MIS
20
Jones
D7
Finance
25
Smith
D7
Finance
30
Black
D8
Sales
• UPDATE
• DELETE
• INSERT
change dept name in many places
data for dept lost when last employee for that dept is deleted
cannot add new dept until an employee is allocated to it
INFO20003 Database Systems © University of Melbourne 31
Solution to these Anomalies
Employee
delete last emp in dept, but dept remains
Emp#
Ename
Dept#
10
Smith
D5
20
Jones
D7
25
Smith
D7
30
Black
D8
Dept#
Dname
D5
MIS
D7
Finance
D8
Sales
add new dept at any time
change dept name in one place
INFO20003 Database Systems © University of Melbourne 32
Summary of Normalisation
First Normal Form
Remove repeating groups Remove partial dependencies Remove transitive dependencies
Second Normal Form
Third Normal Form
INFO20003 Database Systems © University of Melbourne 36
Normalisation vs Denormalization
Normalisation:
– Normalised relations contains a minimum amount of redundancy and allow users to insert, modify, and delete rows in tables without errors or inconsistencies (anomalies)
Denormalization:
– The pay-off: query speed.
– The price: extra work on updates to keep redundant data
consistent.
• Denormalization may be used to improve performance of
time-critical operations.
INFO20003 Database Systems © University of Melbourne 37
What’s Examinable?
• Normalisation Process
• Anomalies
• Functional dependencies
INFO20003 Database Systems © University of Melbourne 38 —
Next Lecture
• Hands on Normalization
INFO20003 Database Systems © University of Melbourne 39 —