Slide 1
INFO20003 Database Systems
Lecture 16
Normalization
Semester 2 2018, Week 8
Dr Renata Borovica-Gajic
INFO20003 Database Systems © University of Melbourne 2018 2
Administration
• Academic integrity:
https://app.lms.unimelb.edu.au/bbcswebdav/institution/Engineering/MSE_academic_integrity/
Academic%20Integrity%20for%20MSE%20LMS.html
• I would love to get your feedback (only 2 min):
– https://www.surveymonkey.com/r/NCPZPWK
“Cheating – copying from or providing to another student or students an answer or answers to any assessment task
or essay:
• First year undergraduate student – failure of subject – Mark 0, Grade N;
• Any other student, second or subsequent offence – suspension or termination of enrolment – Mark 0, Grade N;
• Second or subsequent offence – termination of enrolment – Mark 0, Grade N. “
https://app.lms.unimelb.edu.au/bbcswebdav/institution/Engineering/MSE_academic_integrity/Academic Integrity for MSE LMS.html
WHY ARE DATABASES COOL ?
Featuring people from industry
Part 1
WHY ARE DATABASES COOL ?
Featuring people from industry
SNOWFLAKE COMPUTING, USA
INFO20003 Database Systems © University of Melbourne 2018 5
Learning Objectives
• By the end of this lecture, you should be able to:
– Define normalization
– Explain and identify database anomalies
– Define and identify functional dependencies
– Normalize relations to:
• 1st Normal Form (1NF)
• 2nd Normal Form (2NF)
• 3rd Normal Form (3NF)
INFO20003 Database Systems © University of Melbourne 2018 6
Motivation for normalization
• What happens if we don’t normalize?
INFO20003 Database Systems © University of Melbourne 2018 7
What’s wrong with the organization of
data in this table?
Student
ID#
Student
Name
Campus
Address
Degree Phone Subject
ID
Subject
Title
Lecturer
Name
A121 Joy Egbert 166 Grattan Street B.Com. 555-7771 ACC101 Accounting Davern
A121 Joy Egbert 166 Grattan Street B.Com. 555-7771 ECO101 Economics Smyth
A121 Joy Egbert 166 Grattan Street B.Com. 555-7771 ECO104 Quant. M. Collier
A121 Joy Egbert 166 Grattan Street B.Com. 555-7771 FIN101 Finance. James
A121 Joy Egbert 166 Grattan Street B.Com. 555-7771 ACC103 Processes Wise
A123 Larry Mueller 302 Royal Parade B.Com. 555-1235 ACC101 Accounting Davern
A123 Larry Mueller 302 Royal Parade B.Com. 555-1235 ECO101 Economics Smyth
A123 Larry Mueller 302 Royal Parade B.Com. 555-1235 ECO104 Quant. M. Collier
A123 Larry Mueller 302 Royal Parade B.Com. 555-1235 FIN101 Finance. James
A124 Mike Guon 224 Swanston St. B.Eco. 555-2214 ACC101 Accounting Davern
A124 Mike Guon 224 Swanston St. B.Eco. 555-2214 ECO101 Economics Smyth
Lecturer
Office
T240C
T240F
T240D
T240D
T240E
T240C
T240F
T240D
T240D
T240C
T240F
Lecturer
Phone
8344-1846
8344-1868
8344-5716
8344-5275
8344-5309
8344-1846
8344-1868
8344-5716
8344-5275
8344-1846
8344-1868
Sem.
1-11
1-11
1-11
1-11
1-11
1-11
1-11
1-11
1-11
1-11
1-11
Grade
H1
H2B
H2B
H2A
H3
H1
H2B
H2A
H3
H2A
H2A
A124 Mike Guon 224 Swanston St. B.Eco. 555-2214 ECO104 Quant. M. Collier
A124 Mike Guon 224 Swanston St. B.Eco. 555-2214 ACC103 Processes Wise
A126 Jackie Judson 85 Barry Street B.Eco. 555-1245 ACC101 Accounting Davern
A126 Jackie Judson 85 Barry Street B.Eco. 555-1245 ECO101 Economics Smyth
A126 Jackie Judson 85 Barry Street B.Eco. 555-1245 ECO104 Quant. M. Collier
A126 Jackie Judson 85 Barry Street B.Eco. 555-1245 ACC103 Processes Wise
… … … … … … … …
T240D
T240E
T240C
T240F
T240D
T240E
…
8344-5716
8344-5309
8344-1846
8344-1868
8344-5716
8344-5309
…
1-11
1-11
1-11
1-11
1-11
1-11
…
H2B
H2B
H1
H2B
H2B
H2A
…
INFO20003 Database Systems © University of Melbourne 2018 9
Anomalies in Denormalized Data:
• Consider the following
denormalized table (relation) :
• Insertion Anomaly: A new course cannot be added until at least one
student has enrolled (which comes first student or course?)
• Deletion Anomaly: If student 425 withdraws, we lose all record of course
C400 and its fee!
• Update Anomaly: If the fee for course C200 changes, we have to change
it in multiple records (rows), else the data will be inconsistent.
Student-ID Course-ID Fee
130 C200 75
200 C300 100
250 C200 75
425 C400 150
500 C300 100
575 C500 50
… … …
INFO20003 Database Systems © University of Melbourne 2018 10
Normalisation
A relation is normalized if
all determinants are
candidate keys
• A technique used to remove undesired redundancy from
databases (Break one large table into several smaller
tables).
How do we normalise?
INFO20003 Database Systems © University of Melbourne 2018 11
Invoice example
INFO20003 Database Systems © University of Melbourne 2018 12
Invoice example – Spreadsheet Format
Invoice
Number
Date Customer
Name
Customer
Address
Sales
Person
Terms Product
ID
Product
Name
Unit
Price
Quantity Amount Sub
Total
INV0012 14-Aug-
09
John /
Synex
128
Juanita
Ave…
Charles
Wooten
COD PSV880.
006
AMD
Athlon
X2DC
580 6 3480 9463
PSV880.
037
PDC
E5300
645 4 2580
LC.V890.
002
LG 8.5”
LCD
230 10 2300
HPQ754.
071
HP
LaserJet
5200
1103 1 1103
INV0013 15-Aug-
09
Mary /
ThisCo
123 Smith
Street…
Charles
Wooten
COD HP
Q754.071
HP
LaserJet
5200
1103 2 2206 3356
LCV890.
002
LG 8.5”
LCD
230 5 1150
This is not relational model
INFO20003 Database Systems © University of Melbourne 2018 13
Invoice example – Spreadsheet Format
Invoice
Number
Date Customer
Name
Customer
Address
Sales
Person
Terms Sub Total Discount Sales Tax Shipping
INV0012 14-Aug-09 John / Synex 128 Juanita
Ave…
Charles
Wooten
COD 9463 0 780.70 0
INV0013 15-Aug-09 Mary / ThisCo 123 Smith
Street…
Charles
Wooten
COD 3356 0 100 0
Product ID Product Name Unit Price Quantity Amount
PSV880.006 AMD Athlon
X2DC
580 6 3480
PSV880.037 PDC E5300 645 4 2580
LC.V890.002 LG 8.5” LCD 230 10 2300
HPQ754.071 HP LaserJet
5200
1103 1 1103
HPQ754.071 HP LaserJet
5200
1103 2 2206
LCV890.002 LG 8.5” LCD 230 5 1150
Break into two
But…
How do we connect?
INFO20003 Database Systems © University of Melbourne 2018 14
Invoice example – Spreadsheet Format
Invoice
Number
Date Customer
Name
Customer
Address
Sales
Person
Terms Sub Total Discount Sales Tax Shipping
INV0012 14-Aug-09 John / Synex 128 Juanita
Ave…
Charles
Wooten
COD 9463 0 780.70 0
INV0013 15-Aug-09 Mary / ThisCo 123 Smith
Street…
Charles
Wooten
COD 3356 0 100 0
Product ID Product
Name
Unit Price Quantity Amount Invoice
Number
PSV880.006 AMD Athlon
X2DC
580 6 3480 INV0012
PSV880.037 PDC E5300 645 4 2580 INV0012
LC.V890.00
2
LG 8.5” LCD 230 10 2300 INV0012
HPQ754.07
1
HP LaserJet
5200
1103 1 1103 INV0012
HPQ754.07
1
HP LaserJet
5200
1103 2 2206 INV0013
LCV890.002 LG 8.5” LCD 230 5 1150 INV0013
Add FK
INFO20003 Database Systems © University of Melbourne 2018 15
Invoice example – Spreadsheet Format
Invoice
Number
Date Customer
Name
Customer
Address
Sales
Person
Terms Sub Total Discount Sales Tax Shipping
INV0012 14-Aug-09 John / Synex 128 Juanita
Ave…
Charles
Wooten
COD 9463 0 780.70 0
INV0013 15-Aug-09 Mary / ThisCo 123 Smith
Street…
Charles
Wooten
COD 3356 0 100 0
Product ID Product
Name
Unit Price Quantity Amount Invoice
Number
PSV880.006 AMD Athlon
X2DC
580 6 3480 INV0012
PSV880.037 PDC E5300 645 4 2580 INV0012
LC.V890.00
2
LG 8.5” LCD 230 10 2300 INV0012
HPQ754.07
1
HP LaserJet
5200
1103 1 1103 INV0012
HPQ754.07
1
HP LaserJet
5200
1103 2 2206 INV0013
LCV890.002 LG 8.5” LCD 230 5 1150 INV0013
This is about
product
This is about
Order (invoice)
INFO20003 Database Systems © University of Melbourne 2018 16
Invoice example – Spreadsheet Format
Invoice
Number
Date Customer
Name
Customer
Address
Sales
Person
Terms Sub Total Discount Sales Tax Shipping
INV0012 14-Aug-09 John / Synex 128 Juanita
Ave…
Charles
Wooten
COD 9463 0 780.70 0
INV0013 15-Aug-09 Mary / ThisCo 123 Smith
Street…
Charles
Wooten
COD 3356 0 100 0
Product ID Quantity Amount Invoice
Number
PSV880.006 6 3480 INV0012
PSV880.037 4 2580 INV0012
LC.V890.00
2
10 2300 INV0012
HPQ754.07
1
1 1103 INV0012
HPQ754.07
1
2 2206 INV0013
LCV890.00 5 1150 INV0013
Break into two
Product ID Product
Name
Unit Price
PSV880.006 AMD Athlon
X2DC
580
PSV880.037 PDC E5300 645
LC.V890.00
2
LG 8.5” LCD 230
HPQ754.07
1
HP LaserJet
5200
1103
INFO20003 Database Systems © University of Melbourne 2018 17
Invoice example – Spreadsheet Format
Invoice
Number
Date Customer
Name
Customer
Address
Sales
Person
Terms Sub Total Discount Sales Tax Shipping
INV0012 14-Aug-09 John / Synex 128 Juanita
Ave…
Charles
Wooten
COD 9463 0 780.70 0
INV0013 15-Aug-09 Mary / ThisCo 123 Smith
Street…
Charles
Wooten
COD 3356 0 100 0
Product ID Quantity Amount Invoice
Number
PSV880.006 6 3480 INV0012
PSV880.037 4 2580 INV0012
LC.V890.00
2
10 2300 INV0012
HPQ754.07
1
1 1103 INV0012
HPQ754.07
1
2 2206 INV0013
LCV890.00 5 1150 INV0013
Product ID Product
Name
Unit Price
PSV880.006 AMD Athlon
X2DC
580
PSV880.037 PDC E5300 645
LC.V890.00
2
LG 8.5” LCD 230
HPQ754.07
1
HP LaserJet
5200
1103
What about amount?
INFO20003 Database Systems © University of Melbourne 2018 18
Invoice example – Spreadsheet Format
Invoice
Number
Date Customer
Name
Customer
Address
Sales
Person
Terms Sub Total Discount Sales Tax Shipping
INV0012 14-Aug-09 John / Synex 128 Juanita
Ave…
Charles
Wooten
COD 9463 0 780.70 0
INV0013 15-Aug-09 Mary / ThisCo 123 Smith
Street…
Charles
Wooten
COD 3356 0 100 0
Product ID Quantity Invoice
Number
PSV880.006 6 INV0012
PSV880.037 4 INV0012
LC.V890.00
2
10 INV0012
HPQ754.07
1
1 INV0012
HPQ754.07
1
2 INV0013
LCV890.00 5 INV0013
Product ID Product
Name
Unit Price
PSV880.006 AMD Athlon
X2DC
580
PSV880.037 PDC E5300 645
LC.V890.00
2
LG 8.5” LCD 230
HPQ754.07
1
HP LaserJet
5200
1103
Could be derived
What about sales person?
INFO20003 Database Systems © University of Melbourne 2018 19
Invoice example – Spreadsheet Format
Invoice
Number
Date Customer
Name
Customer
Address
Sales
Person ID
Terms Sub Total Discount Sales Tax Shipping
INV0012 14-Aug-
09
John /
Synex
128 Juanita
Ave…
1 COD 9463 0 780.70 0
INV0013 15-Aug-
09
Mary /
ThisCo
123 Smith
Street…
1 COD 3356 0 100 0
Product ID Quantity Invoice
Number
PSV880.006 6 INV0012
PSV880.037 4 INV0012
LC.V890.00
2
10 INV0012
HPQ754.07
1
1 INV0012
HPQ754.07
1
2 INV0013
LCV890.00 5 INV0013
Product ID Product
Name
Unit Price
PSV880.006 AMD Athlon
X2DC
580
PSV880.037 PDC E5300 645
LC.V890.00
2
LG 8.5” LCD 230
HPQ754.07
1
HP LaserJet
5200
1103
Sales
Person ID
Sales
Person
1 Charles
Wooten
What about customer?
INFO20003 Database Systems © University of Melbourne 2018 20
Invoice example – Spreadsheet Format
Invoice
Number
Date Customer
ID
Sales Person
ID
Terms Sub Total Discount Sales Tax Shipping
INV0012 14-Aug-09 1 1 COD 9463 0 780.70 0
INV0013 15-Aug-09 2 1 COD 3356 0 100 0
Product ID Quantity Invoice
Number
PSV880.006 6 INV0012
PSV880.037 4 INV0012
LC.V890.00
2
10 INV0012
HPQ754.07
1
1 INV0012
HPQ754.07
1
2 INV0013
LCV890.00 5 INV0013
Product ID Product
Name
Unit Price
PSV880.006 AMD Athlon
X2DC
580
PSV880.037 PDC E5300 645
LC.V890.00
2
LG 8.5” LCD 230
HPQ754.07
1
HP LaserJet
5200
1103
Sales
Person ID
Sales
Person
1 Charles
Wooten
Customer
ID
Customer
Name
Customer
Address
1 John /
Synex
128 Juanita
Ave…
2 Mary /
ThisCo
123 Smith
Street…
INFO20003 Database Systems © University of Melbourne 2018 21
Normalized Relations and ER Diagram
– –
• We can name the relations now
– Customer (CustomerNumber, CustomerName, CustomerAddress)
– Clerk (ClerkNumber, ClerkName)
– Product (ProductNumber, ProductDescription)
– Invoice (InvoiceNumber, Date, CustomerNumber, ClerkNumber)
– InvoiceLineItem (InvoiceNumber, ProductNumber, UniPrice, Quantity)
INFO20003 Database Systems © University of Melbourne 2018 22
• Now let’s go back to theoretical concepts…
INFO20003 Database Systems © University of Melbourne 2018 23
Functional Dependency
• A functional dependency concerns values of
attributes in a relation
• A set of attributes X determines another set of
attributes Y if each value of X is associated with only
one value of Y
– Written X → Y
• X determines Y (If I know X then I also know Y)
• Emp# Emp-name
• Emp# Salary
INFO20003 Database Systems © University of Melbourne 2018 24
Functional Dependency: Definitions
– –
• Determinants (X,Y Z)
– the attribute(s) on the left hand side of the arrow
• Key and Non-Key attributes
– each attribute is either part of the primary key or it is not
• Partial functional dependency (Y Z)
– a functional dependency of one or more non-key attributes
upon part (but not all) of the primary key
• Transitive dependency (Z D)
– a functional dependency between 2 (or more) non-key
attributes
A(X, Y, Z, D)
INFO20003 Database Systems © University of Melbourne 2018 25
Armstrong’s Axioms
Functional dependencies can be identified using
Armstrong’s Axioms
1. Reflexivity:
2. Augmentation:
3. Transitivity:
𝐴 = 𝑋1, 𝑋2, … , 𝑋𝑛 𝑎𝑛𝑑 𝐵 = (𝑌1, 𝑌2, … , 𝑌𝑛)
𝐴 → 𝐵 ⟹ AC → 𝐵𝐶
𝐴 → 𝐵 and B → 𝐶 ⟹ 𝐴 → 𝐶
𝐵 ⊆ 𝐴 ⟹ 𝐴 → 𝐵
Example: Student_ID, name -> name
Example: Student_ID -> name => Student_ID, surname ->name, surname
Example: ID -> birthdate and birthdate -> age then ID ->age
INFO20003 Database Systems © University of Melbourne 2018 26
Steps in Normalisation
First Normal
Form
Second
Normal Form
Third Normal
Form
Keep atomic data/Remove repeating groups
Remove partial dependencies
Remove transitive dependencies
INFO20003 Database Systems © University of Melbourne 2018 27
First Normal Form
Remove Repeating Groups
• repeating groups of attributes cannot be represented in a
flat, two dimensional table
• removing cells with multiple values (keep atomic data)
• Order-Item (Order#, Customer#, (Item#, Desc, Qty))
• Order-Item (Order#, Item#, Desc, Qty)
• Order (Order#, Customer#)
Example: Order-Item (Order#, Customer#, (Item#, Desc, Qty))
Set of values
Break them into two
Use PK/FK to connect
INFO20003 Database Systems © University of Melbourne 2018 28
Second Normal Form
Remove Partial Dependencies
a non-key attribute cannot be identified
by part of a composite key
• Order-Item (Order#, Item#, Desc, Qty)
Example: Order-Item (Order#, Item#, Desc, Qty)
Partial dependency
Order-Item (Order#, Item#, Qty)
Item (Item#, Desc)
INFO20003 Database Systems © University of Melbourne 2018 29
Partial Dependency Anomalies
Order# Item# Desc Qty
27
28
28
873
402
873
nut
bolt
nut
2
1
10
Order-Item (Order#, Item#, Desc, Qty)
30 495 washer 50
• UPDATE change item desc in many places
• DELETE data for last item lost when last order for
that item is deleted
• INSERT cannot add new item until it is ordered
INFO20003 Database Systems © University of Melbourne 2018 30
Solution to these Anomalies
Order# Item# Qty
27
28
28
873
402
873
2
1
10
Order-Item
30 495 50
Item# Desc
873
402
nut
bolt
495 washer
delete last
order for item,
but item
remains
add new item at any time
change item
description in one
place
Item
INFO20003 Database Systems © University of Melbourne 2018 31
Third Normal Form
Remove Transitive Dependencies
a non-key attribute cannot be identified
by another non-key attribute
• Employee (Emp#, Ename, Dept#, Dname)
• Employee (Emp#, Ename, Dept#)
• Department( Dept#, Dname)
Example: Employee (Emp#, Ename, Dept#, Dname)
Transitive dependency
INFO20003 Database Systems © University of Melbourne 2018 32
Transitive Dependency Anomalies
Emp# Ename Dept# Dname
10
20
25
Smith
Jones
Smith
D5
D7
D7
MIS
Finance
Finance
Example: Employee (Emp#, Ename, Dept#, Dname)
30 Black D8 Sales
• UPDATE change dept name in many places
• DELETE data for dept lost when last employee
for that dept is deleted
• INSERT cannot add new dept until an
employee is allocated to it
INFO20003 Database Systems © University of Melbourne 2018 33
Solution to these Anomalies
delete last emp in
dept, but dept
remainsEmp# Ename Dept#
10
20
25
Smith
Jones
Smith
D5
D7
D7
Employee
30 Black D8
Dept# Dname
D5
D7
MIS
Finance
D8 Sales
add new dept at any time
change dept
name in one
place
INFO20003 Database Systems © University of Melbourne 2018 37
Summary of Normalisation
First Normal
Form
Second
Normal Form
Third Normal
Form
Remove repeating groups
Remove partial dependencies
Remove transitive dependencies
INFO20003 Database Systems © University of Melbourne 2018 38
Normalisation vs Denormalization
Normalisation:
– Normalised relations contains a minimum amount of
redundancy and allow users to insert, modify, and delete
rows in tables without errors or inconsistencies
(anomalies)
Denormalization:
– The pay-off: query speed.
– The price: extra work on updates to keep redundant data
consistent.
• Denormalization may be used to improve performance of
time-critical operations.
INFO20003 Database Systems © University of Melbourne 2018 39
What’s Examinable?
– –
• Normalisation Process
• Anomalies
• Functional dependencies
• Denormalisation
INFO20003 Database Systems © University of Melbourne 2018 40
Next Lecture
– –
• Database Admin