程序代写代做代考 Normalisation

Normalisation

FIT2094

MONASH
INFORMATION
TECHNOLOGY

2

Data Normalisation
▪ Relations should be normalised in order to avoid anomalies

which may occur when inserting, updating and deleting data
– operates at the LOGICAL level.
▪ Normalisation is a systematic series of steps for

progressively refining the data model.
▪ A formal approach to analysing relations based on their

primary key (or candidate keys) and functional
dependencies.
▪ Used:

▪ as a design technique “bottom up design”, and
▪ as a way of validating table structures produced via

“top down design” (ER modelling)

3

Sample Data

4

Problems with data in Figure 6.1
▪PROJ_NUM intended to be primary key, but it contains nulls
▪JOB_CLASS invites entry errors eg. Elec. Eng. vs Elect. Engineer vs E.E.
▪Table has redundant data

– Details of a charge per hour are repeated for every occurrence of job
class

– Every time an employee is assigned to a project emp name repeated
▪Relations that contain redundant information may potentially suffer from
several update anomalies

– Types of update anomalies include:
• Insert Anomaly

–Insert a new employee only if they are assigned to a project
• Delete Anomaly

–Delete the last employee assigned to a project?
–Delete the last employee of a particular job class?

• Modification (or update) Anomaly
–Update a job class hourly rate – need to update multiple rows

5

The Normalisation Process Goals
▪ Creating valid relations, i.e. each relation meets the

properties of the relational model. In particular:
– Entity integrity
– Referential integrity
– No many-to-many relationship
– Each cell contains a single value (is atomic).

▪ In practical terms:
– Each table represents a single subject
– No data item will be unnecessarily stored in more than one table.
– The relationship between tables can be established (pair of PK

and FK is identified).
– Each table is void of insert, update and delete anomalies.

6

ORDER ( orderno, orderdate, custnumb, custname, custaddress
(partno, partdesc, qtyordered, lineprice))
● Note this is not a relation
● (partno, partdesc, qtyordered, lineprice) – is a multivalued set of

attributes – called a repeating group in normalisation terminology

Representing a form as a relation

7

Functional Dependency Revisited

▪ An attribute B is FUNCTIONALLY DEPENDENT on another attribute A, if a value of A
determines a single value of B at any one time.

– A ➔ B
– EMP# ➔ EMP_NAME
– CUSTNUMB ➔ CUSTNAME
– ORDER-NUMBER ➔ ORDER-DATE

• ORDER-NUMBER – independent variable, also know as the
DETERMINANT

• ORDER-DATE – dependent variable
▪ TOTAL DEPENDENCY

– attribute A determines B AND attribute B determines A
• EMPLOYEE-NUMBER ➔ TAX-FILE-NUMBER
• TAX-FILE-NUMBER ➔ EMPLOYEE-NUMBER

8

Functional Dependency
▪ For a composite PRIMARY KEY, it is possible to have FULL or PARTIAL

dependency.
▪ FULL DEPENDENCY

– occurs when an attribute is always dependent on all attributes in the
composite PK

– ORDER-NUMBER, PART-NUMBER ➔ QTY-ORDERED

▪ Lack of full dependency for multiple attribute key = PARTIAL DEPENDENCY
– ORDER-NUMBER, PART-NUMBER

➔ QTY-ORDERED, PART-DESCRIPTION
– here although qty-ordered is fully dependent on order-number and

part-number, only part-number is required to determine part-description
– part-description is said to be partially dependent on order-number and

part-number

9

Functional Dependency
▪ TRANSITIVE DEPENDENCY

– occurs when Y depends on X, and Z depends on Y – thus Z also
depends on X ie. X ➔ Y ➔ Z

– and Y is not a candidate key (or part of a candidate key)
– ORDER-NUMB ➔ CUSTOMER-NUMB ➔ CUSTOMER-NAME

▪ Dependencies are depicted with the help of a Dependency Diagram.

▪ Normalisation converts a relation into relations of progressively smaller
number of attributes and tuples until an optimum level of decomposition
is reached – little or no data redundancy exists.

▪ The output from normalisation is a set of relations that meet all
conditions set in the relational model principles.

10

11

Unormalised Form (UNF)
▪ Identify a “subject” that needs to be modelled

– For example from figure 6.1 possible “subjects” of
interest:

• PROJECT (we will call this representation 1)
• EMPLOYEE_PROJECT_ASSIGNMENT (we will

abbreviate this as ASSIGNMENT and will call this
representation 2).

▪ Choose one subject of interest as a starting point and
identify a primary key for this subject of interest.

– For example for PROJECT, primary key would be
project _number (or we will abbreviate it as proj_num).

12

First Normal Form
▪FIRST NORMAL FORM (part of formal definition of a
relation)

– A RELATION IS IN FIRST NORMAL FORM (1NF) IF:
• a unique primary key has been identified for each
tuple/row.

• it is a valid relation
– Entity integrity (no part of PK is null)
– Single value for each cell.
– No repeating group.

• all attributes are functionally dependent on all or
part of the primary key

13

▪ Identify the repeating group(s), if any, in the
unnormalised relation.

– For representation 1, a project will have more
than one employee assigned to it, hence there
is a repeating group.

– We have one-to-many relationship from
PROJECT to EMPLOYEE.

14

Q1. Is there any repeating group for the
ASSIGNMENT relation?

A. Yes (what is it)?
B. No

15

▪ Representation 1:

– PROJECT (proj_num, proj_name (emp_num, emp_name, ….))

▪ Representation 2:

– ASSIGNMENT(proj_num, emp_num, proj_name, emp_name,

job_class, …)

16

UNF to 1NF
▪Move from UNF to 1NF by:

identify a unique identifier for the repeating
group.
remove the repeating group along with the PK
of the main relation.
The PK of the new relation resulting from the
removal of repeating group will normally have
a composite PK made up of the PK of the main
relation and the unique identifier chosen in 1.
above, but this must be checked.

17

UNF to 1NF (representation 1)
PROJECT (proj_num, proj_name)

ASSIGNMENT (proj_num, emp_num, emp_name,
job_class, chg_hour, assign_hours)

18

Q2. Assume that we have started with ASSIGNMENT as the
chosen “subject” (representation 2) in UNF and its UNF is in
the form of:

A. Two relations
– PROJECT (proj_num, proj_name) and
– ASSIGNMENT (proj_num, emp_num, emp_name,
job_class, chg_hour, assign_hours)

B. ASSIGNMENT (proj_num, emp_num, proj_name, emp_name,
job_class, chg_hour, assign_hours)

C. PROJECT (proj_num, proj_name)
D. ASSIGNMENT (proj_num, emp_num, job_class, proj_name,

emp_name, chg_hour, assign_hours)

19

1NF to 2NF

▪ A RELATION IS IN 2NF IF –
– all non key attributes are functionally dependent on

the entire primary key (simplified definition)
• i.e. no partial dependencies exist

– all non key attributes are functionally dependent on
any candidate key (general definition)

– for this unit we will only use the simplified definition
ie. look for partial dependencies based on the
primary key

20

Q3. Which of the following attributes has a partial
dependency in the relation ASSIGNMENT?:

A. proj_name
B. emp_name
C. job_class
D. chg_hrs
E. assign_hours
F. More than one option is correct.

21

Q4. Which of the following attributes has a transitive
dependency in the relation ASSIGNMENT?:

A. proj_name
B. emp_name
C. job_class
D. chg_hrs
E. assign_hours
F. More than one option is correct.

22

Representation 2: Dependency Diagram (1NF)

Alternative notation:
proj_no, emp_no -> hours FULL
proj_no -> proj_name PARTIAL
emp_no -> emp_name, job_class PARTIAL
job_class -> chr_hr TRANSITIVE

FULL
proj_no, emp_no -> hours

23

1NF to 2NF Representation 1
▪ Move from 1NF to 2NF by removing partial dependencies

– 1NF: PROJECT (proj_num, proj_name)

– 1NF: ASSIGNMENT (proj_num, emp_num,

emp_name, job_class, chg_hour, assign_hours)

▪ 1NF: PROJECT (proj_num, proj_name)

– already in 2NF only one attribute in PK thus CANNOT

be any partial dependencies

• 2NF: PROJECT (proj_num, proj_name)

24

1NF to 2NF (Representation 1)
▪ 1NF: ASSIGNMENT (proj_num, emp_num, emp_name, job_class,

chg_hour, assign_hours)
– emp_name, job_class and depend on emp_num (partial

dependency), chg_hour depends on job_class (transitive).
– These attributes need to be removed into a new relation.
– Also include the part of PK that causes the partial dependency to

be the PK of the new relation.
• 2NF EMPLOYEE (emp_num, emp_name, job_class,

chg_hour)
– It leaves ASSIGN relation with the remaining attributes.

• 2NF ASSIGNMENT (proj_num, emp_num, assign_hours)

25

2NF Representation 1
PROJECT (proj_num, proj_name)

EMPLOYEE (emp_num, emp_name, job_class,
chg_hour)

ASSIGNMENT (proj_num, emp_num,
assign_hours)

26

Q5. Where is the location of the FK for the relations
below?

EMPLOYEE (emp_num, emp_name, job_class, chg_hour)
ASSIGNMENT (proj_num, emp_num, assign_hours)
PROJECT (proj_num, proj_name)

A. EMPLOYEE
B. ASSIGNMENT
C. PROJECT
D. More than one answer is correct

27

2NF Conversion Results (Representations 1 & 2)

Note Representations 1 & 2 now equivalent

28

Q6. What type of relationship is the relationship between:
ASSIGNMENT and EMPLOYEE
and
ASSIGNMENT and PROJECT

EMPLOYEE (emp_num, emp_name, job_class, chg_hour)
ASSIGNMENT (proj_num, emp_num, assign_hours)

PROJECT (proj_num, proj_name)

A. non-identifying, non-identifying
B. identifying, identifying
C. identifying, non-identifying
D. non-identifying, identifying

29

2NF to 3NF
▪ A RELATION IS IN 3NF IF –

– all transitive dependencies have been removed
– check for non key attribute dependent on
another non key attribute

▪Move from 2NF to 3NF by removing transitive
dependencies

30

2NF to 3NF
▪ PROJECT and ASSIGN already in 3NF

– 3NF PROJECT (proj_num, proj_name)
– 3NF ASSIGNMENT (proj_num, emp_num, assign_hours)

▪ 2NF EMPLOYEE (emp_num, emp_name, job_class, chg_hour)
– It has transitive dependency, job_class-> chg_hour.

• Remove the attributes with transitive dependency into a
new relation.

• The determinant will be an attribute in both the original
and new relations (it will become the PK and FK
relationship)

• Assign the determinant to be the PK of the new relation.

31

2NF to 3NF
▪ After the removal of transitive dependency in

EMPLOYEE, we have:
– 3NF EMPLOYEE (emp_num, emp_name,

job_class)
– 3NF JOB (job_class, chg_hour)

32

Relations in 3NF

33

Q7. Where is the location of the FK for the relations
below?

EMPLOYEE (emp_num, emp_name, job_class)
JOB (job_class, chg_hour)

A. EMPLOYEE
B. JOB
C. Both EMPLOYEE and JOB

34

Q8. What type of relationship is the relationship
between the JOB and EMPLOYEE?

EMPLOYEE (emp_num, emp_name, job_class)
JOB (job_class, chg_hour)

A. non-identifying
B. identifying
C. Cannot be determined

35

Entire Process UNF to 3NF (representation 1)
▪ UNF

– PROJECT (proj_num, proj_name (emp_num, emp_name, job_class,
chg_hour, assign_hours))

▪ 1NF – remove repeating group
– PROJECT (proj_num, proj_name)
– ASSIGNMENT (proj_num, emp_num, emp_name, job_class, chg_hour,

assign_hours)
▪ 2NF – remove partial dependencies

– PROJECT (proj_num, proj_name)
– EMPLOYEE (emp_num, emp_name, job_class, chg_hour)
– ASSIGNMENT (proj_num, emp_num, assign_hours)

▪ 3NF – remove transitive dependencies
– PROJECT (proj_num, proj_name)
– EMPLOYEE (emp_num, emp_name, job_class)
– ASSIGNMENT (proj_num, emp_num, assign_hours)
– JOB (job_class, chg_hour)

36

Entire Process UNF to 3NF (representation 2)
▪ UNF

– ASSIGNMENT (proj_num,emp_num ,proj_name, emp_name, job_class,
chg_hour, assign_hours)

▪ 1NF – remove repeating group
– ASSIGNMENT (proj_num,emp_num ,proj_name, emp_name, job_class,

chg_hour, assign_hours) => 1NF is only identify PK, no repeating group.
▪ 2NF – remove partial dependencies

– ASSIGNMENT (proj_num, emp_num, assign_hours)
– PROJECT (proj_num, proj_name)
– EMPLOYEE (emp_num, emp_name, job_class, chg_hour)

▪ 3NF – remove transitive dependencies
– ASSIGNMENT (proj_num, emp_num, assign_hours)
– PROJECT (proj_num, proj_name)
– EMPLOYEE (emp_num, emp_name, job_class)
– JOB (job_class, chg_hour)

37

Monash Software EMPLOYEE form
▪ List all attributes found on the form, maintain consistency

with previously used attribute names if exist:
– emp_no, emp_fname, emp_lname, emp_dob,

emp_street_no, emp_street, emp_town, emp_pcode,
phone_type, phone_no, degree_name,
degree_institution, degree_year, fmemb_no,
fmemb_name, fmemb_dob, skill_name

▪ Determine if any attribute is multivalued for a given entity
instance
– phone_type, phone_no, degree_name,

degree_institution, degree_year, fmemb_no,
fmemb_name, fmemb_dob, skill_name

38

Monash Software EMPLOYEE form continued
▪ Group multivalued attributes that are related and place in brackets

EMPLOYEE (emp_no, emp_fname, emp_lname, emp_dob,
emp_street_no, emp_street, emp_town, emp_pcode, (phone_type,
phone_no), (degree_name, degree_institution, degree_year),
(fmemb_no, fmemb_name, fmemb_dob), (skill_name))

▪ This is our beginning UNF
– PK of main relation EMPLOYEE is emp_no
– Four repeating groups to remove

39

Monash Software EMPLOYEE form continued
EMPLOYEE (emp_no, emp_fname, emp_lname, emp_dob,
emp_street_no, emp_street, emp_town, emp_pcode, (phone_type,
phone_no), (degree_name, degree_institution, degree_year),
(fmemb_no, fmemb_name, fmemb_dob), (skill_name))
1NF
EMPLOYEE (emp_no, emp_fname, emp_lname, emp_dob,
emp_street_no, emp_street, emp_town, emp_pcode)
EMP_PHONE (emp_no, phone_no, phone_type)
EMP_QUALIFICATION (emp_no, degree_name, degree_institution,
degree_year)
FAMILY_MEMBER (emp_no, fmemb_no, fmemb_name, fmemb_dob)
EMPLOYEE_SKILL (emp_no, skill_name)

40

Monash Software EMPLOYEE form continued
2NF

There are no partial dependencies, relations are in 2NF

EMPLOYEE (emp_no, emp_fname, emp_lname, emp_dob,
emp_street_no, emp_street, emp_town, emp_pcode)
EMP_PHONE (emp_no, phone_no, phone_type)
EMP_QUALIFICATION (emp_no, degree_name, degree_institution,
degree_year)
FAMILY_MEMBER (emp_no, fmemb_no, fmemb_name, fmemb_dob)
EMPLOYEE_SKILL (emp_no, skill_name)

41

Monash Software EMPLOYEE form continued
3NF

There are no transitive dependencies, relations are in 3NF

EMPLOYEE (emp_no, emp_fname, emp_lname, emp_dob,
emp_street_no, emp_street, emp_town, emp_pcode)
EMP_PHONE (emp_no, phone_no, phone_type)
EMP_QUALIFICATION (emp_no, degree_name, degree_institution,
degree_year)
FAMILY_MEMBER (emp_no, fmemb_no, fmemb_name, fmemb_dob)
EMPLOYEE_SKILL (emp_no, skill_name)

42

Consolidate 3NF relations and Logical Model

▪ Check to see whether a new relation has been discovered from the
normalisation

▪ Check to see whether a new attribute has been discovered from the
normalisation

▪ Check whether FKs are correctly identified
▪ Update Logical Model

– Reflect changes back to Conceptual Model

43

44

Monash Software EMPLOYEE form continued
As we are operating at the logical level, surrogate keys may be added
AFTER normalisation has been completed to improve design:

EMPLOYEE (emp_no, emp_fname, emp_lname, emp_dob, emp_street_no,
emp_street, emp_town, emp_pcode)
EMP_PHONE (emp_no, phone_no, phone_type)
EMP_DEGREE (emp_no, degree_code, degree_year)
DEGREE (degree_code, degree_name, institution_code)
INSTITUTION (institution_code, institution_name)
FAMILY_MEMBER (emp_no, fmemb_no, fmemb_name, fmemb_dob)
EMPLOYEE_SKILL (emp_no, skill_code)
SKILL (skill_code, skill_name)
These surrogate keys must not be reflected on your conceptual model

45

* Note – labels and legend have
been omitted for clarity
– they must be shown in your
work

46

Summary
▪ Things to remember

– Primary Key selection in moving from UNF to 1NF is
important, it will determine the starting point (choose
your subject of interest).

– Functional dependency
– Process of removing attributes in relations based on

the concept of 1NF, 2NF and 3NF.
• UNF to 1NF define PK & remove repeating group.
• 1NF to 2NF remove partial dependency.
• 2NF to 3NF remove transitive dependency.