程序代写代做代考 Normalisation

Normalisation

MONASH
INFORMATION
TECHNOLOGY

FIT2094

2

Data Normalisation
▪ Relations should be normalised in order to avoid anomalies

which may occur when inserting, updating and deleting data
– operates at the LOGICAL level.
▪ Normalisation is a systematic series of steps for

progressively refining the data model.
▪ A formal approach to analysing relations based on their

primary key (or candidate keys) and functional
dependencies.
▪ Used:

▪ as a design technique “bottom up design”, and
▪ as a way of validating table structures produced via

“top down design” (ER modelling)

3

Sample Data

4

Problems with data in Figure 6.1
▪PROJ_NUM intended to be primary key, but it contains nulls
▪JOB_CLASS invites entry errors eg. Elec. Eng. vs Elect. Engineer vs E.E.
▪Table has redundant data

– Details of a charge per hour are repeated for every occurrence of job
class

– Every time an employee is assigned to a project emp name repeated
▪Relations that contain redundant information may potentially suffer from
several update anomalies

– Types of update anomalies include:
• Insert Anomaly

–Insert a new employee only if they are assigned to a project
• Delete Anomaly

–Delete the last employee assigned to a project?
–Delete the last employee of a particular job class?

• Modification (or update) Anomaly
–Update a job class hourly rate – need to update multiple rows

5

The Normalisation Process Goals
▪ Creating valid relations, i.e. each relation meets the

properties of the relational model. In particular:
– Entity integrity
– Referential integrity
– No many-to-many relationship
– Each cell contains a single value (is atomic).

▪ In practical terms:
– Each table represents a single subject
– No data item will be unnecessarily stored in more than one table.
– The relationship between tables can be established (pair of PK

and FK is identified).
– Each table is void of insert, update and delete anomalies.

6

ORDER ( orderno, orderdate, custnumb, custname, custaddress
(partno, partdesc, qtyordered, lineprice))
● Note this is not a relation
● (partno, partdesc, qtyordered, lineprice) – is a multivalued set of

attributes – called a repeating group in normalisation terminology

Representing a form as a relation

7

Functional Dependency Revisited

▪ An attribute B is FUNCTIONALLY DEPENDENT on another attribute A, if a value of A
determines a single value of B at any one time.

– A ➔ B
– EMP# ➔ EMP_NAME
– CUSTNUMB ➔ CUSTNAME
– ORDER-NUMBER ➔ ORDER-DATE

• ORDER-NUMBER – independent variable, also know as the
DETERMINANT

• ORDER-DATE – dependent variable
▪ TOTAL DEPENDENCY

– attribute A determines B AND attribute B determines A
• EMPLOYEE-NUMBER ➔ TAX-FILE-NUMBER
• TAX-FILE-NUMBER ➔ EMPLOYEE-NUMBER

8

Functional Dependency
▪ For a composite PRIMARY KEY, it is possible to have FULL or PARTIAL

dependency.
▪ FULL DEPENDENCY

– occurs when an attribute is always dependent on all attributes in the
composite PK

– ORDER-NUMBER, PART-NUMBER ➔ QTY-ORDERED

▪ Lack of full dependency for multiple attribute key = PARTIAL DEPENDENCY
– ORDER-NUMBER, PART-NUMBER

➔ QTY-ORDERED, PART-DESCRIPTION
– here although qty-ordered is fully dependent on order-number and

part-number, only part-number is required to determine part-description
– part-description is said to be partially dependent on order-number and

part-number

9

Functional Dependency
▪ TRANSITIVE DEPENDENCY

– occurs when Y depends on X, and Z depends on Y – thus Z also
depends on X ie. X ➔ Y ➔ Z

– and Y is not a candidate key (or part of a candidate key)
– ORDER-NUMB ➔ CUSTOMER-NUMB ➔ CUSTOMER-NAME

▪ Dependencies are depicted with the help of a Dependency Diagram.
▪ Normalisation converts a relation into relations of progressively smaller

number of attributes and tuples until an optimum level of decomposition
is reached – little or no data redundancy exists.

▪ The output from normalisation is a set of relations that meet all
conditions set in the relational model principles.

10

11

Unormalised Form (UNF)
▪ Identify a “subject” that needs to be modelled

– For example from figure 6.1 possible “subjects” of
interest:

• PROJECT (we will call this representation 1)
• EMPLOYEE_PROJECT_ASSIGNMENT (we will

abbreviate this as ASSIGNMENT and will call this
representation 2).

▪ Choose one subject of interest as a starting point and
identify a primary key for this subject of interest.

– For example for PROJECT, primary key would be
project _number (or we will abbreviate it as proj_num).

12

First Normal Form
▪FIRST NORMAL FORM (part of formal definition of a
relation)

– A RELATION IS IN FIRST NORMAL FORM (1NF) IF:
• a unique primary key has been identified for each
tuple/row.

• it is a valid relation
– Entity integrity (no part of PK is null)
– Single value for each cell.
– No repeating group.

• all attributes are functionally dependent on all or
part of the primary key

13

UNF to 1NF transformation

▪ Identify the repeating group(s), if any, in the
unnormalised relation.

– For representation 1, a project will have more
than one employee assigned to it, hence there
is a repeating group.

– We have one-to-many relationship from
PROJECT to EMPLOYEE.

16

UNF to 1NF
▪Move from UNF to 1NF by:

1. identify a unique identifier for the repeating
group.

2. remove the repeating group along with the PK
of the main relation.

3. The PK of the new relation resulting from the
removal of repeating group will normally have
a composite PK made up of the PK of the main
relation and the unique identifier chosen in 1.
above, but this must be checked.

19

1NF to 2NF

▪ A RELATION IS IN 2NF IF –
– all non key attributes are functionally dependent on

the entire primary key (simplified definition)
• i.e. no partial dependencies exist

– all non key attributes are functionally dependent on
any candidate key (general definition)

– for this unit we will only use the simplified definition
ie. look for partial dependencies based on the
primary key

29

2NF to 3NF
▪ A RELATION IS IN 3NF IF –

– all transitive dependencies have been removed
– check for non key attribute dependent on
another non key attribute

▪Move from 2NF to 3NF by removing transitive
dependencies

37

Monash Software EMPLOYEE form
▪ List all attributes found on the form, maintain consistency

with previously used attribute names if exist:
– emp_no, emp_fname, emp_lname, emp_dob,

emp_street_no, emp_street, emp_town, emp_pcode,
phone_type, phone_no, degree_name,
degree_institution, degree_year, fmemb_no,
fmemb_name, fmemb_dob, skill_name

▪ Determine if any attribute is multivalued for a given entity
instance
– phone_type, phone_no, degree_name,

degree_institution, degree_year, fmemb_no,
fmemb_name, fmemb_dob, skill_name

46

Summary
▪ Things to remember

– Primary Key selection in moving from UNF to 1NF is
important, it will determine the starting point (choose
your subject of interest).

– Functional dependency
– Process of removing attributes in relations based on

the concept of 1NF, 2NF and 3NF.
• UNF to 1NF define PK & remove repeating group.
• 1NF to 2NF remove partial dependency.
• 2NF to 3NF remove transitive dependency.