CS计算机代考程序代写 database algorithm Normalisation – Part 3

Normalisation – Part 3

Summary and Discussion

Summary of Normal Forms

1NF, 3NF and BCNF are popular in practice. Other normal forms are rarely
used.

1NF: only atomic values for attributes
(part of the definition for the relational data model);

2NF: an intermediate result in the history of database design
theory;

3NF: lossless and dependencies can be preserved;

BCNF: lossless but dependencies may not be preserved.

3NF can only minimise (not necessarily eliminate) redundancy. So a
relation schema in 3NF may still have update anomalies.

A relation schema in BCNF eliminates redundancy.

Why Denormalisation?

Do we need to normalize relation schemas in all cases when designing
a relational database?

The normalisation process may degrade performance when data are
frequently queried.

Since relation schemas are decomposed into many smaller ones after
normalisation, queries need to join many relations together in order to
return the results.

Unfortunately, join operation is very expensive.

When data is more frequently queried rather than being updated (e.g.,
data warehousing system), a weaker normal form is desired (i.e.,
denormalisation).

Denormalisation

Denormalisation is a design process that

happens after the normalisation process,
is often performed during the physical design stage, and
reduces the number of relations that need to be joined for certain
queries.

We need to distinguish:

Unnormalised – there is no systematic design.
Normalised – redundancy is reduced after a systematic design (to
minimise data inconsistencies).
Denormalised – redundancy is introduced after analysing the
normalised design (to improve efficiency of queries)

Trade-offs

 

A good database design is to find a balance between desired properties,
then normalise/denormalise relations to a desired degree.

Trade-offs – Data Redundancy vs. Query Efficiency

Normalisation: No Data Redundancy but No Efficient Query Processing
Data redundancies are eliminated in the following relations.

STUDENT
Name StudentID DoB
Tom 123456 25/01/1988

Michael 123458 21/04/1985

COURSE
CourseNo Unit

COMP2400 6
COMP8740 12

ENROL
StudentID CourseNo Semester
123456 COMP2400 2010 S2
123456 COMP8740 2011 S2
123458 COMP2400 2009 S2

However, the query for “list the names of students who enrolled in a course
with 6 units” requires 2 join operations.

SELECT Name, CourseNo FROM Enrol e, Course c, Student s WHERE

e.StudentID=s.StudentID and e.CourseNo=c.CourseNo and c.Unit=6;

Trade-offs – Data Redundancy vs. Query Efficiency

Denormalisation: Data Redundancy but Efficient Query Processing

If a student enrolled 15 courses, then the name and DoB of this student
need to be stored repeatedly 15 times in ENROLMENT.

ENROLMENT
Name StudentID DoB CourseNo Semester Unit
Tom 123456 25/01/1988 COMP2400 2010 S2 6
Tom 123456 25/01/1988 COMP8740 2011 S2 12

Michael 123458 21/04/1985 COMP2400 2009 S2 6

However, the query for “list the names of students who enrolled a course
with 6 units” can be processed efficiently (no join needed).

SELECT Name, CourseNo FROM Enrolment WHERE Unit=6;

Discussion

Both normalisation and denormalisation are useful in database design.

Normalisation: obtain database schema avoiding redundancies and
data inconsistencies

Denormalisation: join normalized relation schemata for the sake of
better query processing

Some problems of (de-)normalisation:

FDs cannot handle null values.

To apply normalisation, FDs must be fully specified.

The algorithms for normalisation are not deterministic, leading to
different decompositions.