module10
Database management – GMU, Prof. Alex Brodsky. Module 10 1
Schema Refinement &
Normalization Theory
Module 10
Prof. Alex Brodsky
Database Systems
Database management – GMU, Prof. Alex Brodsky. Module 10 2
What’s the Problem
❖ Consider relation obtained (call it SNLRHW)
Hourly_Emps(ssn, name, lot, rating, hrly_wages, hrs_worked)
❖ What if we know the FD: R® W holds?
Database management – GMU, Prof. Alex Brodsky. Module 10 3
Redundancy
❖ When part of data can be derived from other
parts, we say redundancy exists.
– Example: the hrly_wage of Smiley can be derived
from the hrly_wage of Attishoo because they have
the same rating and we know rating determines
hrly_wage.
❖ Redundancy exists because of of the existence
of integrity constraints (e.g., FD: R® W).
Database management – GMU, Prof. Alex Brodsky. Module 10 4
What’s the problem, again
❖ Update anomaly: Can we change W in
just the 1st tuple of SNLRWH?
❖ Insertion anomaly: What if we want to
insert an employee and don’t know the
hourly wage for his rating?
❖ Deletion anomaly: If we delete all
employees with rating 5, we lose the
information about the wage for rating 5!
Database management – GMU, Prof. Alex Brodsky. Module 10 5
What do we do? Decomposition
S N L R H
123-22-3666 Attishoo 48 8 40
231-31-5368 Smiley 22 8 30
131-24-3650 Smethurst 35 5 30
434-26-3751 Guldu 35 5 32
612-67-4134 Madayan 35 8 40
R W
8 10
5 7
S N L R W H
123-22-3666 Attishoo 48 8 10 40
231-31-5368 Smiley 22 8 10 30
131-24-3650 Smethurst 35 5 7 30
434-26-3751 Guldu 35 5 7 32
612-67-4134 Madayan 35 8 10 40
!”=
Database management – GMU, Prof. Alex Brodsky. Module 10 6
Functional Dependencies (FDs)
❖ A functional dependency (FD) has the form:
X®Y, where X and y are two sets of
attributes.
– Examples: R®W
❖ The FD X®Y is satisfied by a relation
instance r if:
– for each pair of tuples t1 and t2 in r:
t1[X] = t2[X] implies t1[Y] =t2[Y]
– i.e., given any two tuples in r, if the X values
agree, then the Y values must also agree. (X and
Y are sets of attributes.)
Database management – GMU, Prof. Alex Brodsky. Module 10 7
Reasoning About FDs
❖ Given some FDs, we can usually infer
additional FDs:
– ssn® did, did ® lot implies ssn® lot
– A ® BC implies A ® B
❖ An FD f is logically implied by a set of FDs F,
denoted by F |= f, if for every relational
instance r that satisfies all fd’s in F, f is also
satisfied.
– F+ = closure of F is the set of all FDs that are implied
by F.
Database management – GMU, Prof. Alex Brodsky. Module 10 8
Reasoning about FDs
❖ How do we get all the FDs that are logically
implied by a given set of FDs?
❖ Armstrong’s Axioms (X, Y, Z are sets of
attributes):
– Reflexivity: If X Ê Y, then X ® Y
– Augmentation: If X ® Y, then XZ ® YZ for any Z
– Transitivity: If X ® Y and Y ® Z, then X ® Z
Database management – GMU, Prof. Alex Brodsky. Module 10 9
Reasoning About FDs (Contd.)
❖ Computing the closure of a set of FDs can be
expensive. (Size of closure is exponential in # attrs!)
❖ Typically, we just want to check if a given FD X ®Y is
in the closure of a set of FDs F. An efficient check:
– Compute attribute closure of X (denoted X+) wrt F:
◆ Set of all attributes A such that X ® A is in F+
◆ There is a linear time algorithm to compute this.
– Check if Y is in X+
❖ Does F = {A ® B, B ® C, C D ® E } imply A ® E?
– i.e, is A ® E in the closure F+? Equivalently, is E in A+?
Database management – GMU, Prof. Alex Brodsky. Module 10 10
Computing X+
❖ Input F (a set of FDs), and X (a set of attributes)
❖ Output: Result=X+ (under F)
❖ Method:
– Step 1: Result :=X;
– Step 2: Take Y ® Z in F, and Y is in Result, do:
Result := Result union Z
– Repeat step 2 until Result cannot be changed and
then output Result.
Database management – GMU, Prof. Alex Brodsky. Module 10 11
Example of computing X+
❖ F={A ®B, AC ®D, AB ®C}
❖ X=A
❖ Result should be X+=ABCD
Database management – GMU, Prof. Alex Brodsky. Module 10 12
Normal Forms
❖ The first question: Is any refinement needed!
❖ Normal forms:
– If a relation is in a certain normal form (BCNF, 3NF
etc.), it is known that certain kinds of problems are
avoided/minimized. This can be used to help us
decide whether decomposing the relation will help.
❖ Role of FDs in detecting redundancy:
– Consider a relation R with 3 attributes, ABC.
◆ No FDs hold: There is no redundancy here.
◆ Given A ® B: Several tuples could have the same A value,
and if so, they’ll all have the same B value!
Database management – GMU, Prof. Alex Brodsky. Module 10 13
Boyce-Codd Normal Form (BCNF)
❖ Reln R with FDs F is in BCNF if, for each non-trivial fd
X ® A in F+ , X is a (super) key for R (i.e., X ® R in F+).
❖ In other words, R is in BCNF if the only non-trivial FDs
that hold over R are key constraints.
❖ If BCNF:
– No “data” in R can be predicted using FDs alone. Why:
– Because X is a key,
we can’t have two different tuples that
agree on the X value
X Y A
x y1 a
x y2 ?
Database management – GMU, Prof. Alex Brodsky. Module 10 14
Decomposition of a Relation Scheme
❖ When a relation schema is not in BCNF: decompose.
❖ Suppose that relation R contains attributes A1 … An.
A decomposition of R consists of replacing R by two or
more relations such that:
– Each new relation scheme contains a subset of the attributes
of R (and no attributes that do not appear in R), and
– Every attribute of R appears as an attribute of at least one of
the new relations.
❖ Intuitively, decomposing R means we will store
instances of the relation schemes produced by the
decomposition, instead of instances of R.
Database management – GMU, Prof. Alex Brodsky. Module 10 15
Decomposition example
S N L R H
123-22-3666 Attishoo 48 8 40
231-31-5368 Smiley 22 8 30
131-24-3650 Smethurst 35 5 30
434-26-3751 Guldu 35 5 32
612-67-4134 Madayan 35 8 40
R W
8 10
5 7
S N L R W H
123-22-3666 Attishoo 48 8 10 40
231-31-5368 Smiley 22 8 10 30
131-24-3650 Smethurst 35 5 7 30
434-26-3751 Guldu 35 5 7 32
612-67-4134 Madayan 35 8 10 40
!”=
Original relation
(not stored in DB!)
Decomposition
(in the DB)
Database management – GMU, Prof. Alex Brodsky. Module 10 16
Problems with Decompositions
❖ There are three potential problems to consider:
❶ Some queries become more expensive.
◆ e.g., How much did sailor Attishoo earn? (earn = W*H)
❷ Given instances of the decomposed relations, we may
not be able to reconstruct the corresponding instance
of the original relation!
◆ Fortunately, not in the SNLRWH example.
❸ Checking some dependencies may require joining the
instances of the decomposed relations.
◆ Fortunately, not in the SNLRWH example.
❖ Tradeoff: Must consider these issues vs.
redundancy.
Database management – GMU, Prof. Alex Brodsky. Module 10 17
Example of problem 2
Student_ID Name Dcode Cno Grade
123-22-3666 Attishoo INFS 501 A
231-31-5368 Guldu CS 102 B
131-24-3650 Smethurst INFS 614 B
434-26-3751 Guldu INFS 614 A
434-26-3751 Guldu INFS 612 C
Name Dcode Cno Grade
Attishoo INFS 501 A
Guldu CS 102 B
Smethurst INFS 614 B
Guldu INFS 614 A
Guldu INFS 612 C
Student_ID Name
123-22-3666 Attishoo
231-31-5368 Guldu
131-24-3650 Smethurst
434-26-3751 Guldu
!”
¹
Database management – GMU, Prof. Alex Brodsky. Module 10 18
Lossless Join Decompositions
❖ Decomposition of R into R1 and R2 is lossless-join
w.r.t. a set of FDs F if, for every instance r that
satisfies F, we have:
❖ It is always true that
❖ In general, the other direction does not hold! If
it does, the decomposition is lossless-join.
rrr RR =)()( 21 pp !”
)()(
21
rrr RR pp !”Í
Database management – GMU, Prof. Alex Brodsky. Module 10 19
Example (lossy decomposition)
A B C
1 2 3
4 5 6
7 2 8
1 2 8
7 2 3
A B C
1 2 3
4 5 6
7 2 8
A B
1 2
4 5
7 2
)()( rr BCAB pp !”
)(rBCp
)(rABp
r
Database management – GMU, Prof. Alex Brodsky. Module 10 20
Example (lossless join decomposition)
A B C
1 2 3
4 5 6
7 2 3
A B C
1 2 3
4 5 6
7 2 3
A B
1 2
4 5
7 2
B C
2 3
5 6
)()( rr BCAB pp !”
)(rBCp
)(rABp
r
BCBCABhaveWe ®Ç )(
Database management – GMU, Prof. Alex Brodsky. Module 10 21
Lossless Join Decomposition
❖ The decomposition of R into R1 and R2 is
lossless-join wrt F if and only if F+ contains:
– R1 ÇR2 ® R1, or
– R1 ÇR2 ® R2
❖ In particular, the decomposition of R into
(UV) and (R-V) is lossless-join if U ® V holds
on R
– assume U and V do not share attributes.
– WHY?
Database management – GMU, Prof. Alex Brodsky. Module 10 22
Decomposition
❖ Definition extended to decomposition into 3 or
more relations in a straightforward way.
❖ It is essential that all decompositions used to deal
with redundancy be lossless! (Avoids Problem (2).)
Database management – GMU, Prof. Alex Brodsky. Module 10 23
Decomposition into BCNF
❖ Consider relation R with FDs F. If X ® A in
F+ over R, violates BCNF, i.e.,
– XA are all in R
– A is not in X
– X ® R is not in F+
❖ Then: decompose R into R – A and XA.
❖ Repeated application of this idea will give us
a collection of relations that are in BCNF;
lossless join decomposition, and guaranteed
to terminate.
Database management – GMU, Prof. Alex Brodsky. Module 10 24
BCNF Decomposition Example
❖ Assume relation schema CSJDPQV
– key C, JP ® C, SD ® P, J ® S
❖ To deal with SD ® P, decompose into SDP, CSJDQV.
❖ To deal with J ® S, decompose CSJDQV into JS and
CJDQV
❖ A tree representation of the decomposition:
CSJDPQV
SDP CSJDQV
JS CJDQVUsing SD ® P
Using J ® S
Database management – GMU, Prof. Alex Brodsky. Module 10 25
BCNF Decomposition
❖ In general, several dependencies may cause
violation of BCNF. The order in which we
“deal with’’ them could lead to very different
sets of relations!
Database management – GMU, Prof. Alex Brodsky. Module 10 26
How do we know R is in BCNF?
❖ If R has only two attributes, then it is in BCNF
❖ If F only uses attributes in R, then:
– R is in BCNF if and only if for each X ® Y in F (not
F+!), X is a superkey of R, i.e., X ® R is in F+ (not
F!).
❖ In general (F may use attributes outside of R!
See example earlier for CSJDQV) ,
– Need to consider all FD X ® A in F+ (not F!).
Database management – GMU, Prof. Alex Brodsky. Module 10 27
BCNF and Dependency Preservation
❖ In general, there may not be a dependency
preserving decomposition into BCNF.
❖ E.g., schema CSZ with FDs: CS ® Z, Z ® C
❖ Can’t decompose while preserving CS ® Z, but
CSZ is not in BCNF.
Database management – GMU, Prof. Alex Brodsky. Module 10 28
Dependency Preserving Decomposition
❖ Consider CSJDPQV, C is key, JP ® C and SD
® P.
– BCNF decomposition: CSJDQV and SDP
– Problem: Checking JP ® C requires a join!
❖ Dependency preserving decomposition
(Intuitive):
– If R is decomposed into X, Y and Z, and we enforce
the FDs that hold on X, on Y and on Z, then all FDs
that were given to hold on R must also hold. (Avoids
Problem (3).)
Database management – GMU, Prof. Alex Brodsky. Module 10 29
What FD on a decomposition?
❖ Projection of set of FDs F: If R is decomposed
into X, … the projection of F onto X (denoted
FX ) is the set of FDs U ® V in F+ (closure of F )
such that U, V are in X.
Database management – GMU, Prof. Alex Brodsky. Module 10 30
Dependency Preserving Decompositions (Contd.)
❖ Decomposition of R into X and Y is dependency
preserving if (FX union FY ) + = F +
– i.e., if we consider only dependencies in the closure F + that
can be checked in X without considering Y, and in Y
without considering X, these imply all dependencies in F +.
❖ Important to consider F +, not F, in this definition:
– ABC, A ® B, B ® C, C ® A, decomposed into AB and BC.
– Is this dependency preserving? Is C ® A preserved?????
❖ Dependency preserving does not imply lossless join:
– ABC, A ® B, decomposed into AB and BC.
❖ And vice-versa! (Example?)
Database management – GMU, Prof. Alex Brodsky. Module 10 31
Another example
❖ Assume CSJDQV is decomposed into
SDP, JS, CJDQV
is not dependency preserving
w.r.t. the FDs: JP ® C, SD ® P and J ® S.
❖ However, it is a lossless join decomposition.
❖ In this case, adding JPC to the collection of
relations gives us a dependency preserving
decomposition.
❖ JPC tuples stored only for checking FD!
Database management – GMU, Prof. Alex Brodsky. Module 10 32
Third Normal Form (3NF)
❖ Reln R with FDs F is in 3NF if, for all X ® A in F+
– A in X (i.e., FD is trivial), or
– X contains a key for R, or
– A is part of some (candidate) key for R.
❖ Minimality of a (candidate) key is crucial in third
condition above!
Database management – GMU, Prof. Alex Brodsky. Module 10 33
Third Normal Form (3NF)
❖ If R is in BCNF, obviously in 3NF.
❖ If R is in 3NF, some redundancy is possible.
It is a compromise, used when BCNF not
achievable (e.g., no “good’’ decomposition,
or performance considerations).
– Lossless-join, dependency-preserving decomposition of
R into a collection of 3NF relations always possible.
Database management – GMU, Prof. Alex Brodsky. Module 10 34
What Does 3NF Achieve?
❖ If 3NF is violated by X®A, one of the following holds:
– X is a subset of some key K
◆ We store (X, A) pairs redundantly.
– X is not a proper subset of any key.
◆ There is a chain of FDs K ® X ® A, which means that we cannot
associate an X value with a K value unless we also associate an A
value with an X value.
❖ But: even if reln is in 3NF, these problems could arise.
– e.g., Reserves SBDC, S ®C, C ®S is in 3NF, but for each
reservation of sailor S, same (S, C) pair is stored.
❖ Thus, 3NF is indeed a compromise relative to BCNF.
Database management – GMU, Prof. Alex Brodsky. Module 10 35
Decomposition into 3NF
❖ Obviously, the algorithm for lossless join decomp into
BCNF can be used to obtain a lossless join decomp
into 3NF (typically, can stop earlier).
❖ To ensure dependency preservation, one idea:
– If X ® Y is not preserved, add relation XY.
– Problem is that XY may violate 3NF! e.g., consider the
addition of CJP to `preserve’ JP ® C. What if we also have
J ® C ?
❖ Refinement: Instead of the given set of FDs F, use a
minimal cover for F.
Database management – GMU, Prof. Alex Brodsky. Module 10 36
Minimal Cover for a Set of FDs
❖ Minimal cover G for a set of FDs F:
– Closure of F = closure of G.
– Right hand side of each FD in G is a single attribute.
– If we modify G by deleting an FD or by deleting attributes
from an FD in G, the closure changes.
❖ Intuitively, every FD in G is needed, and “as small as
possible’’ in order to get the same closure as F.
❖ e.g., A ® B, ABCD ® E, EF ® GH, ACDF ® EG
has the following minimal cover:
– A ® B, ACD ® E, EF ® G and EF ® H
❖ M.C. ® Lossless-Join, Dep. Pres. Decomp!!! (in book)
®
Database management – GMU, Prof. Alex Brodsky. Module 10 37
Summary of Schema Refinement
❖ If a relation is in BCNF, it is free of redundancies that
can be detected using FDs. Thus, trying to ensure
that all relations are in BCNF is a good heuristic.
❖ If a relation is not in BCNF, we can try to decompose
it into a collection of BCNF relations.
– Must consider whether all FDs are preserved. If a lossless-
join, dependency preserving decomposition into BCNF is
not possible (or unsuitable, given typical queries), should
consider decomposition into 3NF.
– Decompositions should be carried out and/or re-examined
while keeping performance requirements in mind.