module9
Database Management – GMU, Prof. Alex Brodsky. Module 9 1
Schema Refinement &
Normalization Theory
Module 9
Prof. Alex Brodsky
Database Systems
Database Management – GMU, Prof. Alex Brodsky. Module 9 2
What’s the Problem
❖ Consider relation obtained (call it SNLRHW)
Hourly_Emps(ssn, name, lot, rating, hrly_wages, hrs_worked)
❖ What if we know rating determines hrly_wages?
Database Management – GMU, Prof. Alex Brodsky. Module 9 3
Redundancy
❖ When part of data can be derived from other
parts, we say redundancy exists.
– Example: the hrly_wage of Smiley can be derived
from the hrly_wage of Attishoo because they have
the same rating and we know rating determines
hrly_wage.
❖ Redundancy exists because of of the existence
of integrity constraints.
Database Management – GMU, Prof. Alex Brodsky. Module 9 4
What’s the problem, again
❖ Update anomaly: Can we change W in
just the 1st tuple of SNLRWH?
❖ Insertion anomaly: What if we want to
insert an employee and don’t know the
hourly wage for his rating?
❖ Deletion anomaly: If we delete all
employees with rating 5, we lose the
information about the wage for rating 5!
Database Management – GMU, Prof. Alex Brodsky. Module 9 5
What do we do?
❖ Since constraints, in particular functional
dependencies, cause problems, we need to study
them, and understand when and how they
cause redundancy.
❖ When redundancy exists, refinement is needed.
– Main refinement technique: decomposition (replacing
ABCD with, say, AB and BCD, or ACD and ABD).
❖ Decomposition should be used judiciously:
– Is there reason to decompose a relation?
– What problems (if any) does the decomposition
cause?
Database Management – GMU, Prof. Alex Brodsky. Module 9 6
Refining an ER Diagram
❖ 1st diagram translated:
Workers(S,N,L,D,S)
Departments(D,M,B)
– Lots associated with workers.
❖ Suppose all workers in a
dept are assigned the same
lot: D L
❖ Can fine-tune this:
Workers2(S,N,D,S)
Departments(D,M,B,L)
®
lot
dname
budgetdid
since
name
Works_In DepartmentsEmployees
ssn
lot
dname
budget
did
since
name
Works_In DepartmentsEmployees
ssn
Before:
After:
Database Management – GMU, Prof. Alex Brodsky. Module 9 7
Functional Dependencies (FDs)
❖ A functional dependency (FD) has the form: X®Y,
where X and y are two sets of attributes.
– Examples: Rating®hrly_Wage, AB ®C
❖ The FD X®Y is satisfied by a relation instance r if:
– for each pair of tuples t1 and t2 in r:
t1[X] = t2[X] implies t1[Y] =t2[Y]
i.e., given any two tuples in r, if the X values agree,
then the Y values must also agree. (X and Y are
sets of attributes.)
❖ Convention: X, Y, Z etc denote sets of attributes,
and A, B, C, etc denote attributes.
Database Management – GMU, Prof. Alex Brodsky. Module 9 8
Functional Dependencies (FDs)
❖ The FD holds over relation name R if, for every
allowable instance r of R, r satisfies the FD.
❖ An FD, as an integrity constraint, is a statement about
all allowable relation instances.
– Must be identified based on semantics of application.
– Given some instance r1 of R, we can check if it violates some
FD f or not
– But we cannot tell if f holds over R by looking at an instance!
– This is the same for all integrity constraints!
Database Management – GMU, Prof. Alex Brodsky. Module 9 9
Example: Constraints on Entity Set
❖ Consider relation obtained from Hourly_Emps:
– Hourly_Emps (ssn, name, lot, rating, hrly_wages,
hrs_worked)
❖ Notation: We will denote this relation schema by
listing the attributes: SNLRWH
– This is really the set of attributes {S,N,L,R,W,H}.
– Sometimes, we will refer to all attributes of a relation
by using the relation name. (e.g., Hourly_Emps for
SNLRWH)
❖ Some FDs on Hourly_Emps:
– ssn is the key: S ® SNLRWH
– rating determines hrly_wages: R ® W
Database Management – GMU, Prof. Alex Brodsky. Module 9 10
One more example
A B C
1 1 2
1 1 3
2 1 3
2 1 2
How many possible
FDs totally on this
relation instance?
49.
FDs with A as
the left side:
Satisfied by
the relation
instance?
A®A yes
A®B yes
A®C No
A®AB yes
A®AC No
A®BC No
A®ABC No
Database Management – GMU, Prof. Alex Brodsky. Module 9 11
Violation of FD by a relation
❖ The FD X®Y is NOT satisfied by a relation
instance r if:
– There exists a pair of tuples t1 and t2 in r such that
t1[X] = t2[X] , but t1[y] ¹ t2[y]
i.e., we can find two tuples in r, such that X values
agree, but Y values don’t.
Database Management – GMU, Prof. Alex Brodsky. Module 9 12
Some other FDs
A B C
1 1 2
1 1 3
2 1 3
2 1 2
FD Satisfied by
the relation
instance?
C®B yes
C®AB No
B®C No
B®B Yes
AC ®B Yes [note!]
… …
Database Management – GMU, Prof. Alex Brodsky. Module 9 13
Relationship between FDs and Keys
❖ Given R(A, B, C).
– A®ABC means that A is a key.
❖ In general,
– X ® R means X is a (super)key.
❖ How about key constraint?
– ssn ®did
lot
dname
budgetdid
since
name
Works_In DepartmentsEmployees
ssn
Database Management – GMU, Prof. Alex Brodsky. Module 9 14
Reasoning About FDs
❖ Given some FDs, we can usually infer additional FDs:
– ssn® did, did ® lot implies ssn® lot
– A ® BC implies A ® B
❖ An FD f is logically implied by a set of FDs F, denoted F |= f, if for
every relational instance r, the following holds:
– if r satisfies all FD’s in F,
– then r satisfies f
❖ The closure of F, denoted is the set of all FDs that are
logically implied by F.
F
+
Database Management – GMU, Prof. Alex Brodsky. Module 9 15
Reasoning about FDs
❖ How do we get all the FDs that are logically
implied by a given set of FDs?
❖ Armstrong’s Axioms (X, Y, Z are sets of
attributes):
– Reflexivity: If Y Ê X, then X ® Y
– Augmentation: If X ® Y, then XZ ® YZ for any Z
– Transitivity: If X ® Y and Y ® Z, then X ® Z
Database Management – GMU, Prof. Alex Brodsky. Module 9 16
Armstrong’s axioms
❖ Armstrong’s axioms are sound and complete
inference rules for FDs!
– Sound: all the derived FDs (by using the axioms)
are those logically implied by the given set
– Complete: all the logically implied (by the given
set) FDs can be derived by using the axioms.
Database Management – GMU, Prof. Alex Brodsky. Module 9 17
Example of using Armstrong’s
Axioms
❖ Couple of additional rules (that follow from
AA):
– Union: If X ® Y and X ® Z, then X ® YZ
– Decomposition: If X ® YZ, then X ® Y and X ® Z
❖ Derive the above two by using Armstrong’s
axioms!
Database Management – GMU, Prof. Alex Brodsky. Module 9 18
Reasoning About FDs (Contd.)
❖ Example: Contracts(cid,sid,jid,did,pid,qty,value), and:
– C is the key: C ® CSJDPQV
– Project (jid) purchases each part using single contract:
JP ® C
– Dept purchases at most one part from a supplier: SD ® P
❖ JP ® C, C ® CSJDPQV imply JP ® CSJDPQV
❖ SD ® P implies SDJ ® JP
❖ SDJ ® JP, JP ® CSJDPQV imply SDJ ® CSJDPQV
Database Management – GMU, Prof. Alex Brodsky. Module 9 19
Reasoning About FDs (Contd.)
❖ Computing the closure of a set of FDs can be
expensive. (Size of closure is exponential in # attrs!)
❖ Typically, we just want to check if a given FD X ®Y is
in the closure of a set of FDs F.
❖ An efficient check:
– Compute attribute closure of X (denoted X+) wrt F:
◆ Set of all attributes A such that X ® A is in F+
◆ There is a linear time algorithm to compute this.
❖ Claim: F |= X à Y if and only if Y is in X+
❖ Example: Does F = {A ® B, B ® C, C D ® E } imply
A ® E?
– i.e, is A ® E in the closure F+? Equivalently, is E in A+?
Database Management – GMU, Prof. Alex Brodsky. Module 9 20
Computing X+
❖ Input F (a set of FDs), and X (a set of attributes)
❖ Output: Result=X+ (under F)
❖ Method:
– Step 1: Result :=X;
– Step 2: Take Y ® Z in F, and Y is in Result, do:
Result := Result union Z
– Repeat step 2 until Result cannot be changed and
then output Result.
Database Management – GMU, Prof. Alex Brodsky. Module 9 21
Example of computing X+
❖ F={A ®B, AC ®D, AB ®C}
❖ X=A
❖ Result should be X+=ABCD
Database Management – GMU, Prof. Alex Brodsky. Module 9 22
Computing F+
❖ Given F={ A ® B, B ® C}. Compute F+ (with
attributes A, B, C).
A B C AB AC BC ABC
A Ö Ö Ö Ö Ö Ö Ö
B Ö Ö Ö
C Ö
AB Ö Ö Ö Ö Ö Ö Ö
AC Ö Ö Ö Ö Ö Ö Ö
BC Ö Ö Ö
ABC Ö Ö Ö Ö Ö Ö Ö
Attribute closure
A+=ABC
B+=BC
C+=C
AB+=ABC
AC+=ABC
BC+=BC
ABC+=ABC
• An entry with Ö means FD (the row) ® (the column) is in F+.
• An entry gets Ö when (the column) is in (the row)+
Database Management – GMU, Prof. Alex Brodsky. Module 9 23
Check if two sets of FDs are
equivalent
❖ Two sets of FDs are equivalent if they
logically imply the same set of FDs.
– I.e., if F1+ = F2+, then they are equivalent.
❖ For example, F1={A ®B, A ®C} is equivalent
to F2={A ® BC}
❖ How to test? Two steps:
– Every FD in F1 is in F2+
– Every FD in F2 is in F1+
❖ These two steps can use the algorithm (many
times) for X+
Database Management – GMU, Prof. Alex Brodsky. Module 9 24
Summary
❖ Constraints give rise to redundancy
– Three anomalies
❖ FD is a “popular” type of constraint
– Satisfaction & violation
– Logical implication
– Reasoning