Relational Database Design
1
10 Relational Database Design
Anomalies can be removed from relation designs by decomposing them until they are in a normal form.
Several problems should be investigated regarding a decomposition.
A decomposition of a relation scheme, R, is a set of relation schemes {R1, . . . ,Rn}
𝑅 = 𝑖𝑅 𝑛ڂ such that Ri⊆R for each i, and 1=𝑖
Note that in a decomposition {R1, . . . ,Rn}, the intersect of each pair of Ri and Rj does not have to be empty.
Example:R={A,B,C,D,E}, R1={A,B},R2={A,C},R3={C,D,E} A naive decomposition: each relation has only attribute.
A good decomposition should have the following two properties.
2

Dependency Preserving
Definition: Two sets F and G of FD’s are equivalent if F = G . Given a decomposition {R1, . . . ,Rn} of R:
. 𝑅∈𝑌,𝑅∈𝑋&𝐹∈𝑌→𝑋:𝑌→𝑋 = 𝐹 𝑖𝑖𝑖
The decomposition {R1, . . . ,Rn} of R is dependency preserving with respect to F if
++
𝐹𝑛=𝑖ڂ =+𝐹
.+ 𝑖 1=𝑖
3

Examples
F = { A→BC, D→EG, M→A }, R = (A, B, C, D, E, G, M, A) 1) Given R1= ( A, B, C, M) and R2 = (C, D, E, G),
F1 = { A→BC, M→A}, F2 = {D→EG}
F = F1 U F2. thus, dependency preserving
2) Suppose that F’ = F U {M→D}. R1 and R2 remain the same. Thus, F1 and F2 remain the same.
We need to verify if M→D is inferred by F1 U F2.
Since M+ | F1 U F2 = {M, A, B, C}, M→D is not inferred by F1 U F2. Thus, R1 and R2 are not dependency preserving regarding F’.
3) F’’ = {A→BC, D→EG, M→A , M→C, C→D, M→D} F1 = {A→BC, M→A, M→C}, F2 = {D→EG, C→D}
It can be verified that M→ D is inferred by F1 and F2.
Thus, F’’+ = (F1 U F2)+
Hence, R1 and R2 are dependency preserving regarding F’’.
4
Lossless Join Decomposition A second necessary property for decomposition:
A decomposition {R1, . . . ,Rn} of R is a lossless join decomposition with respect to a set F of FD’s if for every relation instance r that satisfies F:
𝑟=𝜋𝑅 𝑟⋈⋯⋈𝜋𝑅 𝑟. 1𝑛
If 𝑟⊂π𝑅1 𝑟 ⋈ ⋯ ⋈π𝑅𝑛 𝑟 , the decomposition is lossy.
5
Lossless Join Decomposition(cont)
Example 2:
Suppose that we decompose the following relation:
STUDENT_ADVISOR
Name Jones Ng Martin Dulles Duke James Evan Baxter
Department Comp Sci Chemistry Physics Decision Sci Mathematics Comp Sci Comp Sci English
Advisor Smith Turner Bosky Hall James Clark Smith Bronte
With dependencies ሼ𝑁𝑎𝑚𝑒 → 𝐷𝑒𝑝𝑎𝑟𝑡𝑚𝑒𝑛𝑡, 𝑁𝑎𝑚𝑒 → 𝐴𝑑𝑣𝑖𝑠𝑜𝑟, 𝐴𝑑𝑣𝑖𝑠𝑜𝑟 → 𝐷𝑒𝑝𝑎𝑟𝑡𝑚𝑒𝑛𝑡ሽ, into two relations:
6
Lossless Join Decomposition(cont)
STUDENT_DEPARTMENT DEPARTMENT_ADVISOR
Name
Department
Jones Ng Martin Duke Dulles James Evan Baxter
Comp Sci Chemistry Physics Mathematics Decision Sci Comp Sci Comp Sci English
Department
Advisor
Comp Sci Chemistry Physics Decision Sci Mathematics Comp Sci English
Smith Turner Bosky Hall James Clark Bronte
If we join these decomposed relations we get:
7
Lossless Join Decomposition(cont)
Name
Department
Advisor
Jones Jones Ng Martin Dulles Duke James James Evan Evan Baxter
Comp Sci Comp Sci Chemistry Physics Decision Sci Mathematics Comp Sci Comp Sci Comp Sci Comp Sci English
Smith Clark* Turner Bosky Hall James Smith* Clark Smith Clark* Bronte
This is not the same as the original relation (the tuples marked with * have been added). Thus the decomposition is lossy.
Useful theorem: The decomposition {R1,R2} of R is lossless iff the common attributes R1∩R2 form a superkey for either R1 or R2.
8
Lossless Join Decomposition(cont)
Example 3: Given R(A,B,C) and F = {A→ B}. The decomposition into R1(A,B) and R2(A,C) is lossless because A→ B is an FD over R1, so the common attribute A is a key of R1.
9
Testing for the lossless join property Algorithm TEST_LJ
Step 1: Create a matrix S, each element si,j∈S corresponds the relation Ri and the attribute Aj, such that:
sj,i = a if Ai∈ Rj, otherwise sj,i = b.
Step 2: Repeat the following process till S has no change or one row is
made up entirely of “a” symbols.
Step 2.1: For each X→ Y , choose the rows where the elements
corresponding to X take the value a.
Step 2.2: In those chosen rows (must be at least two rows), the elements
corresponding to Y also take the value a if one of the chosen rows take the value a on Y .
10
Testing for the lossless join property(cont) The decomposition is lossless if one row is entirely made up by “a” values.
The algorithm can be found as the Algorithm 15.2 in E/N book.
Note: The correctness of the algorithm is based on the assumption that no null values are allowed for the join attributes.
If and only if exists an order such that Ri ∩ Mi-1 forms
a superkey of R i or Mi-1, where Mi-1 is the join on R1, R2, … Ri-1
11
Testing for the lossless join property(cont) Example: R = (A,B,C,D), F = {A→B,A →C,C → D}.
Let R1 = (A,B,C), R2 = (C,D).
Initially, S is
ABCD R1 a a a b R2 b b a a
Note: rows 1 and 2 of S agree on {C}, which is the left hand side of C→D. Therefore, change the D value on rows 1 to a, matching the value from row 2.
Now row 1 is entirely a’s, so the decomposition is lossless. (Check it.)
12
Testing for the lossless join property(cont) Example 2: R = (A,B,C,D,E),
F = {AB →CD,A → E,C → D}. Let R1 = (A,B,C),
R2 = (B,C,D) and R3 = (C,D,E).
Example 3: R = (A,B,C,D,E,G),
F = {A → B,C → DE,AB → G}. Let R1 = (A,B), R2 = (C,D,E) and R3 = (A,C,G).
13
Testing for the lossless join property(cont) Example 4: R = (A,B,C,D,E,G),
F = {AB →G, C → DE, A → B}.
Let R1 = (A,B), R2 = (C, D, E) and R3 = (A,C,G).
ABCDEG ABCDEG R1 a a b b b b R1 a a b b b b R2 b b a a a b R2 b b a a a b R3 a b a b b a R3 a b a b b a
ABCDEG
R1 a a b b b b
R2 b b a a a b
Rabaaaa 3 X
XX aaa
14
Lossless decomposition into BCNF
Algorithm TO_BCNF
D := {R1,R2, …Rn}
While ∃ a Ri ∈ D and Ri is not in BCNF Do
{findaX→YinRithatviolatesBCNF; replaceRiinDby(Ri−Y)and(X ∪ Y ); }
F = {A→B,A→C, A→D, C→E, E→D, C→G}, R1 = (C, D, E, G), R2 = (A, B, C, D)
R11 = (C, E, G), R12 = (E, D) due E→D
R21 = (A, B, C), R22 = (C, D) because of C→D
15
Lossless decomposition into BCNF
Algorithm TO_BCNF
D := {R1,R2, …Rn}
While ∃ a Ri ∈ D and Ri is not in BCNF Do
{findaX→YinRithatviolatesBCNF; replaceRiinDby(Ri−Y)and(X ∪ Y ); }
Since a X →Y violating BCNF is not always in F, the main difficulty is to verify if Ri is in BCNF; see the approach below:
1. For each subset X of Ri, computer X+.
2. X →(X+|Ri − X) violates BCNF, if X+|Ri − X ≠ ∅ and Ri − X+ ≠ ∅ .
Here, X+|Ri − X = ∅ means that each F.D with X as the left hand side is trivial; Ri −X+ =∅meansXisasuperkeyofRi
16
Lossless decomposition into BCNF(cont)
Example: (From Desai 6.31)
Find a BCNF decomposition of the relation scheme below: SHIPPING(Ship , Capacity , Date , Cargo , Value)
F consists of:
Ship→ Capacity
{Ship , Date}→ Cargo {Cargo , Capacity}→ Value
17
Lossless decomposition into BCNF(cont)
From Ship→ Capacity, we decompose SHIPPING into R1(Ship , Date , Cargo , Value)
Key: {Ship,Date}
A nontrivial FD in F+ violates BCNF: {Ship , Cargo} → Value
and
R2(Ship , Capacity)
Key: {Ship}
Only one nontrivial FD in F+: Ship → Capacity
Ship → Capacity
{Ship , Date}→ Cargo {Cargo , Capacity}→ Value
18
Lossless decomposition into BCNF(cont)
R1 is not in BCNF so we must decompose it further into
R11 (Ship , Date , Cargo)
Key: {Ship,Date}
Only one nontrivial FD in F+ with single attribute on the right side: {Ship , Date} →Cargo
and
R12 (Ship , Cargo , Value)
Key: {Ship,Cargo}
Only one nontrivial FD in F+ with single attribute on the right side:
{Ship,Cargo} → Value
This is in BCNF and the decomposition is lossless but not dependency preserving (the FD {Capacity,Cargo} → Value) has been lost.
Ship → Capacity
{Ship , Date}→ Cargo {Cargo , Capacity}→ Value
19
Lossless decomposition into BCNF(cont)
Or we could have chosen {Cargo , Capacity} →Value, which would give us: R1 (Ship , Capacity , Date , Cargo)
Key: {Ship,Date}
A nontrivial FD in F+ violates BCNF:
Ship → Capacity
and
R2 (Cargo , Capacity , Value) Key: {Cargo,Capacity}
Only one nontrivial FD in F+ with single attribute on the right side: {Cargo , Capacity} → Value
Ship → Capacity
{Ship , Date}→ Cargo {Cargo , Capacity}→ Value
20
and then from Ship→ Capacity,
R11 (Ship , Date , Cargo)
Key: {Ship,Date}
Only one nontrivial FD in F+ with single attribute on the right side: {Ship , Date} → Cargo
And
R12 (Ship , Capacity)
Key: {Ship}
+
Only one nontrivial FD in F : Ship → Capacity
This is in BCNF and the decomposition is both lossless and dependency preserving.
However, there are relation schemes for which there is no lossless, dependency preserving decomposition into BCNF.
Ship → Capacity
{Ship , Date}→ Cargo {Cargo , Capacity}→ Value
A
B
C
21
Lossless and dependency-preserving
decomposition into 3NF
A lossless and dependency-preserving decomposition into 3NF is always possible.
More definitions regarding FD’s are needed. A set F of FD’s is minimal if
1. Every FD X→ Y in F is simple: Y consists of a single attribute, 2. Every FD X→ A in F is left-reduced: there is no proper subset
Y ⊂ X such that X → A can be replaced with Y→A. that is, there is no Y ⊂X such that
((F−{X→A})∪{Y→A})+ =F+
3. No FD in F can be removed; that is, there is no FD X→A in F
Iff F |= Y→A
such that
Iff X→A is inferred From F– { X→A}
(F − {X → A})+ = F+.
22
Computing a minimum cover F is a set of FD’s.
A minimal cover (or canonical cover) for F is a minimal set of FD’s Fmin such that F+ = F+min.
Algorithm Min_Cover
Input: a set F of functional dependencies. Output: a minimum cover of F.
Step 1: Reduce right side. Apply Algorithm Reduce right to F.
Step 2: Reduce left side. Apply Algorithm Reduce left to the output of Step 2.
Step 3: Remove redundant FDs. Apply Algorithm Remove_redundency to the output of Step 2. The
output is a minimum cover. Below we detail the three Steps.
23
Computing a minimum cover(cont)
Algorithm Reduce_right
INPUT: F.
OUTPUT: right side reduced F’.
For each FD X→ Y ∈ F where Y = {A ,A , …,A }, we use all X →{A } (for 1≤ i ≤ k) to replace
X→ Y .
Algorithm Reduce_left
INPUT: right side reduced F.
OUTPUT: right and left side reduced F’.
For each X → {A} ∈ F where X = {Ai : 1 ≤ i ≤ k}, do the following. For i = 1 to k, replace X with X − {Ai} if A∈(X − {Ai})+.
Algorithm Reduce_redundancy
INPUT: right and left side reduced F.
OUTPUT: a minimum cover F’ of F.
For each FD X → {A} ∈ F, remove it from F if: A ∈ X+ with respect to F − {X →{A}}.
12ki
24
Example:
R = (A, B, C, D, E, G)
F={A→BCD, B→CDE,AC→E}
Step 1: F’ = {A→B, A→C, A→D, B→C, B→D, B→E, AC→E}
Step 2: AC → E
C+ = {C}; thus C→E is not inferred by F’.
Hence, AC → E cannot be replaced by A → E.
A+ = {A, B, C, D, E}; thus, A→E is inferred by F’.
Hence, AC→ E can be replaced by A → E.
F’’ = {A→ B, A→C, A→ D, A→ E, B→ C, B→D, B→ E}
Step 3: A+|F’’ – {A→B}= {A, C, D, E}; thus A→B is not inferred by F’’ –{A→B}. That is, A→B is not redundant.
A+|F’’ – {A→C}= {A, B, C, D, E}; thus,A→C is redundant.
Thus, we can remove A→C from F’’ to obtain F’’’.
Iteratively, we can A→D and A→E but not the others.
Thus, Fmin={A→B, B→C, B→D, B→E}.
25
3NF decomposition algorithm
Algorithm 3NF decomposition
1. Find a minimum cover F’ of F.
2. For each left side X that appears in F’, do:
create a relation schema X ∪ A1 ∪ A2… ∪ Am where X →{A1}, … , X →
{Am} are all the
dependencies in F’ with X as left side.
3. if none of the relation schemas contains a key of R,
create one more relation schema that contains attributes that form a key for R. See E/N Algorithm 15.4.
26
Example:
R = (A, B, C, D, E, G) Fmin={A→B, B→C, B→D, B→E}. Candidate key: (A, G)
R1 = (A, B), R2 = (B, C, D, E)
R3 = (A, G)
27
3NF decomposition algorithm(cont) Example: (From Desai 6.31)
Beginning again with the SHIPPING relation. The functional dependencies already form a canonical cover.
• From Ship→Capacity, derive R1(Ship,Capacity), • From {Ship,Date} → Cargo, derive
R2(Ship , Date , Cargo),
• From {Capacity,Cargo} → Value, derive
R3(Capacity , Cargo , Value).
• There are no attributes not yet included and the original key {Ship,Date} is included in R2.
28
3NF decomposition algorithm(cont) Another Example: Apply the algorithm to the LOTS example given earlier. A minimal cover is
{ Property_Id→Lot_No,
Property_Id → Area, {City,Lot_No} → Property_Id, Area → Price, Area → City, City → Tax_Rate }.
This gives the decomposition:
R1 (Property_Id , Lot_No , Area) R2 (City , Lot_No , Property_Id) R3 (Area , Price , City)
R4 (City , Tax_Rate)
Exercise 1: Check that this is a lossless, dependency preserving decomposition into 3NF.
Exercise 2: Develop an algorithm for computing a key of a table R with respect to a given F of FDs.
29
Summary
Data redundancies are undesirable as they create the potential for update anomalies,
One way to remove such redundancies is to normalise a design, guided by FD’s.
BCNF removes all redundancies due to FD’s, but a dependency preserving decomposition cannot always be found,
A dependency preserving, lossless decomposition into 3NF can always be found, but some redundancies may remain,
Even where a dependency preserving, lossless decomposition that removes all redundancies can be found, it may not be possible, for efficiency reasons, to remove all redundancies.
30