DECOMPOSITION &
SCHEMA NORMALIZATION
CS 564
WHAT IS THIS LECTURE ABOUT?
• Bad schemas lead to redundancy
• To “correct” bad schemas: decompose relations
– lossless-join
– dependency preserving
• BCNF : a desired normal forms
2CS 564 [Fall 2021] – Paris Koutris
DB DESIGN THEORY
• Helps us identify the “bad” schemas and improve them
1. express constraints on the data: functional
dependencies (FDs)
2. use the FDs to decompose the relations
• The process, called normalization, obtains a schema in a
“normal form” that guarantees certain properties
– examples of normal forms: BCNF, 3NF, …
3CS 564 [Fall 2021] – Paris Koutris
SCHEMA DECOMPOSITION
4CS 564 [Fall 2021] – Paris Koutris
WHAT IS A DECOMPOSITION?
We decompose a relation R(A1, …, An) by creating
• R1(B1, .., Bm)
• R2(C1,…, Ck)
where {𝐵!, … , 𝐵”} ∪ {𝐶!, … , 𝐶#} = {𝐴!, …𝐴$}
• The instance of R1 is the projection of R onto B1, .., Bm
• The instance of R2 is the projection of R onto C1, .., Cl
5CS 564 [Fall 2021] – Paris Koutris
In general we can decompose a relation into multiple relations.
EXAMPLE: DECOMPOSITION
SSN name age
934729837 Paris 24
123123645 John 30
384475687 Arun 20
SSN name age phoneNumber
934729837 Paris 24 608-374-8422
934729837 Paris 24 603-534-8399
123123645 John 30 608-321-1163
384475687 Arun 20 206-473-8221
SSN phoneNumber
934729837 608-374-8422
934729837 603-534-8399
123123645 608-321-1163
384475687 206-473-8221
6CS 564 [Fall 2021] – Paris Koutris
DECOMPOSITION DESIDERATA
What should a good decomposition achieve?
1. minimize redundancy
2. avoid information loss (lossless-join)
3. preserve the FDs (dependency preserving)
4. ensure good query performance
7CS 564 [Fall 2021] – Paris Koutris
EXAMPLE: INFORMATION LOSS
8CS 564 [Fall 2021] – Paris Koutris
name age phoneNumber
Paris 24 608-374-8422
John 24 608-321-1163
Arun 20 206-473-8221
Decompose into:
R1(name, age)
R2(age, phoneNumber)
name age
Paris 24
John 24
Arun 20
age phoneNumber
24 608-374-8422
24 608-321-1163
20 206-473-8221
We can’t figure out which phoneNumber
corresponds to which person!
LOSSLESS-JOIN DECOMPOSITION
9CS 564 [Fall 2021] – Paris Koutris
R(A, B, C)
R1(A, B) R2(B, C)
decompose (projection)
R’(A, B, C)
recover (natural join)
A schema decomposition is lossless-join if for any
initial instance R, R = R’
A natural join is a join on the
same attribute names
THE CHASE ALGORITHM
The chase algorithm is a classic database technique that can
be used to check for lossless-join decomposition
Running example
• relation R(A, B, C, D, E)
• FDs: 𝐴 ⟶ 𝐵, 𝐶 𝐷 ⟶ 𝐸
Question: is the following decomposition lossless-join?
R1(A, D) R2(A, B, C) R3(D, E)
10CS 564 [Fall 2021] – Paris Koutris
CHASE: INITIALIZATION
• We create a table with the attributes of the original relation
• We add one row for each relation we split to
11CS 564 [Fall 2021] – Paris Koutris
A B C D E
a b1 c1 d e1
a b c d2 e2
a3 b3 c3 d e
R1(A, D)
R2(A, B, C)
R3(D, E)
Attribute not in the relation
gets a subscript
Attribute in the relation – no subscript
CHASE: MAIN ALGORITHM
At every iteration, we check whether an FD is violated, and if
so, we “force” it to hold
• If one has a subscript and the other not, we remove the subscript
• If both have a subscript, we make one subscript equal to the other
12CS 564 [Fall 2021] – Paris Koutris
A B C D E
a b1 c1 d e1 e
a b c d2 e2
a3 b3 c3 d e
R1(A, D)
R2(A, B, C)
R3(D, E)
𝐴 ⟶ 𝐵, 𝐶
𝐷 ⟶ 𝐸
The FD 𝐷 ⟶ 𝐸 is violated, so we need to
drop the subscript from the first row
CHASE: MAIN ALGORITHM
13CS 564 [Fall 2021] – Paris Koutris
A B C D E
a b1 c1 d e
a b c d2 e2
a3 b3 c3 d e
𝐴 ⟶ 𝐵, 𝐶
𝐷 ⟶ 𝐸
A B C D E
a b c d e
a b c d2 e2
a3 b3 c3 d e
𝐴 ⟶ 𝐵, 𝐶
A B C D E
a b1 c1 d e1
a b c d2 e2
a3 b3 c3 d e
𝐷 ⟶ 𝐸
At the end of the chase:
• If there is a row without subscripts,
we can say that the decomposition
is lossless-join
• otherwise, it is not
MORE EXAMPLES
• relation R(A, B, C, D)
• FD 𝐴 ⟶ 𝐵, 𝐶
Lossless-join
• decomposition into R1(A, B, C) and R2(A, D)
Not lossless-join
• decomposition into R1(A, B, C) and R2(D)
14CS 564 [Fall 2021] – Paris Koutris
A B C D
R1
R2
DEPENDENCY PRESERVING
Given R and a set of FDs F, we decompose R into R1
and R2. Suppose:
– R1 has a set of FDs F1
– R2 has a set of FDs F2
– F1 and F2 are computed from F
A decomposition is dependency preserving if by
enforcing F1 over R1 and F2 over R2, we can enforce F
over R
15CS 564 [Fall 2021] – Paris Koutris
A NOTE ON FDS OF SPLIT RELATIONS
Given R and a set of FDs F, we decompose R into R1
and R2. How do we find the FDs F1 that hold for R1?
• It is not enough to only keep the FDs from Fwith attributes
in R1
• Instead, we need to find the non-trivial FDs in the fd
closure of Fwith attributes in R1
Example: R(A, B, C) with FDs: 𝐴 ⟶ 𝐵 B⟶ 𝐶
• For R1(A, C) F1 = 𝐴 ⟶ 𝐶
16CS 564 [Fall 2021] – Paris Koutris
GOOD EXAMPLE
Person(SSN, name, age, canDrink)
• 𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒
• 𝑎𝑔𝑒 ⟶ 𝑐𝑎𝑛𝐷𝑟𝑖𝑛𝑘
decomposes into
• R1(SSN, name, age)
– 𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒
• R2(age, canDrink)
– 𝑎𝑔𝑒 ⟶ 𝑐𝑎𝑛𝐷𝑟𝑖𝑛𝑘
17CS 564 [Fall 2021] – Paris Koutris
BAD EXAMPLE
R(A, B, C)
• 𝐴 ⟶ 𝐵
• 𝐵, 𝐶 ⟶ 𝐴
Decomposes into:
• R1(A, B)
– 𝐴 ⟶ 𝐵
• R2(A, C)
– no FDs here!!
18CS 564 [Fall 2021] – Paris Koutris
A B
a1 b
a2 b
R1
A C
a1 c
a2 c
R2
recover
A B C
a1 b c
a2 b c
The recovered table
violates 𝐵, 𝐶 ⟶ 𝐴
NORMAL FORMS
19CS 564 [Fall 2021] – Paris Koutris
A normal form represents a “good” schema design:
• 1NF (flat tables/atomic values)
• 2NF
• 3NF
• BCNF
• 4NF
• …
more
restrictive
BCNF DECOMPOSITION
20CS 564 [Fall 2021] – Paris Koutris
BOYCE-CODD NORMAL FORM (BCNF)
Equivalent definition: for every attribute set X
• either 𝑋( = 𝑋
• or 𝑋( = 𝑎𝑙𝑙 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠
21
A relation R is in BCNF if whenever 𝑋 ⟶ 𝐵 is
a non-trivial FD, then X is a superkey in R
CS 564 [Fall 2021] – Paris Koutris
BCNF EXAMPLE 1
22
SSN name age phoneNumber
934729837 Paris 24 608-374-8422
934729837 Paris 24 603-534-8399
123123645 John 30 608-321-1163
384475687 Arun 20 206-473-8221
𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒
• key = {𝑆𝑆𝑁, 𝑝ℎ𝑜𝑛𝑒𝑁𝑢𝑚𝑏𝑒𝑟}
• 𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒 is a “bad” FD
• The above relation is not in BCNF!
CS 564 [Fall 2021] – Paris Koutris
BCNF EXAMPLE 2
23
𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒
• key = {𝑆𝑆𝑁}
• The above relation is in BCNF!
SSN name age
934729837 Paris 24
123123645 John 30
384475687 Arun 20
CS 564 [Fall 2021] – Paris Koutris
BCNF EXAMPLE 3
24
• key = {𝑆𝑆𝑁, 𝑝ℎ𝑜𝑛𝑒𝑁𝑢𝑚𝑏𝑒𝑟}
• The above relation is in BCNF!
• Is it possible that a binary relation is not in BCNF?
SSN phoneNumber
934729837 608-374-8422
934729837 603-534-8399
123123645 608-321-1163
384475687 206-473-8221
CS 564 [Fall 2021] – Paris Koutris
BCNF DECOMPOSITION
• Find an FD that violates the BCNF condition
𝐴!, 𝐴), … , 𝐴$ ⟶ 𝐵!, 𝐵), …, 𝐵”
• Decompose R to R1 and R2:
• Continue until no BCNF violations are left
25
A’sB’s remaining
attributes
R1 R2
CS 564 [Fall 2021] – Paris Koutris
EXAMPLE
SSN name age phoneNumber
934729837 Paris 24 608-374-8422
934729837 Paris 24 603-534-8399
123123645 John 30 608-321-1163
384475687 Arun 20 206-473-8221
26
• The FD 𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒 violates BCNF
• Split into two relations R1, R2 as follows:
SSN
name
phoneNumber
R1 R2
age
CS 564 [Fall 2021] – Paris Koutris
EXAMPLE CONT’D
SSN name age
934729837 Paris 24
123123645 John 30
384475687 Arun 20
SSN phoneNumber
934729837 608-374-8422
934729837 603-534-8399
123123645 608-321-1163
384475687 206-473-8221
27
SSN
name
phoneNumber
R1 R2
age
𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒
CS 564 [Fall 2021] – Paris Koutris
BCNF DECOMPOSITION PROPERTIES
The BCNF decomposition:
– removes certain types of redundancy
– is lossless-join
– is not always dependency preserving
28CS 564 [Fall 2021] – Paris Koutris
BCNF IS LOSSLESS-JOIN
Example:
R(A, B, C) with 𝐴 ⟶ 𝐵 decomposes into:
R1(A, B) and R2(A, C)
• The BCNF decomposition always satisfies the
lossless-join criterion!
29CS 564 [Fall 2021] – Paris Koutris
BCNF IS NOT DEPENDENCY PRESERVING
30CS 564 [Fall 2021] – Paris Koutris
R(A, B, C)
• 𝐴 ⟶ 𝐵
• 𝐵, 𝐶 ⟶ 𝐴
The BCNF decomposition is:
• R1(A, B) with FD 𝐴 ⟶ 𝐵
• R2(A, C) with no FDs
There may not exist any BCNF
decomposition that is FD preserving!
BCNF EXAMPLE (1)
Books (author, gender, booktitle, genre, price)
• 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟
• 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒
What is the candidate key?
• (author, booktitle) is the only one!
Is is in BCNF?
• No, because the left hand side of both (not trivial) FDs
is not a superkey!
31CS 564 [Fall 2021] – Paris Koutris
BCNF EXAMPLE (2)
Books (author, gender, booktitle, genre, price)
• 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟
• 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒
Splitting Books using the FD 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟:
• Author (author, gender)
FD: 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟 in BCNF!
• Books2 (authos, booktitle, genre, price)
FD: 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒 not in BCNF!
32CS 564 [Fall 2021] – Paris Koutris
BCNF EXAMPLE (3)
Books (author, gender, booktitle, genre, price)
• 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟
• 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒
Splitting Books using the FD 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟:
• Author (author, gender)
FD: 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟 in BCNF!
• Splitting Books2 (author, booktitle, genre, price):
– BookInfo (booktitle, genre, price)
FD: 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒 in BCNF!
– BookAuthor (author, booktitle) in BCNF!
33CS 564 [Fall 2021] – Paris Koutris
IS NORMALIZATION ALWAYS GOOD?
• Example: suppose A and B are always used
together, but normalization says they should be in
different tables
– decomposition might produce unacceptable
performance loss
• Example: data warehouses
– huge historical DBs, rarely updated after creation
– joins expensive or impractical
34CS 564 [Fall 2021] – Paris Koutris