CS代考 DECOMPOSITION & SCHEMA NORMALIZATION

DECOMPOSITION & SCHEMA NORMALIZATION
CS 564 – Fall 2021
ACKs: , Jignesh Patel, An

WHAT IS THIS LECTURE ABOUT?
• Badschemasleadtoredundancy
• To“correct”badschemas:decomposerelations – lossless-join
– dependency preserving
• BCNF:adesirednormalforms
CS 564 [Fall 2021] – Paris Koutris 2

DB DESIGN THEORY

Helps us identify the “bad” schemas and improve them
1. express constraints on the data: functional
dependencies (FDs)
2. use the FDs to decompose the relations
The process, called normalization, obtains a schema in a “normal form” that guarantees certain properties
– examples of normal forms: BCNF, 3NF, …
CS 564 [Fall 2021] – Paris Koutris 3

SCHEMA DECOMPOSITION
CS 564 [Fall 2021] – Paris Koutris 4

WHAT IS A DECOMPOSITION?
We decompose a relation R(A1, …, An) by creating • R1(B1,..,Bm)
• R2(C1,…,Ck)
where {𝐵!,…,𝐵”} ∪ {𝐶!,…,𝐶#} = {𝐴!,…𝐴$}
• The instance of R1 is the projection of R onto B1, .., Bm
• The instance of R2 is the projection of R onto C1, .., Cl In general we can decompose a relation into multiple relations.
CS 564 [Fall 2021] – Paris Koutris 5

EXAMPLE: DECOMPOSITION
SSN
name
age
phoneNumber
934729837
Paris
24
608-374-8422
934729837
Paris
24
603-534-8399
123123645
John
30
608-321-1163
384475687
Arun
20
206-473-8221
SSN
name
age
934729837
Paris
24
123123645
John
30
384475687
Arun
20
SSN
phoneNumber
934729837
608-374-8422
934729837
603-534-8399
123123645
608-321-1163
384475687
206-473-8221
CS 564 [Fall 2021] – Paris Koutris 6

DECOMPOSITION DESIDERATA
What should a good decomposition achieve?
1. minimize redundancy
2. avoid information loss (lossless-join)
3. preserve the FDs (dependency preserving)
4. ensure good query performance
CS 564 [Fall 2021] – Paris Koutris 7

EXAMPLE: INFORMATION LOSS
Decompose into:
R1(name, age)
R2(age, phoneNumber)
name
age
phoneNumber
Paris
24
608-374-8422
John
24
608-321-1163
Arun
20
206-473-8221
name
age
Paris
24
John
24
Arun
20
age
phoneNumber
24
608-374-8422
24
608-321-1163
20
206-473-8221
We can’t figure out which phoneNumber corresponds to which person!
CS 564 [Fall 2021] – Paris Koutris 8

LOSSLESS-JOIN DECOMPOSITION
R(A, B, C) R1(A, B) R2(B, C)
decompose (projection) recover (natural join)
R’(A, B, C)
A schema decomposition is lossless-join if for any
initial instance R, R = R’
CS 564 [Fall 2021] – Paris Koutris 9
A natural join is a join on the same attribute names

THE CHASE ALGORITHM
The chase algorithm is a classic database technique that can be used to check for lossless-join decomposition
Running example
• relation R(A, B, C, D, E)
• FDs: 𝐴⟶𝐵,𝐶 𝐷⟶𝐸
Question: is the following decomposition lossless-join? R1(A, D) R2(A, B, C) R3(D, E)
CS 564 [Fall 2021] – Paris Koutris 10

CHASE: INITIALIZATION
• •
We create a table with the attributes of the original relation We add one row for each relation we split to
A
B
C
D
E
a
b1
c1
d
e1
a
b
c
d2
e2
a3
b3
c3
d
e
R1(A, D) R2(A, B, C)
R3(D, E)
Attribute not in the relation gets a subscript
Attribute in the relation – no subscript
CS 564 [Fall 2021] – Paris Koutris 11

CHASE: MAIN ALGORITHM
At every iteration, we check whether an FD is violated, and if so, we “force” it to hold
• •
If one has a subscript and the other not, we remove the subscript If both have a subscript, we make one subscript equal to the other
A
B
C
D
E
a
b1
c1
d
e1 e
a
b
c
d2
e2
a3
b3
c3
d
e
R1(A, D) R2(A, B, C)
R3(D, E)
𝐴 ⟶ 𝐵, 𝐶 𝐷⟶𝐸
The FD 𝐷 ⟶ 𝐸 is violated, so we need to drop the subscript from the first row
CS 564 [Fall 2021] – Paris Koutris
12

CHASE: MAIN ALGORITHM
𝐴 ⟶ 𝐵, 𝐶 𝐷⟶𝐸
A
B
C
D
E
a
b1
c1
d
e1
a
b
c
d2
e2
a3
b3
c3
d
e
A
B
C
D
E
a
b1
c1
d
e
a
b
c
d2
e2
a3
b3
c3
d
e
𝐷⟶𝐸
𝐴 ⟶ 𝐵, 𝐶
At the end of the chase:
• If there is a row without subscripts,
we can say that the decomposition
is lossless-join
• otherwise, it is not
A
B
C
D
E
a
b
c
d
e
a
b
c
d2
e2
a3
b3
c3
d
e
CS 564 [Fall 2021] – Paris Koutris
13

MORE EXAMPLES • relation R(A, B, C, D)
• FD𝐴⟶𝐵,𝐶 Lossless-join
• decomposition into R1(A, B, C) and R2(A, D) Not lossless-join
• decomposition into R1(A, B, C) and R2(D)
CS 564 [Fall 2021] – Paris Koutris 14
A
B
C
D
R1
R2

DEPENDENCY PRESERVING
Given R and a set of FDs F, we decompose R into R1
and R2. Suppose:
–R1 hasasetofFDsF1
–R2 hasasetofFDsF2
– F1 and F2 are computed from F
A decomposition is dependency preserving if by enforcing F1 over R1 and F2 over R2, we can enforce F over R
CS 564 [Fall 2021] – Paris Koutris 15

A NOTE ON FDS OF SPLIT RELATIONS
Given R and a set of FDs F, we decompose R into R1
and R2. How do we find the FDs F1 that hold for R1?
• It is not enough to only keep the FDs from F with attributes in R1
• Instead, we need to find the non-trivial FDs in the fd closure of F with attributes in R1
Example: R(A, B, C) with FDs: 𝐴 ⟶ 𝐵 B⟶ 𝐶 • ForR1(A,C) F1 =𝐴⟶𝐶
CS 564 [Fall 2021] – Paris Koutris 16

GOOD EXAMPLE
Person(SSN, name, age, canDrink) • 𝑆𝑆𝑁⟶𝑛𝑎𝑚𝑒,𝑎𝑔𝑒
• 𝑎𝑔𝑒 ⟶ 𝑐𝑎𝑛𝐷𝑟𝑖𝑛𝑘
decomposes into
• R1(SSN,name,age)
– 𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒
• R2(age,canDrink)
– 𝑎𝑔𝑒 ⟶ 𝑐𝑎𝑛𝐷𝑟𝑖𝑛𝑘
CS 564 [Fall 2021] – Paris Koutris 17

BAD EXAMPLE
R(A, B, C)
• 𝐴⟶𝐵
• 𝐵,𝐶⟶𝐴
Decomposes into:
• R1(A,B)
–𝐴⟶𝐵 • R2(A,C)
– no FDs here!!
R1 R2
A
B
a1
b
a2
b
A
C
a1
c
a2
c
recover
A
B
C
a1
b
c
a2
b
c
The recovered table
violates 𝐵, 𝐶 ⟶ 𝐴
CS 564 [Fall 2021] – Paris Koutris 18

NORMAL FORMS
A normal form represents a “good” schema design:
• 1NF (flat tables/atomic values)
• 2NF
• 3NF
• BCNF
• 4NF •…
more restrictive
CS 564 [Fall 2021] – Paris Koutris
19

BCNF DECOMPOSITION
CS 564 [Fall 2021] – Paris Koutris 20

BOYCE-CODD NORMAL FORM (BCNF)
A relation R is in BCNF if whenever 𝑋 ⟶ 𝐵 is a non-trivial FD, then X is a superkey in R
Equivalent definition: for every attribute set X • either𝑋(=𝑋
• or 𝑋( = 𝑎𝑙𝑙 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠
CS 564 [Fall 2021] – Paris Koutris 21

BCNF EXAMPLE 1
SSN
name
age
phoneNumber
934729837
Paris
24
608-374-8422
934729837
Paris
24
603-534-8399
123123645
John
30
608-321-1163
384475687
Arun
20
206-473-8221
𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒
• key={𝑆𝑆𝑁,𝑝h𝑜𝑛𝑒𝑁𝑢𝑚𝑏𝑒𝑟}
• 𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒,𝑎𝑔𝑒 is a “bad” FD
• TheaboverelationisnotinBCNF!
CS 564 [Fall 2021] – Paris Koutris 22

BCNF EXAMPLE 2
SSN
name
age
934729837
Paris
24
123123645
John
30
384475687
Arun
20
𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒 • key={𝑆𝑆𝑁}
• TheaboverelationisinBCNF!
CS 564 [Fall 2021] – Paris Koutris 23

BCNF EXAMPLE 3
SSN
phoneNumber
934729837
608-374-8422
934729837
603-534-8399
123123645
608-321-1163
384475687
206-473-8221
• key={𝑆𝑆𝑁,𝑝h𝑜𝑛𝑒𝑁𝑢𝑚𝑏𝑒𝑟}
• TheaboverelationisinBCNF!
• IsitpossiblethatabinaryrelationisnotinBCNF?
CS 564 [Fall 2021] – Paris Koutris 24

BCNF DECOMPOSITION
• FindanFDthatviolatestheBCNFcondition 𝐴!,𝐴),…,𝐴$ ⟶ 𝐵!,𝐵),…,𝐵”
• DecomposeRtoR1andR2: R1
remaining attributes
R2
B’s
A’s
• ContinueuntilnoBCNFviolationsareleft
CS 564 [Fall 2021] – Paris Koutris 25

EXAMPLE
SSN
name
age
phoneNumber
934729837
Paris
24
608-374-8422
934729837
Paris
24
603-534-8399
123123645
John
30
608-321-1163
384475687
Arun
20
206-473-8221
• TheFD𝑆𝑆𝑁⟶𝑛𝑎𝑚𝑒,𝑎𝑔𝑒violatesBCNF • SplitintotworelationsR1,R2asfollows:
R1 name age
R2
SSN
phoneNumber
CS 564 [Fall 2021] – Paris Koutris
26

EXAMPLE CONT’D
R1 name age
𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒
R2
SSN
phoneNumber
SSN
name
age
934729837
Paris
24
123123645
John
30
384475687
Arun
20
SSN
phoneNumber
934729837
608-374-8422
934729837
603-534-8399
123123645
608-321-1163
384475687
206-473-8221
CS 564 [Fall 2021] – Paris Koutris
27

BCNF DECOMPOSITION PROPERTIES
The BCNF decomposition:
– removes certain types of redundancy – is lossless-join
– is not always dependency preserving
CS 564 [Fall 2021] – Paris Koutris 28

BCNF IS LOSSLESS-JOIN
Example:
R(A, B, C) with 𝐴 ⟶ 𝐵 decomposes into: R1(A, B) and R2(A, C)
• TheBCNFdecompositionalwayssatisfiesthe lossless-join criterion!
CS 564 [Fall 2021] – Paris Koutris 29

BCNF IS NOT DEPENDENCY PRESERVING
R(A, B, C)
• 𝐴⟶𝐵
• 𝐵,𝐶⟶𝐴
The BCNF decomposition is: • R1(A,B)withFD𝐴⟶𝐵
• R2(A,C)withnoFDs
There may not exist any BCNF decomposition that is FD preserving!
CS 564 [Fall 2021] – Paris Koutris 30

BCNF EXAMPLE (1)
Books (author, gender, booktitle, genre, price)
• 𝑎𝑢𝑡h𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟
• 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒
What is the candidate key?
• (author, booktitle) is the only one!
Is is in BCNF?
• No, because the left hand side of both (not trivial) FDs is not a superkey!
CS 564 [Fall 2021] – Paris Koutris 31

BCNF EXAMPLE (2)
Books (author, gender, booktitle, genre, price)
• 𝑎𝑢𝑡h𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟
• 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒
Splitting Books using the FD 𝑎𝑢𝑡h𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟:
• Author (author, gender)
FD: 𝑎𝑢𝑡h𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟 in BCNF!
• Books2 (authos, booktitle, genre, price)
FD: 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒 not in BCNF!
CS 564 [Fall 2021] – Paris Koutris 32

BCNF EXAMPLE (3)
Books (author, gender, booktitle, genre, price)
• 𝑎𝑢𝑡h𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟
• 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒
Splitting Books using the FD 𝑎𝑢𝑡h𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟:
• •
Author (author, gender)
FD: 𝑎𝑢𝑡h𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟 in BCNF!
Splitting Books2 (author, booktitle, genre, price): – BookInfo (booktitle, genre, price)
FD: 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒 in BCNF! – BookAuthor (author, booktitle) in BCNF!
CS 564 [Fall 2021] – Paris Koutris 33

IS NORMALIZATION ALWAYS GOOD?
• Example:supposeAandBarealwaysused together, but normalization says they should be in different tables
– decomposition might produce unacceptable performance loss
• Example:datawarehouses
– huge historical DBs, rarely updated after creation
– joins expensive or impractical
CS 564 [Fall 2021] – Paris Koutris 34