CS计算机代考程序代写 database algorithm DECOMPOSITION &

DECOMPOSITION &
SCHEMA NORMALIZATION

CS 564

WHAT IS THIS LECTURE ABOUT?

• Bad schemas lead to redundancy
• To “correct” bad schemas: decompose relations
– lossless-join
– dependency preserving

• BCNF : a desired normal forms

2CS 564 [Fall 2021] – Paris Koutris

DB DESIGN THEORY

• Helps us identify the “bad” schemas and improve them
1. express constraints on the data: functional

dependencies (FDs)
2. use the FDs to decompose the relations

• The process, called normalization, obtains a schema in a
“normal form” that guarantees certain properties
– examples of normal forms: BCNF, 3NF, …

3CS 564 [Fall 2021] – Paris Koutris

SCHEMA DECOMPOSITION

4CS 564 [Fall 2021] – Paris Koutris

WHAT IS A DECOMPOSITION?

We decompose a relation R(A1, …, An) by creating
• R1(B1, .., Bm)
• R2(C1,…, Ck)
where {𝐵!, … , 𝐵”} ∪ {𝐶!, … , 𝐶#} = {𝐴!, …𝐴$}

• The instance of R1 is the projection of R onto B1, .., Bm
• The instance of R2 is the projection of R onto C1, .., Cl

5CS 564 [Fall 2021] – Paris Koutris

In general we can decompose a relation into multiple relations.

EXAMPLE: DECOMPOSITION

SSN name age
934729837 Paris 24
123123645 John 30
384475687 Arun 20

SSN name age phoneNumber
934729837 Paris 24 608-374-8422
934729837 Paris 24 603-534-8399
123123645 John 30 608-321-1163
384475687 Arun 20 206-473-8221

SSN phoneNumber
934729837 608-374-8422
934729837 603-534-8399
123123645 608-321-1163
384475687 206-473-8221

6CS 564 [Fall 2021] – Paris Koutris

DECOMPOSITION DESIDERATA

What should a good decomposition achieve?

1. minimize redundancy
2. avoid information loss (lossless-join)
3. preserve the FDs (dependency preserving)
4. ensure good query performance

7CS 564 [Fall 2021] – Paris Koutris

EXAMPLE: INFORMATION LOSS

8CS 564 [Fall 2021] – Paris Koutris

name age phoneNumber
Paris 24 608-374-8422
John 24 608-321-1163
Arun 20 206-473-8221

Decompose into:
R1(name, age)
R2(age, phoneNumber)

name age
Paris 24
John 24
Arun 20

age phoneNumber
24 608-374-8422
24 608-321-1163
20 206-473-8221

We can’t figure out which phoneNumber
corresponds to which person!

LOSSLESS-JOIN DECOMPOSITION

9CS 564 [Fall 2021] – Paris Koutris

R(A, B, C)

R1(A, B) R2(B, C)

decompose (projection)

R’(A, B, C)

recover (natural join)

A schema decomposition is lossless-join if for any
initial instance R, R = R’

A natural join is a join on the
same attribute names

THE CHASE ALGORITHM

The chase algorithm is a classic database technique that can
be used to check for lossless-join decomposition

Running example
• relation R(A, B, C, D, E)
• FDs: 𝐴 ⟶ 𝐵, 𝐶 𝐷 ⟶ 𝐸

Question: is the following decomposition lossless-join?
R1(A, D) R2(A, B, C) R3(D, E)

10CS 564 [Fall 2021] – Paris Koutris

CHASE: INITIALIZATION

• We create a table with the attributes of the original relation
• We add one row for each relation we split to

11CS 564 [Fall 2021] – Paris Koutris

A B C D E
a b1 c1 d e1

a b c d2 e2

a3 b3 c3 d e

R1(A, D)

R2(A, B, C)

R3(D, E)

Attribute not in the relation
gets a subscript

Attribute in the relation – no subscript

CHASE: MAIN ALGORITHM

At every iteration, we check whether an FD is violated, and if
so, we “force” it to hold
• If one has a subscript and the other not, we remove the subscript
• If both have a subscript, we make one subscript equal to the other

12CS 564 [Fall 2021] – Paris Koutris

A B C D E
a b1 c1 d e1 e

a b c d2 e2

a3 b3 c3 d e

R1(A, D)

R2(A, B, C)

R3(D, E)

𝐴 ⟶ 𝐵, 𝐶
𝐷 ⟶ 𝐸

The FD 𝐷 ⟶ 𝐸 is violated, so we need to
drop the subscript from the first row

CHASE: MAIN ALGORITHM

13CS 564 [Fall 2021] – Paris Koutris

A B C D E
a b1 c1 d e
a b c d2 e2
a3 b3 c3 d e

𝐴 ⟶ 𝐵, 𝐶
𝐷 ⟶ 𝐸

A B C D E
a b c d e
a b c d2 e2
a3 b3 c3 d e

𝐴 ⟶ 𝐵, 𝐶

A B C D E
a b1 c1 d e1
a b c d2 e2
a3 b3 c3 d e

𝐷 ⟶ 𝐸

At the end of the chase:
• If there is a row without subscripts,

we can say that the decomposition
is lossless-join

• otherwise, it is not

MORE EXAMPLES

• relation R(A, B, C, D)
• FD 𝐴 ⟶ 𝐵, 𝐶

Lossless-join
• decomposition into R1(A, B, C) and R2(A, D)

Not lossless-join
• decomposition into R1(A, B, C) and R2(D)

14CS 564 [Fall 2021] – Paris Koutris

A B C D
R1

R2

DEPENDENCY PRESERVING

Given R and a set of FDs F, we decompose R into R1
and R2. Suppose:
– R1 has a set of FDs F1
– R2 has a set of FDs F2
– F1 and F2 are computed from F

A decomposition is dependency preserving if by
enforcing F1 over R1 and F2 over R2, we can enforce F
over R

15CS 564 [Fall 2021] – Paris Koutris

A NOTE ON FDS OF SPLIT RELATIONS

Given R and a set of FDs F, we decompose R into R1
and R2. How do we find the FDs F1 that hold for R1?
• It is not enough to only keep the FDs from Fwith attributes
in R1

• Instead, we need to find the non-trivial FDs in the fd
closure of Fwith attributes in R1

Example: R(A, B, C) with FDs: 𝐴 ⟶ 𝐵 B⟶ 𝐶
• For R1(A, C) F1 = 𝐴 ⟶ 𝐶

16CS 564 [Fall 2021] – Paris Koutris

GOOD EXAMPLE

Person(SSN, name, age, canDrink)
• 𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒
• 𝑎𝑔𝑒 ⟶ 𝑐𝑎𝑛𝐷𝑟𝑖𝑛𝑘

decomposes into
• R1(SSN, name, age)
– 𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒

• R2(age, canDrink)
– 𝑎𝑔𝑒 ⟶ 𝑐𝑎𝑛𝐷𝑟𝑖𝑛𝑘

17CS 564 [Fall 2021] – Paris Koutris

BAD EXAMPLE

R(A, B, C)
• 𝐴 ⟶ 𝐵
• 𝐵, 𝐶 ⟶ 𝐴

Decomposes into:
• R1(A, B)
– 𝐴 ⟶ 𝐵

• R2(A, C)
– no FDs here!!

18CS 564 [Fall 2021] – Paris Koutris

A B
a1 b
a2 b

R1
A C
a1 c
a2 c

R2

recover

A B C
a1 b c
a2 b c

The recovered table
violates 𝐵, 𝐶 ⟶ 𝐴

NORMAL FORMS

19CS 564 [Fall 2021] – Paris Koutris

A normal form represents a “good” schema design:

• 1NF (flat tables/atomic values)
• 2NF
• 3NF
• BCNF
• 4NF
• …

more
restrictive

BCNF DECOMPOSITION

20CS 564 [Fall 2021] – Paris Koutris

BOYCE-CODD NORMAL FORM (BCNF)

Equivalent definition: for every attribute set X
• either 𝑋( = 𝑋
• or 𝑋( = 𝑎𝑙𝑙 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠

21

A relation R is in BCNF if whenever 𝑋 ⟶ 𝐵 is
a non-trivial FD, then X is a superkey in R

CS 564 [Fall 2021] – Paris Koutris

BCNF EXAMPLE 1

22

SSN name age phoneNumber
934729837 Paris 24 608-374-8422
934729837 Paris 24 603-534-8399
123123645 John 30 608-321-1163
384475687 Arun 20 206-473-8221

𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒

• key = {𝑆𝑆𝑁, 𝑝ℎ𝑜𝑛𝑒𝑁𝑢𝑚𝑏𝑒𝑟}
• 𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒 is a “bad” FD
• The above relation is not in BCNF!

CS 564 [Fall 2021] – Paris Koutris

BCNF EXAMPLE 2

23

𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒

• key = {𝑆𝑆𝑁}
• The above relation is in BCNF!

SSN name age
934729837 Paris 24
123123645 John 30
384475687 Arun 20

CS 564 [Fall 2021] – Paris Koutris

BCNF EXAMPLE 3

24

• key = {𝑆𝑆𝑁, 𝑝ℎ𝑜𝑛𝑒𝑁𝑢𝑚𝑏𝑒𝑟}
• The above relation is in BCNF!
• Is it possible that a binary relation is not in BCNF?

SSN phoneNumber
934729837 608-374-8422
934729837 603-534-8399
123123645 608-321-1163
384475687 206-473-8221

CS 564 [Fall 2021] – Paris Koutris

BCNF DECOMPOSITION

• Find an FD that violates the BCNF condition
𝐴!, 𝐴), … , 𝐴$ ⟶ 𝐵!, 𝐵), …, 𝐵”

• Decompose R to R1 and R2:

• Continue until no BCNF violations are left
25

A’sB’s remaining
attributes

R1 R2

CS 564 [Fall 2021] – Paris Koutris

EXAMPLE
SSN name age phoneNumber
934729837 Paris 24 608-374-8422
934729837 Paris 24 603-534-8399
123123645 John 30 608-321-1163
384475687 Arun 20 206-473-8221

26

• The FD 𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒 violates BCNF
• Split into two relations R1, R2 as follows:

SSN
name

phoneNumber

R1 R2

age

CS 564 [Fall 2021] – Paris Koutris

EXAMPLE CONT’D

SSN name age
934729837 Paris 24
123123645 John 30
384475687 Arun 20

SSN phoneNumber
934729837 608-374-8422
934729837 603-534-8399
123123645 608-321-1163
384475687 206-473-8221

27

SSN
name

phoneNumber

R1 R2

age

𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒

CS 564 [Fall 2021] – Paris Koutris

BCNF DECOMPOSITION PROPERTIES

The BCNF decomposition:
– removes certain types of redundancy
– is lossless-join
– is not always dependency preserving

28CS 564 [Fall 2021] – Paris Koutris

BCNF IS LOSSLESS-JOIN

Example:
R(A, B, C) with 𝐴 ⟶ 𝐵 decomposes into:
R1(A, B) and R2(A, C)

• The BCNF decomposition always satisfies the
lossless-join criterion!

29CS 564 [Fall 2021] – Paris Koutris

BCNF IS NOT DEPENDENCY PRESERVING

30CS 564 [Fall 2021] – Paris Koutris

R(A, B, C)
• 𝐴 ⟶ 𝐵
• 𝐵, 𝐶 ⟶ 𝐴

The BCNF decomposition is:
• R1(A, B) with FD 𝐴 ⟶ 𝐵
• R2(A, C) with no FDs

There may not exist any BCNF
decomposition that is FD preserving!

BCNF EXAMPLE (1)

Books (author, gender, booktitle, genre, price)
• 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟
• 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒

What is the candidate key?
• (author, booktitle) is the only one!

Is is in BCNF?
• No, because the left hand side of both (not trivial) FDs
is not a superkey!

31CS 564 [Fall 2021] – Paris Koutris

BCNF EXAMPLE (2)

Books (author, gender, booktitle, genre, price)
• 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟
• 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒

Splitting Books using the FD 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟:
• Author (author, gender)
FD: 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟 in BCNF!

• Books2 (authos, booktitle, genre, price)
FD: 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒 not in BCNF!

32CS 564 [Fall 2021] – Paris Koutris

BCNF EXAMPLE (3)

Books (author, gender, booktitle, genre, price)
• 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟
• 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒

Splitting Books using the FD 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟:
• Author (author, gender)
FD: 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟 in BCNF!

• Splitting Books2 (author, booktitle, genre, price):
– BookInfo (booktitle, genre, price)
FD: 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒 in BCNF!

– BookAuthor (author, booktitle) in BCNF!

33CS 564 [Fall 2021] – Paris Koutris

IS NORMALIZATION ALWAYS GOOD?

• Example: suppose A and B are always used
together, but normalization says they should be in
different tables
– decomposition might produce unacceptable
performance loss

• Example: data warehouses
– huge historical DBs, rarely updated after creation
– joins expensive or impractical

34CS 564 [Fall 2021] – Paris Koutris