程序代写代做代考 database ER algorithm flex Functional Dependencies Microsoft PowerPoint – 12- DatabaseDesign_SchemaRefinementv2.pptx [Autosaved]

Microsoft PowerPoint – 12- DatabaseDesign_SchemaRefinementv2.pptx [Autosaved]

© 2018 A. Alawini & A. Parameswaran

Database Design:
Normal Forms

Abdu Alawini
University of Illinois at Urbana-Champaign

CS411: Database Systems

October 10, 2018

1

© 2018 A. Alawini & A. Parameswaran

Announcements

• Project Track 1 – Stage 2
And

• Project Track 2 – Stage 1
are due Today!

• HW 2 is due Oct 15 (23:59)

2

© 2018 A. Alawini & A. Parameswaran

•Conceptual design: (ER & UML Models are used
for this.)
• What are the entities and relationships we need?

•Logical design:
• Transform ER design to Relational Schema

•Schema Refinement: (Normalization)
• Check relational schema for redundancies and

related anomalies.

•Physical Database Design and Tuning:
• Consider typical workloads; (sometimes) modify the

database design; select file types and indexes.

Overview of Database Design

3

© 2018 A. Alawini & A. Parameswaran

Last time!

How do we obtain a good design?
Functional Dependencies and Keys
Rules about Functional Dependencies
Splitting/Combination
Trivial Dependencies
Attribute Closure
• FD Closure

4

© 2018 A. Alawini & A. Parameswaran

Recap: Functional Dependencies

• A form of constraint (hence, part of the schema)
• Finding them is part of the database design
• Used heavily in schema refinement
• Holds for ALL instances!
Definition:

If two tuples agree on the attributes

A , A , … A
1 2 n

then they must also agree on the attributes

B , B , … B 1 2 m

Formally:

A , A , … A
1 2 n

B , B , … B 1 2 m

Where have we seen
this before?

5

© 2018 A. Alawini & A. Parameswaran

Recap: Keys are a type of FD

• Key of a relation R is a set of attributes that
• functionally determines all attributes of R
• none of its subsets determines all attributes of R

• There could be many keys of a relation
• Student (UIN, email, dept, age)
• UIN  UIN, email, dept, age
• email  UIN, email, dept, age

• Superkey
• “Superset” of key
• a set of attributes that contains a key
• Any examples for student?

6

© 2018 A. Alawini & A. Parameswaran

Today!

•Closure of a set of functional dependencies
•Armstrong’s axioms

•Boyce-Codd Normal Form
•Third Normal Form

7

© 2018 A. Alawini & A. Parameswaran

Closure of a set of FDs

• Given a relation schema R & a set S of FDs
• Closure of S: S+ = all FDs logically implied by S
• Allows us to answer all questions of the type

• is the FD f logically implied by S?
• Example

• R = {A,B,C,G,H,I}
• S = A B; A C; CG  H; CG  I; B  H
• would A  H be logically implied?
• yes (you can prove this, using the definition of FD)

• How to compute S+?

8

© 2018 A. Alawini & A. Parameswaran

Computing S+

•To check if a specific A  B is true, we can compute A+
• We already have an algorithm for that

•To compute all A  B implied by S, i.e., to compute the
closure of S, we can use a particular algorithm.

• This algorithm depends on the so-called “Armstrong’s axioms”.
• These axioms form a complete description of FD rules

9

© 2018 A. Alawini & A. Parameswaran

Armstrong’s Axioms

•Reflexivity rule
•A1A2…An  a subset of A1A2…An

•Augmentation rule
•A1A2…An  B1B2…Bm, then

A1A2…An C1C2..Ck  B1B2…Bm C1C2…Ck

•Transitivity rule
•A1A2…An  B1B2…Bm and

B1B2…Bm C1C2…Ck, then
A1A2…An  C1C2…Ck

10

© 2018 A. Alawini & A. Parameswaran

Inferring S+ using Armstrong’s Axioms

• S+ = S
• Loop until S+ does not change any further

• For each f in S+, apply reflexivity & augmentation rules
• Add the new FDs to S+
• For each pair of FDs in S+, apply the transitivity rule
• Add the new FD to S+

This procedure could lead to LOTS of rules!!

• You have n attributes, with one FD valid, A  B
• Exponential in n. Why?

11

© 2018 A. Alawini & A. Parameswaran

Two Simple Rules using Armstrong’s
Axioms

• Armstrong’s axioms
• Reflexivity: A  subset of A
• Transitivity: A  B, B  C implies A  C
• Augmentation: A  B implies AC  BC

• Simple Rules
• Trivial rule – same as reflexivity
• Combining/splitting rule be expressed as well?

• X  Y and X  Z, then X YZ
• X  YZ then X Y

12

© 2018 A. Alawini & A. Parameswaran

Two Simple Rules in Armstrong’s Axioms

• Armstrong’s axioms
• Reflexivity: A  subset of A
• Transitivity: A  B, B  C implies A  C
• Augmentation: A  B implies AC  BC

• Simple Rules
• Trivial rule – same as reflexivity
• Combining rule be expressed as well?

• To show: X Y and X Z, then X YZ
• XZ YZ (aug) and X  XZ (aug)
• X  YZ (trans)

13

© 2018 A. Alawini & A. Parameswaran

Two Simple Rules in Armstrong’s Axioms

• Armstrong’s axioms
• Reflexivity: A  subset of A
• Transitivity: A  B, B  C implies A  C
• Augmentation: A  B implies AC  BC

• Simple Rules
• Trivial rule – same as reflexivity
• Splitting rule be expressed as well?

• To show: X YZ then X Y
• YZ Y (reflex); then XY (trans)

14

© 2018 A. Alawini & A. Parameswaran

Outline of Rules

• Two examples of rules
• Splitting/Combination
• Trivial Dependencies

• Attribute Closure
• Algorithm
• Uses

• FD Closure
• A complete set of rules:

• Armstrong’s axioms
• Algorithm

15

© 2018 A. Alawini & A. Parameswaran

But we were talking about schema design

•Armed with the concepts of “functional dependency” and
“superkey” (and tools to reason about them), we will now
define what a “good schema” is.

16

© 2018 A. Alawini & A. Parameswaran
17

Normal Forms

First Normal Form = all attributes are atomic
Second Normal Form (2NF) = old and obsolete

Boyce Codd Normal Form (BCNF)
Third Normal Form (3NF)
Fourth Normal Form (4NF)

Others…

© 2018 A. Alawini & A. Parameswaran 18

Boyce-Codd Normal Form

A simple condition for removing anomalies from relations:

A relation R is in BCNF if and only if:

Whenever there is a nontrivial FD: A1A2…An B,

then A1A2…An is a super-key for R.

In English (though a bit vague):

Whenever a set of attributes of R is determining another attribute,

it should determine all attributes of R.

© 2018 A. Alawini & A. Parameswaran 19

Boyce-Codd Normal Form

A simple condition for removing anomalies from relations:

A relation R is in BCNF if and only if:

Whenever there is a nontrivial FD: A1A2…An B,

then A1A2…An is a super-key for R.

Why does this make sense?

Say R(A, B, C) with AB as the key has an FD: AC.

Then C is being repeated for multiple Bs

© 2018 A. Alawini & A. Parameswaran 20

Example

Name SSN Phone Number

Fred 123-321-99 (201) 555-1234

Fred 123-321-99 (206) 572-4312

Joe 909-438-44 (908) 464-0028

Joe 909-438-44 (212) 555-4000

What are the dependencies?

SSN  Name

Is the left side a superkey?

No

Is it in BCNF?

No.

© 2018 A. Alawini & A. Parameswaran 21

Decompose it into BCNF

SSN Name

123-321-99 Fred

909-438-44 Joe

SSN Phone Number

123-321-99 (201) 555-1234

123-321-99 (206) 572-4312

909-438-44 (908) 464-0028

909-438-44 (212) 555-4000

SSN  Name

Now is it in BCNF?

© 2018 A. Alawini & A. Parameswaran 22

BCNF Decomposition

Find a dependency that violates the BCNF condition:

A , A , … A
1 2 n

B , B , … B
1 2 m

A’sOthers B’s

R2 R1

Heuristic : choose B , B , … B “as large as possible”1 2 m

Decompose:

Continue until

there are no

BCNF violations

left.

© 2018 A. Alawini & A. Parameswaran 23

Example Decomposition

Name SSN Age EyeColor PhoneNumber

Functional dependencies:

SSN  Name, Age, Eye Color

Person:

BCNF: Person1(SSN, Name, Age, EyeColor),

Person2(SSN, PhoneNumber)

© 2018 A. Alawini & A. Parameswaran

Example

Same example, slightly more complex.

Person (Name, SSN, Age, EyeColor, Phone, Draftworthy)

• FD 1: SSN  Name, Age, EyeColor
• FD 2: Age Draftworthy

24

© 2018 A. Alawini & A. Parameswaran

Example

• Person (Name, SSN, Age, EyeColor, Phone, Draftworthy)
• FD 1: SSN  Name, Age, EyeColor
• FD 2: Age Draftworthy

• FD 1 and 2 imply SSN  Name, Age, EyeColor, Draftworthy
• Split based on this

• (SSN, Name, Age, EyeColor, Draftworthy)
• (SSN, Phone Number)

• Split based on Age  Draftworthy
• (SSN, Name, Age, EyeColor)
• (Age, Draftworthy)
• (SSN, Phone Number)

25

© 2018 A. Alawini & A. Parameswaran

Example

•Movie (title, yr, length, genre, studioName, starName)
•(Title, year, starName) is a key
•FD: Title, year  length, genre, studioName

26

© 2018 A. Alawini & A. Parameswaran

Example

•Movie (title, yr, length, genre, studioName, starName)
•(Title, year, starName) is a key
•FD: Title, year  length, genre, studioName

• (Title, Year, Length, Genre, StudioName)
• (Title, Year, StarName)

27

© 2018 A. Alawini & A. Parameswaran

Example

•Movie (title, yr, studioName, President, PresAddr)
•FD: Title, yr studioName
•FD: studioName President
•FD: President  PresAddr

28

© 2018 A. Alawini & A. Parameswaran

Example

• Movie (title, yr, studioName, president, presAddr)
• FD: Title, yr studioName
• FD: studioName president
• FD: president  presAddr
(title, yr, studioName, president)

(president, presidentaddr)

(title, yr, studioName)

(studioName, president)

(president, presidentaddr)

29

© 2018 A. Alawini & A. Parameswaran

Two-attribute relations

• Let A and B be the only two attributes of R

• Claim: R is in BCNF. (See Example 3.17.)

• If A  B is true, B A is not:

• If B A is true, A  B is not:

• If A  B is true, B A is true:
30

© 2018 A. Alawini & A. Parameswaran

Two-attribute relations

• Let A and B be the only two attributes of R

• Claim: R is in BCNF. (See Example 3.17.)

• If A  B is true, B A is not:
• A  B does not violate BCNF

• If B A is true, A  B is not:
• Symmetric

• If A  B is true, B A is true:
• Both are keys, therefore neither violate BCNF

31

© 2018 A. Alawini & A. Parameswaran
32

BCNF Decomposition: The Algorithm

Input: relation R, set S of FDs over R

1) Check if R is in BCNF, if not:

a) pick a violation FD f: A  B

b) compute A+

c) create R1 = A+, R2 = A union (R – A+)

d) compute all FDs over R1, using R and S.
Repeat similarly for R2. (See Algorithm 3.12)

e) Repeat Step 1 for R1 and R2

2) Stop when all relations are BCNF, or are
two-attributes

(Remember, two attribute relations are always in BCNF)

© 2018 A. Alawini & A. Parameswaran
33

Q: Is BCNF Decomposition unique?

• R(SSN, netid, phone).
• FD1: SSN -> netid
• FD2: netid -> SSN
• Each of these two FDs violates BCNF.
Can you tell me two different BCNF decomp for R?

• Pick FD1 and decompose, you get:
• (SSN, netid); (SSN, phone).

• Pick FD2 and you get
• (netid, SSN); (netid, phone).

© 2018 A. Alawini & A. Parameswaran
34

Properties of BCNF

• Property 1. BCNF removes certain types of redundancies
• those caused by adding many-many or one-many relationships

• For examples of redundancy that it cannot remove, see “multi-valued
redundancy”

• But this is not in curriculum

© 2018 A. Alawini & A. Parameswaran
35

Properties of BCNF

• BCNF Decomposition avoids information loss
• You can construct the original relation instance from the decomposed

relations’ instances.

• How would get R(A, B, C) from R(A, B), R(B, C)?
• Ans: Natural join

© 2018 A. Alawini & A. Parameswaran

An easy decomposition?

• We saw that two-attribute relations are in BCNF.

• Why don’t we break any R(A,B,C,D,E) into R1(A,B); R2(B,C);
R3(C,D); R4(D,E)? Why bother with finding BCNF violations etc.?

• Turns out, this leads to information loss …

36

© 2018 A. Alawini & A. Parameswaran

Example of the “easy decomposition”
• R = (A,B,C); decomposed into R1(A,B); R2(B,C)

37

A B C

1 2 3

4 2 6

A B

1 2

4 2

B C

2 3

2 6

© 2018 A. Alawini & A. Parameswaran

Example of the “easy decomposition”
• R = (A,B,C); decomposed into R1(A,B); R2(B,C)

38

A B C

1 2 3

4 2 6

A B

1 2

4 2

B C

2 3

2 6

A B C

1 2 3

4 2 6

1 2 6

4 2 3

Nat.Join

We get back some “bogus tuples” !

© 2018 A. Alawini & A. Parameswaran 39

BCNF Decomposition is Lossless

Example R(A, B, C), A -> C

BCNF: R1(A,B), R2(A,C)

Some tuple (a,b,c) in R

decomposes into (a,b) in R1

and (a,c) in R2

Recover tuples in R: (a,b,c),

© 2018 A. Alawini & A. Parameswaran 40

BCNF Decomposition is Lossless

Example R(A, B, C), A -> C

BCNF: R1(A,B), R2(A,C)

Some tuple (a,b,c) in R (a,b’,c’) also in R

decomposes into (a,b) in R1 (a,b’) also in R1

and (a,c) in R2 (a,c’) also in R2

Recover tuples in R: (a,b,c),

© 2018 A. Alawini & A. Parameswaran 41

BCNF Decomposition is Lossless

Example R(A, B, C), A -> C

BCNF: R1(A,B), R2(A,C)

Some tuple (a,b,c) in R (a,b’,c’) also in R

decomposes into (a,b) in R1 (a,b’) also in R1

and (a,c) in R2 (a,c’) also in R2

Recover tuples in R: (a,b,c), (a,b,c’), (a,b’,c), (a,b’,c’) also in R

Is any of these a “bogus tuple” (not present in R)?

© 2018 A. Alawini & A. Parameswaran 42

BCNF Decomposition is Lossless

Example R(A, B, C), A -> C

BCNF: R1(A,B), R2(A,C)

Some tuple (a,b,c) in R (a,b’,c’) also in R

decomposes into (a,b) in R1 (a,b’) also in R1

and (a,c) in R2 (a,c’) also in R2

Recover tuples in R: (a,b,c), (a,b,c’), (a,b’,c), (a,b’,c’) also in R

Is any of these a “bogus tuple” (not present in R)?

No! Also see text 3.4.1 for proof. (Essentially an extension of this arg)

© 2018 A. Alawini & A. Parameswaran
43

Lossless Decompositions

A decomposition is lossless if we can recover:

R(A,B,C)

{ R1(A,B) , R2(A,C) }

R’(A,B,C) = R(A,B,C)

R’ is in general larger than R. Why?

Must ensure R’ = R

Decompose

Recover

© 2018 A. Alawini & A. Parameswaran

✔1) minimize redundancy
✔2) avoid info loss
3) preserve dependency
4) ensure good query performance

44

Desirable Properties of Schema Refinement

© 2018 A. Alawini & A. Parameswaran
45

However,

•BCNF is not always dependency preserving
•In fact, some times we cannot find a BCNF

decomposition that is dependency preserving

•Can handle this situation using 3NF
•But what is “dependency preserving”?

© 2018 A. Alawini & A. Parameswaran
46

Normal Forms

First Normal Form = all attributes are atomic
Second Normal Form (2NF) = old and obsolete

Boyce Codd Normal Form (BCNF)
Third Normal Form (3NF)
Fourth Normal Form (4NF)

Others…

© 2018 A. Alawini & A. Parameswaran 47

3NF: A Problem with BCNF

Phone Address Name

FD’s: Phone -> Address; Address, Name -> Phone

So, there is a BCNF violation (Phone Address), and we decompose.
Phone Address

Phone Name

Phone -> Address

No FDs

© 2018 A. Alawini & A. Parameswaran 48

So where’s the problem?
Phone Address Phone Name

1234 10 Downing 1234 John

5678 10 Downing 5678 John

FD’s: Phone-> Address; Address, Name -> Phone

No problem so far. All local FD’s are satisfied.

Let’s put all the data into a single table:

Phone Address Name

1234 10 Downing John

5678 10 Downing John

Violates the dependency: Address, Name  Phone