程序代写代做代考 C Java database Functional Dependencies algorithm CS157A: Introduction to Database Management Systems

CS157A: Introduction to Database Management Systems
Chapter 3. Design Theory for Relational Databases
1

Database Design
• Designingarelationaldatabaseschemaforan application.
• Ifadatabaseschemahasflaws,itwillcause problems referred to as “design anomalies”.
• Often,thedesignanomaliesarecausedbytoo much information is crammed into a single relation.
• “Dependencies” can imply what makes a good relational database schema and what we can do about a schema if it has flaws.
2

Design Anomalies
• Anomalies: problems caused by too much information is crammed into a single relation
– Redundancy – capturing info multiple times
– Update Anomalies – forget to update in a tuple
– Deletion Anomalies – deleting a tuple causes a loss of other information as a side effect
3

Solution: Normalization
The goal of normalization is to decompose a relation into several in a way that the decomposition will have
• Elimination of Anomalies
• Recoverability of Information • Preservation of Dependencies
4

Normalization
• 1NF2NF3NFBCNF4NF (optional)
• The higher NF implies the lower NF.
• These days, there is little use for 1NF and 2NF
• BCNF will be covered in detail.
• 3NF is more relaxed condition than BCNF to allow a relational schema that cannot be decomposed into BCNF relations without losing both the lossless-join and dependency –preservation properties
• The most commonly used and easiest normal form to get to that will give you a good database is known to be third normal form (3NF).
5

BCNF Normalization
Relation +
its functional dependencies
Relations in BCNF
normalization
6

Idea of Normalization
• Define BCNF in terms of FD and key.
• From a given mega relation, discover all true FD’s: closure algorithm
• Identify BCNF violations and decompose relations until no BCNF violation exists
7

Functional Dependencies • Definition
– In a relation R, a set of attributes A is said to functionally determine another set of attributes B (AB), if two tuples of R agree on A then they also agree on B.
– R satisfies a FD if the FD is true for every instance of R
• Implication
– Each A value is associated with precisely one B value.
– If the A value is known, then the B value corresponding to A can be determined by looking up in any tuple of R containing the A value.
8

Functional Dependencies
Suppose a relational schema is (A, B, C) and AB
A
B
C
a
b
c1
a
b
c2
9

Example: Functional Dependencies
Based on the knowledge of the real world:
Movies1(title, year, length, genre, studioName, starName)
title
year
length
genre
studioName
starName
Star Wars
1977
124
SciFi
Fox
Carrie Fisher
Star Wars
1977
124
SciFi
Fox
Mark Hamill
Star Wars
1977
124
SciFi
Fox
Harrison Ford
Gone With the Wind
1939
231
drama
MGM
Vivien Leigh
Wayne’s World
1992
95
comedy
Paramount
Dana Carvey
Wayne’s World
1992
95
comedy
Paramount
Mike Meyers
title year  length genre studioName (o) title year  starName (x)
10

Keys
• In a relation R with no duplicates, if Aall other attributes, then A is a key of R.
• Minimal key: no proper subset of A functionally determines all other attributes
• Super key: a set of attributes that contains a key.
11

Example: Keys
[Q] Is {title, year, starName} a key for Movie1 ?
[A] Yes. {title, year, starName} functionally determines all other attributes of Movie1.
[Q] Is the key minimal?
[A] Yes. No proper subset of the key can functionally determine all other attributes.
{title} {year} {starName} {title, year} {title, starName}{year, starName}
12

Functional Dependency Rules
With a given set of FDs, we can deduce other functional dependencies using following rules.
• Transitive rule
• Splitting rule
• Combining rule
• Trivial dependency rules (two of them)
13

Transitive Rule
Example: If we are told that a relation R (A, B, C) satisfies the FD’s AB and BC, we can deduce that R also satisfies the FD AC.
• Let (a, b1, c1) and (a, b2, c2) be two tuples that agree on attribute A.
• Since R satisfies A→B it follows that b1=b2 so the tuples are: (a, b, c1) and (a, b, c2)
• Similarly, since R satisfies B→C and the tuples agree on B they will agree also on C. So, c1=c2.
14

The Splitting/Combining Rule involving the right side of FDs
A1A2…An→B1 Combining Rule A1A2…An→B2
…A1A2…An→Bm Splitting Rule
A1A2…An→B1B2…Bm.
15

title yearlength title yeargenre title year
studioName
Combining Rule
title year
length genre
studioName
Example: The Splitting/Combining Rule involving the right side of FDs
Splitting Rule
16

Can we split the left side ?
[Q]
From, title yearlength, can we deduce titlelength (false FD)
yearlength (false FD) ?
[A] No
17

Trivial Functional Dependencies
• AfunctionaldependencyA1A2…An→BistrivialifBisa subset of A
Example: title year →title
• AfunctionaldependencyA1A2…An→B1B2…Bmis:
– Nontrivial if at least one of the B’s is not among the A’s.
– Completely nontrivial if A and B do not have any overlap.
Example: title year → year length is nontrivial but not completely nontrivial.
18

Trivial Dependency Rule
• We can always remove from the right side of a FD those attributes that appear on the left.
• Suppose AB. Then, AC
e.g) title, yearyear, length. Then, title, yearlength
AC
B
t u
19

Computing the Closure of Attributes
• The closure of a set of attributes {A1, A2, …, An}: {A1, A2, …, An}+
• Suppose {A1, A2, …, An} is a set of attributes and S is a set of FD’s. {A1, A2, …, An}+ under the dependencies in S is the set of attributes B, which are functionally determined by A1, A2, …, An
• That is, it finds A1A2…An→B that follows from S
• Since we allow trivial dependencies, A1, A2, …, An
are in {A1, A2, …, An}+.
20

Algorithm 3.7: Closure of a Set of Attributes
Input: A set of attributes {A1, A2, …, An} and a set of FD’s S Output: The closure {A1, A2, …, An}+
1 Let X be a set of attributes that eventually will become the closure. Initialize X to be {A1, A2, …, An}.
2 Now, repeatedly search for some FD in S: B1B2…Bm→C
such that all of B’s are in the set X, but C is not. We then add C to X.
3 Repeat step 2 as many times as necessary until no more attributes can be added to X. (Since X can only grow, and the number of attributes is finite, eventually nothing more can be added to X.)
4 The set X after no more attributes can be added to it is the: {A1, A2, …, An}+.
21

Example
• Let’s consider a relation with attributes A, B, C, D, E and F. Suppose that this relation satisfies the FD’s:
AB→C, BC→AD, D→E, CF→B. What is {A,B}+?
X = {A,B}
X = {A,B,C}
X = {A,B,C,D}
X = {A,B,C,D,E} {A,B}+.
Use: AB→C Use: BC→AD
Use: D→E
No more changes to X are possible so X =
• The FD: CF→B cannot be used because its left side is never contained in X.
22

Use of the closure of attributes
We can test whether any given functional dependency
A1A2…An→B follows from a set of dependencies S. • Compute {A1, A2, … , An}+ using the set of
dependencies S.
• IfB∈{A1,A2,…,An}+thentheFD:A1A2…An→Bdoes
follow from S.
• IfB∉{A1,A2,…,An}+thentheFD:A1A2…An→B doesn’t follow from S.
23

Example: Use of the closure of attributes
[Q] Consider the previous example. Test whether AB→D follows from the set of the dependencies.
[A] Yes since D∈ {A,B}+ = {A,B,C,D,E}. [Q] Consider the FD: D→A.
[A] No since A ∉ {D}+ = {D,E}. We say, D→A does not follow from the given set of dependencies.
X= {D} Use DE
X= {D, E} we have reached the closure.
24

The Transitive Rule
• If A1A2…AnB1B2…Bm and B1B2…Bm
C1C2…Cm , then A1A2…AnC1C2…Ck
• Prove this rule using the closure of attributes
1. With {A1A2…An } and two FDs A1A2…AnB1B2…Bm and B1B2…BmC1C2…Cm
2. {A1, A2, … , An}+ = {A1A2…An, B1B2…Bm, C1C2…Cm}
3. Therefore, A1A2…AnC1C2…Ck follows from the given FDs.
25

Example: The Transitive Rule
title
year
length
genre
studioName
studioAddr
Star Wars
1977
124
sciFi
Fox
Hollywood
Eight Below
2005
120
drama
Disney
Buena Vista
Wayne’s World
1992
95
comedy
Paramount
Hollywood
title yearstudioName studioName  studioAddr
Then, we can deduce a new FD based on the transitive rule.
title yearstudioAddr
26

Closures and Keys
• {A1, A2, … , An}+ is the set of all attributes of a relation if and only if {A1, A2, … , An} is a superkey for the relation.
• We can test if A1, A2, … , An is a minimal key for a relation by checking:
– first that {A1, A2, … , An}+ contains all attributes,
– and if any of attribute is removed from {A1, A2, … ,
An }, then {reduced set of attributes} + will not contain all the attributes.
27

Design of Relational Database Schemas
• The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple.
28

Example: relation with redundancy
Movie1 Relation
title
year
length
genre
studioName
starName
Star Wars
1977
124
SciFi
Fox
Carrie Fisher
Star Wars
1977
124
SciFi
Fox
Mark Hamill
Star Wars
1977
124
SciFi
Fox
Harrison Ford
Gone With the Wind
1939
231
drama
MGM
Vivien Leigh
Wayne’s World
1992
95
comedy
Paramount
Dana Carvey
Wayne’s World
1992
95
comedy
Paramount
Mike Meyers
29

Anomalies • Redundancy.
– capturing info multiple times unnecessarily. E.g. length and genre.
• Update anomalies.
– forget to update in a tuple
– E.g. we could change the length of Star Wars to 125, in the first tuple, and forget to do the same in the second and third tuple.
• Deletion anomalies.
– deleting a tuple causes a loss of other information as a side
effect
– E.g. if we delete Vivien Leigh. we will lose all the information about Gone With the Wind.
30

Decomposing Relations – Example
Movie2 Relation
title year Star Wars 1977 Gone With the Wind 1939
Wayne’s World 1992
title year Star Wars 1977
Star Wars 1977
Star Wars 1977 Gone With the Wind 1939
Wayne’s World 1992 Wayne’s World 1992
length 124 231
95
starName Carrie Fisher
Mark Hamill Harrison Ford Vivien Leigh Dana Carvey Mike Meyers
genre sciFi drama
comedy
studioName Fox
Disney
Paramount
Movie3 Relation
31

Decomposing Relations – Example
• No true redundancy!
• The update anomaly disappeared. If we change the length of a movie, it is done only once.
• The deletion anomaly disappeared. If we delete all the stars from Movie2 we still will have the other info for a movie.
32

Boyce-Codd Normal Form (BCNF)
• Thegoalofdecompositionistoreplacearelationby several that do not exhibit anomalies.
• ThereisasimpleconditioncalledBoyce-Codd Normal Form under which the anomalies can be guaranteed not to exist.
• A relation is in BCNF if and only if: whenever there is a nontrivial dependency A1A2…An→B1B2…Bm for R, it must be the case that {A1 ,A2 ,… , An} is a superkey for R.
• Thatis,theleftsideofeverynontrivialfunctional dependency must contain a key.
33

Example: BCNF
Relation Movie1 is not in BCNF.
• {title,year,starName}isakeyoftherelation.
• Consider the FD: title year→length genre studioName
• Theleftsideoftheabovedependencyisnotasuper key. In particular, we know that the title and the year does not functionally determine starName.
Movie2 is in BCNF.
• Theonlykeyis{title,year}and
• titleyear→lengthgenrestudioNameholdsinthe relation
34

Example: BCNF
A relation with two attributes is always in BCNF.
(a) If there is no non Trivial FDs, then it is in BCNF (b) AB, but not BA: A is the only key and the
left side of non-trivial FD AB contains A. (c) BA, but not AB: Symmetric to (b)
(d) AB and BA: both A and B are keys. AB contains a key (A) and BA contains a key(B) in their left sides, respectively.
35

Decomposition into BCNF
• Decomposition Strategy
– Find a non-trivial FD A1A2…An→B1B2…Bm that violates BCNF, i.e. A1A2…An is not a super key.
– Decompose the relation schema into two overlapping relation schemas:
– One is all the attributes involved in the violating FD and
– The other is the left side of the FD and all the other
attributes not involved in the FD.
– By repeatedly, choosing suitable decompositions, we can break any relation schema into a collection of smaller schemas in BCNF.
• The original relation should be able to be reconstructed from the decomposed relations.
36

Projecting Functional Dependencies
• It will be used to find FDs for the decomposed relations so that we can eventually check that the decomposed relations are in BCNF.
• Suppose a relation R with a set of FD’s S and R1 is a projection of R.
– What FDs hold for R1 ?
– The algorithm will find all FDs that follow from S and involve only attributes of R1
37

Algorithm 3.12: Projecting Functional Dependencies
Input: R1 and a set of FD’s on R
Output: a set of FD’s T that hold in R1
Method:
• Consider each subset X of attributes of R1.
• Compute X+ using the FD on R.
• At the end, throw out the attributes of R, which
aren’t in R1.
• Then, add to T all nontrivial FD’s XA such that A is
in X+ and A is an attribute of R1 • Construct a minimal basis of T.
38

Example: Projecting Functional Dependencies • Consider R(A, B, C, D, E) decomposed into R1(A, C, D) and
another relation. Let FDs of R be: AB, BC, CD.
• {A}+={A,B,C,D} T = {AC, A D}
• {C}+={C,D} T = {A C, A D, CD}
• {D}+={D} T = {A C, A D, CD}
Since {A}+includes all attributes, we don’t need to consider any superset of {A}.
• {C,D}+={C,D} CD C or CD D are trivial. Therefore, T won’t be changed.
• Based on the transitive rule, AD follows from AC and CD.
• T = {AC, CD} which is a minimal basis.
39

Some simplifications
• Don’t need to compute the closure of the empty set or of the set of all attributes because they never yield a non-trivial FD.
• If we find X + = all attributes, don’t bother computing the closure of any supersets of X.
40

Algorithm 3.20: BCNF Decomposition Algorithm
Input: A Relation R with a set of functional dependencies S Output: Decomposed relations in BCNF
The following steps can be applied recursively to any relation R and a set of FD’s S.
• Check if R is in BCNF. If so, return R as it is.
• If there are BCNF violations, let one be XY.
• Use Algorithm 3.7 to compute X+. The relation will be decomposed into R1 = X+ and R2 = X and the attributes that are not in X+.
• Use Algorithm 3.12 to project FD’s for R1 and R2. Let these be S1 and S2, respectively.
• Recursively decompose R1 and R2 using this algorithm. Return the union of the results of these decompositions.
41

Example: BCNF (continued)
• Consider a schema:
(title, year, studioName, president, presAddr)
and functional dependencies:
title year  studioName studioName  president president  presAddr
• To find BCNF violating FDs, you need to find keys of this relation. Compute {title, year}+, {studioName}+, {president}+ and see if you get all the attributes of the relation. If not, you got a BCNF violation, and need to break relation. You will find that {title, year} is the only key.
• Last two violate BCNF.
42

Example: BCNF (continued)
• Decomposition can start with any of these violating
FDs. Let’s starting with studioNamepresident
• Add to the right-hand side any other attributes in the closure of studioName (optional but often reduces the amount of work)
1. X={studioName} studioNamepresident
2. X={studioName, president} presidentpresAddr
3. X={studioName}+={studioName, president, presAddr}
43

Example: BCNF (continued)
The choice of FD is now studioNamepresident presAddr Therefore, the original relation is decomposed into
Movies1(title, year, studioName) Movies2(studioName, president, presAddr):
[Q1] Is Movies1 in BCNF ? [A] Yes
1. Using Algorithm 3.12, find a minimal basis of FDs that hold in
Movies1.
2. You will find that Movies1 has a basis title yearstudioName.
3. See if {title, year} is a key by finding its closure and see if the
closure includes all attributes of Movie1. {title, year} + = {title, year, studioName}
44

Example: BCNF (continued)
Movies2(studioName, president, presAddr)
[Q2] Is Movies2 in BCNF ?
[A] No
1. Using Algorithm 3.12, find a minimal basis of FDs that hold in Movies2.
2. You will find that Movies2 has bases studioNamepresident & presidentpresAddr 3. See if {studioName} and {president} are keys. {studioName} + = {studioName, present, presAddr}
{president} + = {president, presAddr} and thus it is not a key. We conclude presidentpresAddr is a BCNF voilation.
45

Example: BCNF (continued)
• WehavetodecomposeMovie2into Movie2-1 (president, presidentAddr) Movie2-2 (president, studioName)
Both of them are relations with 2 attributes and thus in BCNF.
• Finalresult
Movies1(title, year, studioName) Movie2-1 (president, presidentAddr) Movie2-2 (president, studioName)
46

3NF
• In some cases, it is not possible to decompose a relation into BCNF relations that have both the lossless- join and dependency-preservation properties.
• 3NF allows us to make a tradeoff between preserving dependencies and BNCF.
47

Example
• Bookings(title,theater,city)
• FD’s: theatercity, title citytheater
• Keys:{title,city}(theater,title}
• BCNF violation: theater city
• IfBookingsisdecomposedinto(theater,city)and (theater, title) to satisfy the BNCF condition, the functional dependency “title citytheater” cannot be preserved as shown below
theater
city
Guild
Menlo Park
Park
Menlo Park
theater
title
Guild
Antz
Park
Antz
theater
city
title
Guild
Menlo Park
Antz
Park
Menlo Park
Antz
48

Definition of 3NF
• A relation R is in 3NF if and only if for every non-
trivial FD A1A2…AnB, either
{A1, A2, …, An} is a super key or
B is an attribute in some key (prime).
• 3NF violation
A non-trivial FD XA violates 3NF if and only if X is
not a super key and A is not prime.
• Ifthereexistsa3NFviolation,decomposeRusing the synthesis algorithm presented in the next slide.
49

The Synthesis Algorithm for 3NF Schemas
• Input: A relation R and a set F of functional dependencies that hold for R
• Output: A decomposition of R into a collection of relations, each of which is in 3NF. The composition has the lossless-join and dependency-preservation properties
• Method:
1. Find a minimal basis for F, say G
2. For each functional dependency XA in G, use XA as the schema of one of the relations in the decomposition.
3. If none of the relation schemas from Step 2 is a super key for R, add another relation whose schema is a key for R.
50

Example
R(A, B, C, D) with FD’s ABC, CD, and DA
• There are 14 nontrivial dependencies. They are: CA, CD, DA, ABD, ABC, ACD, BCA, BCD, BDA, BDC, CDA, ABCD, ABDC, and BCDA.
• There are three keys. They AB, BC, and BD. Since all the attributes on the right sides of the FDs are prime, there are no 3NF violations.
• Since there are no 3NF violations, it is not necessary to decompose the relation.
51

Example R(A, B, C, D) with FD’s BC and BD
• There are 8 nontrivial dependencies. They are: BC, BD, ABC, ABD, BCD, BDC, ABCD and ABDC.
• AB is the only key. FDs where the left side is not a superkey and the attributes on the right are not part of some key (prime) are 3NF violations. The 3NF violations are BC, BD, BCD and BDC.
• Using the synthesis algorithm 3.26, we can decompose into relations using the minimal basis BC and BD. The resulting decomposed relations would be BC and BD.
• However, none of these two sets of attributes is a superkey. Thus we add relation AB to the result. The final set of decomposed relations is BC, BD and AB.
52

2NF
• Every non-key attribute is dependent on the key – the whole key for composite keys
• Not 2NF because name, role, and dID are dependent on the key (eID, skill) but they are also dependent on just eID.
eID
name
role
dID
skill
1
John
Developer
1
Java
1
John
Developer
1
Python
2
Dan
DBA
2
MySQL
• Solution – decompose skills table (like before) 53

1NF
• Each attribute is atomic.
• Not 1NF because skills has multiple values.
eID2
name
role
dID
skills
1
John
Developer
1
Java, Python
2
Dan
DBA
2
MySQL, Oracle
• Solution
eID
Skill
1
Java
1
Python
2
MySQL
2
Oracle
eID
name
role
dID
1
John
IT
1
2
Dan
DBA
2
54