Microsoft PowerPoint – 11- DatabaseDesign_SchemaRefinement
© 2018 A. Alawini & A. Parameswaran
Database Design:
Functional
Dependencies
Abdu Alawini
University of Illinois at Urbana-Champaign
CS411: Database Systems
October 8, 2018
1
© 2018 A. Alawini & A. Parameswaran
Announcements
•HW 2 is due Oct 15 (23:59)
•Project Track 1 – Stage 2
And
•Project Track 2 – Stage 1
are due on Wednesday (10/10)
•Midterm Oct, 29th in class (11-12:15)
•Final Dec, 12 in class (11-12:15)
2
© 2018 A. Alawini & A. Parameswaran
Construct an E-R diagram for a car-insurance company
whose customers own one or more cars each. Each car
has associated with it zero to any number of recorded
accidents.
ERD Exercise
3
© 2018 A. Alawini & A. Parameswaran
ERD Exercise Student Solution
4
© 2018 A. Alawini & A. Parameswaran
Various notations for “one-to-many”
5
0..1 1..*
zero..one one..many
maximum cardinalities only minimum and maximum
cardinalities
1 m
manyone
1 *
0-1 1+
0
© 2018 A. Alawini & A. Parameswaran
Various notations for “many-to-many”
6
1..* 1..*
one..many one..many
maximum cardinalities only minimum and maximum
cardinalities
manymany
1+ 1+
* *
m n
© 2018 A. Alawini & A. Parameswaran
•Conceptual design: (ER & UML Models are used
for this.)
• What are the entities and relationships we need?
•Logical design:
• Transform ER design to Relational Schema
•Schema Refinement: (Normalization)
• Check relational schema for redundancies and
related anomalies.
•Physical Database Design and Tuning:
• Consider typical workloads; (sometimes) modify the
database design; select file types and indexes.
Overview of Database Design
We’re here
7
© 2018 A. Alawini & A. Parameswaran
Agenda
•How do we obtain a good design?
•Functional Dependencies and Keys
•Rules about Functional Dependencies
• Splitting/Combination
• Trivial Dependencies
• Attribute Closure
• FD Closure
8
© 2018 A. Alawini & A. Parameswaran
Motivation
• We have designed ER diagram, and translated it into a relational db
schema R = set of R1, R2, … Now what?
• We can do the following
• implement R in SQL
• start using it
• However, R may not be well-designed, thus causing us a lot of problems
• OR: people may start without an ER diagram, and you need to reformat
the schema R
• Either way you may need to improve the schema
9
© 2018 A. Alawini & A. Parameswaran
Q: Is this a good design?
10 Green 123-321-99 (201) 555-1234
10 Green 123-321-99 (206) 572-4312
431 Purple 909-438-44 (908) 464-0028
Individuals with several phones:
Address SSN Phone Number
10
© 2018 A. Alawini & A. Parameswaran
Potential Problems
• Redundancy
• Update anomalies
• maybe we’ll update the address of the person with phone number ‘(206)
572-4312’ to something other than ‘10 Green’. Then there will be two
addresses for that person.
• Deletion anomalies
• delete the phone number of a person; if not careful then the address can
also disappear with it.
Address SSN Phone Number
10 Green 123-321-99 (201) 555-1234
10 Green 123-321-99 (206) 572-4312
431 Purple 909-438-44 (908) 464-0028
11
© 2018 A. Alawini & A. Parameswaran
Better Designs Exist
SSN Address
123-321-99 10 Green
909-438-44 431 Purple
SSN Phone Number
123-321-99 (201) 555-1234
123-321-99 (206) 572-4312
909-438-44 (908) 464-0028
Break the relation into two:
Unfortunately, this is not something you will detect even
if you did principled ER design and translation
12
© 2018 A. Alawini & A. Parameswaran
How do We Obtain a Good Design?
•Start with the original db schema R
• From ER translation or otherwise
•Transform it until we get a good design R*
• Some desirable properties for R*
• must preserve the information of R
• must have minimal amount of redundancy
• must be “dependency preserving”
• (we’ll come to this later)
• must also give good query performance
13
© 2018 A. Alawini & A. Parameswaran
How do We Obtain a Good Design?
14
© 2018 A. Alawini & A. Parameswaran
Normal Forms
•DB gurus have developed many “normal forms”
•These are basically schemas obeying certain rules
• Converting a schema that doesn’t obey rules to one that does is
called “normalization”
• This typically involves some kind of decomposition into smaller
tables, just like we saw earlier.
• (the opposite: grouping tables together, is called
“denormalization”)
15
© 2018 A. Alawini & A. Parameswaran
Normal Forms
•DB gurus have developed many “normal forms”
•Most important ones
• Boyce-Codd, 3rd, and 4th normal forms
•If R* is in one of these forms, then R* is guaranteed to
achieve certain good properties
• e.g., if R* is in Boyce-Codd NF, it is guaranteed to not have certain
types of redundancies
•DB gurus have also developed algorithms to transform R
into R* in these normal forms
16
© 2018 A. Alawini & A. Parameswaran
Normal Forms (cont.)
•There are also trade-offs among normal forms
•Thus, our goal is to:
• learn these forms
• transform R into R* in one of these forms
• carefully evaluate the trade-offs
•To understand these normal forms we’ll need to understand
certain types of constraints
• functional dependencies and keys
17
© 2018 A. Alawini & A. Parameswaran
Functional Dependencies
and Keys
18
© 2018 A. Alawini & A. Parameswaran
Functional Dependencies
• A form of constraint (hence, part of the schema)
• Finding them is part of the database design
• Used heavily in schema refinement
• Holds for ALL instances!
Definition:
If two tuples agree on the attributes
A , A , … A
1 2 n
then they must also agree on the attributes
B , B , … B 1 2 m
Formally:
A , A , … A
1 2 n
B , B , … B 1 2 m
Where have we seen
this before?
19
© 2018 A. Alawini & A. Parameswaran
Examples
•EmpID Name, Phone, Position
•Position Phone
•but Phone Position: why?
EmpID Name Phone Position
E0045 Smith 1234 Clerk
E1847 John 9876 Salesrep
E1111 Smith 9876 Salesrep
E9999 Mary 1234 Lawyer
20
© 2018 A. Alawini & A. Parameswaran
What a FD actually means
• Knowing FD: A B holds in R(A, B, C) means that
• For ALL valid instances R(A, B, C):
• A determines B
• Or, if two tuples share A, then they share the same B
• This is the property of the “world”
• Conversely, if: A / B, then there is no guarantee that the “A
determines B” property holds in a given instance (though it
might).
• Trivially, it holds when you have only one tuple.
21
© 2018 A. Alawini & A. Parameswaran
More examples
Product: name, manufacturer price
Person: ssn name, age
Company: name stock price, president
22
© 2018 A. Alawini & A. Parameswaran
Q: From this, can you conclude phone SSN?
SSN Phone Number
123-321-99 (201) 555-1234
123-321-99 (206) 572-4312
909-438-44 (908) 464-0028
909-438-44 (212) 555-4000
No, you cannot. Like we discussed, this is a property of
the world, not of the current data
In general, you cannot conclude from a given instance of
a table that an FD is true. An FD is not an observation
made from a table’s current tuples, it is an assertion that
must always be respected.
You can however check if a given FD is violated by the
table instance.
23
© 2018 A. Alawini & A. Parameswaran
Keys are a type of FD
• Key of a relation R is a set of attributes that
• functionally determines all attributes of R
• none of its subsets determines all attributes of R
• There could be many keys of a relation
• Student (UIN, email, dept, age)
• UIN UIN, email, dept, age
• email UIN, email, dept, age
• Superkey
• “Superset” of key
• a set of attributes that contains a key
• Any examples for student?
24
© 2018 A. Alawini & A. Parameswaran
Many many FDs…
• MovieInfo (name, year, actor, director, studio)
• Same movie can be remade multiple years, but a name, year pair
uniquely determines a movie
• A movie has a single director/studio but many actors
• Name, year director, studio
• Name, year director; Name, year studio
• Name, year / actor
• A director works only with a single studio
• Director studio
• An actor works on a given movie only once (never for remakes), but
may work for many movies in a year
• Actor, name year; actor, year / name
25
© 2018 A. Alawini & A. Parameswaran
Many many FDs…
•MovieInfo (name, year, actor, director, studio)
• Name, year director, studio
• Name, year director
• Name, year studio
• Director studio
• Actor, name year
• …
• Actor, name, year director, studio
• Director, actor, name studio, year
• Director, name, year studio
• Studio, actor, name year
• …
Any missing
FDs?
26
© 2018 A. Alawini & A. Parameswaran
Rules about Functional
Dependencies
27
© 2018 A. Alawini & A. Parameswaran
Outline of Rules
• Two examples of rules
• Splitting/Combination
• Trivial Dependencies
• Attribute Closure
• Algorithm
• Uses
• FD Closure
• A complete set of rules:
• Armstrong’s axioms
• Algorithm
28
© 2018 A. Alawini & A. Parameswaran
The Splitting/Combining Rule
•A1A2…An B1B2…Bm
•Equivalent to:
A1A2…An B1;
A1A2…An B2;
…
A1A2…An Bm
•Can replace one for the other.
29
© 2018 A. Alawini & A. Parameswaran
Trivial Functional Dependencies
•A1A2…An A1
•In general,
A1A2…An B1B2…Bm
if {B1B2…Bm} {A1A2…An}
Example: name, UIN UIN
Why does this make sense?
30
© 2018 A. Alawini & A. Parameswaran
Closure of a Set of Attributes
Given a set of attributes {A1, …, An} and a set of FDs S.
Problem: find all attributes B such that:
for all relations that satisfy S, they also satisfy:
A1, …, An B
The closure of {A1, …, An}, denoted {A1, …, An} ,
is the set of all such attributes B
We will discuss the motivations for attribute closures soon
+
31
© 2018 A. Alawini & A. Parameswaran
Algorithm to Compute Closure
Split the FDs in S so that every FD has a single attribute on the
right. (Simplify the FDs)
Start with X={A1A2…An}.
Repeat until X doesn’t change do:
If (B1B2…Bm C) is in S,
such that B1,B2,…Bm are in X and C is not in X:
add C to X.
// X is now the correct value of {A1A2…An}
+
Why does this algorithm converge?
32
© 2018 A. Alawini & A. Parameswaran
Example
A B C
A D E
B D
A F B
Closure of {A,B}:
Closure of {A, F}:
• Set of attributes A,B,C,D,E,F.
• Functional Dependencies:
33
© 2018 A. Alawini & A. Parameswaran
Example
A B C
A D E
B D
A F B
Closure of {A,B}: X = {A, B, C, D, E}
Closure of {A, F}: X = {A, F, B, D, C, E}
• Set of attributes A,B,C,D,E,F.
• Functional Dependencies:
34
© 2018 A. Alawini & A. Parameswaran
1. Decompose all FDs in F
sid -> name, crn -> cid, crn -> subj
2. We start with with the attribute crn in the initial
closure = {crn}
3. We look for all FDs that has crn, the 2nd FD (crn -> cid) has crn
closure = cid U closure = {crn, cid}
4. We look for all FDs that has cid or crn, the 3rd FD (crn -> subj) has
crn
closure = subj U closure = {crn, cid, subj}
Attribute Closure Exercise
35
Given:
Student_info(sid, name, crn, subj, cid, grade), and
F={sid ->name, crn -> cid, subj}, find A+ for A = {crn}
© 2018 A. Alawini & A. Parameswaran
Is this algorithm correct?
•Yes. See Text (Section 3.2.5) for proof.
•Two parts of proof:
• Anything determined to be part of {S}+ deserves to be there:
• soundness
• There’s nothing missing:
• completeness
36
© 2018 A. Alawini & A. Parameswaran
Uses for Attribute Closure
•Use 1: To test if X is a superkey
• How?
• compute X+, and check if X+ contains all attrs of R
•Use 2: To check if X Y holds
• How?
• by checking if Y is contained in X+
37
© 2018 A. Alawini & A. Parameswaran
An exercise
•Show that each of the following are not valid rules about
FD’s, by giving example relations that satisfy the given
FDs (following the “If”), but not the FD that allegedly
follows (after the “then”).
(1) If A –> B then B –> A
(2) If AB –> C and A –> C then B –> C.
•(1) A = SSN, B = Name
•(2) A = SSN, B = Phone, C = Name
38