CS计算机代考程序代写 Functional Dependencies algorithm database Design Theory and Normalization

Design Theory and Normalization
CSC343
Mark Kazakevich
(with slides from Diane Horton, Jeff Ullman)

What’s the point?
● Most likely, you have some idea of how to make a schema for a data set
● But how do you know it’s a good schema?
● How do you make it better?
● Is there an ‘optimal’ schema for your data?

Database ‘Design’ Theory
● Improve your (likely non-optimal) schema systematically
● Try to get a schema that is in a ‘normal form’ that guarantees good properties.
○ ‘Normal’ in the sense of conforming to a standard.
● The process of converting a schema to a normal form is called normalization.

Database ‘Design’ Theory
● General idea:
○ Express constraints on the relationships between
attributes
○ Use these constraints to decompose the
relations
● We will talk about how to put these into practise

What are these ‘constraints’?
● We’ve defined constraints before like keys, foreign keys, etc. based on our knowledge of the problem domain.
● We can define other domain-specific constraints on the data that can help us design better schemas
○ These constraints are called functional dependencies

● A functional dependency (FD) on a relation R is a statement of the form:
If two tuples of R agree on all of the attributes A1,A2,…,An
then they must also agree on all of another list of attributes B1,B2,…,Bm
● We write this FD formally as
A1,A2,…,An B1,B2,…,Bm
and we say:
“A1,A2,…,An functionally determines B1,B2,…,Bm”
G-M, Ullman, Widom

Visual representation
Note: it is not a requirement that the attributes B have to follow A
directly – they can be anywhere
G-M, Ullman, Widom

Example
What is a functional dependency here? (think about the domain)
G-M, Ullman, Widom

Example
title year -> length genre studioName
What assumptions did we make?
G-M, Ullman, Widom

Example
What is not a functional dependency here?
G-M, Ullman, Widom

Coincidence or FD?
● An FD is an assertion about every instance of the relation.
● You can’t know it holds just by looking at one instance.
● You must use knowledge of the domain to determine whether an FD holds.

FDs are closely related to keys
● Suppose K is a set of attributes for relation R. ● Our earlier definition of superkey:
○ a set of attributes for which no two rows can have the same values.
● Using FDs
○ K is a superkey for R iff
K functionally determines all of R.

FDs are a generalization of keys

FD Properties

Trivial FDs
● Suppose A1,A2…An -> B1,B2…Bn holds ● Trivial FDs are those where the right side is a
subset of the left side, that is, {B1,B2…Bn} ⊆ {A1,A2…An}
Examples:
● title -> title
● title year -> title ● title year -> {}

Splitting Rules
● An FD such as A -> BCD can be split up into a set of equivalent FDs:
○ A -> B
○ A -> C
○ A -> D
● We are ‘splitting’ the Right Hand Side of the FD
(can also combine the three into the original) ○ Can we split the left hand side?
■ No. (Just like we can’t split a key)

Inferring FDs
● Given a set of FDs, we can often infer further FDs.
● This will be handy when we apply FDs to the problem of database design.
● Big task: given a set of FDs,
○ infer every other FD that must also hold.
● Simpler task: given a set of FDs,
○ check whether a given FD must also hold.

What we’d like to be able to find out:
● If A -> B and B -> C hold, must A -> C hold?
● If A -> H,C -> F,andFG -> ADhold, must FA -> D hold?
must CG -> FH hold?
● IfH -> GD,HD -> CE,andBD -> Ahold,
must EH -> C hold?
● Note: we are not generating new FDs,
but testing a specific possible one.

How do we figure them out?
● Method 1 (first principles):
○ Refer to FDs you know and use definition
of functional dependency ○ This can get complicated…
● Method 2
○ TheClosureTest
○ Systematic, and much easier!

Closure { }+
Suppose {A1,A2,…,An} is a set of attributes
and S is a set of FDs
● The closure of {A1,A2,…,An} under the FD’s in S is the set of attributes B such that: ○ Every relation that satisfies all the FD’s in
set S also satisfies A1A2..An -> B.
● We denote the closure of a set of attributes A1A2..An by {A1,A2,..An}+.
G-M, Ullman, Widom

Closure Algorithm
● Find Y+, the closure of a set of attributes Y under a set of FDs S.
Attribute_closure(Y, S):
Initialize Y+ to Y; split RHS’s of FDs if necessary Repeat until no more changes occur:
If there is an FD LHS->RHS in S
such that LHS is in Y+:
Return Y+
Add RHS to Y+

Example
● Relation R(A,B,C,D,E,F) Y = {A,B}
● FDs: AB->C, BC->AD, D->E, CF->B ● What is {A, B}+ (the closure of {A, B})
Y+ = {A,B}; split BC->AD to BC->A, BC->D
LHS of AB->C in Y+, so Y+={A,B,C}
LHS of BC->A and BC->D in Y+, so Y+={A,B,C,D} LHS of D->E in X, so Y+={A,B,C,D,E}
(cannot use CF->B,since F not in Y+}
So we’re done, {A, B}+ = {A,B,C,D,E}
G-M, Ullman, Widom

Closure Test
● S is a set of FDs; LHS -> RHS is a single FD.
○ Return true iff LHS -> RHS follows from S.
FD_follows(S, LHS -> RHS):
Y+ = Attribute_closure(LHS, S)
return (RHS is in Y+)

Property that follows:
● Transitive Rule
○ (A -> B andB -> C)impliesA -> C ○ Check this using closure test
■ C would appear in A+
G-M, Ullman, Widom

FD Worksheet Q 1-4

Projecting FDs

Projecting FDs
● Later, we will learn how to normalize a schema by decomposing relations into smaller ones
○ This is the whole point of this theory.
● We will need to know what FDs hold in the new, smaller relations.
○ We must project our FDs onto the
attributes of our new relations.

Projecting FDs Relation R(A1..An), Set of attributes: A
Sets of attributesBandCsuch thatBU C= A

Projecting FDs
R1 = π B (R) R2 = π C (R)
R1 ⋈ R2 = R
Question: What FDs hold in the new relation

Projecting FDs
Question: If a set of FDs S hold in R,
what FDs hold in the new relations R1 and R2?

Projection Algorithm
● S is a set of FDs;L is a set of attributes.
● Return T, the projection of S onto L,
that is, all FDs that follow from S and involve only attributes from L.
Project(S, L):
Initialize T to {}. For each subset X of L:
Compute X+ # Close X and see what we get. For every attribute A in X+:
If A is in L: # X -> A is only relevant if A # is in L (we know X is).
add X -> A to T.
Return T.

Board Example: FD Worksheet Q5, part a

Some Projection algorithm speed-ups
1. No need to addX->AifA is in X itself. It’s a trivial FD.
2. These subsets of X won’t yield a nontrivial FD, so no need to compute their closures:
○ the empty set
○ the set of all attributes in X
Neither are big savings, but …

A big speed-up
● If we find X+ = all attributes, we can ignore any superset of X.
● It can only give use “weaker” FDs (with more on the LHS).
This is a big time saver!

FD Worksheet Q5 part b

Projection/MB Worksheet Q1

Minimal Basis


Minimal basis
We saw earlier that we can very often rewrite sets
of FDs in equivalent ways.
○ Example: S1 = {A -> BC} is equivalent
to S2 = {A->B, A->C}.
Given a set of FDs S, we may want to find a minimal basis: A set of FDs that is equivalent, but has
○ no redundant FDs, and
○ no FDs with unnecessary attributes on the LHS.
○ All RHS will be single attributes

Minimal Basis
● S is a set of FDs. Return a minimal basis for S.
Minimal_basis(S):
1. Split the RHS of each FD
2. For each FD X->Y where |X| ≥ 2:
If you can remove an attribute from X
(use closures!) and get an FD that
follows from S:
Do so! (It’s a stronger FD.)
3. For each FD f :
If S – {f} implies f (use closures!):
Remove f from S.

Board Example:
Projection/Minimal Basis Worksheet Q2

Some comments on Minimal Basis
● Often there are multiple possible results.
○ Depends on the order in which you consider the
possible simplifications.
● After you identify a redundant FD,
you must not use it when computing subsequent closures.

and less intuitive..
When you are computing closures to decide whether the LHS of an FD
X -> Y
can be simplified, continue to use that FD.