Functional Dependencies – Part 2
Definition and Identification
Codd and Functional Dependencies
Functional dependencies (FDs) were introduced by Codd in 1971 1
Edgar F. Codd of IBM Research (1923-2003) invented the relational data
model for data management in 1970.
He received the ACM Turing Award in 1981 for his contributions on the
theoretical foundations of relational databases:
Functional dependencies
Normalization
– Boyce–Codd Normal Form (BCNF)
Query languages
– Relational Calculus
– Relational Algebra
1
Further Normalization of the Data Base Relational Model. E. F. Codd, IBM Research Report, San Jose, California,
1971.
Why Functional Dependencies?
We need some formal way of analysing whether a database schema is
well-designed, or why one is better than another.
FDs are developed to define the goodness and badness of (relational)
database design in a formal way.
Top down: start with a relation schema and FDs, and produce smaller
relation schemas in certain normal form (called normalisation).
Bottom up: start with attributes and FDs, and produce relation
schemas (not popular in practice).
FDs tell us “relationship between and among attributes”!
Functional Dependencies – Informal Description
We have two FDs on ENROLMENT:
ENROLMENT
Name StudentID DoB CourseNo Semester Unit
Tom 123456 25/01/1988 COMP2400 2010 S2 6
Tom 123456 25/01/1988 COMP8740 2011 S2 12
Michael 123458 21/04/1985 COMP2400 2009 S2 6
Michael 123458 21/04/1985 COMP8740 2011 S2 12
Fran 123457 11/09/1987 COMP2400 2009 S2 6
StudentID functionally determines Name and DoB, i.e.,
{StudentID} → {Name, DoB}
CourseNo functionally determines Unit, i.e.,
{CourseNo} → {Unit}
Functional Dependencies – Informal Description
A FD says that, within a relation, the values of some attributes determine
the values of other attributes.
Animal → Legs
Ostrich 2
Wombat 4
If attributes A,B,C determine attributes D,E , then we write
{A,B,C} → {D,E}
This means, if two tuples have the same values for A,B and C, then
they must also have the same values for D and E .
A,B and C are the determinant, while D and E are the dependent.
Formal Definition
Let R be a relation schema.
A FD on R is an expression X → Y with attribute sets X ,Y ⊆ R.
A relation r(R) satisfies X → Y on R if, for any two tuples
t1, t2 ∈ r(R), whenever the tuples t1 and t2 coincide on values of X ,
they also coincide on values of Y .
t1[X ] = t2[X ]
⇓
t1[Y ] = t2[Y ]
A FD is trivial if it can always be satisfied, e.g.,
{A,B,C} → {C}
{A,B,C} → {A,B}
Syntactical convention: (1) Instead of {A,B,C}, we may use ABC. (2)
A,B, . . . for individual attributes and X ,Y , . . . for sets of attributes.
Exercise – Functional Dependencies on Relations
Consider the following relations with attributes {A,B,C,D,E}. Do they satisfy:
(1) AB → E ; (2) C → DE ;
r1(R)
A B C D E
1 4 1 9 4
1 4 2 8 9
1 4 3 8 9
r2(R)
A B C D E
1 3 1 3 8
1 3 2 4 8
1 2 2 4 9
Check:
r1(R) r2(R)
(1) AB → E no yes
(2) C → DE yes no
How to Identify FDs in General?
A functional dependency specifies a constraint on the relation schema that
must hold at all times.
In real-life applications, we often use the following approaches:
(1) Analyse data requirements
Can be provided in the form of discussion with application users
and/or data requirement specifications.
(2) Analyse sample data
Useful when application users are unavailable for consultation and/or
the document is incomplete.
(1) Identifying FDs – Analyse Data Requirements
Consider the following relation schema:
RENTAL={CustID,CustName,PropertyNo,DateStart,Owner} .
Data requirements:
1 Each customer can be uniquely identified by his or her customer ID.
{CustID} → {CustName}
2 A customer cannot rent two or more properties from the same date.
{CustID, DateStart} → {PropertyNo}
3 A customer cannot rent the same property more than once.
{PropertyNo, CustID} → {DateStart}
4 Each property can be uniquely identified by its owner.
{Owner} → {PropertyNo}
(2) Identifying FDs – Analyse Sample Data
Can you find some FDs on ENROLMENT based on the sample data?
ENROLMENT
Name StudentID DoB CourseNo Semester Unit
Tom 123456 25/01/1988 COMP2400 2010 S2 6
Tom 123456 25/01/1988 COMP8740 2011 S2 12
Michael 123458 21/04/1985 COMP2400 2009 S2 6
Michael 123458 21/04/1985 COMP8740 2011 S2 12
Fran 123457 11/09/1987 COMP2400 2009 S2 6
We may have:
{StudentID} → {Name, DoB};
{CourseNo} → {Unit};
{StudentID, CourseNo, Semester} → {Name, DoB, Unit};
{Name} → {StudentID} ×;
{DoB} → {StudentID} ×;
……
Limitations: Sample data needs to be a true
representation of all possible values that the
database may hold.