File Structures
FUNDAMENTALS OF RELATIONAL
MODELLING
Database Theory and Applications (M)
Dr Chris Anagnostopoulos
CONCEPTUAL DATA MODELLING
Challenge: transform a textual description of a real problem into a set of
concepts conveying exactly the same information.
Approach: Entity-Relationship Modeling
Inventor: Prof P Chen; 1976
Does not guarantee optimality in operations and query
executions.
Approach: Relational Modeling
Inventor: Dr Edgar Codd; 1970
Mathematics-driven: foundation of relational algebra, set theory,
functional dependency theory.
Guarantees query optimization
9
Edgar F. Codd (1923-2003)
Peter Chen (1947)
CONCEPTUAL DATA MODEL
A mathematical model for interpreting our data
Why mathematical model: theorems from: set theory, functional
dependency & normalization theory, and relational algebra.
Why interpretation: need to understand the context
Which are the entities? e.g., bank accounts; students; employees …
Which are the attributes (characteristics) of a data entity? e.g.,
name; address; ID …
Which are the relationships between entities? e.g., an employee
works in a department; a student attends many courses?
10
RELATIONAL CONCEPTUAL MODEL
E.F. Codd; A Relational Model for Large Shared Data Banks,
Communications of the ACM, June 1970 (ACM Turing Award).
Informally, any entity might relate with any other entity when
they both share common attributes.
Edgar F. Codd (1923-2003)
• Name
• Age
• Book Title
• ID
• Book Title
• Subject
• #pages
• ISBN
11
RELATIONAL MODEL
Any entity and any relationship are modelled as a relation, which maps to a 2-
dimensional table:
an ordered set of attributes (columns);
a set of tuples (rows), which represents instances;
There exists a specific attribute that uniquely identifies a tuple in the
relation,
e.g., sequential numbers 1, 2, 3, …, or
e.g., logical values like the matriculation number/ID.
12
Attribute 1 Attribute 2 … Attribute n
ID1 Thomas C … active
ID2 Carolyn B … inactive
RELATIONAL MODEL
Example: being in the context of bank accounts:
Query: attributes of interest to be retrieved, and constrained attributes to filter
out irrelevant tuples.
Return the names of those customers with active accounts and balance > £500
SELECT name FROM BankAccount
WHERE balance > 500 AND status = ‘active’
# account name balance status
1234567 ‘Thomas C’ £1,000 ‘active’
7654321 ‘Carolyn B’ £2,300 ‘inactive’
attribute
tuple
unique value per tuple
Relation: BankAccount
13
RELATIONAL MODEL: FORMALISM
Schema of a Relation: R(A1, A2, …., An)
Relation with name R and an ordered set of attributes A1, A2, …., An
Each attribute Ai assumes values in a domain Di, i.e., Ai Di
e.g., BankAccount(account, name, balance, status)
#account N = {1, 2, 3, …}; natural numbers (positive integers)
name Varchar(50); character strings of maximum length 50
balance R; real numbers
status {‘active’, ‘inactive’}; finite domain / enumerated type
14
RELATIONAL MODEL: FORMALISM
A tuple t of R is an ordered set of values corresponding to attributes of R
satisfying the domain constraints:
t = (v1, v2, …, vn), vi Di
e.g., t = (1234567, ‘Thomas C’, 1000, ‘active’) or
t[account] = 1234567, t[name] = ‘Thomas C’, t[balance] = 1000, t[status]=‘active’
An instance r(R) is a set of tuples
r(R) = {t1, t2, …, tm}: ti is a tuple of R
15
RELATIONAL MODEL: FORMALISM
NULL: represents an unknown, or inapplicable, or uncertain, or missing value.
Relational Database Schema: set of relations
S = {R1, R2, …, Rk} U {NULL}
attribute
tuple
relation
NULL values with
different interpretation
16
[A1] RELATIONAL MODEL: COMPANY
A company is organized into departments. Each department has a unique
number, and a particular employee who manages the department.
We keep track of the start date that employee began managing the
department.
A department may have several locations.
A department controls a number of projects, each of which has a unique name,
number and a single location.
The company stores the information about the employee (e.g., name, salary,
birth date, …), and the unique social security number. An employee is assigned
only to one department, but may work on several projects, which are not
necessarily controlled by the same department.
We keep track of the current number of hours per week that an employee
works on each project and their direct supervisor, who is another employee.
The company keeps track of the dependent (e.g., child) of each employee for
insurance purposes and the corresponding relationship with the employee (e.g.,
son, daughter).
17
[A1] RELATIONAL MODEL: COMPANY
A company is organized into departments. Each department has a unique
number, and a particular employee who manages the department.
We keep track of the start date that employee began managing the
department.
A department may have several locations.
A department controls a number of projects, each of which has a unique name,
number and a single location.
The company stores the information about the employee (e.g., name, salary,
birth date, …), and the unique social security number. An employee is assigned
only to one department, but may work on several projects, which are not
necessarily controlled by the same department.
We keep track of the current number of hours per week that an employee
works on each project and their direct supervisor, who is another employee.
The company keeps track of the dependent (e.g., child) of each employee for
insurance purposes and the corresponding relationship with the employee (e.g.,
son, daughter).
18
[A1] RELATIONAL MODEL: COMPANY
Schema: Company
Entities:
Department
Employee
Project
Dependent
Relationships:
An employee manages a department.
A department may have several locations.
A department controls a number of projects.
An employee is assigned only to one department,
An employee may work on several projects (for each, store hours per week), which are not
necessarily controlled by the same department.
A department controls several projects
An employee is supervised by a supervisor, who is another employee.
An employee has several dependents, each one corresponding to a specific relationship with the
employee 19
RELATIONAL DATABASE SCHEMA: COMPANY
20
21
Task 1:
Mail a Christmas card to
John’s supervisor’s son.
• Which is his name?
• Which relations
should be joined for
this query?
Task 2:
How many hours per
week does Alicia’s
supervisor work?
• In which projects is
she involved?
• Which relations
should be joined for
this query?
RELATIONAL CONSTRAINTS
Conditions that must hold on all instances, for each relation.
Distinguish three fundamental constraints:
Key constraint (unique tuple identification)
Entity integrity constraint (keys are never null!)
Referential integrity constraint (interpretation of relationships)
22
KEY CONSTRAINT; E. CODD, 1970
Superkey (SK) of relation R is a set of attributes containing at least one attribute
that uniquely identifies any tuple.
For any two distinct tuples t1 and t2 r(R) it holds true the implication:
t1 ≠ t2 → t1[SK] ≠ t2[SK]
EMPLOYEE (SSN, Ename, Lname, Bdate, Salary, Dno)
{SSN, Ename, Bdate}
{SSN} (singleton)
{SSN, Ename}
{Ename, Salary}
23
KEY CONSTRAINT; E. CODD, 1970
Candidate Key is the minimal superkey; i.e., the set with the smallest number of
attributes that uniquely identify tuples.
Let K = {A1, …, Ak}, then:
K is candidate key ↔ K’ = K \ {Ai} is not a SK for any Ai K
i.e., the removal of any attribute from K results in K’ that is no longer a superkey.
{SSN, Ename} \ {Ename} = {SSN} is SK, thus {SSN, Ename} is not candidate key
{SSN, Ename, Lname} \ {Ename, Lname} = {SSN} is SK, thus {SSN, Ename, Lname}
is not candidate key
{SSN} is the minimal since {SSN}\{SSN} = { } is not a SK
Primary Key (PK): If a relation has several candidate keys, one is chosen
arbitrarily to be primary key; the rest candidate keys are called secondary keys.
24
[A2] EXAMPLE
• License number (plate) uniquely identifies a car;
• Engine serial number uniquely identifies a car;
Hence, there are two candidate keys:
• K1 = {LicenceNumber}
• K2 = {EngineSerialNumber}
A = {LicenceNumber, EngineSerialNumber}
What can we say for A? superkey or candidate key?
• A is not a candidate key since A \{LicenceNumber} = K2.
• Hence, A is a superkey & not a candidate key
Lesson Learnt: A composite set with unique attributes is not a candidate key.
Convention: the PK attributes are underlined in the relation schema.
25
Candidate key is the minimum
sufficient statistic that discriminates
any pair of tuples
ENTITY INTEGRITY CONSTRAINT; E. CODD, 1970
Principle: Primary Key (PK) cannot be NULL in any tuple of instance r(R).
t[PK] NULL for any tuple t in r(R)
If PK has several attributes, NULL is not allowed in any of these attributes
If {student-id, course-id} is a composite PK then:
student-id NULL and course-id NULL
Note: There might be non-key attributes which are not allowed to be NULL, e.g.,
Employee’s surname specified by the database designer, e.g., unique value.
26
REFERENTIAL INTEGRITY CONSTRAINT ; E. CODD, 1970
Roles: referencing relation R1 and referenced relation R2.
There exists an attribute Foreign Key (FK) in R1 that either has
exactly the same value with the Primary Key (PK) in R2 or is NULL.
t1[FK] references t2[PK] → t1[FK] = t2[PK] or t1[FK] = NULL
If t1[FK] = NULL then the FK in R1 should not be a part of its own
primary key (not violating the entity integrity constraint).
Notation: directed arc from R1.FK to R2.PK
27
R1 R2
REDESIGN: REFERENTIAL INTEGRITY CONSTRAINT
• Name
• Age
• Book Title
• ID
• Book Title
• Subject
• #pages
• ISBN
• Name
• Age
• ISBN
• ID
• Title
• Subject
• #pages
• ISBN
Referencing R1
(Reader)
Referenced R2
(Book)
28
FK
PK
PK
Can this be NULL?
Can this be NULL?
Can this be NULL?
Can this be NULL?
Can this be NULL?
29
SO FAR…
Superkey is used to uniquely identify a tuple in a relation.
Lemma: If K is a superkey, then any superset of K is a superkey.
Proof…left for exercise
Candidate Key is the minimal superkey: i.e., K is a candidate key if
there is no subset of K that is also a superkey.
A relation can have several different candidate keys; one of them is chosen
to be the Primary Key (PK).
Convention: key attributes are underlined in the relation schema.
A Foreign Key (FK) in a referencing relation should either be NULL (if it
is not part of the primary key) or have a value of the primary key of the
referenced relation.
30