Week 2 – Relational Data Model
FIT2094 Database
MONASH
INFORMATION
TECHNOLOGY
2
Overview
▪ Relational Model
▪ Relational Algebra
3
The Relational Model
▪ Introduced by CODD in 1970 – the fundamental basis for relational DBMS’s
▪ Basic structure is the mathematical concept of a RELATION mapped to the
‘concept’ of a table (tabular representation of relation)
– Relation – abstract object
– Table – pictorial representation
– Storage structure – “real thing” – eg. isam file
▪ Relational Model Terminology
– DOMAIN – set of atomic (indivisible) values
• specify
– name
– data type
– data format
▪ Examples:
– customer_number domain – 5 character string of the form xxxdd
– name domain – 20 character string
– address domain – 30 character string containing street, town & postcode
– credit_limit domain – money in the range $1,000 to $99,999
4
A Relation
▪ A relation consists of two parts
– heading
– body
▪ Relation Heading
– Also called Relational Schema consists of a fixed set of attributes
• R (A1,A2,…….An)
– R = relation name, Ai = attribute i
– Each attribute corresponds to one underlying domain:
• Customer relation heading:
– CUSTOMER (custno, custname, custadd, credlimit)
» dom(custno) = customer_number
» dom(custname) = name
» dom(custadd) = address
» dom(credlimit) = credit_limit
credlimitcustaddcustnamecustno
5
Relation Body
▪ Relation Body
– Also called Relation Instance (state)
• r(R) = {t1, t2, t3, …., tm}
• consists of a time-varying set of n-tuples
– Relation R consists of tuples t1, t2, t3 .. tm
– m = number of tuples = relation cardinality
• each n-tuple is an ordered list of n values
• t = < v1, v2, ....., vn>
– n = number of values in tuple (no of attributes) = relation degree
– In the tabular representation:
• Relation heading ⇨ column headings
• Relation body ⇨ set of data rows
credlimitcustaddcustnamecustno
SMITHSMI13 Wide Rd, Clayton, 3168 2000
JON44 JONES Narrow St, Clayton, 3168 10000
BRO23 BROWN Here Rd, Clayton, 3168 10000
6
Relation Properties
▪ No duplicate tuples
– by definition sets do not contain duplicate elements
• hence tuples are unique
▪ Tuples are unordered within a relation
– by definition sets are not ordered
• hence tuples can only be accessed by content
▪ No ordering of attributes within a tuple
– by definition sets are not ordered
7
Relation Properties cont’d
▪ Tuple values are atomic – cannot be divided
• EMPLOYEE(eid, ename, departno, dependants)
– not allowed: dependants (depname, depage)
multivalued
– hence no multivalued (repeating) attributes allowed,
called the first normal form rule
▪ COMPARE with tabular representation
• normally nothing to prevent duplicate rows
• rows are ordered
• columns are ordered
– tables and relations are not the same ‘thing’
9
surname firstname degree DOB
Black Sam BBIS 02-02-1996
Brown Jane BITS 01-01-1995
Chen Chan BITS 09-02-1996
Grey Maria BCS 15-12-1995
Indigo Jose BITS 28-10-1995
Black Jet BCS 13-05-1996
Chen Maria BITS 31-08-1995
▪ Functional Dependency:
– A set of attributes X functionally determines an attribute Y if, and only if,
for each X value, there is exactly one Y value in the relation. It is denoted
as X → Y.
▪ For example, given the data above:
– firstname, surname → degree
• but
– firstname → degree does not hold
– What about: degree → firstname, surname?
11
Relational Keys
▪ A candidate key K of a relation R is an attribute or set of
attributes which exhibits the following properties:
– No two tuples of R have the same value for K
(Uniqueness property)
– No proper subset of K has the uniqueness property
(Minimality or Irreducibility property)
▪ One candidate key is chosen to be the primary key of a
relation. Remaining candidate keys are termed alternate
keys.
▪ A superkey is an attribute or set of attributes which only
exhibits the uniqueness property
16
stu_no surname firstname degree DOB
1111 Black Sam BBIS 02-02-1996
1112 Brown Jane BITS 01-01-1995
1113 Chen Chan BITS 09-02-1996
1114 Grey Maria BCS 15-12-1995
1115 Indigo Jose BITS 28-10-1995
1116 Black Jet BCS 13-05-1996
1117 Chen Maria BBIS 31-08-1995
▪ A primary key must be chosen considering the data that may be added to the table in the
future
– Names, dates of birth etc are rarely unique and as such are not a good option
– PK should be free of ‘extra’ semantic meaning, preferably single attribute, preferably
numeric (see Table 5.3 Coronel & Morris)
– Natural vs Surrogate
Selection of a Primary key
17
18
Writing Relations
▪ Relations may be represented using the following
notation:
– relation_name(attribute1, attribute2,…)
▪ The primary key is underlined.
▪ Example:
– staff(staffid, surname, initials, address, phone)
19
Relational Database
▪ A relational database is a collection of normalised
relations.
▪ Normalisation is part of the design phase of the database
and will be discussed in a later lecture.
Example relational database:
order (order_id, orderdate,)
order-line (order_id, product_id, quantity)
product (product_id, description, unit_price)
20
Foreign Key (FK)
▪ An attribute/s in a table that exists in the same, or
another table as a Primary Key.
▪ Referential Integrity
– A Foreign Key value must either match the primary
key in another table or be NULL.
▪ The pairing of PK and FK creates relationships (logical
connections) between tables. Hence the abstraction
away from the underlying storage model.
23
Data Integrity
▪ Entity integrity
– Primary key values must be unique
– Primary key value must not be NULL.
▪ Referential integrity
– The values of FK must either match a value of the
PK in the related relation or be NULL.
▪ Column/Domain integrity
– All values in a given column must come from the
same domain (the same data type and range).
26
Relational DMLs
▪ Relational Calculus
▪ Relational Algebra
▪ Transform Oriented Languages (e.g. SQL)
▪ Graphical Languages
▪ Exhibit the “closure” property – queries on relations
produce relations
27
Relational Calculus
▪ Based on mathematical logic.
▪ Non-procedural.
▪ Primarily of theoretical importance.
▪ May be used as a yardstick for measuring the power of
other relational languages (“relational completeness”).
▪ Operators may be applied to any number of relations.
28
RELATIONAL ALGEBRA
Manipulation of relational data
29
Relational Algebra
▪ Relationally complete.
▪ Procedural.
▪ Operators only apply to at most two relations at a time.
▪ 8 basic operations:
– single relation: selection, projection
– cartesian product, join
– union
– intersection
– difference
– division
30
Relational Operation PROJECT
31
Relational Operation SELECT
32
Relational Operation Multiple Actions
1
2
33
SQL vs Relational Algebra in the Database
35
JOIN
▪ Join operator used to combine data from two or more
relations, based on a common attribute or attributes.
▪ Different types:
– theta-join
– equi-join
– natural join
– outer join
36
THETA JOIN (Generalised join)
(Relation_1) ⨝F (Relation_2)
– F is a predicate (i.e. truth-valued function) which is
of the form Relation_1.ai θ Relation2.bi
– θ is one of the standard arithmetic comparison
operators, i.e. <, ≤, =, ≥, >
– Most commonly, θ is equals (=)
39
NATURAL JOIN
ID Name
1 Alice
2 Bob
ID Subj Marks
1 1004 95
2 1045 55
1 1045 90
STUDENT MARK
STUDENT.
ID
Name MARK.ID Subj Marks
1 Alice 1 1004 95
1 Alice 2 1045 55
1 Alice 1 1045 90
2 Bob 1 1004 95
2 Bob 2 1045 55
2 Bob 1 1045 90
Step 1: STUDENT X MARK
Step 2: delete rows where IDs do not match (select =)
40
NATURAL JOIN
STUDENT.I
D
Name MARK.ID Subj Marks
1 Alice 1 1004 95
1 Alice 1 1045 90
2 Bob 2 1045 55
Step 1: STUDENT X MARK
Step 2: delete rows where IDs do not match (select =)
Step 3: delete duplicate columns (project away)
ID Name
1 Alice
2 Bob
ID Subj Marks
1 1004 95
2 1045 55
1 1045 90
STUDENT MARK
41
NATURAL JOIN
ID Name Subj Marks
1 Alice 1004 95
1 Alice 1045 90
2 Bob 1045 55
A natural join of STUDENT and MARK
Step 1: STUDENT X MARK
Step 2: delete rows where IDs do not match (select =)
Step 3: delete duplicate columns (project away)
ID Name
1 Alice
2 Bob
ID Subj Marks
1 1004 95
2 1045 55
1 1045 90
STUDENT MARK
⨝
44
OUTER JOIN
ID Name Subj Marks
1 Alice 1004 95
1 Alice 1045 90
2 Bob 1045 55
A natural join of STUDENT and MARK
No information for Chris and the student with ID 4
ID Name
1 Alice
2 Bob
3 Chris
ID Subj Marks
1 1004 95
2 1045 55
1 1045 90
4 1004 100
STUDENT MARK
⨝
45
FULL OUTER JOIN
ID Name Subj Marks
1 Alice 1004 95
1 Alice 1045 90
2 Bob 1045 55
3 Chris Null Null
4 Null 1004 100
A full outer join of STUDENT and MARK
Get (incomplete) information of both Chris and student with ID 4
ID Name
1 Alice
2 Bob
3 Chris
ID Subj Marks
1 1004 95
2 1045 55
1 1045 90
4 1004 100
STUDENT MARK
⟗
46
LEFT OUTER JOIN
ID Name Subj Marks
1 Alice 1004 95
1 Alice 1045 90
2 Bob 1045 55
3 Chris Null Null
A left outer join of STUDENT and MARK
Get (incomplete) information of only Chris
ID Name
1 Alice
2 Bob
3 Chris
ID Subj Marks
1 1004 95
2 1045 55
1 1045 90
4 1004 100
STUDENT MARK
⟕
47
RIGHT OUTER JOIN
ID Name Subj Marks
1 Alice 1004 95
1 Alice 1045 90
2 Bob 1045 55
4 Null 1004 100
A right outer join of STUDENT and MARK
Get (incomplete) information of the student with ID 4
ID Name
1 Alice
2 Bob
3 Chris
ID Subj Marks
1 1004 95
2 1045 55
1 1045 90
4 1004 100
STUDENT MARK
⟖