The Relational Database Model
MONASH
INFORMATION TECHNOLOGY
Overview
We now have a conceptual model for Monash Software, it is time to move to the second stage and map this to a logical model
For our unit this will involve mapping to the Relational Model in preparation for implementation in a RDBMS
▪ Relational Model
▪ RelationalAlgebra
2
The Relational Model
▪ IntroducedbyCODDin1970-thefundamentalbasisfortherelationalDBMS
▪ BasicstructureisthemathematicalconceptofaRELATIONmappedtothe
‘concept’ of a table (tabular representation of relation)
– Relation – abstract object
– Table – pictorial representation
– Storage structure – “real thing” – eg. isam file of 1’s and 0’s
▪ RelationalModelTerminology
– DOMAIN – set of atomic (indivisible) values
• specify
– name
– data type
– data format ▪ Examples:
– customer_number domain – 5 character string of the form xxxdd
– name domain – 20 character string
– address domain – 30 character string containing street, town & postcode
– credit_limit domain – money in the range $1,000 to $99,999
3
A Relation
▪ Arelationconsistsoftwoparts – heading
– body
▪ RelationHeading
– Also called Relational Schema consists of a fixed set of attributes • R(A1,A2,…….An)
– R = relation name, Ai = attribute i
– Each attribute corresponds to one underlying domain:
• Customerrelationheading:
– CUSTOMER (custno, custname, custadd, credlimit)
» dom(custno) = customer_number » dom(custname) = name
» dom(custadd) = address
» dom(credlimit) = credit_limit
custno
custname
custadd
credlimit
4
Relation Body
▪ RelationBody
– Also called Relation Instance (state of the relation at any point in time)
• r(R)={t1,t2,t3,….,tm}
• consistsofatime-varyingsetofn-tuples
– Relation R consists of tuples t1, t2, t3 .. tm
– m = number of tuples = relation cardinality • eachn-tupleisanorderedlistofnvalues
• t=
– n = number of values in tuple (no of attributes) = relation degree
– In the tabular representation:
• Relationheading⇨columnheadings
• Relationbody⇨setofdatarows
custno
custname
SMI13
JON44 BRO23
BROWN
Here Rd, Clayton, 3168
2000
10000
10000
SMITH
JONES
custadd
Wide Rd, Clayton, 3168
Narrow St, Clayton, 3168
credlimit
5
Relation Properties
▪ No duplicate tuples
– bydefinitionsetsdonotcontainduplicateelements
• hence tuples must be unique ▪ Tuples are unordered within a relation
– bydefinitionsetsarenotordered
• hence tuples can only be accessed by content
▪ No ordering of attributes within a tuple – bydefinitionsetsarenotordered
6
Relation Properties cont’d
▪ Tuple values are atomic – cannot be divided
• EMPLOYEE (eid, ename, departno, dependants)
– not allowed: dependants (depname, depage) multivalued
– hencenomultivalued(repeating)attributesallowed, called the first normal form rule
▪ COMPARE with tabular representation
• normally nothing to prevent duplicate rows • rows are ordered
• columns are ordered
– tablesandrelationsarenotthesame’thing’
7
Functional Dependency
▪ Functional Dependency:
– A set of attributes A functionally determines an attribute B if, and only if,
for each A value, there is exactly one value of B in the relation. It is denoted as A → B (A determines B, or B depends on A)
• order_no → order_date
• prod_no → prod_desc
• order_no, prod_no → qty_ordered
9
Relational Model Keys
▪ A superkey of a relation R is an attribute or set of attributes which exhibits only the uniqueness property
– No two tuples of R have the same value for the superkey (Uniqueness property)
– t1[superkey] ≠ t2[superkey]
▪ A candidate key CK of a relation R is an attribute or set of attributes
which exhibits the following properties:
– Uniqueness property (as above), and
– No proper subset of CK has the uniqueness property
(Minimality or Irreducibility property) ie. a minimal superkey
▪ One candidate key is chosen to be the primary key (PK) of a relation. Remaining candidate keys are termed alternate keys (AK).
Only ONE primary key
(may be composed of
many attributes – a
composite primary key) 10
Many possible superkeys
Potentially many possible candidate keys
Selection of a Primary key
▪ A primary key must be chosen considering the data that may be added to the table in the future
– Names, dates of birth etc are rarely unique and as such are not a good option
– PK should be free of ‘extra’ semantic meaning and security compliant, preferably a single attribute, preferably numeric (see Table 5.3 Coronel & Morris)
– Natural vs Surrogate primary key
• PATIENT_TREATMENT (patient_id, physician_id, treatment_code, pt_date, pt_time, pt_result)
– Superkey
– CK
– PK
– Issues with PK?
14
15
Null in the Relational Model
▪ NULL is NOT a value – is a representation of the fact that there is NO VALUE
▪ Reasons for a NULL:
– VALUE NOT APPLICABLE –
• EMP relation – empno, deptno, salary, commission – commission only applies to staff in sales dept
– VALUE UNKNOWN –
• Joe’s salary is NULL, Joe’s salary is currently unknown
– VALUE DOES NOT EXIST –
• Tax File Number – is applicable to all employees but Joe may not
have a number at this time
– VALUE UNDEFINED –
• Certain items explicitly undefined eg. divide by zero
– Columns Number_of_payments, Total_payments – ColumnAverage_payment_made
– If Number_of_payments = 0 => Average undefined
16
Writing Relations
▪ Relations may be represented using the following notation:
– RELATION_NAME(attribute1,attribute2,…)
▪ The primary key is underlined.
▪ Example:
– STAFF(staffid,surname,initials,address,phone)
17
Relational Database
▪ A relational database is a collection of normalised relations.
▪ Normalisation is part of the design phase of the database and will be discussed in a later lecture.
Example relational database:
ORDER (order_id, orderdate,)
ORDER_LINE (order_id, product_id, quantity) PRODUCT (product_id, description, unit_price)
18
Foreign Key (FK)
▪ FK: An attribute/s in a relation that exists in the same, or another relation as a Primary Key.
▪ Referential Integrity
– AForeignKeyvaluemusteithermatchthefull
primary key in a relation or be NULL.
▪ The pairing of PK and FK creates relationships (logical connections) between tables when implemented in a RDBMS. Hence the abstraction away from the underlying storage model.
19
Data Integrity
▪ Entity integrity
– PrimarykeyvaluemustnotbeNULL.
• No duplicate tuple property then ensures that each primary key must be unique
▪ Referential integrity
– ThevaluesofFKmusteithermatchavalueofafull PK in the related relation or be NULL.
▪ Column/Domain integrity
– Allvaluesinagivencolumnmustcomefromthe same domain (the same data type and range).
22
Relational DMLs
▪ Relational Calculus
▪ Relational Algebra
▪ Transform Oriented Languages (e.g. SQL)
▪ Graphical Languages
▪ Exhibit the “closure” property – queries on relations produce relations
24
Relational Calculus
▪ Based on mathematical logic.
▪ Non-procedural.
▪ Primarily of theoretical importance.
▪ May be used as a yardstick for measuring the power of other relational languages (“relational completeness”).
▪ Operators may be applied to any number of relations.
25
RELATIONAL ALGEBRA
Manipulation of relational data
26
Relational Algebra
▪ Relationally complete.
▪ Procedural.
▪ Operators only apply to at most two relations at a time. ▪ 8 basic operations:
– single relation: selection, projection
– cartesianproduct,join
– union
– intersection
– difference
– division
27
Relational Operation PROJECT
28
Relational Operation SELECT
29
Relational Operation Multiple Actions 2
1
30
SQL vs Relational Algebra in the Database
31
JOIN
▪ Join operator used to combine data from two or more relations, based on a common attribute or attributes.
▪ Different types: – theta-join
– equi-join
– naturaljoin – outerjoin
33
THETA JOIN (Generalised join)
(Relation_1) ⨝F (Relation_2)
– Fisapredicate(i.e.truth-valuedfunction)whichis
of the form Relation_1.ai θ Relation2.bi
• CUSTOMER.cust_no θ ORDER.cust_no
– θisoneofthestandardarithmeticcomparison operators, i.e. <, ≤, =, ≥, >
– Mostcommonly,θisequals(=),butcanbeanyof the operators
• EMPLOYEE.emp_sal > SALARYSCALE.step_5
34
NATURAL JOIN
STUDENT MARK
ID Name ID Subj Marks
1Alice 1100495 2Bob 2104555
Step 1: STUDENT X MARK
Step 2: delete rows where IDs do not match (select =)
1 1045 90
STUDENT. Name ID
1 Alice 1 Alice
1 Alice
2 Bob
2 Bob 2 Bob
MARK.ID Subj Marks
1 1004 95
2 1045 55
1 1045 90
1 1004 95
2 1045 55
1 1045 90
37
NATURAL JOIN
STUDENT MARK
ID Name ID Subj Marks
1Alice 1100495 2Bob 2104555
Step 1: STUDENT X MARK
Step 2: delete rows where IDs do not match (select =) Step 3: delete duplicate columns (project away)
1
1045 90
STUDENT.I Name D
MARK.ID Subj Marks
1 1004 95 1 1045 90
1 Alice
1 Alice
2 Bob 2 1045 55
38
NATURAL JOIN
STUDENT MARK
ID Name ID Subj Marks
1 Alice ⨝1 100495 2Bob 2104555
Step 1: STUDENT X MARK
Step 2: delete rows where IDs do not match (select =) Step 3: delete duplicate columns (project away)
1
1045 90
Marks 95
90
55
ID 1 1 2
Name Subj Alice 1004 Alice 1045 Bob 1045
A natural join of STUDENT and MARK
39
STUDENT
ID Name
1 Alice
2 Bob
3 Chris
⨝
MARK
ID Subj Marks
1 1004 95
2 1045 55
1 1045 90 4 1004 100
OUTER JOIN
No information for Chris (no mark, e.g. just enrolled) and the student with ID 4 (no student, e.g. quit uni)
ID 1 1 2
Name Subj Marks Alice 1004 95 Alice 1045 90 Bob 1045 55
A natural join of STUDENT and MARK
41
STUDENT ID
1 2 3
Name Alice Bob Chris
MARK
ID Subj Marks
⟗
FULL OUTER JOIN
Get (incomplete) information of both Chris and student with ID 4
ID Name 1 Alice
1 Alice
2 Bob
3 Chris
4
Subj Marks 1004 95 1045 90 1045 55
1 1004 95 2 1045 55 1 1045 90 4 1004 100
1004 100 A full outer join of STUDENT and MARK
Null Null
Null
42
STUDENT
ID Name
1 Alice 2 Bob 3 Chris
⟕
MARK
ID Subj Marks
1 1004 95 2 1045 55 1 1045 90 4 1004 100
LEFT OUTER JOIN
← Get (incomplete) information of only Chris
ID Name 1 Alice 1 Alice 2 Bob
3 Chris
Subj Marks 1004 95 1045 90 1045 55
Null Null
A left outer join of STUDENT and MARK
Memory aid: Chris is on the ← LEFT of the nulls.
43
STUDENT
ID Name
1 Alice 2 Bob 3 Chris
MARK
ID Subj Marks
1 1004 95 ⟖2 104555 1 1045 90
4 1004 100
ID 1 1 2 4
Name Alice Alice Bob
Subj Marks 1004 95 1045 90 1045 55 1004 100
RIGHT OUTER JOIN
Get (incomplete) information of the student with ID 4 →
Null
A right outer join of STUDENT and MARK.
Memory aid: the marks data is on the RIGHT → of the null.
44