CS计算机代考程序代写 database AI File Structures

File Structures

FUNDAMENTALS OF RELATIONAL

MODELLING

Database Theory and Applications (M)

Dr Chris Anagnostopoulos

CONCEPTUAL DATA MODELLING

Challenge: transform a textual description of a real problem into a set of

concepts conveying exactly the same information.

 Approach: Entity-Relationship Modeling

 Inventor: Prof P Chen; 1976

 Does not guarantee optimality in operations and query

executions.

 Approach: Relational Modeling

 Inventor: Dr Edgar Codd; 1970

 Mathematics-driven: foundation of relational algebra, set theory,

functional dependency theory.

 Guarantees query optimization

9

Edgar F. Codd (1923-2003)

Peter Chen (1947)

CONCEPTUAL DATA MODEL

 A mathematical model for interpreting our data

 Why mathematical model: theorems from: set theory, functional

dependency & normalization theory, and relational algebra.

 Why interpretation: need to understand the context

 Which are the entities? e.g., bank accounts; students; employees …

 Which are the attributes (characteristics) of a data entity? e.g.,

name; address; ID …

 Which are the relationships between entities? e.g., an employee

works in a department; a student attends many courses?

10

RELATIONAL CONCEPTUAL MODEL

E.F. Codd; A Relational Model for Large Shared Data Banks,

Communications of the ACM, June 1970 (ACM Turing Award).

 Informally, any entity might relate with any other entity when

they both share common attributes.

Edgar F. Codd (1923-2003)

• Name

• Age

• Book Title

• ID

• Book Title

• Subject

• #pages

• ISBN

11

RELATIONAL MODEL

 Any entity and any relationship are modelled as a relation, which maps to a 2-

dimensional table:

 an ordered set of attributes (columns);

 a set of tuples (rows), which represents instances;

 There exists a specific attribute that uniquely identifies a tuple in the

relation,

 e.g., sequential numbers 1, 2, 3, …, or

 e.g., logical values like the matriculation number/ID.

12

Attribute 1 Attribute 2 … Attribute n

ID1 Thomas C … active

ID2 Carolyn B … inactive

RELATIONAL MODEL

Example: being in the context of bank accounts:

 Query: attributes of interest to be retrieved, and constrained attributes to filter

out irrelevant tuples.

 Return the names of those customers with active accounts and balance > £500

SELECT name FROM BankAccount

WHERE balance > 500 AND status = ‘active’

# account name balance status

1234567 ‘Thomas C’ £1,000 ‘active’

7654321 ‘Carolyn B’ £2,300 ‘inactive’

attribute

tuple

unique value per tuple

Relation: BankAccount

13

RELATIONAL MODEL: FORMALISM

 Schema of a Relation: R(A1, A2, …., An)

 Relation with name R and an ordered set of attributes A1, A2, …., An

 Each attribute Ai assumes values in a domain Di, i.e., Ai  Di

e.g., BankAccount(account, name, balance, status)

 #account  N = {1, 2, 3, …}; natural numbers (positive integers)

 name  Varchar(50); character strings of maximum length 50

 balance  R; real numbers

 status  {‘active’, ‘inactive’}; finite domain / enumerated type
14

RELATIONAL MODEL: FORMALISM

 A tuple t of R is an ordered set of values corresponding to attributes of R

satisfying the domain constraints:

t = (v1, v2, …, vn), vi  Di

e.g., t = (1234567, ‘Thomas C’, 1000, ‘active’) or

t[account] = 1234567, t[name] = ‘Thomas C’, t[balance] = 1000, t[status]=‘active’

 An instance r(R) is a set of tuples

r(R) = {t1, t2, …, tm}: ti is a tuple of R

15

RELATIONAL MODEL: FORMALISM

NULL: represents an unknown, or inapplicable, or uncertain, or missing value.

 Relational Database Schema: set of relations

S = {R1, R2, …, Rk} U {NULL}

attribute

tuple

relation

NULL values with

different interpretation

16

[A1] RELATIONAL MODEL: COMPANY

 A company is organized into departments. Each department has a unique

number, and a particular employee who manages the department.

 We keep track of the start date that employee began managing the

department.

 A department may have several locations.

 A department controls a number of projects, each of which has a unique name,

number and a single location.

 The company stores the information about the employee (e.g., name, salary,

birth date, …), and the unique social security number. An employee is assigned

only to one department, but may work on several projects, which are not

necessarily controlled by the same department.

 We keep track of the current number of hours per week that an employee

works on each project and their direct supervisor, who is another employee.

 The company keeps track of the dependent (e.g., child) of each employee for

insurance purposes and the corresponding relationship with the employee (e.g.,

son, daughter).

17

[A1] RELATIONAL MODEL: COMPANY

 A company is organized into departments. Each department has a unique

number, and a particular employee who manages the department.

 We keep track of the start date that employee began managing the

department.

 A department may have several locations.

 A department controls a number of projects, each of which has a unique name,

number and a single location.

 The company stores the information about the employee (e.g., name, salary,

birth date, …), and the unique social security number. An employee is assigned

only to one department, but may work on several projects, which are not

necessarily controlled by the same department.

 We keep track of the current number of hours per week that an employee

works on each project and their direct supervisor, who is another employee.

 The company keeps track of the dependent (e.g., child) of each employee for

insurance purposes and the corresponding relationship with the employee (e.g.,

son, daughter).

18

[A1] RELATIONAL MODEL: COMPANY

Schema: Company

Entities:

 Department

 Employee

 Project

 Dependent

Relationships:

 An employee manages a department.

 A department may have several locations.

 A department controls a number of projects.

 An employee is assigned only to one department,

 An employee may work on several projects (for each, store hours per week), which are not

necessarily controlled by the same department.

 A department controls several projects

 An employee is supervised by a supervisor, who is another employee.

 An employee has several dependents, each one corresponding to a specific relationship with the

employee 19

RELATIONAL DATABASE SCHEMA: COMPANY

20

21

Task 1:

Mail a Christmas card to

John’s supervisor’s son.

• Which is his name?

• Which relations

should be joined for

this query?

Task 2:

How many hours per

week does Alicia’s

supervisor work?

• In which projects is

she involved?

• Which relations

should be joined for

this query?

RELATIONAL CONSTRAINTS

 Conditions that must hold on all instances, for each relation.

 Distinguish three fundamental constraints:

 Key constraint (unique tuple identification)

 Entity integrity constraint (keys are never null!)

 Referential integrity constraint (interpretation of relationships)

22

KEY CONSTRAINT; E. CODD, 1970

 Superkey (SK) of relation R is a set of attributes containing at least one attribute

that uniquely identifies any tuple.

 For any two distinct tuples t1 and t2  r(R) it holds true the implication:

t1 ≠ t2 → t1[SK] ≠ t2[SK]

EMPLOYEE (SSN, Ename, Lname, Bdate, Salary, Dno)

 {SSN, Ename, Bdate}

 {SSN} (singleton)

 {SSN, Ename}

 {Ename, Salary}

23

KEY CONSTRAINT; E. CODD, 1970

 Candidate Key is the minimal superkey; i.e., the set with the smallest number of

attributes that uniquely identify tuples.

Let K = {A1, …, Ak}, then:

K is candidate key ↔ K’ = K \ {Ai} is not a SK for any Ai  K

i.e., the removal of any attribute from K results in K’ that is no longer a superkey.

 {SSN, Ename} \ {Ename} = {SSN} is SK, thus {SSN, Ename} is not candidate key

 {SSN, Ename, Lname} \ {Ename, Lname} = {SSN} is SK, thus {SSN, Ename, Lname}

is not candidate key

 {SSN} is the minimal since {SSN}\{SSN} = { } is not a SK

 Primary Key (PK): If a relation has several candidate keys, one is chosen

arbitrarily to be primary key; the rest candidate keys are called secondary keys.
24

[A2] EXAMPLE

• License number (plate) uniquely identifies a car;

• Engine serial number uniquely identifies a car;

Hence, there are two candidate keys:

• K1 = {LicenceNumber}

• K2 = {EngineSerialNumber}

A = {LicenceNumber, EngineSerialNumber}

What can we say for A? superkey or candidate key?

• A is not a candidate key since A \{LicenceNumber} = K2.

• Hence, A is a superkey & not a candidate key

Lesson Learnt: A composite set with unique attributes is not a candidate key.

Convention: the PK attributes are underlined in the relation schema.

25

Candidate key is the minimum

sufficient statistic that discriminates

any pair of tuples

ENTITY INTEGRITY CONSTRAINT; E. CODD, 1970

Principle: Primary Key (PK) cannot be NULL in any tuple of instance r(R).

t[PK]  NULL for any tuple t in r(R)

 If PK has several attributes, NULL is not allowed in any of these attributes

 If {student-id, course-id} is a composite PK then:

student-id  NULL and course-id  NULL

Note: There might be non-key attributes which are not allowed to be NULL, e.g.,

Employee’s surname specified by the database designer, e.g., unique value.
26

REFERENTIAL INTEGRITY CONSTRAINT ; E. CODD, 1970

Roles: referencing relation R1 and referenced relation R2.

 There exists an attribute Foreign Key (FK) in R1 that either has
exactly the same value with the Primary Key (PK) in R2 or is NULL.

t1[FK] references t2[PK] → t1[FK] = t2[PK] or t1[FK] = NULL

If t1[FK] = NULL then the FK in R1 should not be a part of its own

primary key (not violating the entity integrity constraint).

 Notation: directed arc from R1.FK to R2.PK
27

R1 R2

REDESIGN: REFERENTIAL INTEGRITY CONSTRAINT

• Name

• Age

• Book Title

• ID

• Book Title

• Subject

• #pages

• ISBN

• Name

• Age

• ISBN

• ID

• Title

• Subject

• #pages

• ISBN

Referencing R1

(Reader)

Referenced R2

(Book)

28

FK

PK
PK

Can this be NULL?

Can this be NULL?

Can this be NULL?

Can this be NULL?

Can this be NULL?
29

SO FAR…

 Superkey is used to uniquely identify a tuple in a relation.

 Lemma: If K is a superkey, then any superset of K is a superkey.

 Proof…left for exercise

 Candidate Key is the minimal superkey: i.e., K is a candidate key if

there is no subset of K that is also a superkey.

 A relation can have several different candidate keys; one of them is chosen

to be the Primary Key (PK).

 Convention: key attributes are underlined in the relation schema.

 A Foreign Key (FK) in a referencing relation should either be NULL (if it

is not part of the primary key) or have a value of the primary key of the

referenced relation.

30