File Structures
CONTEXTUAL DATABASE
& CONCEPTUAL MODELLING
Database Theory & Applications (M)
Dr Chris Anagnostopoulos
ROADMAP
Database in context
Database System abstract architectural view
Families of Data
Focus on structured-data
Entity-Relationship Conceptual Model
Prof Peter Chen (CMU) invented the ER model; 1976
{Entities, Attributes, Relationships, Semantics}
Examples
Query the E-R Model
2
CONTEXTUAL DATABASE
Observation: human activity is data-driven, i.e., we are making
decisions, proceeding with actions and reasoning based on observed data.
Observation: we are limited in storing all data in our memory and recall
them;
Idea: Define a robust base that can store humongous data, update and
delete data, and recall / search data efficiently;
A database holds data relevant to our current contextual activity:
Web Search: a database with web page links [Google Databases]
Data Mining: a database managing multidimensional data for
discovering patterns, outliers, novel trends [UCI ML Repository]
Scientific/Medical databases used for drug discovery, health
monitoring and viruses analytics [MEDLINE]
Customer/Retail databases for customers profiles and preferences,
products [Amazon Databases]
3
DATABASE SYSTEM
The fundamental functionality is to provide software to:
model data: relational data modeling, object-oriented data
modeling, first-order logic data modeling, description logics
data modeling, fuzzy logic modeling…
access data: query for data, insert, delete and update;
analyze data: complex aggregation queries, function
approximation, histograms, multi-dimensional visualization,
outliers detection, …
store (physically) data: from memory to hard disks;
secure data: control access to sensitive & confidential data,
cipher / encode data;
4
DATABASE SYSTEM
maintain data consistency in the face of:
failures, e.g., think of machine crashes due to software
bugs, power cuts, disk crashes;
recovery from failures.
optimize data access to efficiently retrieve data
index and hashing data structures: fast access to data!
optimization algorithms.
5
DATABASE SYSTEM ABSTRACTION
A box with an interface for users/applications offering the discussed
functionality;
Data Modelling;
Declarative Programming Language (SQL) to manage & query data
Declarative: we tell the database what to do and not how to do a task.
result
query (how to optimally process)
query (what I want)
6
DATABASE SYSTEM ZOOM-IN
7
Users
App
Application
(DB Connector)
Parser
Optimizer
Processing
Algorithms
Code
Generator
SQL
Database System Operating System
Data Files
Relational Model
CPU Memory
Data Files Blocks
DATA
Distinguish three families of data:
Structured data: well-defined data structure, e.g., tables;
E.g., 3 Kg: 3 is datum and Kg is meta-data for this datum
Unstructured data, e.g., web pages, texts, sensor measurements;
Less information is provided on interpreting the data (high entropy)
3
Semi-structured data, e.g., XML or JSON documents
Self-descriptive data; they interpret themselves (medium entropy)
3
Traditional DBSs are designed to manage structured data, i.e., we require meta-data to
manage data, e.g., database schema
Modern DBSs manage all families of data: documents, graphs, multimedia content…
8
Kg
3.0
4.1
3.6
3.0, 4.1, 3.6, 6.7, 8.8, …
CONCEPTUAL DATA MODELLING
Challenge: transform a textual description into a set of concepts
conveying exactly the same information.
Approach: Entity-Relationship (E-R) Modeling
Inventor: Prof P Chen; 1976 [*]
Semantics-driven: our reasoning about the world.
Does not guarantee optimality in operations and query
executions.
Approach: Relational Modeling
Inventor: Dr Edgar Codd; 1970
Mathematics-driven: foundation of relational algebra, set theory,
functional dependency theory.
Guarantees query optimization (heuristic optimization)
[*] Chen, P. The Entity-Relationship Model-Toward a Unified View of Data. ACM
Transactions on Database Systems. 1 (1): 9–36, March 1976
9
Edgar F. Codd (1923-2003)
Peter Chen (1947)
PATHWAY…
10
Textual description
SQL Scheme
SQL
Create
Table
{…}
Conceptual model Relational model
SQL Query
SELECT
FROM
WHERE;
SQL Result
E-R MODELLING COMPONENTS
Entities: concepts of interest (subjects)
e.g., lecturer, employee, vehicle, student, bank account
Attributes: properties of an Entity (nouns)
e.g., name, ID, number plate, balance
Meta-data: domain (integer/real, characters, image, …),
private or public or protected, unique, non-unique, encrypted, …
lecturer
age
name
NINO
ID
office-address
office-phone
surname
title
salary
password
username
11
E-R MODELLING COMPONENTS
Relationships: association between Entities (verbs)
e.g., employee is working on a project, student attends a course,
a lecturer teaches students, a supervisor (who is an employee)
supervises another employee.
teaches
lecturer student
supervises
supervisor
employee
attends
student
course
works on
employee project
12
[A1] EXAMPLE: COMPANY
A company stores information about its employees, e.g., name, salary, and
their unique social security number.
The company is organized into departments. Each department has a
unique ID, name, and a particular employee who manages the
department, i.e., the manager.
We keep track of the start date the manager began managing the
department.
A department controls a number of projects, each of which has a unique
name, unique number and a single location.
An employee is appointed only to one department, but may work on
several projects.
We keep track of the current number of hours per week that an employee
works on each project.
An employee has a direct supervisor, who is just another employee.
Task: Identify or imagine as many entities, attributes, and relationships as
possible.
13
[A1] EXAMPLE: COMPANY
A company stores information about its employees, e.g., name,
salary, and their unique social security number.
The company is organized into departments. Each department
has an ID, name, and a particular employee who manages the
department, i.e., the manager.
We keep track of the start date the manager began managing the
department.
A department controls a number of projects, each of which has a
unique name and a single location.
An employee is appointed to one department, but may work on
several projects.
We keep track of the number of hours per week that an employee
works on each project.
An employee is supervised by a supervisor, who is just another
employee.
14
appoints
employee department
is supervised by
project
c
o
n
tr
o
ls
manages
works in
15
DIAGRAMMING ENTITIES
Entity is drawn as a box
with rounded corners labelled
with the name of the concept
that represents that entity,
e.g., Employee, Department,
Project
Instance is a fact that
materializes an entity, e.g.,
‘Chris’ is an Employee
‘HR’ is a Department
‘Product-X’ is a Project
DepartmentEmployee
Project
Entity=
16
DIAGRAMMING ATTRIBUTES
Attribute is drawn as an oval and is
linked to an Entity, e.g., name.
Value is an instantiation of an
Attribute of a specific domain:
name = ‘Chris’ (characters)
salary = £100K (real number)
SSN = 1234 (integer)
Employee
[‘Chris’, 1234, £100K]
[‘Philip’, 5678, £50K]
Department Project
[123, ‘HR’] [Glasgow, ‘Product-X’]
[455, ‘R&D’] [Glasgow, ‘Product-Y’]
SSN
SalaryName
DepartmentEmployee
Project
NameID
NameLocation
tuples
17
DIAGRAMMING RELATIONSHIPS
Relationship is an association among participating Entities (diamond).
There are roles for each participating Entity (optional)
Entity can play many roles in the same or in many other relationships!
DepartmentEmployee appoints
DepartmentEmployee
manages
role: appointee
appoints
role: manager
role: appointee
role: appointer
role: appointer
18
[A2] WORKED EXAMPLE: ADD RELATIONSHIPS
A Department (appointer) appoints several Employees (appointees).
An Employee (worker) works in several Projects.
A Department controls several Projects.
An Employee (manager) manages a Department.
An Employee (supervisor) supervises several Employees (supervisee).
19
SSN
SalaryName
DepartmentEmployee
Project
NameID
NameLocation
20
SSN
SalaryName
DepartmentEmployee
Project
NameID
NameLocation
works
manages
manager
worker
supervises
supervisee
supervisor
appoints
controls
appointee
appointer
21
RELATIONSHIP CARDINALITY 1:1
Each entity participates with
zero, one, or more than one
instances in a relationship.
A role has a cardinality
degree M ≥ 0 indicating the
number of instances adopting
this role.
One to One (1:1)
An Employee as a manager,
manages only one Department
Each Department is managed
only by one manager.
DepartmentEmployee manages
manager
1 1
11
Entity A Entity B
22
RELATIONSHIP CARDINALITY 1:M
One to Many (1:M) and Many to One (M:1)
A (1) Department controls many (M) Projects
Many (M) Projects are controlled by one (1) Department
A (1) Department appoints many (M) Employees
Many (M) Employees are appointed by one (1) Department
Entity A Entity B
1 M
ProjectDepartment controls
1 M
EmployeeDepartment appoints
1 M
23
[A3] WORKED EXAMPLE: RECURSIVE 1:M
An (1) Employee, as supervisor, supervises many (M)
Employees (supervisees)
Many (M) Employees, as supervisees, are supervised only by
one (1) Employee (supervisor)
Employee supervises
1
M
supervisor
supervisee
24
RELATIONSHIP CARDINALITY N:M
Many to Many (M:N)
Each Employee works on several (M) Projects
Each Project is worked by several (N) Employees.
ProjectEmployee works on
N M
Entity A Entity B
N M
25
[A4] WORKED EXAMPLE: MANY-TO-MANY
An (1) Employee (manager) manages only one (1) Department
and (worker) works in five (5) Projects. Those five projects are
controlled by this Department and are assigned to nine (9)
Employees (workers).
Project
works on
1
5
controls
5
9
DepartmentEmployee manages
manager
1 1
worker
26
SSN
SalaryName
DepartmentEmployee
Project
NameID
NameLocation
works
manages
manager
worker
supervises
supervisee
supervisor
appoints
controls
appointee
1
M
N
L
K
1 1
1F
1
appointer
ADDING ALL SEMANTICS…
27
TRANSFORMING N:M TO 1:N AND M:1
Context: An Employee (worker) works in each
Project a specific number of hours per week
and gets certain bonus for each project.
Information per project per employee.
Hours per week
Bonus related to the project
Challenge: whose attributes are these?
Solution: introduce a new entity!
Lemma: A many-to-many (N:M) relationship
always splits into two one-to-many (1:N &
M:1) relationships by transforming the original
one into a new Relationship Entity.
It has these attributes!
It is indirectly generated by the semantics
Employee
Project
works
M
N
Hours
Per Week
Project
Bonus
?
28
Context: An Employee works in
N= 5 Projects.
Context: A Project is worked by
M = 10 Employees.
Introduce entity: Task
Employee has one Task per
Project, thus, 5 Tasks in total.
For each Task, we store:
number of working hours per
week and bonus
A Task refers only to one
Project for a specific Employee.
Employee
Project
works Task
Employee
Project
Refers
Has
Hours
Per Week
10
5
1
1
10
5
Project
Bonus
29
Task
Employee
Project
Refers
Has
Hours
Per Week
1
1
10
5
Project
BonusEmployee: [‘Chris’, 1234, £100K]
Task per Project: [P1, ‘Task 1’, 20, £100]
Task per Project: [P2, ‘Task 2’, 5, £20]
Task per Project: [P3, ‘Task 3’, 2, £50]
Task per Project: [P4, ‘Task 4’, 3, £10]
Task per Project: [P5, ‘Task 5’, 10, £200]
Project: [P1, ‘Project 1’]
Project: [P2, ‘Project 2’]
Project: [P3, ‘Project 3’]
Project: [P4, ‘Project 3’]
Project: [P5, ‘Project 5’]
30
SSN
SalaryName
DepartmentEmployee
Project
NameID
NameLocation
manages
manager
worker
supervises
supervisee
supervisor
appoints
controls
appointee
1
M
N
L
K
1 1
1F
1
appointer
refersTask
has
1
1
Hours
Per Week
Project
Bonus
31
[A5] WORKED EXAMPLE
An (1) Employee (manager) manages one (1) Department;
‘We keep track of the start date the manager began managing the
department.’
DepartmentEmployee manages
manager
1 1
32
[A5] WORKED EXAMPLE
‘We keep track of the start date the manager began managing the
department.’
DepartmentEmployee
has
1 1
Management refers
Start Date
manager
1 1
33