The Entity/Relationship (E/R) Model & DB Design
Slides by Manos Papagelis, Ryan Johnson, John Mylopoulos, Arnold Rosenbloom, Renee Miller and Diane Horton
Overview
• Using the Entity/Relationship (ER) Model to model the real world
• From there, designing a database schema
– Restructuring of an E/R model
– Translating an E/R model into a logical model (DB Schema)
2
THE ENTITY/RELATIONSHIP (E/R) MODEL
3
Conceptualizing the real-world
• DB design begins with a boss or client who wants a database.
• We must map the entities and relationships of the world into the concepts of a database. This is called modeling.
• Sketching the key components is an efficient
way to develop a design.
– Sketch out (and debug) schema designs
– Express as many constraints as possible
– Convert to relational DB once the client is happy
4
Entity/Relationship Model
• Visual data model (diagram-based)
– Quickly “chart out” a database design
– Easier to “see” big picture
– Comparable to class diagrams in UML
• Basic concepts:
– entities
– relationships among them
– attributes describing the entities and the relationships
5
Example: 2 entities with a relationship6
Name:
Horton
Dept:
CompSci
Group:
CS Education
teaches
Code:
CSC343
Title:
Intro DB
Defining a Schema
• E/R allows us to specify what structures can and must look like.
• We generalize from specific entities & relationships to the sets they are drawn from.
7
Instance
Schema
Entity
[with attributes]
Entity Set
[with attributes]
Relationship [with attributes]
Relationship Set [with attributes]
Entity Sets
• An entity set represents a category of objects that have properties in common and an autonomous existence (e.g., City, Department, Employee, Sale)
• An entity is an instance of an entity set (e.g., Stockholm is a City; Peterson is an Employee)
8
Relationship Sets
• A relationship set is an association between 2+ entity sets (e.g., Residence is a relationship set between entity sets City and Employee)
• A relationship is an instance of a n-ary relationship set (e.g., the pair
9
Example Instance
10
Exam
Recursive Relationships
• Recursive relationships relate an entity set to itself
• The relationship may be asymmetric
– If so, we indicate the two roles that the entity plays in the relationship
11
Ternary Relationships
12
Attributes
• Describe elementary properties of entities or relationships (e.g., Surname, Salary and Age are attributes of Employee)
• May be single-valued, or multi-valued
13
Composite Attributes
• composite attributes are grouped attributes of the same entity or relationship that have closely connected meaning or uses
14
Example Schema with Attributes
15
Keys in E/R
• Notation: solid circle
• If multi-attribute, connect with a line and a “knob”
16
Cardinalities
• Each entity set participates in a relationship set with a
17
minimum (min) and a maximum (max) cardinality
• Cardinalities constrain how entity instances participate
in relationship instances
• Notation: pairs of (min, max) values for each entity set
Cardinalities
• Cardinalities are pairs of non-negative integers
(min, max) such that min ≤ max.
• minimum cardinality min:
– If 0, entity participation in the relationship is optional
– If 1, entity participation in the relationship is mandatory
– Other values are possible
• maximum cardinality max:
– If 1, each instance of the entity is associated at most with a single
instance of the relationship
– If > 1, then each instance of the entity can be associated multiple instances of the relationship
– We write N to indicate no upper limit
– Other values are possible
20
Cardinality Examples
21
ER Diagrams Worksheet Q1
Multiplicity of relationships
If entity sets E1 and E2 participate in relationship R with cardinalities (n1, N1) and (n2, N2) then the multiplicity of R is N1-to-N2 (which is the same as saying N2-to-N1)
Ì
23
Ì
1-to-1
N-to-1 OR 1-to-N
N-to-N
Cardinalities of Attributes
• Describe min/max number of values an attribute can have
• When the cardinality of an attribute is (1, 1) it can be omitted
from the diagram. This is a single-valued attribute.
• The value of an attribute may also be null, or have several
values (that is a multi-valued attribute). Surname
24
Person
(0,N)
License Number
(0,1)
CarRegistration#
Cardinalities of Attributes (cont.)
• Multi-valued attributes often represent situations that can be modeled with additional entities. E.g., the ER schema of the previous slide can be revised into:
25
Surname
(0,1) License Number
(0,N)
(1,1)
Person
Owns
Car
CarRegistration#
More about keys in E/R
• A key is a minimal set of attributes which uniquely identifies instances of an entity set.
• Usually, a key is formed by one or more attributes of the entity itself. This is an internal key.
26
internal, single-attribute key
internal, multi-attribute key
Weak entity sets
• Sometimes, an entity set doesn’t have a key among its
attributes. This is called a weak entity set.
• Solution: the keys of related entities are brought in to
help with identification (becoming foreign keys). foreign, multi-attribute key
27
Weak entity set
Keys of relationship sets
• The key for a relationship set consists of the keys of the entity sets that it relates.
• In this example, the key of the Made By relationship set is Part Number and Name.
Name
Part Number
(1,1) Made By
(1,N)
Address
Name
28
Part
Manufacturer
ER Diagrams Worksheet Q2
Requirements for Keys
• Each attribute in a key must have (1,1) cardinality.
• A foreign key for a weak entity set must come through a relationship which the entity set participates in with cardinality (1,1).
• E.g., what if a student could enrol at > 1 university?
30
Requirements for Keys
• A foreign key may involve an entity that has itself a foreign key, as long as cycles are not generated.
• Each entity set must have at least one (internal or foreign) key.
• E.g., What if universities can have the same name?
31
A larger schema with keys
ER Diagrams to DB Schema Worksheet Q1
Challenge: modeling the “real world”
• Life is arbitrarily complex
– Directors who are also actors? Actors who play multiple roles in one movie? Animal actors?
• Design choices: Should a concept be modeled as an entity, an attribute, or a relationship?
• Limitations of the E/R Model: A lot of data semantics can be captured but some cannot
• Key to successful model: parsimony
– As complex as necessary, but no more
– Choose to represent only “relevant” things
34
EXAMPLE
35
From real world to E/R Model
We wish to create a database for a company that runs training courses. For this, we must store data about trainees and instructors. For each course participant (about 5,000 in all), identified by a code, we want to store her social security number, surname, age, sex, place of birth, employer’s name, address and telephone number, previous employers (and periods employed), the courses attended (there are about 200 courses) and the final assessment for each course. We need also to represent the seminars that each participant is attending at present and, for each day, the places and times the classes are held.
Each course has a code and a title and any course can be given any number of times. Each time a particular course is given, we will call it an ‘edition’ of the course. For each edition, we represent the start date, the end date, and the number of participants. If a trainee is self-employed, we need to know her area of expertise, and, if appropriate, her title. For somebody who works for a company, we store the level and position held. For each instructor (about 300), we will show the surname, age, place of birth, the edition of the course taught, those taught in the past and the courses that the tutor is qualified to teach. All the instructors’ telephone numbers are also stored. An instructor can be permanently employed by the training company or freelance.
36
From real world to E/R Model
We wish to create a database for a company that runs training courses. For this, we must store data about the trainees and the instructors. For each course participant (about 5,000), identified by a code, we want to store her social security number, surname, age, sex, place of birth, employer’s name, address and telephone number, previous employers (and periods employed), the courses attended (there are about 200 courses) and the final assessment for each course. We need also to represent the seminars that each participant is attending at present and, for each day, the places and times the classes are held.
Each course has a code and a title and any course can be given any number of times. Each time a particular course is given, we will call it an ‘edition’ of the course. For each edition, we represent the start date, the end date, and the number of participants. If a trainee is self-employed, we need to know her area of expertise, and, if appropriate, her title. For somebody who works for a company, we store the level and position held. For each instructor (about 300), we will show the surname, age, place of birth, the edition of the course taught, those taught in the past and the courses that the tutor is qualified to teach. All the instructors’ telephone numbers are also stored. An instructor can be permanently employed by the training company or freelance.
37
Glossary
38
The picture can’t be displayed.
More Annotations
We wish to create a database for a company that runs training courses. For this, we must store data about trainees and instructors. For each course participant (about 5,000), identified by a code, we want to store her social security number, surname, age, sex, place of birth, employer’s name, address and telephone number, previous employers (and periods employed), courses attended (there are about 200 courses) and the final assessment for each course. We need also to represent seminars that each participant is attending at present and, for each day, the places and times the classes are held.
Each course has a code and a title and any course can be given any number of times. Each time a particular course is given, we will call it an ‘edition’ of the course. For each edition, we represent the start date, the end date, and the number of participants. If a trainee is self-employed, we need to know her area of expertise, and, if appropriate, her title. For somebody who works for a company, we store the level and position held. For each instructor (about 300), we will show the surname, age, place of birth, the edition of the course taught, those taught in the past and the courses that the tutor is qualified to teach. All the instructors’ telephone numbers are also stored. An instructor can be permanently employed by the training company or freelance.
39
… the E/R model result
40
isA
isA
FROM E/R MODEL TO DATABASE SCHEMA
41
Two Steps
• Restructure the ER schema to improve it, based on criteria
• Translate the schema into the relational model
42
1. RESTRUCTURING AN E/R MODEL
43
Restructuring Overview Input: E/R Schema
Output: Restructured E/R Schema
It includes (not necessarily in this order):
a. Analysis of redundancies
b. Choosing entity set vs attribute
c. Limiting the use of weak entity sets
d. Selection of keys
e. Creating entity sets to replace attributes with cardinality greater than one
44
1a. Analysis of redundancies
45
Example: no redundancy It is not redundant to have Name twice.
46
Name
Part Number
(1,1) (1,N)
Address
Name
Part
Made By
Manufacturer
Example: redundancy What is redundant here?
47
Manf Name
Name
Part Number
(1,1) (1,N)
Address
Name
Part
Made By
Manufacturer
Example: redundancy What is redundant here?
Manf Name Manf Address
Name
Part
Part Number
48
1b. Entity sets vs attributes
49
Overall, Prefer Attributes
• An entity set should satisfy at least one of the following conditions:
– It is more than the name of something; it has at least one non-key attribute, or
– It is the “many” in a many-one or many-many relationship.
• Rules of thumb
– A “thing” in its own right => Entity Set
– A “detail” about some other “thing” => Attribute
This is just about avoiding redundancy
50
E.S. vs. attributes: examples
Domain fact change: A part can have more than one manufacturer …
Name
Part Number
Manf Name
(1,N) (1,N)
Manf Address
Address
Name
51
Part
Made By
Manufacturer
Name
Part Number
Part
E.S. vs. attributes: examples
Domain fact change: Not representing Manufacturer address …
52
Name
Part Number
Manf Name
(1,N)
Part
(1,N) (1,N)
(No address attribute)
Name
Part
Made By
Manufacturer
Name
Part Number
E.S. vs. attributes: examples
53
New domain
Name
Student number
Mentor email
(0,1) (1,1)
Mentor name
Name
email
Student
Mentored by
Mentor
Name
Student
Student number
E.S. vs. attributes: examples
Domain fact change: A mentor can have more than one mentee …
54
Name
Student number
Mentor email
(0,1) (1,N)
Mentor name
Name
email
Student
Mentored by
Mentor
Name
Student
Student number
1c. Limiting weak entity sets
55
When to use weak entity sets?
• The usual reason is that there is no global authority capable of creating unique ID’s
• Example: it is unlikely that there could be an agreement to assign unique student numbers across all students in the world
56
Don’t Overuse Weak Entity Sets
• Beginning database designers often doubt that anything could be a key by itself
– They make all entity sets weak, supported by all other entity sets to which they are linked
• It is usually better to create unique IDs – Social insurance number, automobile VIN, etc.
57
1d. Selection of keys
58
Selecting a Primary Key
• Every relation must have a primary key
• The criteria for this decision are as follows:
– Attributes with null values cannot be part of primary keys
– One/few attributes is preferable to many attributes
– Internal keys preferable to external ones (weak entities depend for their existence on other entities)
– A key that is used by many operations to access instances of an entity is preferable to others
59
Avoid multi-attribute and string keys60
• They are wasteful
– e.g. Movies(title, year, …): 2 attributes in key, requires ~16 bytes
– Number of movies ever made (<< 232) can be distinguished with 4 bytes => Having an integer movieID key saves 75% space and a lot of typing
• They break encapsulation
– e.g. Patient(firstName, lastName, phone, …)
– Security/privacy hole
=> Integer patientID prevents information leaks
• They are brittle (nasty interaction of above two points)
– Name or phone number change? Parent and child with same name?
– Patient with no phone? Two movies with same title and year? => Internal ID always exist, are immutable, unique
Also: computers are really good at integers!
1e. Creating entity sets to replace attributes with cardinality greater than one
61
Attributes with cardinality > 1
• The relational model doesn’t allow multi-valued
attributes. We must convert these to entity
sets.
Name City
Phone (1,N)
Name City
62
Company
Company
(1,N) Possesses (1,1)
Phone
number
2. TRANSLATING AN E/R MODEL INTO A DB SCHEMA
63
Translation into a Logical Schema
Input: E/R Schema
Output: Relational Schema
• Starting from an E/R schema, an equivalent
relational schema is constructed
– “equivalent”: a schema capable of representing the same information
• A good translation should also:
– not allow redundancy
– not invite unnecessary null values
64
The general idea
• Each entity set becomes a relation. Its attributes are
– the attributes of the entity set.
• Each relationship becomes a relation. It’s attributes are
– the keys of the entity sets that it connects, plus
– the attributes of the relationship itself.
• We’ll see opportunities to simplify.
65
Many-to-Many Binary Relationships
Many-to-Many Binary Relationships
Employee(Number, Surname, Salary) Project(Code, Name, Budget) Participation(Number, Code, StartDate)
Participation[Number] ⊆ Employee[Number] Participation[Code] ⊆ Project[Code]
Many-to-Many Recursive Relationships68
Many-to-Many Recursive Relationships69
Product(Code, Name, Cost) Composition(Part, SubPart, Quantity)
Composition[Part] ⊆ Product[Code] Composition[SubPart] ⊆ Product[Code]
Many-to-Many Ternary Relationships70
Many-to-Many Ternary Relationships71
Supplier(SupplierID, SupplierName) Product(Code, Type)
Department(Name, Telephone)
Supply(Supplier, Product, Department, Quantity)
Supply[Supplier] ⊆ Supplier[SupplierID] Supply[Product] ⊆ Product[Code] Supply[Department] ⊆ Department[Name]
Simplifications
• These straight translations are acceptable. • But we can often simplify.
One-to-Many Relationships
with mandatory participation on the “one” side
One-to-Many Relationships
with mandatory participation on the “one” side
Standard translation:
Player(Surname, DOB, Position)
Team(Name, Town, TeamColours) Contract(PlayerSurname, PlayerDOB, Team, Salary)
Contract[PlayerSurname, PlayerDOB]⊆Player[Surname, DOB] Contract[Team] ⊆Team[Name]
One-to-Many Relationships
with mandatory participation on the “one” side
Simpler translation:
Player(Surname, DOB, Position, TeamName, Salary) Team(Name, Town, TeamColours)
Player[TeamName] ⊆Team[Name]
One-to-One Relationships with mandatory participation for both
One-to-One Relationships with mandatory participation for both
The standard translation has 3 relations (what is key of management)?
One-to-One Relationships with mandatory participation for both
Simpler v1 (with Management info moved over to Head):
Head(Number, Name, Salary, Department, StartDate) Department(Name, Telephone, Branch)
Head[Department] ⊆ Department[Name]
Simpler v2 (with Management info moved over to Department): Head(Number, Name, Salary)
Department(Name, Telephone, Branch, HeadNumber, StartDate)
Department[HeadNumber] ⊆ Head[Number]
One-to-One Relationships with mandatory participation for both
Simpler v3 (with Management info split between Head and Department):
Head(Number, Name, Salary, StartDate) Department(Name, Telephone, HeadNumber, Branch)
Department[HeadNumber] ⊆ Head[Number]
One-to-One Relationships with optional participation for one
One-to-One Relationships with optional participation for one
Standard:
Employee(Number, Name, Salary) Department(Name, Telephone, Branch)
Management(Head, Department, StartDate) Management[Head] ⊆ Employee[Number] Management[Department] ⊆ Department[Name]
One-to-One Relationships with optional participation for one
Simpler:
Employee(Number, Name, Salary)
Department(Name, Telephone, Branch, Head, StartDate)
Department[Head] ⊆ Employee[Number]
ER Diagrams to DB Schema Worksheet Q 2-3
Summary of Types of Relationship
• many-to-many (binary or ternary) • one-to-many
– mandatory: (1,1) on the “one” side
– optional: (0,1) on the “one” side • one-to-one
– both mandatory: (1,1) on both sides
– one mandatory, one optional:
(1,1) on one side and (0,1) on other side
– both optional: (0,1) on both sides
84
Will the schema be “good”?
• If we use this process, will the schema we get be a good one?
• The process should ensure that there is no redundancy.
• But only with respect to what the E/R diagram represents.
• Crucial thing we are missing: functional dependencies. (We only have keys, not other FDs.)
• So we still need FD theory.
85
Redundancy can be desirable
• Disadvantages of redundancy:
– More storage (but usually at negligible cost)
– Additional operations to keep the data consistent
• Advantages of redundancy:
– Speed: Fewer accesses necessary to obtain information
• How to decide to maintain or eliminate a redundancy?
Examine:
– the speedup in operations made possible by the redundant information
– the relative frequency of those operations
– the storage needed for the redundant informations
Performance analysis is required to decide about redundancy
86