Database Fundamentals
Conceptual Design
Most DB Designers will use the ER or UML approach to start off with an acceptable design and then use Normalisation rules to check the design for efficiency and redundancy
1
Relational Database Modelling
Conceptual Modelling
Database requirements are collected and visualised as an ER (or
UML) diagram
Logical Modelling
The next phase is to create functional relational schemas with keys
based on the conceptual design.
This includes deciding which candidate key will become the primary key. A visual depiction of how the relational database will be implemented
Physical Design
Take the relational schemas and implement them in a DBMS
2
2
3
Entity Relation Model/UML
Entities – correspond to classes in object oriented (oo) programming They reflect real-world and theoretical/conceptual objects
Relationships
Depict how may objects of one type interact with another
1 -> 1, 1 -> many, many -> many
+ descriptive/enhanced relationships (inheritance, weak entity, aggregation,
composition)
Attributes – describing details of types
For entity types
For relationship types that record entity interactions Can include the domain (data type)
Multiplicity
Constraints on relationships – related to the relationship type
Entity Relation Model/UML
Entities
These represent what the database keeps track of
They are translated into relations/tables in the final database Entities describe a group of objects with similar properties
A “real-world” object is one that can be easily identifiable and distinguishable
Real: Student, Book etc
Conceptual: Class, Course
Some are the result of relationships between Entities
StudentClasses, StudentCourses, CourseBooks Represented as rectangles in the ER/UML diagram
Each represents a number of instances (ie tuples/rows of records)
4
4
Entity Relation Model/UML
When you are designing a database you should:
1. Create a List of all the real-world AND conceptual
entities you need to store data about
2. To each of those items, list the attributes required to
capture the desired data
– ALL of them!
– Especially those required for specific searches/purposes
3. Check that each of your entities only captures only relevant information to that entity
Relevant to that entity and only to that entity
5
5
Entity Relation Model/UML
When you are designing a database you should: 4. Check to see if you have any candidate keys
Do they apply to ALL Records? Are they reliable?
If so, pick the best one (usually the smallest one) as the
Primary Key
If not, add an artificial (surrogate) Primary key – a standard
ID field with a number that increases for each new row
6
6
Entity Relation Model/UML
Process
1. List the various Entities needed to capture the required data –
consider future requirements
2. Add to the Entities the attributes required to capture the
relevant data
3. Determine the Candidate Keys. Indicate the preferred PK if
known
7
7
Entity Relation Model/UML
Process
8
4.
Optional: Add Data Types and domains to the attributes CustomerName – do we need the individual components?
custFirstName, custLastName, middle name or initials? CustomerAddress – do we need the individual components?
custAddrNumber, custAddrStreetName, custAddressAptNmbr, custCity, custState, custPostcode, custPhone
This division allows us to check the number is valid, sort by city, state, postcode etc.
If only using Phone number to call them it can be left as a single column
If needed to search on other parts – then separate.
8
Entity Relation Model/UML
Process
9
4.
Optional: Add Data Types and domains to the
attributes
Does a Customer have a single phone number or
Address? – place in separate class if many Home, mobile
Home address, shipping address etc
9
Unified Modelling
Language
A Step By Step Guide
10
10
UML Data Modelling – Basic Concepts
11
In this module we will cover the basic requirements for building a UML Model:
Classes
Associations Association Classes Inheritance – Sub Classes
11
UML Data Modelling – Basic Concepts
Terminology in UML
A Class is equivalent to a table or relation in the proposed
relational DB
An Object is a single instance of an Entity or Relation (ie, a
tuple)
An Association is a relationship between two classes
Associations may be accompanied by their own class (an Association Class) that records additional detail about the association
Executable UML shows hidden classes and presents Association Classes differently
12
12
UML Data Modelling – Basic Concepts
Classes:
Represent our real-
world/conceptual objects entities in our conceptual
design
Consist of a class name,
attributes and nominate any candidate keys
Can nominate a PK if known
13
If you have already chosen the PK otherwise nominate it as a {CK}
Class Name
attribute 1 {PK} attribute 2 {CK1} attribute 3 {CK1} attribute N…
Where a candidate key is composed of more than one attribute include a number to indicate it forms part of the same collection (ie. {CK1})
13
UML Data Modelling – Basic Concepts
Classes
Represent our real-world
objects (entities in our
conceptual design) Consist of a class name,
attributes and nominate any candidate keys
14
Students
studentID {PK} emailID {CK1} studentName
Courses
courseID {PK} courseName {CK1}
14
UML Data Modelling – Basic Concepts
Attributes
Represent data about real-world objects Describe our real-world objects entities in our
conceptual design
No spaces, No special characters Use camelCase
Derived attributes are those whose values are calculated from the values of other attributes
total = qty * price
Structured attributes are those composed of
more than one attribute
Name is really Salutation + FirstName + LastName
15
PurchaseOrders
qty price /total
Employees
name: salutation
firstName
lastName address:
addressLine1 addressLine2
15
UML Data Modelling – Basic Concepts
Attributes Highlight any
potential/desired keys
Candidate Key (CK)
The attribute(s) that could be
used to uniquely identify each record
Primary Key (PK)
The attribute(s) chosen to uniquely identify each record
16
Student
studentID {PK}
emailID {CK1}
studentName
Course
courseID {PK}
courseName {CK1}
16
UML Data Modelling – Basic Concepts
Pick PKs from the available Candidate Keys Keep the PK as small as possible
The PK will be distributed among related tables – 1000s of copies of that PK value will appear in the database
The smaller they are, the less the DBMS has to hunt through to find related records
17
17
UML Data Modelling – Basic Concepts
Pick PKs from attributes that are relatively stable You don’t want the DBMS constantly updating PK and FK
values
Pick a column that has no inherent meaning for the entity
Eg StudentID, ProductID are good choices (in general)
Picking StudentName + BirthDate + Address is bad as these can
change over time
Add a surrogate key if no good candidate keys for PK
But remember this is more data to manage that has no meaning or relationship to the business data
18
18
UML Data Modelling – Basic Concepts
Consider 1NF when selecting attributes
Columns or attributes should be atomic – they should provide a
single piece of useful information and not consist of subparts or
multiple values
A Name column may be no good
A Name could have a prefix (Mr, Mrs) a First + Last Name and even a Middle name
What if you want to search on Last Name? Or sort records by prefixes/titles?
What if some names don’t have prefixes?
19
19
UML Data Modelling – Basic Concepts
Consider 1NF when selecting attributes Consider any business use on the values in a column
Do you need to directly access the Last Name for a customer search?
Phone Numbers – will you need access to area codes vs the whole phone number?
do you need to search for (+61)?
Will you need to sort by suburb, postcode or street number? – if
not, maybe lump them together
Do you have any sets of columns that repeat related information
– remove and separate out.
mobile1, mobile2, home, office, emergency, fax, email. . .
20
20
UML Data Modelling – Basic Concepts
Other tips
Values represented as options in a Drop Down List or that
must be accurate should be represented by a separate class This serves as a “look-up” table
Where possible, small natural keys may improve data readability
Surrogate keys have no meaning
Natural keys can be useful
eg streetTypeID = ‘Rd’ and streetType = ‘Road’
21
21
UML Data Modelling – Basic Concepts
22
Students
studentID : int (PK)
studentName : varchar(100)
emailID : varchar(10) (CK)
favColour : varchar(10) {red, green, blue}
Attributes
Can include the Data Type
and Domain Data Type int, char
Domain
range of acceptable values {red, green, blue}
Data type
Domain
22
UML Data Modelling – Basic Concepts
Associations
Capture relationships between
objects of two different classes If Students Enrol in a course we
create an association line between the classes and give it a description or a role
This indicates that objects in the student class interact with objects in the course class
23
Relationship Description
Courses
courseID {PK} courseName {CK}
Enrols in
Relationship/ Association
Students
studentID {PK} emailID {CK} studentName
23
UML Data Modelling – Basic Concepts
24
Multiplicity
Indicates the number of
interactions between an instance of an object and instances of another through the association
No value or a 1 implies 1..1 (1:1 relationship)
* implies 0..*
Can have a specific number as
minimum or maximum
1.. 1
10..200
Has →
Has →
0..*
3..*
Entity1
Entity2
attr1 attr2 …
attr1 attr2 …
Multiplicity: An instance of Entity 2 relates to 1 of Entity1
Multiplicity:
An instance of Entity1 relates to zero or more of Entity2
Students
attr1 attr2 …
Courses
attr1 attr2 …
A course must have a minimum of 10 and maximum of 200 students
A student must enrol in at least 3 courses
24
UML Data Modelling – Basic Concepts
Multiplicity
One-to-One (1:1)
Every object is associated with at most one of the other object Denoted 0..1 on both sides or 1..1
25
Citizen
firstName lastName dateOfBirth
Has →
AustralianPassport
1.. 1
0..1
passportNo {PK} dateIssued dateExpires
Each “Passport” belongs to one and only one Person
Each “Person” has zero or one Australian Passport
Note: this is from the perspective of Australian citizens holding standard Australian Passports
25
UML Data Modelling – Basic Concepts
26
Multiplicity
One-to-Many (1:m)
Many elements of one object are related to at most one of
the other object 0..1 → 1..*
Has →
1..1 0..*
Course
courseID {PK}
courseName …
Tutorial
classNo {PK}
Location
startTime …
A given Tutorial can only belong to one course
A course may have 0 or more Tutorials
26
UML Data Modelling – Basic Concepts
27
Multiplicity Many-to-Many (m:n)
0..* → 0..*
5..30
Attends →
Student
studentID {PK}
studentName …
0..*
Tutorial
classNo {PK}
Location
startTime …
A given Tutorial can must have between 5 and 30 students
A student can attend 0 or more tutorials
27
UML Data Modelling – Basic Concepts
28
Multiplicity
Specifies the number of possible occurrences of an entity type that
may relate to a single occurrence of an associated entity type
through a given relationship.
Multiplicities represent business rules established by a user or
company
They do not necessarily modify the database design
They are generally implemented at the application level/user
interface
28
UML Data Modelling – Basic Concepts
29
Multiplicity
Participation identifies whether all or only some objects participate
in a relationship Is it mandatory?
A course must have at least 10 students – every course participates
A student may not enrol in any course – does not participate in “enrols” Cardinality indicates the maximum number of possible relationship
occurrences for an entity participating in a the relationship
how many times did it take place?
A course can have many students.
A student maximally enrols in 5 courses.
Cardinality
10..* enrols in → 0..5
Participation
Courses
Students
29
UML Data Modelling – Basic Concepts
30
Association Classes Allows attributes to be
included with the association
relationship
E.g. what if with the enrolment
we want to record the date commenced and mark they received at the end of the course?
The association class describes the properties of each association occurrence
1..*
Student
studentID {PK} emailID {CK} studentName
Course
courseID {PK} courseName
1..*
Enrolments
dateCommenced mark
30
UML Data Modelling – Basic Concepts
Association Classes vs 1 : many and many :1 1..* Enrols in→
1..*
1**1
31
Courses
Students
studentID {PK} emailID {CK} studentName
courseID {PK} courseName
Enrolment
enrolmentID (PK) dateCommenced mark
Captures that there WAS at least ONE interaction
Students
studentID {PK} emailID {CK} studentName
Enrolments
enrolmentID {PK} sutdentID (FK) courseID (FK) dateCommenced mark
Courses
Captures EVERY interaction
courseID {PK} courseName
Executable UML
31
UML Data Modelling – Basic Concepts
Recursive Relationships/Self Associations:
A Class that associates with itself
The same entity type can participate more than once in
different roles
Role names should be used in a recursive relationship type to
distinguish between each of these roles
Supervisor Supervise
1..*
32
1..*: An Employee can supervise 1 or more other Employees (called Supervisees)
0..1: An Employee may or may not be supervised by another Employee (called a Supervisor)
Employees
0..1
id {PK} name
Supervisee
32
UML Enhancements
Things to make UML that bit Better harder
33
33
Aggregation and Composition
Aggregation represents a ‘has-a’ or ‘is-part-of’ association between two entity types
Conceptual notion that distinguishes a ‘whole’ from its ‘parts’
Represented with an unfilled diamond at the ‘whole’ end
For example, a program is an aggregation of courses
34
Deleting a program does not delete the courses!
A course is part of more than one program
1..*
*
Courses
Staff
Programs
0..1 OR 0..*
Office
TeaRoom
Composition: Newspaper is composed of Articles.
34
Aggregation and Composition
35
Composition is a stronger notion
The lifetime of the parts are bound up with the whole
Represented with a filled diamond at the ‘whole’ end
For example, a foot / course is composed of many toes / tutorials
Deleting the foot / course takes the toes / tutorials with it!
Toe * Tutorial *
1
Foot
Course
If in doubt stick to multiplicities
• 1..1 | 1 = mandatory (composition)
• 0..1 |* = optional (aggregation)
Composition: Newspaper is composed of Articles.
35
UML Data Modelling – Basic Concepts
Strong Entity Types
An Entity that is not existence-dependent on some other
entity type
e.g. Student, Building, Competition
All previous examples except maybe association classes
1..*
< Has
36
RoundNo
Fights
Winner
Rounds
FightNo (PK) FightDate Winner
The PK of Rounds is RoundNo, FightNo
36
UML Data Modelling – Basic Concepts
Strong Entity Types
An Entity that is not existence-dependent on some other entity type e.g. Student, Building, Competition
All previous examples except maybe association classes
Weak Entity Type
Entity type that is existence-dependent on some other entity type An instance cannot be uniquely identified by its attributes alone
e.g. Week 4 lecture - what subject, which semester, which year? Student Assignments - what subject, which semester?
37
Competition Rounds, building rooms
1..*
< Has
RoundNo
Fights
Winner
Rounds
FightNo (PK) FightDate Winner
The PK of Rounds is RoundNo, FightNo
37
UML Data Modelling – Basic Concepts
Inheritance and Sub Classes - Generalisations
Students, Staff and Contractors are all People within our university
database system
They are Sub Classes of People “is A” relationship
They share common attributes PersonID
PersonName
Sub Classes may or may not have any attributes of their own
38
People
personID {PK}
personName
Staff
position
Students
Contractors
gpa
licenseNo
38
39
Superclass / Subclass
Superclass
An entity type that includes one or more distinct subgroups
of its occurrences. e.g., Person (name)
Subclass
A subgroup of occurrences of an entity type
e.g. Student and Staff are two subclasses of Person This is called an inheritance hierarchy
also called a Generalisation/specialisation structure
3
Superclass / Subclass
Attribute Inheritance
Subclasses inherit attributes from their
Superclasses
A Student has attributes
GPA (defined in the Student class) AND PersonName, PersonID(inherited from the superclass
Person)
An instance of a subclass is also an instance
of the superclass
if “John” is an instance of Student, he is also a Person
4
People
personID {PK}
personName
Staff
position
Students
gpa
Contra
{optional}
Treckies
Forcies
starTrekCharacter
starWarsCharac
40
Specialisation
Generalisation
Superclass / Subclass
Multiple Inheritance Specialisation
Process of identifying distinguishing characteristics of subclasses of a class.
John as a student is a more specific example of a person
Generalisation
Process of identifying common
characteristics of subclasses for a superclass.
4
People
personID {PK}
personName
Staff
position
Students
gpa
Contra
{optional}
Treckies
Forcies
starTrekCharacter
starWarsCharac
41
Specialisation
Generalisation
Constraints on Specialization / Generalization
A subclass member is always a member of the superclass Optional Participation (default if not specified)
42
{optional}
People
personID {PK}
personName
Some people are not students, staff or contractors
A person doesn’t need to be a member of any subclass Mandatory Participation (must specify)
A Person must be a staff member or a student Disjoint (OR)
{mandatory}
Staff
position
Every person is either one of a staff member, student or contractor Non-Disjoint (AND) - overlapping
Every person is represented by at least two of the sub-classes
A person may be a staff member and a student or a student and a contractor. . .
Students
Contractors
gpa
42
UML Modelling
Potential Issues
43
43
44
Problems with ER Models
Problems called connection traps may arise when designing a conceptual data model.
Due to the inability to interpret the meaning of certain relationships.
Two main types of connection traps are called Fan trap
Chasm trap
4
UML Data Modelling – Fan Trap
45
Many: Many Incorrect Where a model represents
a relationship between
entities, but the pathway
between specific entity
occurrences is ambiguous
A Fan trap can occur if entities are related in the wrong order
Can you tell which tutorial class Peter enrolled in?
0..* 0..*
Enrols In →
1..1 0..*
Has→
DB Fundamentals Programming Fund.. Networking Fund..
55034 55036 55038 40014 40024 40034
23004 23005
Solution 1
Relate Student directly back to Tutorial
Enrols In → Has →
Students
StudentID StudentName ...
Courses
CourseID CourseName ...
Tutorials
ClassNo Location StartTime
Which Tutorial did Peter enrol in?
Peter
45
UML Data Modelling – Fan Trap
46
Many: Many Correct Specific
5..30
→ General
0..*
Belongs To →
Students
StudentID StudentName ...
...
Indicator of a potential Fan Trap: 1 → many → many
ClassNo Location StartTime
0..* 1..1 CourseID CourseName
...
Courses
Solution 2
Ensure relations transition from most specific relationship to more general relationships correctly
Tutorials
46
47
Example of a Fan Trap
Fan Trap 2
many → 1 ← many
● Solution: swap relationships to 1 → many and 1 → many
4
© Pearson Education Limited 1995, 2005
© Pearson Education Limited 1995, 2005
Which Branch does each st member work at?
a
UML Data Modelling – Chasm Trap
48
The Chasm trap occurs when two or more “many to one” relationships converge on a single entity
The following diagram suggests the existence of a relationship between entity types (Staff and Properties for rent), but no pathway exists between certain entity occurrences
Not all properties for rent are overseen by staff (0..1)
Which Branch is offering the Property for rent?
Adding the Offers relationship resolves the chasm trap
© Pearson Education Limited 1995, 2005
48
49
Database Fundamentals
UML Translation – Next Week!
49