PowerPoint Presentation
Introduction to Database
Jianjun Chen
Thanks, Matt
Contents
Database introduction 不必掌握
Relational model
Relational Keys
What is Data?
Example 1: An array that stores some numbers, which can be retrieved sometime later.
private int workloadWeekly[] = {2, 3, 5, 2, 1, 9};
public int getWorkload(String dayOfWeek) {
switch (dayOfWeek.toLowerCase()) {
case “monday”:
return workloadWeekly[0];
case “tuesday”:
return workloadWeekly[1];
…
}
}
public void increaseAllWorkload() {
…
}
What is Data?
Example 2: A piece of data inside CPU register
“-4(%rbp)” is where the variable x is stored
What is Data?
We can see that:
Data is only meaningful under its designed scenario.
In example 1: The array workloadWeekly is private, thus cannot be used outside of its class.
In example 2: The data stored in “-4(%rbp)” is invalid once the function returns.
Must have ways to create/modify data.
Must have ways to access data.
What is Database?
Database: “Organised collection of data. Structured, arranged for ease and speed of search and retrieval.”
Database Management System (DBMS):
Designed scenario: Software that is designed to enable users and programs to store, retrieve and update data from database.
Create/Modify/Access: A software must have a set of standard functions to be called DBMS.
We will learn about these functions soon!
But why do we need DBMS?
Adina (A) – 目的:designed scenario
Pre-DBMS Methods
Applications store data as files.
Each application uses its own format.
Other applications need to understand that specific format.
Leads to duplicated code and wasted effort.
Compatibility issues.
Application must know the file format
Application 1
Custom data files
*.docx
Application must know the file format
Application 2
Custom data files
*.odt
Pre-DBMS Methods
How about using a common data format?
Still need to write duplicated code for reading this file format.
Synchronisation issues.
Accessed simultaneously?
Very hard to coordinate operations from different apps.
Custom data file(s)
Application must know the file format
Application must know the file format
Application must know the file format
Application 1
Application 3
Application 2
DBMS Approach
Work as a delegate for this common collection of data.
Applications use a common API for accessing database.
A header file in C, plus the dynamic/static library file.
A Java Interface, plus a set of class files or a single “.jar” file
Application 1
Application must know the DBMS Interface
Application 3
Application 2
DB data file(s)
Database Management System (DBMS)
We will see
Adina (A) – API 应用程序接口
Application Programme Interface
Commonly Seen DBMS
Oracle
DB2
MySQL
Ingres
PostgreSQL
Microsoft SQL Server
MS Access
Databases: Applications
Example 1: Member cards of a chain store
Example 2: Banking service, account balance.
Phone No. Name Points
233333 Vincent 1000
233334 Matt 1231
Card ID Holder ID Name Balance
0933 1223 0001 4321 12360 Daryl XXXX -50
0963 1245 0291 0177 78799 Jessie XXXX 233333
DBMS Functions / Must Haves
Allow users to store, retrieve and update data
Ensure either that all the updates corresponding to a given action are made or that none of them is made (Atomicity)
Ensure that DB is updated correctly when multiple users are updating it concurrently
Recover the DB in the event it is damaged in any way
Ensure that only authorised users can access the DB
Be capable of integrating with other software
About this Module
Module convener’s email address:
Jianjun. .cn
Textbook:
Database Systems: a practical approach to design, implementation, and management – Connolly, Thomas M., Begg, Carolyn E.
Assessments:
A two-stage coursework (10% + 15%)
1 In-class exam at the end of this semester (15%)
One final exam.
The Relational Model
And the relational database management systems (RDBMS)
✬
The Relational Model
The relational model is one approach to managing data. Originally Introduced by E.F. Codd in his paper “A Relational Model of Data for Large Shared Databanks”, 1970.
An earlier model is called the navigational model导航模型.
RDBMS are based on the relational model.
The model uses a structure and language that is consistent with first-order predicate logic一阶谓词逻辑
Provides a declarative method for specifying data and queries
Details are covered in the Chapter 4 of the textbook.
Relational Model: Terminologies*名词
A relation is a mathematical concept. The physical form of a relation is a table with columns and rows. 有着行和列的表格
An attribute is a named column of a relation. 属性是一个被命名的列
A domain is the set of allowable values for attributes. 一个域是一系列容许值的集合
Age must be a positive integer.
Postcodes have length limit.
*A red title indicates that the knowledge is needed throughout the module
Terminologies 专业术语
Attribute
Domain
Tuple: a tuple is a row of a relation. (order of tuples does not matter)
一个元组是一行
The degree of a relation is the number of attributes it contains
relation的程度就是由它包含的属性的数量
Cardinality: the number of tuples in a relation.
基数就是元组的数量
Terminologies
Relation schema关系模式:
The definition of a relation, which contains the name and domain of each attribute.
Formally (See Chapter 4.2.3): “A named relation defined by a set of attribute and domain name pairs”
Relational database schema关系数据库模式:
A set of relation schemas, each with a distinct name
branchNO Character: size 4, range B001-B999
street Character: size 25
city Character: size 15
postcode Character: size 8
Alternative Terminologies替代的术语
Formal Terms Alternative #1 Alternative #2
Relation Table File
Tuple Row Record
Attribute Column Field
Table 4.1 in the textbook
ID Name Salary Department
M139 John Smith 18000 Marketing
M140 Mary Jones 22000 Marketing
A368 Jane Brown 22000 Accounts
P222 Mark Brown 24000 Personnel
A367 David Jones 20000 Accounts
Relation schema:
relation_name(ID: Char, Name: Char, Salary: Monetary, Department: Char)
Attributes are: ID, Name, Salary & Department
Tuples, e.g.
{
(ID, A368),
(Name, Jane Brown), (Salary, 22,000), (Department, Accounts) }
The cardinality of the relation is 5
The degree of the relation is 4
Relation: Properties 属性
A relation also has the following properties:
Its name is unique in the relational database scheme.
Each cell contains exactly one atomic value.
Each attribute of a relation must have a distinct name.
The values of an attribute are from the same domain.
No duplicate tuples.
The order of attributes has no significance.
The order of tuples has no significance.
顺序无意义!
Relational Keys
Super key, candidate key, primary key, foreign key
NULL
Data Integrity 数据完整性
In this section, you will be introduced to a very important part of the relational model called relational keys.
In relational databases, related information are often grouped together to form tables.
For example, in a company that has many branch offices:
The information related to staff is put together.
The information related to office is saved in another table.
Also, there will be a lot of referencing among tables.
For example, given an ID of a staff, you should be able to locate the address of his office.
Integrity Consideration 1
We want to minimise data redundancy within a relation.
The database should be able to check for any duplicate tuples before tuples are added or modified
This is enforced by something called Primary key.
Relational Keys: Super Key
Super key: one or more attributes that uniquely identifies a tuple within a relation.
{Passport ID}
{Passport ID, Name}
{Student ID}
{Student ID, Name}
{Passport ID, Student ID}
{Passport ID, Student ID, Name}
Passport ID Student ID Name
{Name} alone does not uniquely identify a tuple.
Because there are people with the same name.
Relational Keys: Candidate Key & Primary Key
Candidate key: a super key such that no proper subset is a super key within the relation
Every tuple has a unique value for that set of attributes: uniqueness
No proper subset of the set has the uniqueness property: minimality 同时满足!
Primary key: The candidate key that is selected to identify tuples uniquely within the relation.
Available choices:
{Passport ID}
{Passport ID, Name}
{Student ID}
{Student ID, Name}
{Passport ID, Student ID}
{Passport ID, Student ID, Name}
Passport ID Student ID Name
Adina (A) – ?
Choosing Candidate Keys
You can’t necessarily infer the candidate keys based solely on the data in your table
More often than not, an instance of a relation will only hold a small subset of all the possible values
E.g. Restaurants’ booking number might reset to 1 after a large number.
You must use knowledge of the real-world to help.
Queue No. A31
Table type: up to 4
31 People are waiting ahead of you.
A1, A2, A3… A99, A999 -> A1
Adina (A) – ??
Choosing Candidate Keys
What are the possible candidate keys of the following relation?
OfficeID Name Country Postcode Phone
O1001 Headquarters UK W1 1AA 0044 20 1545 3241
O1002 R&D Labs UK W1 1AA 0044 20 1545 4984
O1003 US West USA 94130 001 415 665981
O1004 US East USA 10201 001 212 448731
O1005 Telemarketing UK NE5 2GE 0044 1909 559862
O1006 Telemarketing USA 84754 001 385 994763
The candidate keys are : {OfficeID}, {Phone}, {Name, Postcode/Zip}, {Name, Country}
Note: Keys like {Name, Postcode/Zip, Phone} satisfy uniqueness, but not minimality
OfficeID Name Country Postcode Phone
O1001 Headquarters UK W1 1AA 0044 20 1545 3241
O1002 R&D Labs UK W1 1AA 0044 20 1545 4984
O1003 US West USA 94130 001 415 665981
O1004 US East USA 10201 001 212 448731
O1005 Telemarketing UK NE5 2GE 0044 1909 559862
O1006 Telemarketing USA 84754 001 385 994763
Integrity Consideration 2
It is also very common that tuples in one relation references data from another relation.
As a result, a database should provide such mechanism to ensure correct references.
This is enforced by something called foreign key
Relational Keys: Foreign Key
Foreign key:
One or more attributes within one relation that must match the candidate key of some (possibly the same) relation
Example: We want the values of the ‘branchNo’ in relation staff to be one of the ‘branchNo’ in relation Branch.
Relational Keys and Integrity Constraints 完整性约束
Primary Key enforces entity integrity
Foreign Key enforces referential integrity
Why they are important? Read here. You need more lectures to understand that, though.
A Small Challenge
Assume that the two tables on the right are stored in 2-dimensional arrays:
String[][] Branch
String[][] Staff
How would you find out the postcode of the branch where Julie Lee works in? (In Java code)
A Small Challenge
Now, write a function that allows the caller to find out the branch information of any Staff.
Doing so really helps understand future contents.
String find(staffID, branchAttributeName)
Extended Reading
“Database Systems: A Practical Approach …” by Connolly and Begg.
Go through Chapter 1 to Chapter 4 quickly.
Focus more on chapter 4.
Software Preparation For Labs
Prepare your personal laptop. Install the necessary software for this course:
MySQL Database or MariaDB.
A visual database administration tool like phpMyAdmin or MySQL workbench.
A tutorial will be available on ICE.
/docProps/thumbnail.jpeg