1. Introduction to Databases
1. Introduction to Databases
We will study:
Basic terminology
Database
Database management system Database system
2 Data models
3 Three-level ANSI/SPARC architecture
External, logical and physical schemas Data independence
4 Database management (assumptions, languages, roles, challenges)
5 Historical remarks
6 From math to relational databases
Set and set operations
1
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 1
1. Introduction to Databases
Why Databases?
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 2
1. Introduction to Databases
Role of Databases?
Have you designed a database system?
Have you worked with a database system?
Can you think of role of databases for the following?
Crisis management
earthquakes, tsunamis, floods, droughts, fires, stampedes, terror attacks.
Business management
universities, libraries, supermarkets, financial institutions (e.g. banks), travel agents, factories, sports, statistical organizations (e.g. census), your affairs or business.
Others
criminal investigation, navigation, people and products identification.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 3
1. Introduction to Databases
Some Examples of Databases
A database is a (possibly huge) collection of related data. Implicit properties:
represents some aspects of the real world;
a logically coherent collection of data with inherent meaning; designed and built for a specific purpose.
Examples:
Amazon: – It has 244 million active customers, over 60 million items occupying many terabytes of data (clothing, sports, videos, office products).
YouTube: – Over 1.3 billion users, over 100 million videos watched per day, 300 hours of videos added every minute, average of one billion mobile YouTube views per day
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 4
1. Introduction to Databases
Large database organisations in Australia
Example: Department of Human Services
– It has hundreds of databases and is responsible for the development of service delivery policy and provide access to social, health and other payments and services.
Example: Australian Taxation Department
– ATO is the federal government tax and superannuation administration,
known for contemporary service, expertise and integrity.
Example: Australian Defence
– Australian Defence Force is responsible for defending Australia and its
national interests.
Example: Australian Bureau of Statistics
– The ABS is Australia’s national statistical agency, providing trusted official statistics on a wide range of economic, social, population and environmental matters of importance to Australia.” .
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 5
1. Introduction to Databases
What is a Database Management System?
A database management system (DBMS) is a collection of programs that enables users to create and maintain a database.
It is a general-purpose software system that facilitates the process of defining: specifying data types, structures and constraints;
constructing: storing data on some storage medium; manipulating: retrieving and manipulating data;
sharing: using data by multiple users/programs simultaneously.
Well-known relational DBMSs include Oracle, IBM DB2, Microsoft’s SQL Server, MySQL, postgreSQL, etc.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 6
1. Introduction to Databases
What is a Database System?
A database system is part of information systems dealing with data retrieval and manipulation.
It often refers to a DBMS plus a database.
Database system
Main services a database system provides:
answer queries efficiently; execute updates efficiently.
DBMS
Database
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 7
1. Introduction to Databases
Advantages of Using a Database System
Controlled data redundancy: Data redundancy is controlled to ensure consistency and save the storage space.
Sharing data: The data stored in the database can be shared among multiple users or application programs.
Enforcing data integrity: Some integrity constraints identified by a database designer can be enforced automatically by the DBMS.
Improved data security: Since the data is managed centrally, the DBMS ensures that access to the database is through an authorized channel.
In addition to the above, the DBS also facilitates the following:
Data storage, retrieval and modification; transaction support; concurrency, backup and recovery services; data independence; etc.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 8
1. Introduction to Databases
Data Models
A data model is a collection of concepts for describing data.
In any data model, it is important to distinguish between the description of
the database and the database itself:
A database state refers to the mutable content of a database (in fact, it is a database);
ENROL
StudentID
CourseNo
Semester
Status
EnrolDate
456
COMP2400
2012 S2
active
25/05/2012
458
COMP1130
2012 S1
active
20/02/2012
459
COMP2400
2012 S2
active
11/06/2012
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 9
1. Introduction to Databases
Data Models
A database schema refers to the description of allowed database states.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 10
1. Introduction to Databases
Characteristics of File Systems
In file systems
Each user defines the files needed for a specific software application
Definition of data is embedded in application programs, rather than being stored separately and independently
No control over access and manipulation of data beyond that imposed by application programs
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 11
1. Introduction to Databases
Characteristics of Database Systems
In database systems
Single repository maintains data that is defined once and then
accessed by various users
Insulation between programs and data, and data abstraction Structure of data files is stored in DBMS catalog separately
Support of multiple views of the data
Sharing of data and multiuser transaction processing Enhanced Security
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 12
1. Introduction to Databases
1. Introduction to Databases
Three-level ANSI/SPARC Architecture
Three-level ANSI/SPARC Architecture
External Schema
perspective of the user / application
describes restructured parts of the database used in applications
Conceptual or Logical Schema
perspective of a community of users
describes what data is stored in the database and relationships among data (independent from their physical storage structures).
Internal Schema
perspective of the implementation / system realization
describes how data is stored in the database (i.e., physical storage structures and indexes).
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 14
1. Introduction to Databases
Derived Principles – Data Independence
Logical data independence: change the conceptual/logical schemas without having to change external schemas or application programs
Example: If adding or removing entities, external schemas that refer only to the remaining data should not be affected.
Physical data independence: change the internal schemas without having to change the conceptual/logical schemas
Example: If physical files were reorganised, we should not have to change the conceptual/logical schemas.
External level (e.g., views)
Conceptual/logical level
(e.g., entities, relationships,
relations, constraints)
Internal level
(e.g., physical files, blocks,
pointers)
Users
…
…
Note: schemas at the three levels are descriptions of data; the stored data actually exists at the internal level (i.e., physical level) only.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 13
1. Introduction to Databases
External schema (views)
Example
ENROLLEDSTUDENT COURSESTATISTICS
CourseNo StudentID CourseNo
Conceptual or logical schema (relations)
NumberOfEnrollments
COURSE No Cname
StudentID
Internal schema
Unit CourseNo
StudentID
ENROL Semester
STUDENT Name
Status
DoB Email EnrolDate
CREATE TABLE Course (No INT, …) TABLESPACE users
STORAGE (initial 1M
next 512K
pctincrease 0
minextents 2
maxextents unlimited)
USING INDEX (…)
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016
15
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 16
1. Introduction to Databases
Derived Principles – Data Independence
Key idea: When the schema is changed at some level,
the schema at the next higher level remains unchanged; only the mapping between two levels is changed.
External level (e.g., views)
Conceptual/logical level (e.g., entities, relationships,
relations, constraints)
Internal level
(e.g., physical files, blocks, pointers)
Users
…
Logical
data independence
Physical data independence
…
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 17
1. Introduction to Databases
Database Management
Fundamental assumptions:
Data is accessed and manipulated by multiple users/programs; Data is persistent;
Data is dynamic, which may change over time;
The amount of data in databases can be huge.
We need database languages to manage databases:
A query language is used to access and retrieve data in the database.
A data manipulation language is used to update the data in the database (e.g., insert, delete, modify).
A data definition language is used to define the database schemas.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 18
1. Introduction to Databases
Roles in Database Management
Data administrators are the custodian of the data in the entire organization. Database administrators (DBAs) are responsible for managing resources
in a database environment:
authorizing access to the database (security);
the decision on the physical schema and storage structures; physical optimisation and tuning (system response time).
Database designers are responsible for
the design of conceptual/logical and external schemata
specification of interfaces to application programs (queries, transactions)
liaison with current or potential users
Database administrators ‘own’ the DBMS, while database designers ‘own’ the database.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 19
1. Introduction to Databases
Historical Remarks I/II
Hierarchical Databases
Oldest data model (1960s);
Provides language constructs to manipulate database instances; Physical data independence is marginal (not intended).
Network Databases
Simple extension of hierarchical databases (late 1960s); Still no physical data independence (not intended);
Relational Databases
Beginning / mid of the 1970s in tight connection with the prototype DBMS System R;
First data model which abstracts from physical storage (data independence);
Nevertheless efficient storage and retrieval.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 20
1. Introduction to Databases
Historical Remarks II/II
Object-Oriented Databases
Driven by object-oriented programming languages (1980s); Designed to store and share complex, structured objects;
Many concepts were incorporated into relational databases – object-relational databases.
XML Databases
XML is emerged as the standard for Web data exchange (1990s);
Contain semi-structured data in which markup tags are used to specify the logical structure.
NoSQL Databases
Recent development in industry (since 2009);
Refer to a broad class of non-relational databases;
We will discuss Google’s bigtable at the end of this course.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 21
1. Introduction to Databases
Set Notation
We need set notation to represent formal definitions in this course. A set is several things considered together as one thing.
There are two ways of specifying a set.
1 {x1,…,xn} (i.e., list all the members in a set)
{Sydney , Toulouse, London}
{}, i.e., the empty set.
{{1}, {1, 2}, {1, 2, 3}, …}, i.e., a set of sets.
2 {x|φ} (i.e., describe the members that satisfy a property φ)
{wheels — all the wheels on Greg’s van };
{students — all the students currently enrolled in Comp2400 }; {x|x ∈ N}, i.e., the set of natural numbers.
Comp2400/Comp6240 – Relational Databases, Semester 2, 2016 22
1. Introduction to Databases
The Essence of Sets
If two sets have the same members, they are the same set {x|x ∈ N, x < 6} = {1,2,3,4,5}
If one set contains something that is not in the other set, then they are different.
e.g., { ..., bowls, ... } ≠ { extreme sports }
The members in a set have no order. e.g., {1, 2} = {2, 1}
Each member can not be in the set more than once. e.g., {1, 1} = {1}
Comp2400/Comp6240 - Relational Databases, Semester 2, 2016 23
1. Introduction to Databases
Set Operations
Membership: x ∈Aifx isinthesetA;x ̸∈Aifx isnotinthesetA. Equality: A and B are equal if they have the same members we write
A = B.
Subset: if every member of A is in B we write A ⊆ B; A is called a proper
subset of B if A ⊆ B and A and B are not equal.
Union: A ∪ B for the set containing everything in A and everything in B.
{3, 4, 5} ∪ {3, 5, 7, 9} = {3, 4, 5, 7, 9}.
Intersection: A ∩ B for the set of things that are in both A and B
{3, 4, 5} ∩ {3, 5, 7, 9} = {3, 5}.
Difference: A − B is the elements from A that are not in B.
{3, 4, 5} − {3, 5, 7, 9} = {4}.
Comp2400/Comp6240 - Relational Databases, Semester 2, 2016 24
1. Introduction to Databases
Set Operations – Exercise
Let A = {1, 2, 3} and B = {true, false}. Which of the following are correct?
1 {2}⊆A
2 2∈B
3 {2,3}∈(A∪B)
4 2∈(A∩B)
5 2∈(A−{1,3})
6 {1,2,3}⊆{oddintegers}
7 {1,2,3}∩{oddintegers}={3,1}
Comp2400/Comp6240 - Relational Databases, Semester 2, 2016 25
1. Introduction to Databases
Set Operations – Exercise
Let A = {1, 2, 3} and B = {true, false}. Which of the following are correct?
1 {2}⊆A
2 2∈B
3 {2,3}∈(A∪B)
4 2∈(A∩B)
5 2∈(A−{1,3})
6 {1,2,3}⊆{oddintegers}
7 {1,2,3}∩{oddintegers}={3,1}
Answer: 1,5,7
Comp2400/Comp6240 - Relational Databases, Semester 2, 2016 26
1. Introduction to Databases
Ordered Pairs
An ordered pair is two things in order: a first thing and a second thing. (15, ‘apple”)
(Melbourne, true)
Two ordered pairs are equal if they have the same thing in the first place,
and the same thing in the second place.
(1, 2) ̸= (2, 1) (i.e., the order does matter!)
The same thing can be in a pair twice. (1, 1) is an ordered pair
Comp2400/Comp6240 - Relational Databases, Semester 2, 2016 27
1. Introduction to Databases
Tuples
Pairs are 2-tuples, but we can also have n-tuples for a positive integer n. (3.141, Darwin, sardine, wallpaper ) is a 4-tuple.
A n-tuple is an ordered list of n elements.
The positions can have names as well as (or instead of) numbers.
first = productID, second = description (15, “apple”)
first = city, second = hasBeaches (Melbourne, true)
The same element can be in a n-tuple more than once. (a, a, a, a, a) is a 5-tuple.
Comp2400/Comp6240 - Relational Databases, Semester 2, 2016 28
1. Introduction to Databases
Cartesian Product of Sets
The Cartesian product operation takes an ordered list of sets, and returns a set of tuples.
Cartesian product D1 × ... × Dn is the set of all possible combinations of values from the sets D1 , ..., Dn .
It is all the tuples with the first thing from the first set, the second thing from the second set, ...
For ordered pairs, A × B = {(a, b) where a ∈ A and b ∈ B}.
{Sydney , Paris} × {2, 4, 11} = {(Sydney , 2), (Sydney , 4), (Sydney , 11), (Paris, 2), (Paris, 4), (Paris, 11)}
Comp2400/Comp6240 - Relational Databases, Semester 2, 2016 29
1. Introduction to Databases
Relations
A relation is a subset of a Cartesian product of sets. Example
Let Z = {..., −1, 0, 1, 2, ...}, the set of all integers, and LetR={(x,y)| x,y∈Zandx