COMP9311
DATABASE SYSTEMS
• Xuemin L in
• Schoo l o f Compu te r Sc ience and Eng inee r i ng
• Off i ce : K17 -503
• E-ma i l : l xue@cse .unsw.edu .au
• Ex t : 6493
• h t tp : / /www.cse .unsw.edu .au /~ l xue
• www home address o f 9311 :
• h t tp : / /www.cse .unsw.edu .au /~cs9311
1
Course Information
Lectures:
17:00 – 19:00 (Mon) week 1 – 9
16:00 – 18:00 (Tue) week 1 – 9
9 weeks lectures in total.
Lab: Held online through Moodle(https://moodle.telt.unsw.edu.au/ ),
in“Lectures and Recordings” of course (COMP9311 – Database Systems
2021 T1).
week 2 – 5, 7 – 10 (week 11 Mon for Easter Monday makup)
Consultation: TBA
Q&A Forum: https://groups.google.com/group/comp9311-21t1
Course Email: comp9311unsw@gmail.coms
For routine questions, we recommend you use the Q&A forums. You are also
welcome to contact us via course email if something is private.
2
https://moodle.telt.unsw.edu.au/
https://groups.google.com/group/comp9311-21t1
Course Information(cont)
2 assignments, 1 project. All individual work !
Assignments (50%):
◦ Ass 1: Data Modelling + Relational Algebra (20%) (week 2-4)
◦ Ass 2: DB Design Theory + Database Storage Structures + Transaction (30%) (week 7-9)
Projects (50%)
◦ Proj 1: SQL & PLpgSQL (50%) (week 4-7 for SQL, week 8 for PLpgSQL)
Penalty for late submissions:
Assignments: 0 mark for late submissions
Project: 10% reduction for the 1st day, then 30% reduction per day.
3
Course Information(cont)
Exam: 100%
◦ If you are ill on the day of the exam, do not attend the exam.
◦ I will not accept medical special consideration claims from people who have
already attempted the exam.
Final Mark by Geometric Mean:
◦ Final mark = 𝑎𝑠𝑠1 + 𝑎𝑠𝑠2 + 𝑝𝑟𝑜𝑗1 ∗ 𝑒𝑥𝑎𝑚
4
Course Information(cont)
Text Book:
◦ Elmasri & Navathe, Fundamentals of Database Systems,
Benjamin/Cummings, 6th Edition, 2010.
Reference Books:
◦ J. D. Ullman & J. Widom, A First Course in Database Systems, Prentice Hall,
1997.
◦ R. Ramakrishan, Database Management Systems, McGRAW-HILL, 1997.
◦ D. Maier, The Theory of Relational Databases, Computer Science Press,
1983.
5
Course Outline
Time Monday Tuesday
Week 1 Subject Introduction, Conceptual DB
Design (ER)
Conceptual DB Design (continue),
Relational Data Model
Week 2 Relational Data Model(continue),
Relational Algebra
SQL
Week 3 SQL(continue), PLpgSQL Functional Dependencies
Week 4 Functional Dependencies (continue),
Normal Forms
Normal Forms (continue)
Week 5 Relational DB design Relational DB design (continue)
Week 6 Disks, Files Index
Week 7 Transaction Management Transaction Management (continue)
Week 8 Graph Data and Graph Database Graph Pattern Matching
Week 9 Towards Big Graph Processing:
applications and challenges
Revisions
6
Introduction
Database Applications:
◦ Banking System,
◦ Stock Market,
◦ Transportation,
◦ Social Network,
◦ Marine Data Analysis,
◦ Criminal Analysis and Control,
◦ Now, BIG DATA….
7
Intelligent Transportation
Public Health Modern Military Tourism Development
Business Services Natural Disasters
8
Introduction
Introduction
9
Big
Data
Volume
• Petabytes
• Records
• Transactions
Velocity
• Batch
• Real time
• Streaming
Variety
• Structured
• Unstructured
• Semi-
structured
10
New Graph Analytics Models
New Processing Algorithms & Indexing
Techniques
New Computing Platform/Architecture
Major Research Issues
Graph Processing System
§ Primitive operators
§ Query language
§ Distributed techniques
§ Storage
§ etc
11
Introduction(cont)
12
Develop a good database
system:
Effectively organize data
(database design).
Efficiently execute users
queries (transaction
management).
These are even more
important in modern
applications, e.g. internet:
Huge unstructured
information is available in
the internet.
Must access the information
efficiently and effectively
What is data?
Data – (Elmasri/Navathe):
◦ known facts that can be recorded and
have explicit meaning . . .
Example – a student records
database:
Contents – Information identifying
students, courses they are enrolled
in, results from past courses . . .
13
Item Type of
data
Stored as
Family
name
String Character
strings?
Birthdate Date 3 integers?
Weight Real
number
Floating point
number?
…
What is a database?
Elmasri/Navathe:
◦ . . . a collection of related data . . .
Data items alone are relatively useless.
We need the data to have some structure.
Database can be manipulated by a database management system.
14
What is a database management system (DBMS)?
Elmasri/Navathe:
◦ DBMS: . . . a collection of programs that enables users to create and maintain
a database . . .
◦ Database system: . . . The database and DBMS together . . .
15
Database requirements
Database system provides facilities to:
◦ Define a database – specifying the data items to be stored and their types,
◦ Construct a database – loading the data items and storing them on some
storage medium (usually disk),
◦ Manipulate a database
◦ querying – i.e. retrieving relevant data,
◦ updating – i.e. adding, deleting or modifying data items:
◦ from one “correct” state to another “correct” state,
◦ reporting
16
Database requirements(cont)
Database system must be
◦ Timely – e.g. an airline database (fast response), a CAD system (must be interactive),
◦ Multi-user – e.g. trading system,
◦ Modifiable – must be able to be extended or reorganised, e.g. to cope with new laws,
requirements, business conditions,
◦ Secure – different classes of users may need different levels of access,
◦ No redundancy,
◦ Robust – e.g. power failure during an update – must be able to recover to a consistent
state.
17
Database requirements(cont)
A database system must address these issues and provide
solutions – DBMS:
◦ a special purpose DBMS,
◦ a general DBMS.
The DBMS solution vs meta-data
To allow a general DBMS to be applied to a particular
database application, we need meta-data.
18
Database requirements(cont)
Meta-data: a definition and description of the stored database, such as
structure of each file, type and storage format of each data item,
constraints etc.
Stored in the system catalog.
19
Benefits of meta-data
program-data independence – DBMS access programs may be written
independent of file structures and storage formats,
data abstraction – information hiding.
◦ Users are provided with a conceptual representation of the data using a high level
data model.
support for views – different users can have different views of the database. e.g.
◦ salary details may be hidden from some users,
◦ statistical summaries may be derived and appear as stored data for some users.
20
Database personnel
Database Administrator(DBA) – This person is responsible for the
centralised control of the database:
◦ authorising access
◦ monitoring usage,
◦ recovery,
◦ identifying the data,
◦ choosing appropriate structures to represent and store the data,
◦ managing definitions of views . . .
21
Database personnel(cont)
End user – People requiring access to the database for querying,
updating, reporting etc.
◦ Naive (parametric) user – typically use the database via “canned transactions”
– standardised queries and updates, often through a menu system of some
kind,
◦ Online user – has an understanding of the database system. May be capable of
designing their own queries etc.
22
Database personnel(cont)
Systems analyst:
◦ determine end users requirements,
◦ develop specifications for canned transactions and reports,
◦ may also take part in database design.
Application programmer – Implements the specifications given by
analyst:
◦ tests,
◦ debugs,
◦ maintains the resulting programs.
23
DBMS concepts
Data model: a set of concepts that is used to describe the allowed
structure of a database. i.e. the structure of the meta-data.
May be classified as:
◦ High-level or conceptual (e.g. ER model – concerns entities, attributes and
relationships)
◦ Implementation or record-based (e.g. Relational, Network, Hierarchical –
suggests a physical implementation)
◦ Low-level or physical (concerns record formats, access paths etc)
24
DBMS concepts(cont)
Database Schema: An instance of a data model, that is, a description of the structure
of a particular database in the formalism of the data model. (Intention)
Database Instance (or State): The data in the database at a particular time.
(Extension)
In these terms:
◦ We define a database by specifying its schema.
◦ The state is then an empty instance of the schema.
◦ To create the initial instance we load in data.
◦ After this, each change in state is an update.
25
ANSI-SPARC three level architecture
ANSI: American National Standard Institute.
SPARC: Standards Planning and Requirements Committee.
ANSI-SPARC three level architecture (1975-1977):
◦ The external or view level includes a number of external schemas or user
views.
◦ The conceptual level has a conceptual schema, which describes the structure
of the whole database for a community of users.
◦ The internal level has an internal schema, which describes the physical
storage structure of the database.
26
27
EXTERNAL VIEW
1
EXTERNAL VIEW
1
EXTERNAL VIEW
1
CONCEPTUAL
VIEW
INTERNAL VIEW
EXTERNAL_TO_CONCEPTUAL
MAPPINGS
CONCEPTUAL_TO_INTERNAL
MAPPINGS
ANSI-SPARC three level architecture(cont)
3 levels of abstraction => 2 levels of data independence:
◦ logical data independence: the ability to change the conceptual schema
without changing external views. Must change the external-to-conceptual
mapping though.
◦ physical data independence: the ability to change physical storage paths and
access structures without changing the conceptual view. Must change the
conceptual-to-internal mapping though.
28
Database languages
In the three level architecture:
◦ Data definition language (DDL): used to define the conceptual schema.
◦ View definition language (VDL): used to define external schemas.
◦ Storage definition language (SDL): used to define the internal schemas.
In DBMS where conceptual and internal levels are mixed up, DDL is
used to define both schemas.
29
Database languages(cont)
Data manipulation language (DML): used to construct retrieval requests
(queries) and update requests:
◦ Low-level or procedural
◦ embedded in a general purpose language,
◦ record at a time
◦ High-level or non-procedural
◦ interactive and/or embedded
◦ set at a time/ set oriented.
In most current DBMSs, a comprehensive integrated language is used; for
example SQL.
30
Database components
See Fig2.3 in Elmasri/Navathe.
Run-time database processor – Receives retrieval and update requests and carries them out with
the help of the stored data manager.
Stored data manager or file manager – Controls access to the DBMS information stored on disk:
◦ may use the OS for disk access,
◦ controls other aspects of data transfer, such as handling buffers.
Pre-compiler – Extracts DML commands from the host language program.
◦ These are compiled by the DML compiler, the rest is compiled by the host language compiler, then they are
linked to produce executable code with calls to the data manager.
Query processor (or Complier) – Parses high-level queries and converts them into calls to be
executed by the data manager.
31
32
Component modules of a DBMS and their interactions.