Databases: Introduction
Introduction Data
Databases: Introduction
P.J. McBrien
Imperial College London
P.J. McBrien (Imperial College London) Databases: Introduction 1 / 23
Introduction Data Models
Databases are Computer Stores of Data!
Tiny Bank Ltd Customer: McBrien, P.
Strand Branch Current Acc: 10000100
Sortcode: 55-66-67
Trans Amount Date
1000 2300.00 5/1/1999
1002 -223.45 8/1/1999
1006 10.23 15/1/1999
Tiny Bank Ltd Customer: McBrien, P.
Strand Branch Deposit Acc: 10000101
Sortcode: 55-66-67
Trans Amount Date
1001 4000.00 5/1/1999
1008 1230.00 15/1/1999
Tiny Bank Ltd Customer: Boyd, M.
Goodge St Branch Current Acc: 10000103
Sortcode: 55-66-34
Trans Amount Date
1005 145.50 12/1/1999
Tiny Bank Ltd Customer: Poulovassilis, A.
Wimbledon BranchCurrent Acc: 10000107
Sortcode: 55-66-56
Trans Amount Date
1004 -100.00 11/1/1999
1007 345.56 15/1/1999
Tiny Bank Ltd Customer: Poulovassilis, A.
Wimbledon BranchDeposit Acc: 10000119
Sortcode: 55-66-56
Trans Amount Date
1009 5600.00 18/1/1999
Tiny Bank Ltd Customer: Bailey, J.
Wimbledon BranchCurrent Acc: 10000125
Sortcode: 55-66-56
Trans Amount Date
No transactions this month
Deposit Rates
AccountRate
101 5.25
119 5.50
P.J. McBrien (Imperial College London) Databases: Introduction 2 / 23
Introduction Data Models
Relational Data Model
Relational Data Model
Roughly: storing data in tables
bank data
no sortcode bname cash type cname rate? mid amount tdate
100 67 Strand 34005.00 current McBrien, P. 1000 2300.00 1999-01-05
101 67 Strand 34005.00 deposit McBrien, P. 5.25 1001 4000.00 1999-01-05
100 67 Strand 34005.00 current McBrien, P. 1002 -223.45 1999-01-08
107 56 Wimbledon 84340.45 current Poulovassilis, A. 1004 -100.00 1999-01-11
103 34 Goodge St 6900.67 current Boyd, M. 1005 145.50 1999-01-12
100 67 Strand 34005.00 current McBrien, P. 1006 10.23 1999-01-15
107 56 Wimbledon 84340.45 current Poulovassilis, A. 1007 345.56 1999-01-15
101 67 Strand 34005.00 deposit McBrien, P. 5.25 1008 1230.00 1999-01-15
119 56 Wimbledon 84340.45 deposit Poulovassilis, A. 5.50 1009 5600.00 1999-01-18
P.J. McBrien (Imperial College London) Databases: Introduction 3 / 23
Introduction Data Models
Database Design: ER Modelling
branchbname
cash
sortcode
account
rate?
cname
type
no
movement
amount
tdate
mid
has
0:N
1:1
holds
1:1
0:N
P.J. McBrien (Imperial College London) Databases: Introduction 4 / 23
Introduction Data Models
Structured Data: Relational Model
branch
sortcode bname cash
56 ’Wimbledon’ 94340.45
34 ’Goodge St’ 8900.67
67 ’Strand’ 34005.00
movement
mid no amount tdate
1000 100 2300.00 5/1/1999
1001 101 4000.00 5/1/1999
1002 100 -223.45 8/1/1999
1004 107 -100.00 11/1/1999
1005 103 145.50 12/1/1999
1006 100 10.23 15/1/1999
1007 107 345.56 15/1/1999
1008 101 1230.00 15/1/1999
1009 119 5600.00 18/1/1999
account
no type cname rate? sortcode
100 ’current’ ’McBrien, P.’ NULL 67
101 ’deposit’ ’McBrien, P.’ 5.25 67
103 ’current’ ’Boyd, M.’ NULL 34
107 ’current’ ’Poulovassilis, A.’ NULL 56
119 ’deposit’ ’Poulovassilis, A.’ 5.50 56
125 ’current’ ’Bailey, J.’ NULL 56
key branch(sortcode)
key branch(bname)
key movement(mid)
key account(no)
movement(no)
fk
⇒ account(no)
account(sortcode)
fk
⇒ branch(sortcode)
P.J. McBrien (Imperial College London) Databases: Introduction 5 / 23
Introduction Data Models
Data Model: CSV
branch.csv
sortcode,bname,cash
56,”Wimbledon”,94340.45
34,”Goodge St”, 8900.67
67,”Strand”,34005.00
account.csv
no,type,cname,rate,sortcode
100,”current”,”McBrien, P.”,,67
101,”deposit”,”McBrien, P.”,5.25,67
103,”current”,”Boyd, M.”,,34
107,”current”,”Poulovassilis, A.”,,56
119,”deposit”,”Poulovassilis, A.”,5.50,56
125,”current”,”Bailey, J.”,,56
movement.csv
mid,no,amount,tdate
1000,100,2300.00,5/1/1999
1001,101,4000.00,5/1/1999
1002,100,-223.45,8/1/1999
1004,107,-100.00,11/1/1999
1005,103,145.50,12/1/1999
1006,100,10.23,15/1/1999
1007,107,345.56,15/1/1999
1008,101,1230.00,15/1/1999
1009,119,5600.00,18/1/1999
P.J. McBrien (Imperial College London) Databases: Introduction 6 / 23
Introduction Data Models
Semistructured Data: XML
〈bank〉
〈branch sortcode=“67” bname=“Strand” cash=“34005.00”〉
〈account no=“100” type=“current” cname=“McBrien, P.”〉
〈movement mid=“1000” amount=“2300.00” tdate=“5/1/1999”/〉
〈movement mid=“1002” amount=“-223.45” tdate=“8/1/1999”/〉
〈movement mid=“1006” amount=“10.23” tdate=“15/1/1999”/〉
〈/account〉
〈account no=“101” type=“deposit” cname=“McBrien, P.” rate=“5.25”〉
〈movement mid=“1001” amount=“4000.00” tdate=“5/1/1999”/〉
〈movement mid=“1008” amount=“1230.00” tdate=“15/1/1999”/〉
〈/account〉
〈/branch〉
〈/bank〉
P.J. McBrien (Imperial College London) Databases: Introduction 7 / 23
Introduction Data Models
SQL DDL: Implementation of the Relational Model
CREATE TABLE branch
( sortcode INTEGER NOT NULL,
bname VARCHAR(20) NOT NULL,
cash DECIMAL(10,2) NOT NULL,
CONSTRAINT branch pk PRIMARY KEY (sortcode)
)
CREATE UNIQUE INDEX branch bname idx
ON branch(bname)
CREATE TABLE account
( no INTEGER NOT NULL,
type CHAR(8) NOT NULL,
cname VARCHAR(20) NOT NULL,
rate DECIMAL(4,2) NULL,
sortcode INTEGER NOT NULL,
CONSTRAINT account pk
PRIMARY KEY (no),
CONSTRAINT account fk
FOREIGN KEY (sortcode) REFERENCES branch
)
CREATE INDEX account type idx ON account(type)
CREATE TABLE movement
( mid INTEGER NOT NULL,
no INTEGER NOT NULL,
amount DECIMAL(10,2) NOT NULL,
tdate DATETIME NOT NULL,
CONSTRAINT movement pk
PRIMARY KEY (mid),
CONSTRAINT movement fk
FOREIGN KEY (no) REFERENCES account
)
P.J. McBrien (Imperial College London) Databases: Introduction 8 / 23
Introduction Data Models
SQL DML: Implementation of the Relational Algebra
Basic SQL SELECT statements
SELECT no , cname , r a t e
FROM account
WHERE type=’ d e p o s i t ’
SQL Joins
SELECT bname , no , r a t e
FROM branch JOIN account USING ( so r t code )
WHERE type=’ d e p o s i t ’
Same as
SELECT bname , no , r a t e
FROM account JOIN branch ON branch . s o r t c ode=account . s o r t c ode
WHERE type=’ d e p o s i t ’
Same as
SELECT bname , no , r a t e
FROM account , branch
WHERE branch . s o r t c ode=account . s o r t c ode
AND type=’ d e p o s i t ’
P.J. McBrien (Imperial College London) Databases: Introduction 9 / 23
Introduction Data Models
RDBMS Products
Product SQL Language Company
DB2 SQL PL IBM
Oracle PL/SQL Oracle
Sybase Transact-SQL SAP
SQLServer Transact-SQL Microsoft
PostgreSQL PL/pgSQL Open Source
MySQL MySQL Open Source (Oracle)
All partially implement ANSI SQL
P.J. McBrien (Imperial College London) Databases: Introduction 10 / 23
Transactions
Transactions
BEGIN TRANSACTION
UPDATE branch
SET cash=cash-10000.00
WHERE sortcode=56
UPDATE branch
SET cash=cash+10000.00
WHERE sortcode=34
COMMIT TRANSACTION
database management systems
(DBMS) implements indivisible tasks
called transactions
The ACID Properties
Atomicity all or nothing
Consistency consistent before → consistent after
Isolation independent of any other transaction
Durability completed transaction are durable
P.J. McBrien (Imperial College London) Databases: Introduction 11 / 23
Transactions
Transaction Properties: Atomicity
BEGIN TRANSACTION
UPDATE branch
SET cash=cash −10000.00
WHERE so r t c od e=56
CRASH
Failure to maintain Atomicity
Suppose that the system crashes half way through processing a cash transfer, and the
first part of the transfer has been written to disc
The database on disc is left in an inconsistent state: the sum of cash should be
£137,246.12 but only £127,246.12 recorded
A DBMS implementing Atomicity of transactions would on restart undo the
change to branch 56
P.J. McBrien (Imperial College London) Databases: Introduction 12 / 23
Transactions
Transaction Properties: Consistency
BEGIN TRANSACTION
DELETE FROM branch
WHERE so r t c od e=56
INSERT INTO account
VALUES (100 , ’ Smith , J ’ , ’ d e p o s i t ’ , 5 . 0 0 , 34 )
END TRANSACTION
Failure to maintain Consistency
Suppose that a user deletes branch with sortcode 56, and inserts a desposit account
number 100 for John Smith at branch sortcode 34
The database is left in an inconsistent state for two reasons
it has three accounts recorded for a branch that appears not to exist, and
it has two records for account number 100, with different details for the account
A DBMS implementing Consistency of transactions would forbid both of these
changes to the database
P.J. McBrien (Imperial College London) Databases: Introduction 13 / 23
Transactions
Transaction Properties: Isolation
BEGIN TRANSACTION
UPDATE branch
SET cash=cash −10000.00
WHERE so r t c od e=56
UPDATE branch
SET cash=cash +10000.00
WHERE so r t c od e=34
END TRANSACTION
BEGIN TRANSACTION
SELECT SUM( cash ) AS n e t c a sh
FROM branch
END TRANSACTION
Failure to maintain Isolation
Suppose that the system sums the cash in the bank in one transaction, half way
through processing a cash transfer in another transaction
The result of the summation of cash in the bank erroneously reports
£127,246.12, whereas the movement of cash always leaves a total of £137,246.12
A DBMS implementing Isolation of transactions ensures that transactions
always report results based on the values of committed transactions
P.J. McBrien (Imperial College London) Databases: Introduction 14 / 23
Transactions
Transaction Properties: Durability
BEGIN TRANSACTION
UPDATE branch
SET cash=cash −10000.00
WHERE so r t c od e=56
UPDATE branch
SET cash=cash+10000.00
WHERE so r t c od e=34
END TRANSACTION
CRASH
Failure to maintain Durability
Suppose that the system crashes after informing the user that it has committed the
transfer of cash, but has not yet written to disc the update to branch 34
The database on disc is left in an inconsistent state, with £10,000 ‘missing’
A DBMS implementing Durability of transactions would on restart complete
the change to branch 34 (or alternatively never inform a user of commitment
with writing the results to disc).
P.J. McBrien (Imperial College London) Databases: Introduction 15 / 23
DBMS Architecture
DBMS Architecture
log
disc
data manager
buffer
manager
memory
recovery manager
scheduler
transaction manager
read
write❄✻
read
write
✛
read
write❄
flush
fetch ❄
read
write
begin
abort
commit❄
execute
❄
result
reject
delay
✻
Transaction Manager
BEGIN TRANSACTION
UPDATE branch
SET cash=cash −10000.00
WHERE so r t c od e=56
UPDATE branch
SET cash=cash +10000.00
WHERE so r t c od e=34
END TRANSACTION
r1[b56], w1[b56], r1[b34], w1[b34]
P.J. McBrien (Imperial College London) Databases: Introduction 16 / 23
DBMS Architecture
DBMS Architecture
log
disc
data manager
buffer
manager
memory
recovery manager
scheduler
transaction manager
read
write❄✻
read
write
✛
read
write❄
flush
fetch ❄
read
write
begin
abort
commit❄
execute
❄
result
reject
delay
✻
Transaction Manager
BEGIN TRANSACTION
SELECT SUM( cash ) AS n e t c a sh
FROM branch
END TRANSACTION
r2[b56], r2[b34], r2[b67]
P.J. McBrien (Imperial College London) Databases: Introduction 16 / 23
DBMS Architecture
DBMS Architecture
log
disc
data manager
buffer
manager
memory
recovery manager
scheduler
transaction manager
read
write❄✻
read
write
✛
read
write❄
flush
fetch ❄
read
write
begin
abort
commit❄
execute
❄
result
reject
delay
✻
Scheduler
execute(r1[b56], w1[b56], r1[b34], w1[b34])
execute(r2[b56], r2[b34], r2[b67])
b1, r1[b56], w1[b56], b2, r2[b56]
r1[b34], w1[b34], c1, r2[b34], r2[b67], c2
P.J. McBrien (Imperial College London) Databases: Introduction 16 / 23
DBMS Architecture
DBMS Architecture
log
disc
data manager
buffer
manager
memory
recovery manager
scheduler
transaction manager
read
write❄✻
read
write
✛
read
write❄
flush
fetch ❄
read
write
begin
abort
commit❄
execute
❄
result
reject
delay
✻
Data Manager
b1, r1[b56], w1[b56], b2,
r2[b56]
r1[b34], w1[b34],
c1, r2[b34],
r2[b67], c2
Say P1 = [b34], P2 = [b56, b67]
Mr(P2), Br(P2), Dr(P2),Mw(P2), Lw(P2),
Mr(P2),
Mr(P1), Br(P1), Dr(P1),Mw(P1), Lw(P1),
Lw(c1),Mr(P1),
Mr(P2), Lw(c2), Dw(P1), Dw(P2)
P.J. McBrien (Imperial College London) Databases: Introduction 16 / 23
ANSI/SPARC
ANSI/SPARC Model
DB →
ր
ց
external
schema1
. . . external
scheman
conceptual
schema
internal
schema
✻
❑ ✕
transform
filter
ANSI/SPARC model views three levels of abstractions
schema means structure of the database
P.J. McBrien (Imperial College London) Databases: Introduction 17 / 23
ANSI/SPARC
ANSI/SPARC Model (Internal Schema)
external
schema1
. . . external
scheman
conceptual
schema
internal
schema
✻
❑ ✕
✲
2k page size
B-tree index
Strings end with char 0
Describes the physical layout of data
P.J. McBrien (Imperial College London) Databases: Introduction 18 / 23
ANSI/SPARC
ANSI/SPARC Model (Conceptual Schema)
external
schema1
. . . external
scheman
conceptual
schema
internal
schema
✻
❑ ✕
✲
CREATE TABLE branch
( sortcode INTEGER,
bname VARCHAR(20),
cash DECIMAL(10,2)
)
SELECT sortcode
FROM branch
WHERE cash<0
defined in data definition language (DDL)
queried using data manipulation language (DML)
controlled by database administrator (DBA)
P.J. McBrien (Imperial College London) Databases: Introduction 19 / 23
ANSI/SPARC
ANSI/SPARC Model (External Schema)
external
schema1
. . . external
scheman
conceptual
schema
internal
schema
✻
❑ ✕
✲
CREATE VIEW bust
SELECT sortcode
FROM branch
WHERE cash<0
Define a schema for a particular user/application
P.J. McBrien (Imperial College London) Databases: Introduction 20 / 23
Course Format
Course Format
Schedule
Three hours combined lectures/tutorials per week, running into week 10
Coursework that helps you prepare for the exam
May Exam
Books
Several good text books on the market. Some that will also cover material in more
advanced courses are:
Fundamentals of Database Systems,
6th Ed, Elmasri and Navathe, Addison Wesley
Database Systems: The Complete Book,
2nd Ed, Garcia-Molina, Ullman and Widom, Pearson
Database Systems,
5th Ed, Connolly and Begg, Addison Wesley
P.J. McBrien (Imperial College London) Databases: Introduction 21 / 23
Course Format
Course Resources
Course Web Site
http://www.doc.ic.ac.uk/~pjm/db/
Lecture slides
Example Databases
Postgres
SQL Server
Resources
CATe course work handout and submission
Piazza discussion forum
email course email list
If you are not on Level 2 on CATe, nothing works!
P.J. McBrien (Imperial College London) Databases: Introduction 22 / 23
http://www.doc.ic.ac.uk/~pjm/db/
Course Format
Course Content
Conceptual Layer: Relational Algebra
SQL
Datalog
Conceptual Layer: Relational Data Model
Properties of a ‘good’ schema: keys and normalisation
Database design using ER models
Physical Layer: Transaction Processing
Serialisability
Recovery and Checkpointing
P.J. McBrien (Imperial College London) Databases: Introduction 23 / 23
Introduction
Data
Data Models
Transactions
DBMS Architecture
ANSI/SPARC
Course Format