CS 200
Lecture 08 Relational Databases – SQL (Structured Query Language)
CS 200 Spring 2019 1 SQL
Administrivia
BEFORE lab, please read and highlight
• Assignment 8
• Database Design (on learn.uwaterloo.ca)
Adapted from Access Database Design & Programming by Steven Roman • The SQL tutorial notes (on learn.uwaterloo.ca)
• Slides for this lecture
START the lab this week by doing the SQL tutorial
Topics for today
• relational database design
• when you need another table, and why • SQL (Structured Query Language)
• a model for relational databases
Please ask questions!
CS 200 Spring 2019
2
SQL
Assumptions for Today’s Lecture
You’ve seen a two-table relational database
• you’ve been exposed to
forms, reports, queries, sorting, & data validation
• you’re familiar with the terms
database
table
record (aka row)
field (aka column)
If this wasn’t true at the beginning of the term,
• by now you’ve completed the Filemaker Intro
CS 200 Spring 2019
3
SQL
Things to Think about
How does a DBMS differ from a spreadsheet?
Why would I choose to use a DBMS?
How does SQL differ from FileMaker?
What are the reasons for needing more than one database table?
CS 200 Spring 2019
4
SQL
Why SQL?
It’s an excellent MODEL for how relational DBMS’s work
Modern “big” DBMS’s are SQL-based
Many PC databases are not
• but can often be used as “front-ends” to mainframe SQL systems • though FileMaker Pro 11–16 and MS Access are based on SQL
OS X actually comes with two SQLs!
• “SQL Lite,” which is designed to be embedded in programs (including the O/S)
• “My SQL,” a (very) popular open source SQL server (used for the Math Faculty’s inventory database)
Often you can
import data from an SQL database into software with a nice GUI (eg FileMaker)
by crafting an appropriate “SQL select statement”
CS 200 Spring 2019
5
SQL
Databases – Structured Data
Why use a database at all?
Structuring data allows us to do things we can’t do efficiently,
or can’t feasibly do, with unstructured data
• The added power & flexibility aren’t free
it takes time and effort to create (and maintain) the structure
• You have to decide if that effort is worthwhile
Obvious questions:
• What do I mean by “structure” in a database? • What’s the payoff?
CS 200 Spring 2019
6
SQL
“Structure”
The visa worksheet in the Excel assignment
• is an example of a 1–table database,
although we built it in Excel, not with a DBMS
• each ROW (“record”) holds data for a particular transaction
• each COLUMN holds a particular piece of data about that transaction (a “field”)
• we could have used FileMaker
though for what we wanted to do,
it wouldn’t have been worth the effort of learning FileMaker
• indeed, we could have placed the data in a Word table
or even in a text processor (eg BBEdit)…
separate fields by tabs, separate records by ¶
but working with the data would have been MUCH harder
— think about implementing the Actual Balance and Statement Balance columns!
CS 200 Spring 2019
7
SQL
The Excel Assignment — Keeping Track of VISA Charges
CS 200 Spring 2019
8
SQL
Key DBMS Functionality
Data entry “validation”
Sophisticated searching (aka “queries”) Sophisticated summarizing and reporting Safe simultaneous updates by multiple users
The REAL power of a relational database
• appears when you have multiple related tables • what does “related” mean?
• why have multiple tables?
CS 200 Spring 2019
9
SQL
A Music Library
The goal — to refine our understanding of why/when multiple tables are necessary
Suppose you want to keep track of your music • Album Title
• Artist
• Medium (CD, Tape, LP record, …)
• Category (Jazz, Classical, Hard Rock, …) • Price
• Purchase Date
• Copyright
• Label
Well, you could do it with a word processor
• but … how to find all the recordings by Led Zeppelin?
• & … what’s the value of your Charlie Parker albums?
• & … how to avoid entry of a bogus Medium, Category, etc. • & …
How well would Excel work?
Consider how we might manage our music in FileMaker
CS 200 Spring 2019
10
SQL
Albums in FileMaker (1)
Filemaker
• requires datatypes — Text, Number, Date, etc — why?
• provides data entry options for data validation (default values, value lists, range checks, etc)
CS 200 Spring 2019
11
SQL
Albums in FileMaker (2)
Aside: looks a lot like a spreadsheet, eh?
But as you know, FileMaker has a lot more layout flexibility than Excel.
Also, FileMaker restricts the way you can inter-connect fields via computation (=
formulas) [“structure”…]
CS 200 Spring 2019
SQL
12
SQL – Structured Query Language
Here’s a simple language that lets us describe matching between (database) tables
select field_list from table_list where conditions EG
select Title, Artist, Price from Albums
Click to run the select
CS 200 Spring 2019
13
SQL
Sybase’s Interactive SQL App
Another Example of an SQL Select Statement
select Album_ID, Mins, Secs, Title from Songs
CS 200 Spring 2019
14
SQL
Asking for albums below a certain price
select Title, Artist, Price from Albums where Price < 10.00
CS 200 Spring 2019
15
SQL
Asking for albums with a particular album title
select Title, Artist, Medium, Category, Price from Albums where Title = 'North Country'
CS 200 Spring 2019
16
SQL
Asking for albums by a particular artist
select Title, Artist, Medium, Category, Price from Albums where Artist = 'The Rankin Family'
CS 200 Spring 2019
17
SQL
Suppose you want songs too?
Containing such data as • Title
• Side
• Track
• Playing Time
Maybe you’re the librarian for a radio station...
Can we just add Song fields to the Albums Table?
• It’s a lot of work (145 additional fields!)
• And how to find a song?
• Or list all the songs that are more than 3 minutes long? • Or make an alphabetical list of the songs!
• How many songs should you set the table up for? If too few ... you run out
If too many ... you waste effort & space
CS 200 Spring 2019
18
SQL
The Songs for each album ... as a list
Laborious to set up... CS 200 Spring 2019
19
SQL
... as a form (1)
This looks pretty good . . .
. . . though it’s also laborious to set up . . . . . . but try another record, . . .
CS 200 Spring 2019
20
SQL
As a Form (2)
. . . and we can see there are a lot of empty fields
CS 200 Spring 2019
21
SQL
The General Problem Just Illustrated
We want to have multiple copies of some field(s) and we can’t know in advance how many copies
— especially difficult if there’s no limit!
Replicating fields is bad because •it’s a lot of work to set up
• it makes searching difficult
• you waste a lot of space
• you must modify the database structure
to add more copies
if you run out
CS 200 Spring 2019
22
SQL
How about a table with a separate record for each song?
Now it’s easy to find a song, but ... look at all that space wasted in repetitive album info! (See next slide.)
And consider changing the Category for an album...
Be careful
• to get ALL the songs for that album
• and ONLY the songs for that album CS 200 Spring 2019
23
SQL
One Table, One Song/Record – The Data (1)
Space wasted in repetitive album info!
Consider changing the Category for an album CS 200 Spring 2019
24
SQL
One Table, One Song/Record – The Data (2)
Notice that Album Title, Artist, Medium, Category, ... & Label are completely determined by the Album ID
That is
If you know the Album ID,
you know what the Album Title, Group, Medium, Category, ... & Label are
ie they’re always the same for a given Album ID
So why store the Album Title, Group, Medium, ... & Label repeatedly?
Why not store them once
somewhere else, and keep just the
Album ID with each song?
T
CS 200 Spring 2019
25
SQL
One Table, One Song/Record – The Data (3)
So the idea is to keep (just) the songs in a separate table
• With just an Album ID field for each song record
to locate the album information for each song
it’s just an integer, so it doesn’t take much space
• From the Album Table
use Album ID to find Songs in the Song Table
DBMS’s do this for you automatically
• From the Song Table
use Album ID to find album info in the
Album Table
DBMS’s do this for you automatically
• Avoids wasted space
• Searching is straightforward
• Adapts automatically and efficiently
to ANY number of songs / album
Important!
Don’t be confused by album data shown in the Song Table
— it’s temporarily copied from the Album Table just for
T
CS 200 Spring 2019
26
SQL
as a list, showing all the songs on an album
as a form, showing one song & info for the album on which it appears
The Song Table in FileMaker
CS 200 Spring 2019
27
SQL
The Album info is temporarily copied from the Album table.
The Album Table in FileMaker
as a list, showing album information only
as a form, showing info about one album
and a list of all the songs on that album in the Songs table
The Songs info is temporarily copied from the Songs table.
CS 200 Spring 2019
28
SQL
Terminology
Album ID
• is a “primary key” for the Album Table
because it uniquely identifies an album • is a “foreign key” of the Songs Table
because it contains a primary key of the Album Table
and thus links a Song record to a unique Album record
“One-to-many” and “many-to-one” • wrt Album ID
Albums is the “one table”
• Songs is the “many table” because for a given Album ID
there is only ONE Album record
— but are (usually) MANY Song records • “many-to-many” can happen, too
— though not by matching a primary key in each of two tables!
— it is often useful
• we’ll see an example next week
CS 200 Spring 2019
29
SQL
The Albums & Songs Database
Albums
Album_ID
Title
Group Medium Category Price Purchase Date Copyright Label
Sample
Songs
Song_ID
Side
Track Title
Mins
Secs Album_ID
(a primary key)
(a foreign key)
the relationship from Albums to Songs on Album_ID is
one-to-many
the relationship from Songs to Albums on Album_ID is
many-to-one
CS 200 Spring 2019
30
SQL
Summary
Relational databases
• keep data in multiple tables
• each of which has a primary key
• and link those tables
• by matching field values (though not necessarily via a foreign key)
This works in both directions
• given a song, get the album info
• given an album, list the songs on that album
— just by matching field values (a “relationship”)
CS 200 Spring 2019
31
SQL
Criteria for when you need another table...
(1) You have more than one “entity” (eg cars and drivers); fields for one are empty for the other (& vice- versa)
(2) You can have multiple values of some field, ESPECIALLY when you can’t predict how many (such as
having multiple songs on each album) (3) Given the value of one field A ...
eg in a Course Offerings table:
you know the value of another field B without looking because B’s value is always the same for a given value of A just as, given Album ID, we knew Album Title, Group, etc
T
Cours
Sec
Teacher
Office
Phone
Userid
Machine
e
cs100
3
KAAnderson DC 3141 6656 kaanders math KAAnderson DC 3141 6656 kaanders math
cs100
4
cs100
6
BMDaly DC 3133 6692 bmzister math
cs100
13
DMSwitzer DC 3111 6200 dmswitze math
cs100
14
BMDaly DC 3133 6692 bmzister math
cs100
15
KAAnderson DC 3141 6656 kaanders math
cs100
16
BMDaly DC 3133 6692 bmzister math
cs100
21
DMSwitzer DC 3141 6656 kaanders math
cs100
22
KAAnderson DC 3141 6656 kaanders math
cs100
23
DMSwitzer DC 3111 6200 dmswitze math
cs100
24
BMDaly DC 3133 6692 bmzister math
cs100
25
DMSwitzer DC 3111 6200 dmswitze math
cs100
26
BMDaly DC 3133 6692 bmzister math
cs100
31
DMSwitzer DC 3111 6200 dmswitze math
cs100
32
BMDaly DC 3133 6692 bmzister math
cs200
22
DMSwitzer DC 3111 6200 dmswitze math DMSwitzer DC 3111 6200 dmswitze math
cs200
25
cs200
26
JCBeatty
DC 2109
4525
jcbeatty
math
cs200
34
JCBeatty
DC 2109
4525
jcbeatty
math
cs230
1
JCBeatty
DC 2109
4525
jcbeatty
math
•••
•••
•••
•••
•••
•••
•••
CS 200 Spring 2019
SQL
Listing the songs on an album (“dot notation”) . . .
and matching songs to a particular album
select Side, Track, Songs.Title from Albums, Songs
where (Albums.Title = 'North Country') and
(Albums.Album_ID= Songs.Album_ID)
CS 200 Spring 2019
33
SQL
Listing the songs in alphabetical order (“order by”)
selectSide, Track, Songs.Title from Albums, Songs
where order by
(Albums.Title = 'North Country') and
(Albums.Album_ID = Songs.Album_ID)
Songs.Title
CS 200 Spring 2019
34
SQL
More on Matching (1)
Here’s the precise syntax of a select statement
select from [where [order by [group by
field_listA table_list conditions] field_listB] field_listC]
You must type each clause in the order shown
just as “The red bounces ball.” is incorrect English
[•••]meansthat ••• isoptional
Fields in the various field_lists must exist in a table of table_list
you can use “*” as field_listA to mean “all the fields”
If two tables use the same field name, you must write TableName.FieldName
to indicate which field you mean
they aren’t necessarily a (foreignKey, primaryKey) matchup
CS 200 Spring 2019
35
SQL
Repeated from the previous slide...
select from [where [order by [group by
field_listA table_list conditions] field_listB ] field_listC]
The ordering specified by “order by” is
• first by the leftmost field in field_listB
• then by the second leftmost field in field_listB • etc, from left to right
The “where” clause can accomplish two things • extract only specific records
eg where (Albums.Title = ‘North Country’) • specify a connection between two tables
eg where (Albums.AlbumID = Songs.AlbumID) Actually, these two actions aren’t really different . . .
CS 200 Spring 2019
36
SQL
More on Matching (2)
Consider the two tables Students
IDN Name
with Students data
Register
IDN
Course
Term
Mark
and Register data
A model for how the where clause works (1)
IDN
Name
10
Aaron
11
Sarah
12
Jose
13
Marie
IDN
Course
Term
Mark
10
CS200
9701
89
10
Biol458
9701
75
11
CS200
9701
81
11
Econ335
9701
94
(What are the primary keys for these tables? The foreign keys?)
CS 200 Spring 2019
37
SQL
A model for how the where clause works (2)
Suppose you said
select * from Students as S, Register as R
Here’s what that produces —
(Note the convention for defining)
the “table aliases” S and R.) Every possible combination
of a Student and a Register record!
S.IDN
S.Name
R.IDN
R.Course
R.Term
R.Mark
10
Aaron
10
CS200
9701
89
10
Aaron
10
Biol458
9701
75
10
Aaron
11
CS200
9701
81
10
Aaron
11
Econ335
9701
94
11
Sarah
10
CS200
9701
89
11
Sarah
10
Biol458
9701
75
11
Sarah
11
CS200
9701
81
11
Sarah
11
Econ335
9701
94
12
Jose
10
CS200
9701
89
12
Jose
10
Biol458
9701
75
12
Jose
11
CS200
9701
81
12
Jose
11
Econ335
9701
94
13
Marie
10
CS200
9701
89
13
Marie
10
Biol458
9701
75
13
Marie
11
CS200
9701
81
13
Marie
11
Econ335
9701
94
CS 200 Spring 2019
38
SQL
A model for how the where clause works (3)
More likely what you want is something like select * from Students as S, Register as R where S.IDN = R.IDN
which produces
S.IDN
S.Name
R.IDN
R.Cours
R.Term
R.Mark
10
Aaron
10
e CS200
9701
89
10
Aaron
10
Biol458
9701
75
11
Sarah
11
CS200
9701
81
11
Sarah
11
Econ33
9701
94
CS 200 Spring 2019
39
SQL
5 by constructing the previous table
and then throwing out rows that don’t satisfy the where clause
How the where clause works (4) select S.IDN, Name, Course from Students as S, Register as R
where S.IDN = R.IDN which produces
Or more elegantly,
S.IDN
S.Name
R.Course
10
Aaron
CS200
10
Aaron
Biol458
11
Sarah
CS200
11
Sarah
Econ335
CS 200 Spring 2019
40
SQL
Just the courses for Aaron
select S.IDN, Name, Course from Students as S, Register as R where (Name = 'Aaron') and (S.IDN = R.IDN)
which produces
S.IDN
S.Name
R.Course
10
Aaron
CS200
10
Aaron
Biol458
CS 200 Spring 2019
41
SQL
List the students taking CS 200
select Course, R.IDN, Name
from Students as S, Register as R
where (Course = 'CS200') and (R.IDN = S.IDN)
which produces
R.Course
R.IDN
S.Name
CS200
10
Aaron
CS200
11
Sarah
CS 200 Spring 2019
42
SQL
Suppose we add a third table Students Register Courses IDN IDN Name Name Course Room
Term Time
Mark Description
NB: Students.Name and Courses.Name hold different things
Here’s some data for Courses
Name Room CS100 DC1351 CS200 MC4060 Biol458 B2 350 Econ335 ML212
Time
M 1230 M 1230 MWF 830 TR 1000
Description
Introduction to Computer Usage Advanced Concepts for Computer Usage Behavioural Ecology
Economic Development
Why Stop at 2?
What’s the primary key for Courses?
CS 200 Spring 2019
43
SQL
A list of the courses for Aaron, with description
select S.Name, Course, Description
from Students as S, Register as R, Courses as C where (S.Name = 'Aaron' )
and (S.IDN = R.IDN)
and (R.Course = C.Name)
which produces
S.Name
R.Course
C.Description
Aaron
CS200
Advanced Concepts for Computer Usage
Aaron
Biol458
Behavioural Ecology
Incidentally,
it would be better to use Aaron’s IDN than his name (why?) if it makes sense, you can use <, ≤, >, ≥ or <> instead of = you can use as many tables as you want
CS 200 Spring 2019
44
SQL
Adding New Records – Insert
Insert Into table_name ( list_of_fields ) Values ( list_of_values ) EG
insert into Students ( IDN, Name ) values ( 14, ‘Barbara’ )
CS 200 Spring 2019
45
SQL
Altering Existing Data – Update
Update table_name Set field = value Where condition
EG
update Students set Name = ‘Mike’ where IDN = 10 update Register set Mark = 100 where Course = ‘CS200’ update Register set Mark = 0
CS 200 Spring 2019
46
SQL
Removing Records – Delete
Delete From table_name Where condition
EG
delete from Students where IDN = 14 delete from Students
CS 200 Spring 2019
47
SQL
CREATE TABLE Students
(
IDN integer NOT NULL, Name varchar(10) NOTNULL,
PRIMARY KEY ( IDN ),
CHECK( IDN between 10 and 99 ),
);
CREATE TABLE Courses
(
Name varchar(10) Room varchar(10) Thyme varchar(10) Description varchar(40)
PRIMARY KEY ( Name ),
);
NOTNULL, NOTNULL, NOTNULL, NOT NULL,
Each field has a type (Integer, Char, …) allocate storage
know how to manipulate (eg compare)
“NOT NULL”
you must supply a value
PRIMARY KEYs are identified CS 200 Spring 2019
48
SQL
Creating Tables and Fields (1)
Creating Tables (2)
CREATE TABLE Register
(
IDN integer NOT NULL, Course varchar(10) NOT NULL, Term integer NOT NULL, Mark integer,
PRIMARY KEY ( IDN, Course, Term ),
FOREIGN KEY ( IDN ) REFERENCES Students ( IDN ) ON DELETE CASCADE,
FOREIGN KEY ( Course ) REFERENCES Courses ( Name ) ON DELETE CASCADE,
CHECK( IDN between 10 and 99 ),
CHECK( Term between 5800 and 9999 ),
CHECK( Mark between 0 and 100 ),
);
FOREIGN KEY
identify the target table
what happens on deletion or updates?
CS 200 Spring 2019
49
SQL
Indices (1)
Consider
• select Name, IDN from Students where Name = ‘Marie’
How long does it take to find the right record?
• suppose you have 15,000 students
• examine the records one-by-one?
• is that how you look up somebody’s phone number?
No!
• if the records were sorted by Name
it would be much faster
Can we assume records are sorted in the order we want?
• not if we might look things up on any of two or more fields!
• also, we don’t want to have to enter data in sorted order,
or have to sort all the data before looking something up
CS 200 Spring 2019
50
SQL
Indices (2)
The solution: create indices
• and update them whenever a record is added • or a name is changed
Index 1
Aaron (1)
Jose (3)
Marie (4)
Sarah (2)
Name
Phone
Aaron
555-6393
Sarah
555-5285
Jose
555-6206
Marie
555-2337
Index 2
555-2337 (4)
555-5285 (2)
555-6206 (3)
555-6393 (1)
Indices are “auxiliary tables” you can’t (directly) manipulate
• the DBMS updates them when you change an indexed table
For our example database
• Create Index ByName On Students( Name )
• Create Index ByIDN On Students( IDN )
• Create Index ByCourse On Register( Course ) • etc.
If an index exists and would be useful • SQL will use it automatically
SQL won’t create an index for you, however • except for primary keys
CS 200 Spring 2019
SQL
Other SQL Commands
EG
• delete a table
• delete an index
• add a field
• delete a field
We won’t worry about them because it’s easier to use the Sybase Central GUI
CS 200 Spring 2019
52
SQL
Sybase Central — Creating a New Field
CS 200 Spring 2019
53
SQL
Sybase Central — Utilities (Creating a New Database, Backups, etc)
CS 200 Spring 2019
54
SQL
Sub-Selects (1)
CS 200 Spring 2019
55
SQL
Selects albums whose purchase price
is equal to the purchase price of the album “Led Zeppelin.”
Sub-Selects (2)
CS 200 Spring 2019
56
SQL
Selects albums whose purchase price
is greater than the average purchase price of all albums.
Sub-Selects (3)
song
Selects albums for which there’s at least one whose title is identical to the album’s title.
CS 200 Spring 2019
57
SQL
Sub-Selects (4)
Selects albums for which there’s no song whose title is identical to the album’s title.
There are other variations…
CS 200 Spring 2019
58
SQL
Comparing FileMaker and SQL Remember to read
• “Database Design” on learn.uwaterloo.ca
CS 200 Spring 2019
59
SQL
What’s Next?