CS 200
Lecture 08 Relational Databases – SQL (Structured Query Language)
CS 200 Spring 2018 1 SQL
Administrivia
BEFORE lab, please read and highlight
• Assignment 8
• Database Design (on learn.uwaterloo.ca)
Adapted from Access Database Design & Programming by Steven Roman • The SQL tutorial notes (on learn.uwaterloo.ca)
• Slides for this lecture
START the lab this week by doing the SQL tutorial
Topics for today
• relational database design
• when you need another table, and why • SQL (Structured Query Language)
• a model for relational databases
Please ask questions!
CS 200 Spring 2018 2 SQL
Assumptions for Today’s Lecture
You’ve seen a two-table relational database
• you’ve been exposed to
forms, reports, queries, sorting, & data validation
• you’re familiar with the terms
database
table
record (aka row)
field (aka column)
If this wasn’t true at the beginning of the term,
• by now you’ve completed the Filemaker Intro
CS 200 Spring 2018 3 SQL
Things to Think about
How does a DBMS differ from a spreadsheet?
Why would I choose to use a DBMS?
How does SQL differ from FileMaker?
What are the reasons for needing more than one database table?
CS 200 Spring 2018 4 SQL
Why SQL?
It’s an excellent MODEL for how relational DBMS’s work
Modern “big” DBMS’s are SQL-based
Many PC databases are not
• but can often be used as “front-ends” to mainframe SQL systems • though FileMaker Pro 11–16 and MS Access are based on SQL
OS X actually comes with two SQLs!
• “SQL Lite,” which is designed to be embedded in programs (including the O/S)
• “My SQL,” a (very) popular open source SQL server (used for the Math Faculty’s inventory database)
Often you can
import data from an SQL database into software with a nice GUI (eg FileMaker)
by crafting an appropriate “SQL select statement”
CS 200 Spring 2018 5 SQL
Databases – Structured Data
Why use a database at all?
Structuring data allows us to do things we can’t do efficiently,
or can’t feasibly do, with unstructured data
• The added power & flexibility aren’t free
it takes time and effort to create (and maintain) the structure
• You have to decide if that effort is worthwhile
Obvious questions:
• What do I mean by “structure” in a database? • What’s the payoff?
CS 200 Spring 2018 6 SQL
“Structure”
The visa worksheet in the Excel assignment
• is an example of a 1–table database,
although we built it in Excel, not with a DBMS
• each ROW (“record”) holds data for a particular transaction
• each COLUMN holds a particular piece of data about that transaction (a “field”)
• we could have used FileMaker
though for what we wanted to do,
it wouldn’t have been worth the effort of learning FileMaker
• indeed, we could have placed the data in a Word table
or even in a text processor (eg BBEdit)…
separate fields by tabs, separate records by ¶
but working with the data would have been MUCH harder
— think about implementing the Actual Balance and Statement Balance columns!
CS 200 Spring 2018 7 SQL
The Excel Assignment — Keeping Track of VISA Charges
CS 200 Spring 2018 8 SQL
Key DBMS Functionality
Data entry “validation”
Sophisticated searching (aka “queries”) Sophisticated summarizing and reporting Safe simultaneous updates by multiple users
The REAL power of a relational database
• appears when you have multiple related tables • what does “related” mean?
• why have multiple tables?
CS 200 Spring 2018 9 SQL
A Music Library
The goal — to refine our understanding of why/when multiple tables are necessary
Suppose you want to keep track of your music
• Album Title
• Artist
• Medium (CD, Tape, LP record, …)
• Category (Jazz, Classical, Hard Rock, …) • Price
• Purchase Date
• Copyright
• Label
Well, you could do it with a word processor
• but … how to find all the recordings by Led Zeppelin?
• & … what’s the value of your Charlie Parker albums?
• & … how to avoid entry of a bogus Medium, Category, etc. • & …
How well would Excel work?
Consider how we might manage our music in FileMaker
CS 200 Spring 2018 10 SQL
Albums in FileMaker (1)
Filemaker
• requires datatypes — Text, Number, Date, etc — why?
• provides data entry options for data validation (default values, value lists, range checks, etc)
CS 200 Spring 2018 11 SQL
Albums in FileMaker (2)
Aside: looks a lot like a spreadsheet, eh?
But as you know, FileMaker has a lot more layout flexibility than Excel.
Also, FileMaker restricts the way you can inter-connect fields via computation (=
formulas) [“structure”…]
CS 200 Spring 2018 12 SQL
SQL – Structured Query Language
Here’s a simple language that lets us describe matching between (database) tables
select field_list from table_list where conditions EG
select Title, Artist, Price from Albums
Click to run the select
CS 200 Spring 2018 13
SQL
Sybase’s Interactive SQL App
Another Example of an SQL Select Statement
select Album_ID, Mins, Secs, Title from Songs
CS 200 Spring 2018 14 SQL
Asking for albums below a certain price
select Title, Artist, Price from Albums where Price < 10.00
CS 200 Spring 2018 15 SQL
Asking for albums with a particular album title
select Title, Artist, Medium, Category, Price from Albums where Title = 'North Country'
CS 200 Spring 2018 16 SQL
Asking for albums by a particular artist
select Title, Artist, Medium, Category, Price from Albums where Artist = 'The Rankin Family'
CS 200 Spring 2018 17 SQL
Suppose you want songs too?
Containing such data as
• Title
• Side
• Track
• Playing Time
Maybe you’re the librarian for a radio station...
Can we just add Song fields to the Albums Table?
• It’s a lot of work (145 additional fields!)
• And how to find a song?
• Or list all the songs that are more than 3 minutes long? • Or make an alphabetical list of the songs!
• How many songs should you set the table up for?
If too few ... you run out
If too many ... you waste effort & space
CS 200 Spring 2018
18 SQL
The Songs for each album ... as a list
Laborious to set up...
CS 200 Spring 2018 19 SQL
... as a form (1)
This looks pretty good . . .
. . . though it’s also laborious to set up . . . . . . but try another record, . . .
CS 200 Spring 2018 20 SQL
As a Form (2)
. . . and we can see there are a lot of empty fields
CS 200 Spring 2018 21 SQL
The General Problem Just Illustrated
We want to have multiple copies of some field(s)
and we can’t know in advance how many copies — especially difficult if there’s no limit!
Replicating fields is bad because
•it’s a lot of work to set up
• it makes searching difficult
• you waste a lot of space
• you must modify the database structure
to add more copies
if you run out
CS 200 Spring 2018
22 SQL
How about a table with a separate record for each song?
Now it’s easy to find a song, but ... look at all that space wasted in repetitive album info! (See next slide.)
And consider changing the Category for an album...
Be careful
• to get ALL the songs for that album
CS 200 Spring 2018 23 SQL
One Table, One Song/Record – The Data (1)
Space wasted in repetitive album info!
Consider changing the Category for an album
CS 200 Spring 2018 24 SQL
One Table, One Song/Record – The Data (2)
Notice that Album Title, Artist, Medium, Category, ... & Label
are completely determined by the Album ID
That is
If you know the Album ID,
you know what the Album Title, Group, Medium, Category, ... & Label are
ie they’re always the same for a given Album ID
So why store the Album Title, Group, Medium, ... & Label repeatedly?
Why not store them once
somewhere else, and keep just the
Album ID with each song?
CS 200 Spring 2018 25 SQL
T
One Table, One Song/Record – The Data (3)
So the idea is to keep (just) the songs in a separate table
• With just an Album ID field for each song record
to locate the album information for each song
it’s just an integer, so it doesn’t take much space
• From the Album Table
use Album ID to find Songs in the Song Table
DBMS’s do this for you automatically
• From the Song Table
use Album ID to find album info in the
Album Table
DBMS’s do this for you automatically
• Avoids wasted space
• Searching is straightforward
• Adapts automatically and efficiently
to ANY number of songs / album
Important!
Don’t be confused by album data shown in the Song Table
— it’s temporarily copied from the Album Table just for
T
CS 200 Spring 2018 26 SQL
The Song Table in FileMaker
as a list, showing all the songs on an album as a form, showing one song & info for the album on which it appears
CS 200 Spring 2018 27
The Album info is temporarily copied from the Album table.
SQL
The Album Table in FileMaker
as a list, showing album information only
as a form, showing info about one album
and a list of all the songs on that album in the Songs table
The Songs info is temporarily copied from the Songs table.
CS 200 Spring 2018 28 SQL
Terminology
Album ID
• is a “primary key” for the Album Table because it uniquely identifies an album
• is a “foreign key” of the Songs Table
because it contains a primary key of the Album Table
and thus links a Song record to a unique Album record
“One-to-many” and “many-to-one”
• wrt Album ID
Albums is the “one table”
• Songs is the “many table” because for a given Album ID there is only ONE Album record
— but are (usually) MANY Song records
• “many-to-many” can happen, too
— though not by matching a primary key in each of two tables! — it is often useful
• we’ll see an example next week
CS 200 Spring 2018 29 SQL
The Albums & Songs Database
Albums
Album_ID
Title
Group Medium Category Price Purchase Date Copyright Label
Sample
Songs
Song_ID
Side
Track Title
Mins
Secs Album_ID
(a primary key)
CS 200 Spring 2018
30
SQL
(a foreign key)
the relationship from Albums to Songs on Album_ID is
one-to-many
the relationship from Songs to Albums on Album_ID is
many-to-one
Relational databases
• keep data in multiple tables
• each of which has a primary key
• and link those tables
• by matching field values (though not necessarily via a foreign key)
This works in both directions
• given a song, get the album info
• given an album, list the songs on that album — just by matching field values (a “relationship”)
Summary
CS 200 Spring 2018 31 SQL
Criteria for when you need another table...
T
(1) You have more than one “entity” (eg cars and drivers); fields for one are empty for the other (& vice-versa)
(2) You can have multiple values of some field, ESPECIALLY when you can’t predict how many
(such as having multiple songs on each album)
(3) Given the value of one field A ...
eg in a Course Offerings table:
you know the value of another field B without looking because B’s value is always the same for a given value of A just as, given Album ID, we knew Album Title, Group, etc
Cours
Sec
Teacher
Office
Phone
Userid
Machine
e
cs100
3
KAAnderson DC 3141 6656 kaanders math KAAnderson DC 3141 6656 kaanders math
cs100
4
cs100
6
BMDaly DC 3133 6692 bmzister math
cs100
13
DMSwitzer DC 3111 6200 dmswitze math
cs100
14
BMDaly DC 3133 6692 bmzister math
cs100
15
KAAnderson DC 3141 6656 kaanders math
cs100
16
BMDaly DC 3133 6692 bmzister math
cs100
21
DMSwitzer DC 3141 6656 kaanders math
cs100
22
KAAnderson DC 3141 6656 kaanders math
cs100
23
DMSwitzer DC 3111 6200 dmswitze math
cs100
24
BMDaly DC 3133 6692 bmzister math
cs100
25
DMSwitzer DC 3111 6200 dmswitze math
cs100
26
BMDaly DC 3133 6692 bmzister math
cs100
31
DMSwitzer DC 3111 6200 dmswitze math
cs100
32
BMDaly DC 3133 6692 bmzister math
cs200
22
DMSwitzer DC 3111 6200 dmswitze math DMSwitzer DC 3111 6200 dmswitze math
cs200
25
cs200
26
JCBeatty
DC 2109
4525
jcbeatty
math
cs200
34
JCBeatty
DC 2109
4525
jcbeatty
math
cs230
1
JCBeatty
DC 2109
4525
jcbeatty
math
•••
•••
•••
•••
•••
•••
•••
CS 200 Spring 2018
SQL
CS 200 Spring 2018
33 SQL
Listing the songs on an album (“dot notation”) . . .
and matching songs to a particular
select Side, Track, Songs.Title from Albums, Songs
album where (Albums.Title = 'North Country') and
(Albums.Album_ID= Songs.Album_ID)
select
where order by
Side, Track, Songs.Title from Albums, Songs
(Albums.Title = 'North Country') and
(Albums.Album_ID = Songs.Album_ID)
Songs.Title
Listing the songs in alphabetical order (“order by”)
CS 200 Spring 2018
34 SQL
More on Matching (1)
Here’s the precise syntax of a select statement
select from [where [order by [group by
field_listA table_list conditions] field_listB] field_listC]
You must type each clause in the order shown
just as “The red bounces ball.” is incorrect English
[•••]meansthat ••• isoptional
Fields in the various field_lists must exist in a table of table_list
you can use “*” as field_listA to mean “all the fields”
If two tables use the same field name, you must write TableName.FieldName
to indicate which field you mean
they aren’t necessarily a (foreignKey, primaryKey) matchup
CS 200 Spring 2018 35 SQL
Repeated from the previous slide...
select from [where [order by [group by
field_listA table_list conditions] field_listB ] field_listC]
The ordering specified by “order by” is
• first by the leftmost field in field_listB
• then by the second leftmost field in field_listB • etc, from left to right
The “where” clause can accomplish two things • extract only specific records
eg where (Albums.Title = ‘North Country’) • specify a connection between two tables
eg where (Albums.AlbumID = Songs.AlbumID)
Actually, these two actions aren’t really different . . .
More on Matching (2)
CS 200 Spring 2018 36 SQL
Consider the two tables
Students
IDN Name
with Students data
Register
IDN
Course
Term
Mark
and Register data
A model for how the where clause works (1)
IDN
Name
10
Aaron
11
Sarah
12
Jose
13
Marie
IDN
Course
Term
Mark
10
CS200
9701
89
10
Biol458
9701
75
11
CS200
9701
81
11
Econ335
9701
94
(What are the primary keys for these tables? The foreign keys?)
CS 200 Spring 2018 37 SQL
A model for how the where clause works (2)
Suppose you said
select * from Students as S, Register as R Here’s what that produces —
(Note the convention for
defining)
the “table aliases” S and R.) Every possible combination
of a Student and a Register record!
S.IDN
S.Name
R.IDN
R.Course
R.Term
R.Mark
10
Aaron
10
CS200
9701
89
10
Aaron
10
Biol458
9701
75
10
Aaron
11
CS200
9701
81
10
Aaron
11
Econ335
9701
94
11
Sarah
10
CS200
9701
89
11
Sarah
10
Biol458
9701
75
11
Sarah
11
CS200
9701
81
11
Sarah
11
Econ335
9701
94
12
Jose
10
CS200
9701
89
12
Jose
10
Biol458
9701
75
12
Jose
11
CS200
9701
81
12
Jose
11
Econ335
9701
94
13
Marie
10
CS200
9701
89
13
Marie
10
Biol458
9701
75
13
Marie
11
CS200
9701
81
13
Marie
11
Econ335
9701
94
CS 200 Spring 2018 38
SQL
A model for how the where clause works (3)
More likely what you want is something like
select * from Students as S, Register as R where S.IDN = R.IDN
which produces
5
by constructing the previous table
and then throwing out rows that don’t satisfy the where clause
S.IDN
S.Name
R.IDN
R.Cours
R.Term
R.Mark
10
Aaron
10
e CS200
9701
89
10
Aaron
10
Biol458
9701
75
11
Sarah
11
CS200
9701
81
11
Sarah
11
Econ33
9701
94
CS 200 Spring 2018 39 SQL
How the where clause works (4) select S.IDN, Name, Course from Students as S, Register as R
where S.IDN = R.IDN which produces
Or more elegantly,
S.IDN
S.Name
R.Course
10
Aaron
CS200
10
Aaron
Biol458
11
Sarah
CS200
11
Sarah
Econ335
CS 200 Spring 2018 40 SQL
Just the courses for Aaron
select S.IDN, Name, Course from Students as S, Register as R where (Name = 'Aaron') and (S.IDN = R.IDN)
which produces
S.IDN
S.Name
R.Course
10
Aaron
CS200
10
Aaron
Biol458
CS 200 Spring 2018 41 SQL
List the students taking CS 200
select Course, R.IDN, Name
from Students as S, Register as R
where (Course = 'CS200') and (R.IDN = S.IDN)
which produces
R.Course
R.IDN
S.Name
CS200
10
Aaron
CS200
11
Sarah
CS 200 Spring 2018 42 SQL
Suppose we add a third table
Students Register Courses
IDN IDN Name Name Course Room
Term Time
Mark Description
NB: Students.Name and Courses.Name hold different things
Here’s some data for Courses
Why Stop at 2?
Name Room
CS100 DC1351
CS200 MC4060
Biol458 B2 350
Econ335 ML212
What’s the primary key for Courses?
Time Description
M 1230 Introduction to Computer Usage
M 1230 Advanced Concepts for Computer Usage MWF 830 Behavioural Ecology
TR 1000 Economic Development
CS 200 Spring 2018 43 SQL
A list of the courses for Aaron, with description
select S.Name, Course, Description
from Students as S, Register as R, Courses as C where (S.Name = 'Aaron' )
and (S.IDN = R.IDN)
and (R.Course = C.Name)
which produces
Incidentally,
it would be better to use Aaron’s IDN than his name (why?) if it makes sense, you can use <, ≤, >, ≥ or <> instead of = you can use as many tables as you want
S.Name
R.Course
C.Description
Aaron
CS200
Advanced Concepts for Computer Usage
Aaron
Biol458
Behavioural Ecology
CS 200 Spring 2018 44 SQL
Adding New Records – Insert
Insert Into table_name ( list_of_fields ) Values ( list_of_values ) EG
insert into Students ( IDN, Name ) values ( 14, ‘Barbara’ )
CS 200 Spring 2018 45 SQL
Altering Existing Data – Update
Update table_name Set field = value Where condition
EG
update Students set Name = ‘Mike’ where IDN = 10 update Register set Mark = 100 where Course = ‘CS200’ update Register set Mark = 0
CS 200 Spring 2018 46 SQL
Removing Records – Delete
Delete From table_name Where condition
EG
delete from Students where IDN = 14 delete from Students
CS 200 Spring 2018 47 SQL
CREATE TABLE Students
(
IDN integer NOT NULL, Name varchar(10) NOTNULL,
PRIMARY KEY ( IDN ),
CHECK( IDN between 10 and 99 ),
);
CREATE TABLE Courses
(
Name varchar(10) Room varchar(10) Thyme varchar(10) Description varchar(40)
PRIMARY KEY ( Name ),
);
NOTNULL, NOTNULL, NOTNULL, NOT NULL,
Each field has a type (Integer, Char, …)
allocate storage
know how to manipulate (eg compare)
“NOT NULL”
you must supply a value
PRIMARY KEYs are identified
Creating Tables and Fields (1)
CS 200 Spring 2018 48 SQL
Creating Tables (2)
CREATE TABLE Register
(
IDN integer NOT NULL, Course varchar(10) NOT NULL, Term integer NOT NULL, Mark integer,
PRIMARY KEY ( IDN, Course, Term ),
FOREIGN KEY ( IDN ) REFERENCES Students ( IDN ) ON DELETE CASCADE,
FOREIGN KEY ( Course ) REFERENCES Courses ( Name ) ON DELETE CASCADE,
CHECK( IDN between 10 and 99 ),
CHECK( Term between 5800 and 9999 ),
CHECK( Mark between 0 and 100 ),
);
FOREIGN KEY
identify the target table
what happens on deletion or updates?
CS 200 Spring 2018 49 SQL
Consider
• select Name, IDN from Students where Name = ‘Marie’
How long does it take to find the right record?
• suppose you have 15,000 students
• examine the records one-by-one?
• is that how you look up somebody’s phone number?
No!
•
if the records were sorted by Name
it would be much faster
Can we assume records are sorted in the order we want?
• not if we might look things up on any of two or more fields!
• also, we don’t want to have to enter data in sorted order,
or have to sort all the data before looking something up
Indices (1)
CS 200 Spring 2018 50 SQL
Indices (2)
The solution: create indices
• and update them whenever a record is added • or a name is changed
Indices are “auxiliary tables” you can’t (directly) manipulate
• the DBMS updates them when you change an indexed table
For our example database
• Create Index ByName On Students( Name )
• Create Index ByIDN On Students( IDN )
• Create Index ByCourse On Register( Course ) • etc.
If an index exists and would be useful
• SQL will use it automatically
SQL won’t create an index for you, however
• except for primary keys
CS 200 Spring 2018 SQL
Index 1
Name
Aaron (1)
Aaron
Phone
Index 2
555-2337 (4)
Jose (3)
Sarah
555-6393
555-5285
555-5285 (2)
Marie (4)
Jose
Sarah (2)
Marie
555-6206
555-2337
555-6206 (3)
555-6393 (1)
EG
• delete a table
• delete an index
• add a field
• delete a field
Other SQL Commands
We won’t worry about them because it’s easier to use the Sybase Central
CS 200 Spring 2018 52 SQL
Sybase Central — Creating a New Field
CS 200 Spring 2018 53 SQL
Sybase Central — Utilities (Creating a New Database, Backups, etc)
CS 200 Spring 2018 54 SQL
Sub-Selects (1)
CS 200 Spring 2018
Selects albums whose purchase price
is equal to the purchase price of the album “Led Zeppelin.”
55 SQL
Sub-Selects (2)
Selects albums whose purchase price
is greater than the average purchase price of all albums.
CS 200 Spring 2018
56 SQL
Sub-Selects (3)
Selects albums for which there’s at least one whose title is identical to the album’s title.
CS 200 Spring 2018 57 SQL
song
Sub-Selects (4)
Selects albums for which there’s no song whose title is identical to the album’s title.
There are other variations…
CS 200 Spring 2018 58 SQL
Comparing FileMaker and SQL Remember to read
• “Database Design” on learn.uwaterloo.ca
What’s Next?
CS 200 Spring 2018 59 SQL