8/30/2021
1
Introduction to Data Science
Lecture 3
Structured Data
CIS 5930/4930 – Fall 2021
Course Staff
• Instructor – Michael Gubanov
– .edu
– OH: Wed 1.00-2.15pm (online)
• TA – Md Muhib Khan
– .edu
– OH: Mon 10-11am, Fri 5-6pm
CIS 5930/4930 – Fall 2021
Exam
• One Midterm exam
– Proctored, online (Canvas)
– TBD during regular scheduled lecture time
• 3.05 – 4.20pm EST
– To be done individually
– Multiple choice test
– A week before (TBD)
• during regular scheduled lecture time – 1.20 – 2.35pm EST
• course material review lectures
– No other exams
CIS 5930/4930 – Fall 2021
Exam
• Exam (online) – the student is solely
responsible for obtaining/installing all
necessary components for Honorlock to
function properly. This includes, but is not
limited to:
o A laptop or desktop computer
o A webcam
o A microphone
o A stable and strong Internet connection
CIS 5930/4930 – Fall 2021
Structured Query Language (SQL)
CIS 5930/4930 – Fall 2021
• A declarative query language, based on
relational algebra
• Data definition language (DDL)
• Statements to create, modify tables and
views
• Data manipulation language (DML)
• Statements to query, insert, delete data
SQL Query
CIS 5930/4930 – Fall 2021
• Syntax
SELECT
FROM
WHERE
1 2
3 4
5 6
mailto: .edu
mailto: .edu
8/30/2021
2
Examples
CIS 5930/4930 – Fall 2021
Product
SELECT PName, Price, Manufacturer
FROM Product
WHERE Price > 100
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
PName Price Manufacturer
SingleTouch $149.99 Canon
MultiTouch $203.99 Hitachi
Syntax
CIS 5930/4930 – Fall 2021
• Case insensitive
• Select = select = selecT
• Supplier = supplier = supplier
• Case sensitive (data)
• ‘San Antonio’ <> ‘SAN ANTONIO’
Duplicates
CIS 5930/4930 – Fall 2021
SELECT DISTINCT category
FROM Product
SELECT category
FROM Product
Category
Gadgets
Photography
Household
Category
Gadgets
Gadgets
Photography
Household
Ordering
CIS 5930/4930 – Fall 2021
SELECT pname, price, manufacturer
FROM Product
WHERE category=‘Gadgets’ AND price > 50
ORDER BY price ASC/DESC, pname ASC/DESC
• Ties are broken by the next attribute
Join
CIS 5930/4930 – Fall 2021
Product (pname, price, category, manufacturer)
Company (cname, stockPrice, country)
• Find all products under $100 manufactured in Japan
• Return their names and prices
SELECT PName, Price
FROM Product, Company
WHERE Manufacturer=CName AND Country=‘Japan’
AND Price <= 100
Pseudonyms
CIS 5930/4930 - Fall 2021
Person (pname, address, worksfor)
Company (cname, address)
SELECT DISTINCT pname, address
FROM Person, Company
WHERE worksfor = cname
SELECT DISTINCT Person.pname, Company.address
FROM Person, Company
WHERE Person.worksfor = Company.cname
SELECT DISTINCT P.pname, C.address
FROM Person AS P, Company AS C
WHERE P.worksfor = C.cname
7 8
9 10
11 12
8/30/2021
3
Nested Queries
CIS 5930/4930 - Fall 2021
• Nested query
• Has another query embedded within it
• The embedded query is called a subquery
• Nesting makes possible referring to a
dynamically generated table (result of a
select statement)
• Subqueries can be found in
• FROM clause
• WHERE clause
• HAVING clause
Examples
CIS 5930/4930 - Fall 2021
Company (name, city)
Product (pname, maker)
Purchase (id, product, buyer)
SELECT Company.city
FROM Company
WHERE Company.name IN (
SELECT Product.maker
FROM Purchase, Product
WHERE Product.pname = Purchase.product
AND Purchase .buyer = ‘John‘)
• Return cities having
• companies that manufacture products bought by
John
Correlated Queries
CIS 5930/4930 - Fall 2021
Movie (title, year, director, length)
• Find movies shown only in different years
SELECT DISTINCT title
FROM Movie AS X
WHERE year <> ANY
(SELECT year
FROM Movie
WHERE title = X.title);
Aggregation
CIS 5930/4930 – Fall 2021
SELECT AVG(price)
FROM Product
WHERE maker=“Honda”
SELECT Count(*)
FROM Product
WHERE year > 2015
• Aggregate functions
• Sum, Count, Min, Max, Avg
• Expect a single argument
Grouping
CIS 5930/4930 – Fall 2021
SELECT S
FROM T1,…,Tn
WHERE C1
GROUP BY a1,…,ak
HAVING C2
• Evaluate SELECT-FROM-WHERE (SFW), apply condition C1
• Group by the attributes a1,…,ak
• Apply an aggregate condition C2 to each group
• Return the result
Group BY Example
CIS 5930/4930 – Fall 2021
Supplier(sno, sname, scity, sstate)
Order(sno, pno, qty, price)
Part(pno, pname, psize, pcolor)
SELECT p.pname, count(*) as cnt
FROM Order as O, Part as P
WHERE o.pno = p.pno
GROUP BY p.pname
13 14
15 16
17 18
8/30/2021
4
SQL → Relational Algebra
CIS 5930/4930 – Fall 2021
Product (pid, name, price)
Purchase (pid, cid, store)
Customer (cid, name, city)
SELECT DISTINCT P.name, C.name
FROM Product AS P, Purchase AS Pu, Customer AS C
WHERE P.pid = Pu.pid AND C.cid = Pu.cid AND
P.price > 100 AND C.city = ‘Houston’
SQL → Relational Algebra
CIS 5930/4930 – Fall 2021
Product (pid, name, price)
Purchase (pid, cid, store)
Customer (cid, name, city)
δ
P
Product Purchase
cid = cid
Customer
s
P.name, C.name
price > 100 and city = “Houston”
pid = pid
Relational Query Optimization
CIS 5930/4930 – Fall 2021
δ
P
Product Purchase
cid = cid
Customer
P.name, C.name
pid = pid
s city = ‘Houston’
s Price > 100
Product (pid, name, price)
Purchase (pid, cid, store)
Customer (cid, name, city)
Extended Relational Algebra (ERA)
CIS 5930/4930 – Fall 2021
• Duplicate elimination
•
• Grouping
•
• Sorting
•
ERA Query Plan Template
CIS 5930/4930 – Fall 2021
• Extended SELECT-PROJECT-JOIN
…
s
fields
select conditions
join condition
…
fields, sum/count/min/max(fields)
conditionhaving
ERA Query Plan Example
CIS 5930/4930 – Fall 2021
SELECT city
FROM sales
GROUP BY city
HAVING SUM(price) > 100
• Temporarily created tables
• T1, T2, T3
city
sales(product, city, price)
P
T3(city)
s
p > 100
T2(city, p)
T1(city, p)
city, sum(price) as p →city, p
19 20
21 22
23 24
8/30/2021
5
Relational Model Benefits
CIS 5930/4930 – Fall 2021
• Physical Data Independence
• Oblivious to data organization on disk
• Logical Data Independence
• Schema can evolve without affecting
the users/applications
25