CS计算机代考程序代写 SQL data science 8/30/2021

8/30/2021

1

Introduction to Data Science

Lecture 3

Structured Data

CIS 5930/4930 – Fall 2021

Course Staff

• Instructor – Michael Gubanov

– .edu

– OH: Wed 1.00-2.15pm (online)

• TA – Md Muhib Khan

– .edu

– OH: Mon 10-11am, Fri 5-6pm

CIS 5930/4930 – Fall 2021

Exam

• One Midterm exam
– Proctored, online (Canvas)

– TBD during regular scheduled lecture time
• 3.05 – 4.20pm EST

– To be done individually

– Multiple choice test

– A week before (TBD)
• during regular scheduled lecture time – 1.20 – 2.35pm EST

• course material review lectures

– No other exams

CIS 5930/4930 – Fall 2021

Exam

• Exam (online) – the student is solely
responsible for obtaining/installing all
necessary components for Honorlock to
function properly. This includes, but is not
limited to:

o A laptop or desktop computer
o A webcam
o A microphone
o A stable and strong Internet connection

CIS 5930/4930 – Fall 2021

Structured Query Language (SQL)

CIS 5930/4930 – Fall 2021

• A declarative query language, based on
relational algebra

• Data definition language (DDL)
• Statements to create, modify tables and

views
• Data manipulation language (DML)

• Statements to query, insert, delete data

SQL Query

CIS 5930/4930 – Fall 2021

• Syntax

SELECT
FROM
WHERE

1 2

3 4

5 6

mailto: .edu
mailto: .edu

8/30/2021

2

Examples

CIS 5930/4930 – Fall 2021

Product

SELECT PName, Price, Manufacturer
FROM Product
WHERE Price > 100

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

PName Price Manufacturer

SingleTouch $149.99 Canon

MultiTouch $203.99 Hitachi

Syntax

CIS 5930/4930 – Fall 2021

• Case insensitive
• Select = select = selecT
• Supplier = supplier = supplier

• Case sensitive (data)
• ‘San Antonio’ <> ‘SAN ANTONIO’

Duplicates

CIS 5930/4930 – Fall 2021

SELECT DISTINCT category
FROM Product

SELECT category
FROM Product

Category

Gadgets

Photography

Household

Category

Gadgets

Gadgets

Photography

Household

Ordering

CIS 5930/4930 – Fall 2021

SELECT pname, price, manufacturer
FROM Product
WHERE category=‘Gadgets’ AND price > 50
ORDER BY price ASC/DESC, pname ASC/DESC

• Ties are broken by the next attribute

Join

CIS 5930/4930 – Fall 2021

Product (pname, price, category, manufacturer)
Company (cname, stockPrice, country)

• Find all products under $100 manufactured in Japan
• Return their names and prices

SELECT PName, Price
FROM Product, Company
WHERE Manufacturer=CName AND Country=‘Japan’
AND Price <= 100 Pseudonyms CIS 5930/4930 - Fall 2021 Person (pname, address, worksfor) Company (cname, address) SELECT DISTINCT pname, address FROM Person, Company WHERE worksfor = cname SELECT DISTINCT Person.pname, Company.address FROM Person, Company WHERE Person.worksfor = Company.cname SELECT DISTINCT P.pname, C.address FROM Person AS P, Company AS C WHERE P.worksfor = C.cname 7 8 9 10 11 12 8/30/2021 3 Nested Queries CIS 5930/4930 - Fall 2021 • Nested query • Has another query embedded within it • The embedded query is called a subquery • Nesting makes possible referring to a dynamically generated table (result of a select statement) • Subqueries can be found in • FROM clause • WHERE clause • HAVING clause Examples CIS 5930/4930 - Fall 2021 Company (name, city) Product (pname, maker) Purchase (id, product, buyer) SELECT Company.city FROM Company WHERE Company.name IN ( SELECT Product.maker FROM Purchase, Product WHERE Product.pname = Purchase.product AND Purchase .buyer = ‘John‘) • Return cities having • companies that manufacture products bought by John Correlated Queries CIS 5930/4930 - Fall 2021 Movie (title, year, director, length) • Find movies shown only in different years SELECT DISTINCT title FROM Movie AS X WHERE year <> ANY

(SELECT year
FROM Movie
WHERE title = X.title);

Aggregation

CIS 5930/4930 – Fall 2021

SELECT AVG(price)
FROM Product
WHERE maker=“Honda”

SELECT Count(*)
FROM Product
WHERE year > 2015

• Aggregate functions
• Sum, Count, Min, Max, Avg
• Expect a single argument

Grouping

CIS 5930/4930 – Fall 2021

SELECT S
FROM T1,…,Tn
WHERE C1
GROUP BY a1,…,ak
HAVING C2

• Evaluate SELECT-FROM-WHERE (SFW), apply condition C1
• Group by the attributes a1,…,ak
• Apply an aggregate condition C2 to each group
• Return the result

Group BY Example

CIS 5930/4930 – Fall 2021

Supplier(sno, sname, scity, sstate)

Order(sno, pno, qty, price)

Part(pno, pname, psize, pcolor)

SELECT p.pname, count(*) as cnt
FROM Order as O, Part as P
WHERE o.pno = p.pno
GROUP BY p.pname

13 14

15 16

17 18

8/30/2021

4

SQL → Relational Algebra

CIS 5930/4930 – Fall 2021

Product (pid, name, price)
Purchase (pid, cid, store)
Customer (cid, name, city)

SELECT DISTINCT P.name, C.name
FROM Product AS P, Purchase AS Pu, Customer AS C
WHERE P.pid = Pu.pid AND C.cid = Pu.cid AND
P.price > 100 AND C.city = ‘Houston’

SQL → Relational Algebra

CIS 5930/4930 – Fall 2021

Product (pid, name, price)
Purchase (pid, cid, store)
Customer (cid, name, city)

δ

P

Product Purchase

cid = cid

Customer

s

P.name, C.name

price > 100 and city = “Houston”

pid = pid

Relational Query Optimization

CIS 5930/4930 – Fall 2021

δ

P

Product Purchase

cid = cid

Customer

P.name, C.name

pid = pid
s city = ‘Houston’

s Price > 100

Product (pid, name, price)
Purchase (pid, cid, store)
Customer (cid, name, city)

Extended Relational Algebra (ERA)

CIS 5930/4930 – Fall 2021

• Duplicate elimination
• 

• Grouping
• 

• Sorting
• 

ERA Query Plan Template

CIS 5930/4930 – Fall 2021

• Extended SELECT-PROJECT-JOIN

s

fields

select conditions

join condition

fields, sum/count/min/max(fields)

conditionhaving

ERA Query Plan Example

CIS 5930/4930 – Fall 2021

SELECT city
FROM sales
GROUP BY city
HAVING SUM(price) > 100

• Temporarily created tables
• T1, T2, T3

city

sales(product, city, price)

P

T3(city)

s
p > 100

T2(city, p)

T1(city, p)


city, sum(price) as p →city, p

19 20

21 22

23 24

8/30/2021

5

Relational Model Benefits

CIS 5930/4930 – Fall 2021

• Physical Data Independence
• Oblivious to data organization on disk

• Logical Data Independence
• Schema can evolve without affecting

the users/applications

25