CS计算机代考程序代写 data science database algorithm 8/25/2021

8/25/2021

1

Introduction to Data Science

Lecture 2

Data Science, Structured Data

CIS 5930/4930 – Fall 2021

What is Data Science

• Relatively new Computer Science field

• Systems, algorithms, and processes for
managing and deriving insights from
heterogeneous and/or large data sets

CIS 5930/4930 – Fall 2021

Typical Workflow for a Data Scientist

CIS 5930/4930 – Fall 2021

Structured Data: Relations

CIS 5930/4930 – Fall 2021

• Definitions:
• A Relation R is a subset of S1 x S2 x … x Sn,

where Si is Domain of attribute i ϵ [1, n],
and n is a number of attributes of R

• A Tuple t is an element of S1 x S2 x … x Sn
• A relation is a set of tuples

• Relational Database is a collection of
relations

Structured Data: Schema

CIS 5930/4930 – Fall 2021

• Relation schema
• Relation name
• Name of each attribute
• Type of each attribute

• Database schema
• A set of all relation schemas

Example

CIS 5930/4930 – Fall 2021

Relation Schema
Supplier (sno: integer, sname: string, scity: string, sstate: string)

Supplier

sno sname scity sstate

1 s1 city1 TX

2 s2 city1 TX

3 s3 city2 TX

4 s4 city2 TX

1 2

3 4

5 6

8/25/2021

2

Keys

CIS 5930/4930 – Fall 2021

• Super Key: a set of attributes that functionally
determines all other attributes in a tuple

• Key (candidate key): a minimal super-key
• A relation can have several candidate

keys

• Primary key: one minimal key can be selected
as primary key

Foreign Keys

CIS 5930/4930 – Fall 2021

• Foreign key (A → B)
• A and B are relations
• An attribute K (or collection of attributes)

in relation B that uniquely identifies a
tuple in relation A

• Typically, K is a primary key in A

Key Constraints

CIS 5930/4930 – Fall 2021

CREATE TABLE Part

(

pno integer,

pname varchar(20),

psize integer,

pcolor varchar(20),

PRIMARY KEY (pno)

);

CREATE TABLE Order

(

sno integer,

pno integer,

qty integer,

price integer,

PRIMARY KEY (sno, pno),
FOREIGN KEY (pno) REFERENCES Part

);

Relational Query Languages

CIS 5930/4930 – Fall 2021

• Relational Queries
• Inputs and outputs are relations
• Takes one/many relation instance(s) as

input, returns one relation instance as a
result

• Easy to compose into relational algebra
expressions

Basic Relational Operators

CIS 5930/4930 – Fall 2021

• Selection: sCondition (S)
• Condition is a Boolean combination (, ) of

atomic predicates ( , =, = , ≠, =, = , )

• Projection: list-of-attributes (S)

• Union ()

• Set difference ( – )
• Cross-product/Cartesian product (x)
• Join: R ⨝θ S = sθ (R x S)

Selection/Projection Examples

CIS 5930/4930 – Fall 2021

Patient

no name zip disease

1 p1 78225 flu

2 p2 78225 heart

3 p3 78220 lung

4 p4 78220 heart

zip, disease (Patient)

zip disease

78225 flu

78225 heart

78220 lung

78220 heart

sdisease = ‘heart’ (Patient)

no name zip disease

2 p2 78225 heart

4 p4 78220 heart

zip sdisease = ‘heart’ (Patient)

zip

78220

78225

7 8

9 10

11 12

8/25/2021

3

Cross-Product Example

CIS 5930/4930 – Fall 2021

AnonymousPatient P

age zip disease

54 78225 heart

20 78220 flu

Voters V

name age zip

p1 54 78225

p2 20 78220

P x V

P.age P.zip disease name V.age V.zip

54 78225 heart p1 54 78225

54 78225 heart p2 20 78220

20 78220 flu p1 54 78225

20 78220 flu p2 20 78220

Natural Join

CIS 5930/4930 – Fall 2021

AnonymousPatient P

age zip disease

54 78225 heart

20 78220 flu

Voters V

name age zip

p1 54 78225

p2 20 78220

P⨝ V

age zip disease name

54 78225 heart p1

20 78220 flu p2

Outer Join

CIS 5930/4930 – Fall 2021

• Outer join
• Includes tuples, even with no matches
• NULL values for missing attributes

Outer Join

CIS 5930/4930 – Fall 2021

AnonPatient P

age zip disease

54 78225 heart

20 78220 flu

33 78220 lung

Voters V

name age zip

p1 54 78225

p2 20 78220

P ⨝V

age zip disease name

54 78225 heart p1

20 78220 flu p2

33 78220 lung null

Relational Algebra Expressions

CIS 5930/4930 – Fall 2021

E1: Names of voting patients who have heart
disease:

name (Voter ⨝ (sdisease =‘heart’(AnonymousPatient))

CIS 5930/4930 – Fall 2021

Relations
Supplier(sno,sname,scity,sstate)

Part(pno,pname,psize,pcolor)

Order(sno,pno,qty,price)

E2: Suppliers of parts, size greater than 20
sname (Supplier ⨝ Order⨝(spsize > 20(Part))

E3: Suppliers of red parts or parts of size greater than 30
sname (Supplier ⨝ Order⨝(spsize > 30(Part)  spcolor = ‘red’(Part)))

Relational Algebra Expressions

13 14

15 16

17 18

8/25/2021

4

CIS 5930/4930 – Fall 2021

Supplier(sno, sname, scity, sstate)

Order(sno, pno, qty, price)

Part(pno, pname, psize, pcolor)

Supplier Order

pno = pno

 sname, scity

sno = sno

spsize > 30

Part

Relational Algebra Expression
Query Plan

19