CS计算机代考程序代写 SQL data science database 9/1/2021

9/1/2021

1

Introduction to Data Science

Lecture 4

Semi-Structured Data,

XML

CIS 5930/4930 – Fall 2021

Exam

• One Midterm exam
– Proctored, online (Canvas)
– November 3 during regular scheduled lecture time

• 3.05 – 4.20pm EST

– To be done individually
– Multiple choice test
– Closed book, notes
– A week before (October 25, 27, November 1st)

• during regular scheduled lecture time – 3.05 – 4.20pm EST
• course material review lectures

– No other exams

CIS 5930/4930 – Fall 2021

Exam

• Exam (online) – the student is solely
responsible for obtaining/installing all
necessary components for Honorlock to
function properly. This includes, but is not
limited to:

o A laptop or desktop computer
o A webcam
o A microphone
o A stable and strong Internet connection

CIS 5930/4930 – Fall 2021

SQL → Relational Algebra

CIS 5930/4930 – Fall 2021

Product (pid, name, price)
Purchase (pid, cid, store)
Customer (cid, name, city)

SELECT DISTINCT P.name, C.name
FROM Product AS P, Purchase AS Pu, Customer AS C
WHERE P.pid = Pu.pid AND C.cid = Pu.cid AND
P.price > 100 AND C.city = ‘Houston’

SQL → Relational Algebra

CIS 5930/4930 – Fall 2021

Product (pid, name, price)
Purchase (pid, cid, store)
Customer (cid, name, city)

δ

P

Product Purchase

cid = cid

Customer

s

P.name, C.name

price > 100 and city = “Houston”

pid = pid

Relational Query Optimization

CIS 5930/4930 – Fall 2021

δ

P

Product Purchase

cid = cid

Customer

P.name, C.name

pid = pid
s city = ‘Houston’

s Price > 100

Product (pid, name, price)
Purchase (pid, cid, store)
Customer (cid, name, city)

1 2

3 4

5 6

9/1/2021

2

Relational Model Benefits

CIS 5930/4930 – Fall 2021

• Physical Data Independence
• Oblivious to data organization on disk

• Logical Data Independence
• Schema can evolve without affecting

the users/applications

XML Data Model

bib

book book

publisher
author . . . .

Addison-Wesley Serge Abiteboul

The root

The root element

CIS 5930/4930 – Fall 2021

XML Sample

Addison-Wesley Serge Abiteboul
Rick

Hull

Victor Vianu
Foundations of Databases
1995


Freeman Jeffrey D. Ullman
Principles of Database and Knowledge Base Systems
1998

CIS 5930/4930 – Fall 2021

XPath: Simple Expressions

Result: 1995

1998

Result: empty (there were no papers)

/bib/book/year

/bib/paper/year

CIS 5930/4930 – Fall 2021

Result: Serge Abiteboul
Rick

Hull

Victor Vianu

Jeffrey D. Ullman

Result: Rick

//author

/bib//first-name

XPath: Simple Expressions

CIS 5930/4930 – Fall 2021

XPath: Attribute Nodes

Result: “55, 75”

@price means that price is has to be an
attribute

/bib/book/@price

CIS 5930/4930 – Fall 2021

7 8

9 10

11 12

9/1/2021

3

XPath: Wildcard

Result: Rick

Hull

* Matches any element

@* Matches any attribute

//author/*

CIS 5930/4930 – Fall 2021

XPath: Predicates

Result: Rick

Hull

/bib/book/author[first-name=‘Rick’]

CIS 5930/4930 – Fall 2021

XPath: More Predicates

//product[@price < 60] /bib/book[author/@age < 25] CIS 5930/4930 - Fall 2021 Xpath: Summary bib matches a bib element * matches any element / matches the root element /bib matches a bib element under root bib/paper matches a paper in bib bib//paper matches a paper in bib, at any depth //paper matches a paper at any depth paper|book matches a paper or a book @price matches a price attribute bib/book/@price matches price attribute in book, in bib bib/book[@price<“55”]/author/lastname matches… CIS 5930/4930 - Fall 2021 XQuery • Based on Quilt, which is based on XML-QL • Uses XPath to express more complex queries CIS 5930/4930 - Fall 2021 FLWR (“Flower”) Expressions FOR ... LET... WHERE... RETURN... CIS 5930/4930 - Fall 2021 13 14 15 16 17 18 9/1/2021 4 FOR-WHERE-RETURN Find all book titles published after 1995: FOR $x IN document("bib.xml")/bib/book, $y IN $x/title WHERE $x/year/text() > 1995

RETURN $y

abc
def
ghi

CIS 5930/4930 – Fall 2021

FOR-WHERE-RETURN

Equivalent

FOR $x IN document(“bib.xml”)/bib/book[year/text() > 1995] /title

RETURN $x

Even shorter:

document(“bib.xml”)/bib/book[year/text() > 1995]/title

CIS 5930/4930 – Fall 2021

FOR-WHERE-RETURN

• Find all book titles and the year when they
were published:

FOR $x IN document(“bib.xml”)/ bib/book
RETURN

{ $x/title/text() }
{ $x/year/text() }

abc 1995
def < year > 2002
ghk < year > 1980

CIS 5930/4930 – Fall 2021

Nesting

For each author of a book by Morgan
Kaufmann, list all books s/he published:

FOR $b IN document(“bib.xml”)/bib,
$a IN $b/book[publisher /text()=“Morgan Kaufmann”]/author

RETURN
{ $a,

FOR $t IN $b/book[author/text()=$a/text()]/title
RETURN $t

}

In the RETURN clause comma concatenates XML fragments
CIS 5930/4930 – Fall 2021

Result


Jones
abc
def


Smith
ghi

CIS 5930/4930 – Fall 2021

Aggregates, Ordering

Find all books with more than 3 co-authors:

count = a function that counts

avg = computes the average

sum = computes the sum

distinct-values = eliminates duplicates

Order by – orders the result set in ascending or descending order

FOR $x IN document(“bib.xml”)/bib/book
WHERE count($x/author)>=3

ORDER BY $x/@price descending
RETURN $x

CIS 5930/4930 – Fall 2021

19 20

21 22

23 24

9/1/2021

5

Aggregates

Same thing:

FOR $x IN document(“bib.xml”)/bib/book[count(author)>3]
ORDER BY $x/@price descending

RETURN $x

CIS 5930/4930 – Fall 2021

Aggregates

Print all authors who published more than 3
books (no duplicates → distinct-values clause)

FOR $b IN document(“bib.xml”)/bib,
$a IN distinct-values($b/book/author/text())

WHERE count($b/book[author/text()=$a])>3
RETURN { $a }

CIS 5930/4930 – Fall 2021

Aggregates

Find books whose price is larger than average:

FOR $b in document(“bib.xml”)/bib,
$x in $b/book

LET $a:=avg($b/book/price/text())
WHERE $x/price/text() > $a
RETURN $x

CIS 5930/4930 – Fall 2021

Flattening

• “Flatten” the authors, i.e. return a list of
(author, title) pairs

FOR $b IN document(“bib.xml”)/bib/book,
$x IN $b/title/text(),
$y IN $b/author/text()

RETURN
{ $x }
{ $y }

Result:

abc
efg


abc
hkj

CIS 5930/4930 – Fall 2021

Re-grouping

• For each author, return all titles of her/his
books

FOR $b IN document(“bib.xml”)/bib,
$x IN $b/book/author/text()

RETURN

{ $x }
{ FOR $y IN $b/book[author/text()=$x]/title

RETURN $y }

Result:

efg
abc
klm
. . . .

CIS 5930/4930 – Fall 2021

Re-grouping

• Same, but eliminate duplicate authors:

FOR $b IN document(“bib.xml”)/bib
LET $a := distinct-values($b/book/author/text())
FOR $x IN $a
RETURN


$x
{ FOR $y IN $b/book[author/text()=$x]/title

RETURN $y }

CIS 5930/4930 – Fall 2021

25 26

27 28

29 30

9/1/2021

6

XQuery: Standard Functions

CIS 5930/4930 – Fall 2021

• Divided into subsets by functionality
• Numeric (e.g. abs(), floor(), round())
• String (e.g. concat(), normalize-space())
• Duration, Date and Time (e.g. year-from-

date())
• Sequences (e.g. reverse())
• Accessors (e.g. data())

XQuery: User Defined Functions

CIS 5930/4930 – Fall 2021

• XQuery (depending on vendor) supports
many useful functions

• If you can not find a function, you can write
it yourself

• Reuse common expressions written in your
queries

• Make your queries easy to read
• Use recursion if necessary

XQuery: User Defined Functions

CIS 5930/4930 – Fall 2021

declare function
namespace:function_name(

$input1 as input_datatype1,
…)

as return_datatype
{
(: function body: for, let, where,
order by, return 🙂
};

31 32

33