CS计算机代考程序代写 SQL javascript database Java flex jquery PowerPoint Presentation

PowerPoint Presentation

1

XML & XPath

DSCI 551

Wensheng Wu

2

3

Agenda

• XML:

– What is it and why do we care?

– Data model (ordered tree)

– Query language: XPath

4

XML

• eXtensible Markup Language

• XML 1.0 – a recommendation from W3C, 2008

• Root: SGML (standard generalized markup

language)

• After the root: a format for sharing data

• Ajax (x – XML)

• jquery ($.ajax(…, format=‘XML’/’JSON’))

SGML

• Derived from IBM’s GML (generalized

ML) developed in 1960’s

– Charles Goldfarb, Edward Mosher, and

Raymond Lorie

– For sharing of large-project documents

• Basis for HTML & XML

– XML is roughly an augmented subset (adds

more restrictions)

– HTML is an application of SGML 5

6

Why XML is of Interest to Us

• XML is a syntax (serialization format) for
data

• This is exciting because:

– Can translate any data to XML

– Can ship XML over the Web (HTTP)

– Can input XML into any application

– Thus: data sharing and exchange on the Web

7

XML Data Sharing and Exchange

application

relational data

Transform

Integrate

Warehouse

XML Data WEB (HTTP)

application

application

legacy data

object-relational

Specific data management tasks

8

From HTML to XML

HTML describes the presentation

9

HTML

Bibliography

Foundations of Databases

Abiteboul, Hull, Vianu

Addison Wesley, 1995

Data on the Web

Abiteoul, Buneman, Suciu

Morgan Kaufmann, 1999

10

XML

Foundations…

Abiteboul

Hull

Vianu

Addison Wesley

1995

XML describes the content

11

Web Services

• A software system designed to support

interoperable machine-to-machine

interaction over a network (from Wikipedia)

• Use http for machine-machine

communications of files

– E.g., in XML & JSON formats

https://en.wikipedia.org/wiki/Interoperability
https://en.wikipedia.org/wiki/Computer_network

Ajax

• Asynchronous Javascript and XML

• Web clients send and receive data from

server asynchronously

– Benefit: more responsive web pages

• Common to use XML, JSON as data format

12

Ajax in action (link)

13

https://www.amazon.com/s?k=data+mining&ref=nb_sb_noss_2

14

XML Terminology
• tags: book, title, author, …

• start tag: , end tag:

• elements: ,

• elements may be nested:

• empty element (no content): abbrv.

– Note that an empty element can have attributes

]>

Tove

Jani

Reminder

Don’t forget me this weekend

XML schema

26

27

Example XML for Company

DTD

123456789

John

B432

1234
987654321

Jim

B123

Example of valid XML document:

28

DTD: The Content Model

• Content model:

– Complex = a regular expression over other elements

– Text-only = #PCDATA/#CDATA

– Empty = EMPTY

– Any = ANY

– Mixed content = (#PCDATA | A | B | C)*

• #CDATA (#PCDATA)

– Character data not are (are) parsed by parser

– Tags inside #PCDATA will be treated as markup

content

model

29

DTD: Regular Expressions

. . . . .

. . . . .

. . . . .

. . . . . . . . . . . . . . .

. . . . . .

Processing instructions

• This is the first line of an XML document

– Declaring that the following is an XML doc…

– that follows standard version 1.0

– and whose encoding is UTF-8

30

31

Agenda

• XML:

– What is it and why do we care?

– Data model

– Query language: XPath

32

Querying XML Data

• XPath = simple navigation through the tree

• XQuery = the SQL of XML

33

Addison-Wesley

Serge Abiteboul

RickHull

Victor Vianu

Foundations of Databases

1995

38.8

Freeman

Jeffrey D. Ullman

Principles of Database and Knowledge Base Systems

1998

34

Data Model for XPath

bib

book book

publisher author . . . .

Addison-Wesley Serge Abiteboul

Document node

The root element

35

XPath: Simple Expressions

Result: 1995

1998

Result: empty (there were no papers)

/bib/book/year

/bib/paper/year

36

//: finding descendants

Result: Serge Abiteboul

Rick

Hull

Victor Vianu

Jeffrey D. Ullman

Result: Rick

//author

/bib//first-name

Select Child by Index

• Index of children starts from 1

• //author[1]

• /bib/book[2]/author

37

38

Xpath: Text Nodes

Result: Serge Abiteboul
Victor Vianu

Jeffrey D. Ullman

Rick Hull doesn’t appear because he has firstname, lastname elements

Functions in XPath:

– text() = matches text nodes

– * = matches only element nodes

– node() = matches any node (element or text)

/bib/book/author/text()

39

Xpath: Wildcard

Result: Rick

Hull

* Matches any element

//author/*

40

Xpath: Attribute Nodes

Result: [’35’, ’55’]

@price means that price has to be an attribute

Is it the same as ?

/bib/book/@price

/bib/book[@price]

Xpath: Attribute nodes

• /bib/book/@*

– Return all attribute nodes of book elements

• Result:

– [’35’, ’55’]

41

42

Xpath: Predicates

Return author elements (under /bib/book) which

have a child element called “first-name”

Result: Rick

Hull

/bib/book/author[first-name]

43

Xpath: More Predicates

Return lastname of author elements which have child element

firstname and child element “address” which itself has …

Result:

/bib/book/author[firstname][address[//zip][city]]/lastname

44

Xpath: More Predicates

/bib/book[@price < 60] /bib/book[author/@age < 25] /bib/book[author/text()] Return books under bib that have an author element with a text node 45 Xpath: More Predicates /bib/book[contains(author, 'Ullman')] Return books under bib whose (first) author subelment contains the word 'Ullman' in its text node (note contains is case-sensitive) What about //book/author[contains(., "Ullman")] ? Xpath: More Predicates • /bib/book[author = "Victor Vianu"] • /bib/book[author/text() = "Victor Vianu"] • /bib/book/author[. = 'Victor Vianu'] 46 Xpath: More Predicates • /bib/book[price > 30 or year > 1995]

• /bib/book[price > 30 and year >= 1995]

• /bib/book[not(price > 30)]

• Note: and, or, not should be all lowercases

47

Parenthesis required for not

Xpath: More Predicates

• /bib/book[not(publisher)]

• What about /bib/book[author[not(node())]]?

48

Xpath: alternatives

49

Return book and cd elements under /bib

/bib/book|/bib/cd

Questions

50

//*

What do these return?

//@*

Resources

• Comparison of SGML and XML

– https://www.w3.org/TR/NOTE-sgml-xml-

971215/

• XML

– http://www.w3schools.com/xml/default.asp

• XPath

– http://www.w3schools.com/xml/xml_xpath.asp

51

https://www.w3.org/TR/NOTE-sgml-xml-971215/
http://www.w3schools.com/xml/default.asp
http://www.w3schools.com/xml/xml_xpath.asp

Resources

• Testers

– https://codebeautify.org/Xpath-Tester (no

support for alternation such as “/bib/(book|cd)”,

but /bib/book|/bib/cd is ok)

– https://www.freeformatter.com/xpath-

tester.html ( no support for “contains”, but

support both forms of alternations above)

– http://www.xpathtester.com/xpath

52

https://codebeautify.org/Xpath-Tester
https://codebeautify.org/Xpath-Tester
http://www.xpathtester.com/xpath