CS代写 CSE 214: Data Structures for Information Systems

CSE 214: Data Structures for Information Systems

CSE 3241: XML
Extensible Markup Language

Copyright By PowCoder代写 加微信 powcoder

Structured, Semistructured,
and Unstructured Data
XML Hierarchical (Tree) Data Model
XML Documents
DTD (Document Type Definition)
XML Schema

Databases are data sources for e-commerce
E-commerce apps / Internet DB applications need to interact with
Users via Web
Need common format to display content and formatting / hypertext documents
HTML can be used for formatting and structuring Web documents but not for data specifications

Structured, Semistructured,
and Unstructured Data
Structured data
Represented in a strict format
Example: information stored in databases
Semistructured data
Has a certain structure
Not all information collected will have identical structure

Structured, Semistructured,
and Unstructured Data (cont’d.)
Self-describing data
Schema information mixed in with data values
May be displayed as a directed graph
Labels or tags on directed edges represent:
Schema names
Names of attributes
Object types (or entity types or classes)
Relationships

Unstructured Data
Limited indication of the type of data document that contains information embedded within it
HTML documents
Do not include schema information about type of data
Static HTML page
All information to be displayed explicitly spelled out as fixed text in HTML file

Unstructured Data
HTML uses a large number of predefined tags
Text that appears between angled brackets: <...>
Tag with a slash:

Semistructured Data

SemiStructured Data: XML
Data sources
Database storing data for Internet applications
Hypertext documents
Common method of specifying contents and formatting of Web pages

What is XML?
XML – The eXtensible Markup Language
What’s a Markup Language?
Language used to annotate a document for some purpose
Uses tags that are distinguished from the content of the document to provide that annotation
HTML (HyperText Markup Language) and LaTeX
Both examples of document publishing languages
Tags used to indicate formatting
Tags follow a defined structure to keep them separate from the content of the document

What is XML?
XML provides a framework to define a structure for data
An XML document is a collection of related data items
Document is “marked up” with tags known as elements
Elements are used to provide structure to the data

XML Hierarchical (Tree) Data Model
Elements and attributes
Main structuring concepts used to construct an XML document
Complex elements
Constructed from other elements hierarchically
Simple elements
Contain data values
XML tag names
Describe the meaning of the data elements in the document

XML Hierarchical (Tree) Data Model (cont’d.)
XML attributes
Describe properties and characteristics of the elements (tags) within which they appear
May reference another element in another part of the XML document
Common to use attribute values in one element as the references

The XML Data Model
Attributes vs. Elements
Data can be stored as the contents of an element OR as an attribute of an element



Product X
Bellaire
5

Why pick one over the other?
Best practice:
Attributes – describe/modify the element
Elements – hold the actual data values
Much like in HTML:
Element (tag) contents are the data to be displayed
Attributes (generally) modify/describe how it is to be displayed

What does XML have to do with databases?
Recall: What is a database?
A logically coherent collection of data with some specific meaning that has been designed for a specific purpose.
Structured and semi-structured data files vs. database?
More practically, XML is used as a data exchange framework
Moving data from one application to another, from one database to another
Taking data from a database and turning it into a website, a report, or other human readable document
Even some implementations of “XML native” DBs
XML as the “back end” storage instead of relations

The XML Data Model

XML uses a hierarchical model
Also known as a tree model
Documents can be represented as trees
Each simple element contains one data value
Leaves of the tree
Complex elements can contain multiple child elements
Internal nodes of the tree
Each complex element can belong to one complex parent element
Parent node of the tree
One root element contains everything else
Root of the tree

A sample XML tree
Internal nodes are complex elements
Leaf nodes are simple elements
The root node is the root element
Root element contains all other elements within it

“Product X”
“Bellaire”
“123456789”
“453453453”

A sample XML tree



Product X
Bellaire
5
123456789
Smith
32.5
453453453
15.5


“Product X”
“Bellaire”
“123456789”
“453453453”

A sample of XML



Product X
Bellaire
5
123456789
Smith
32.5
453453453
15.5


A sample of XML
XML Declaration



Product X
Bellaire
5
123456789
Smith
32.5
453453453
15.5


A sample of XML

root element



Product X
Bellaire
5
123456789
Smith
32.5
453453453
15.5


A sample of XML
Beginning of root element
End of root element

root element



Product X
Bellaire
5
123456789
Smith
32.5
453453453
15.5


A sample of XML
First child element of root
(Other child elements possible in here – do not even need to be “Project” elements necessarily)




Product X
Bellaire
5
123456789
Smith
32.5
453453453
15.5


A sample of XML
The first Project element
has an attribute named
with a value of “1”




Product X
Bellaire
5
123456789
Smith
32.5
453453453
15.5


A sample of XML
First child element of
Project element where

Simple element with a
name of “Name” and a
value of “Product X”



Product X
Bellaire
5
123456789
Smith
32.5
453453453
15.5


A sample of XML
Second child element of
Project element where

Simple element with a
name of “Location” and a
value of “Bellaire”



Product X
Bellaire
5
123456789
Smith
32.5
453453453
15.5


A sample of XML
Third child element of
Project element where

Simple element with a
name of “Dept_no” and a
value of “5”



Product X
Bellaire
5
123456789
Smith
32.5
453453453
15.5


A sample of XML
Fourth child element of
Project element where

Complex element with a
name of “Workers”



Product X
Bellaire
5
123456789
Smith
32.5
453453453
15.5


A sample of XML
First child element of
Projects/ Project[number=“1”]/ Workers

Complex element with a
name of “Worker”



Product X
Bellaire
5
123456789
Smith
32.5
453453453
15.5


XML Hierarchical (Tree) Data Model (cont’d.)
Tree model or hierarchical model
Main types of XML documents
Data-centric XML documents
Document-centric XML documents
Hybrid XML documents
Schemaless XML documents
Do not follow a predefined schema of element names and corresponding tree structure

XML Document Types – Data Centric XML
Data-centric XML
Highly structured
Many small data items
Often used for data exchange purposes
Transfer data from one system to another
Also used to create web pages dynamically from databases
Generally follow a schema document that determines their structure

XML Document Types –
Document-Centric XML
Few structural elements
Large amounts of text
Articles, blog entries, books
May have a schema document, but not required
Schema may be very limited in semantics
What’s a title?
What’s a chapter?
What’s a paragraph?

More XML Document Types
Hybrid XML
Some parts are highly structured
Some parts mostly blocks of text and/or unstructured
May or may not have a predefined schema
Schemaless XML documents
Semi-structured documents without a predefined schema
Denoted by the attribute ‘standalone=“yes”’ in the XML declaration on the top line

An XML document is considered valid if:
It is well-formed

To be continued after this definition…

Well-formed XML
An XML document is well-formed when it follows certain conditions:
It must start with an XML declaration line:

It must form a tree:
Must start with a single root element
Every child element must have start and end tags that are contained completely within a parent element:
Good Bad

An XML document is considered valid if:
It is well-formed, and …
It follows a particular schema in a standard definition language
A DTD document (Document Type Definition)
An XML schema document
DTDs are the original, older technology
XML schema documents – came up around 2001

DTD – Document Type Definition
Original method of specifying a schema definition
Still in widespread use
A very simple schema definition language
Each possible element in the document is defined
What children must it have?
What children can it (optionally) have?
What kinds of attributes can/must it have?
If it is a leaf element, what kinds of values can it have?

XML Documents, DTD, and XML Schema (cont’d.)
Notation for specifying elements
XML Document Type Definition
Data types in DTD are not very general
Special syntax
Requires specialized processors
All DTD elements always forced to follow the specified ordering of the document
Unordered elements not permitted

A sample XML document and DTD




Product X
Bellaire
5
123456789
Smith
32.5
453453453
15.5



We declare that we want to use a DTD by
Putting the DOCTYPE declaration at the top of our XML file

The name of our DTD’s root node

indicating that this is an external DTD

“proj.dtd”
the filename (or URL)

A sample XML document and DTD




Product X
Bellaire
5
123456789
Smith
32.5
453453453
15.5

























A sample DTD
root element comes first












A sample DTD
Name of element












A sample DTD
List of children

Regular expression-like syntax:
+ – indicates 1 or more of this child
* – indicates 0 or more of this child
? – indicates 0 or 1 of this child
No symbol – indicates exactly one child

So this indicates 1 or more Project children
are required












A sample DTD
List of children

Regular expression-like syntax:
+ – indicates 1 or more of this child
* – indicates 0 or more of this child
? – indicates 0 or 1 of this child
No symbol – indicates exactly one child

This indicates that Dept_no is an optional
field, but there can be only one of them












List of children

Regular expression-like syntax:
+ – indicates 1 or more of this child
* – indicates 0 or more of this child
? – indicates 0 or 1 of this child
No symbol – indicates exactly one child

This indicates that Dept_no is an optional
field, but there can be only one of them
A sample DTD












List of children

Regular expression-like syntax:
+ – indicates 1 or more of this child
* – indicates 0 or more of this child
? – indicates 0 or 1 of this child
No symbol – indicates exactly one child

This indicates that Dept_no is an optional
field, but there can be only one of them
A sample DTD












A sample DTD
Project has an attribute named









CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com