CSE 214: Data Structures for Information Systems
CSE 3241: XML
Extensible Markup Language
Copyright By PowCoder代写 加微信 powcoder
Structured, Semistructured,
and Unstructured Data
XML Hierarchical (Tree) Data Model
XML Documents
DTD (Document Type Definition)
XML Schema
Databases are data sources for e-commerce
E-commerce apps / Internet DB applications need to interact with
Users via Web
Need common format to display content and formatting / hypertext documents
HTML can be used for formatting and structuring Web documents but not for data specifications
Structured, Semistructured,
and Unstructured Data
Structured data
Represented in a strict format
Example: information stored in databases
Semistructured data
Has a certain structure
Not all information collected will have identical structure
Structured, Semistructured,
and Unstructured Data (cont’d.)
Self-describing data
Schema information mixed in with data values
May be displayed as a directed graph
Labels or tags on directed edges represent:
Schema names
Names of attributes
Object types (or entity types or classes)
Relationships
Unstructured Data
Limited indication of the type of data document that contains information embedded within it
HTML documents
Do not include schema information about type of data
Static HTML page
All information to be displayed explicitly spelled out as fixed text in HTML file
Unstructured Data
HTML uses a large number of predefined tags
Text that appears between angled brackets: <...>
Tag with a slash:
Semistructured Data
SemiStructured Data: XML
Data sources
Database storing data for Internet applications
Hypertext documents
Common method of specifying contents and formatting of Web pages
What is XML?
XML – The eXtensible Markup Language
What’s a Markup Language?
Language used to annotate a document for some purpose
Uses tags that are distinguished from the content of the document to provide that annotation
HTML (HyperText Markup Language) and LaTeX
Both examples of document publishing languages
Tags used to indicate formatting
Tags follow a defined structure to keep them separate from the content of the document
What is XML?
XML provides a framework to define a structure for data
An XML document is a collection of related data items
Document is “marked up” with tags known as elements
Elements are used to provide structure to the data
XML Hierarchical (Tree) Data Model
Elements and attributes
Main structuring concepts used to construct an XML document
Complex elements
Constructed from other elements hierarchically
Simple elements
Contain data values
XML tag names
Describe the meaning of the data elements in the document
XML Hierarchical (Tree) Data Model (cont’d.)
XML attributes
Describe properties and characteristics of the elements (tags) within which they appear
May reference another element in another part of the XML document
Common to use attribute values in one element as the references
The XML Data Model
Attributes vs. Elements
Data can be stored as the contents of an element OR as an attribute of an element
Why pick one over the other?
Best practice:
Attributes – describe/modify the element
Elements – hold the actual data values
Much like in HTML:
Element (tag) contents are the data to be displayed
Attributes (generally) modify/describe how it is to be displayed
What does XML have to do with databases?
Recall: What is a database?
A logically coherent collection of data with some specific meaning that has been designed for a specific purpose.
Structured and semi-structured data files vs. database?
More practically, XML is used as a data exchange framework
Moving data from one application to another, from one database to another
Taking data from a database and turning it into a website, a report, or other human readable document
Even some implementations of “XML native” DBs
XML as the “back end” storage instead of relations
The XML Data Model
XML uses a hierarchical model
Also known as a tree model
Documents can be represented as trees
Each simple element contains one data value
Leaves of the tree
Complex elements can contain multiple child elements
Internal nodes of the tree
Each complex element can belong to one complex parent element
Parent node of the tree
One root element contains everything else
Root of the tree
A sample XML tree
Internal nodes are complex elements
Leaf nodes are simple elements
The root node is the root element
Root element contains all other elements within it
“Product X”
“Bellaire”
“123456789”
“453453453”
A sample XML tree
“Product X”
“Bellaire”
“123456789”
“453453453”
A sample of XML
A sample of XML
XML Declaration
A sample of XML
root element
A sample of XML
Beginning of root element
End of root element
root element
A sample of XML
First child element of root
(Other child elements possible in here – do not even need to be “Project” elements necessarily)
A sample of XML
The first Project element
has an attribute named
with a value of “1”
A sample of XML
First child element of
Project element where
Simple element with a
name of “Name” and a
value of “Product X”
A sample of XML
Second child element of
Project element where
Simple element with a
name of “Location” and a
value of “Bellaire”
A sample of XML
Third child element of
Project element where
Simple element with a
name of “Dept_no” and a
value of “5”
A sample of XML
Fourth child element of
Project element where
Complex element with a
name of “Workers”
A sample of XML
First child element of
Projects/ Project[number=“1”]/ Workers
Complex element with a
name of “Worker”
XML Hierarchical (Tree) Data Model (cont’d.)
Tree model or hierarchical model
Main types of XML documents
Data-centric XML documents
Document-centric XML documents
Hybrid XML documents
Schemaless XML documents
Do not follow a predefined schema of element names and corresponding tree structure
XML Document Types – Data Centric XML
Data-centric XML
Highly structured
Many small data items
Often used for data exchange purposes
Transfer data from one system to another
Also used to create web pages dynamically from databases
Generally follow a schema document that determines their structure
XML Document Types –
Document-Centric XML
Few structural elements
Large amounts of text
Articles, blog entries, books
May have a schema document, but not required
Schema may be very limited in semantics
What’s a title?
What’s a chapter?
What’s a paragraph?
More XML Document Types
Hybrid XML
Some parts are highly structured
Some parts mostly blocks of text and/or unstructured
May or may not have a predefined schema
Schemaless XML documents
Semi-structured documents without a predefined schema
Denoted by the attribute ‘standalone=“yes”’ in the XML declaration on the top line
An XML document is considered valid if:
It is well-formed
To be continued after this definition…
Well-formed XML
An XML document is well-formed when it follows certain conditions:
It must start with an XML declaration line:
It must form a tree:
Must start with a single root element
Every child element must have start and end tags that are contained completely within a parent element:
Good Bad
An XML document is considered valid if:
It is well-formed, and …
It follows a particular schema in a standard definition language
A DTD document (Document Type Definition)
An XML schema document
DTDs are the original, older technology
XML schema documents – came up around 2001
DTD – Document Type Definition
Original method of specifying a schema definition
Still in widespread use
A very simple schema definition language
Each possible element in the document is defined
What children must it have?
What children can it (optionally) have?
What kinds of attributes can/must it have?
If it is a leaf element, what kinds of values can it have?
XML Documents, DTD, and XML Schema (cont’d.)
Notation for specifying elements
XML Document Type Definition
Data types in DTD are not very general
Special syntax
Requires specialized processors
All DTD elements always forced to follow the specified ordering of the document
Unordered elements not permitted
A sample XML document and DTD
We declare that we want to use a DTD by
Putting the DOCTYPE declaration at the top of our XML file
The name of our DTD’s root node
indicating that this is an external DTD
“proj.dtd”
the filename (or URL)
A sample XML document and DTD
A sample DTD
root element comes first
A sample DTD
Name of element
A sample DTD
List of children
Regular expression-like syntax:
+ – indicates 1 or more of this child
* – indicates 0 or more of this child
? – indicates 0 or 1 of this child
No symbol – indicates exactly one child
So this indicates 1 or more Project children
are required
A sample DTD
List of children
Regular expression-like syntax:
+ – indicates 1 or more of this child
* – indicates 0 or more of this child
? – indicates 0 or 1 of this child
No symbol – indicates exactly one child
This indicates that Dept_no is an optional
field, but there can be only one of them
List of children
Regular expression-like syntax:
+ – indicates 1 or more of this child
* – indicates 0 or more of this child
? – indicates 0 or 1 of this child
No symbol – indicates exactly one child
This indicates that Dept_no is an optional
field, but there can be only one of them
A sample DTD
List of children
Regular expression-like syntax:
+ – indicates 1 or more of this child
* – indicates 0 or more of this child
? – indicates 0 or 1 of this child
No symbol – indicates exactly one child
This indicates that Dept_no is an optional
field, but there can be only one of them
A sample DTD
A sample DTD
Project has an attribute named
CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com