CS157A: Introduction to Database Management Systems
Chapter 11: XML Suneuy Kim
1
Semi structured data representation
• A database of semi structured data is a hierarchical collection of nodes.
• Root represents the entire database.
• Immediate children of roots represents central
entities.
• Leaf nodes have data
• AlabelonanarcfromnodeNtonodeM – name of the attribute or the sub element
– relationship
2
Semi-Structured Data Model
3
StarMovieData.xml – XML Data corresponding to the Semi-Structured Data Model on pp. 3.
4
XML (Extensible Markup Language)
• Standard for data representation and exchange
• Basicconstructs
– Tagged elements (can be nested) – Attributes
– Text
• Tags
– Play the same role as the labels on the arcs of semi
structured-data graph.
– HTML tags describe formatting
– XML tags describe content, that is, meaning of data
5
Example: XML
6
• •
• •
Semantic Tag
Tags are normally matched pairs, as
Opening tag can have attributes.
Tags may be nested arbitrarily.
Element – A pair of matching tags and everything that comes between them.
A single tag is used for an element that doesn’t have any sub-element. A single tag can have attributes.
• •
7
Attributes
• An alternative way to represent a leaf node
• Identifier of an element
• To connect elements
8
Namespaces
To distinguish among different vocabularies for tags in the same document
URI: URL that refers to a document describing the meaning of the tags in the name space.
9
XML with and without a Schema
• Well-formed XML
– You can invent your own tags – no predefined schema – The nesting rule for tags must be obeyed.
• Valid XML
– Conforms to a certain DTD (Document Type Definition)
or a XML Schema
– DTD/XML Schema specifies the allowable tags and a grammar about how they may be nested.
10
Well-formed XML
An XML document is called well-formed if it satisfies the following rules, specified by the W3C.
• A well-formed XML document must have a corresponding end tag for all of its start tags.
• Nesting of elements within each other in an XML document must be proper. For example,
• In each element two attributes must not have the same name. For example,
• Markup characters must be properly specified.
• An XML document can contain only one root element. So, the root element of an xml document is an element which is present only once in an xml document and it does not appear as a child element within any other element.
11
Well-Formed XML
XML Document
XML Parser
not well-formed :<
Parsed XML
12
Valid XML
• Adheres to basic structural requirements • Adheres to content-specific specification
–Document Type Descriptor (DTD) –XML Schema (XSD)
13
Valid XML
DTD/XML Schema
Validating XML Parser
Parsed XML
XML Document
not well-formed or not valid
14
Valid vs. Well-formed XML
• Valid XML – benefit of typing
– Application programs can assume structure
– DTD/XSD can serve as specification for data exchange – Documentation
• Well-formed XML – flexibility, benefit of no-typing – Flexibility – ease of change
– DTD/XSD can be messy for irregular data
15
DTD (Document Type Definitions)
• Language to describe XML schema by specifying elements, attributes, nesting, ordering and # of occurrences
• Also special attribute types for key and foreign key(s): ID and IDREF(s)
16
The form of a DTD
… more elements…
]>
17
DTD Elements
• Thedescriptionofanelementconsistsofits name (tag), and a parenthesis containing any nested tags.
• Sub tags must appear in order shown
• Each tag may be followed by its multiplicity.
• A*: any number of times including 0
• A+: one or more times
• A ?: either zero or one time, but no more
• Symbol | can connect alternative sequences of tags. Example: (A|B) means A or B, but not both.
18
DTD Elements: #PCDATA and EMPTY
• Leaves (text elements) have #PCDATA (Parsed Character DATA ) in place of nested tags
–The element has a text value and no nested element within it.
e.g.)
• means
19
Example: DTD Elements
20
Using a DTD
1. Set standalone = “no”.
2. Either:
a) Internally include the DTD as a preamble of the XML document, or
b) Follow DOCTYPE and the
21
Example: (a) InternalDTD.xml
]>
The DTD
The XML document
22
Example: (b) ExternalDTD.xml
• AssumetheStarsDTDisinfiledefault.dtd.
23
Internal vs. External DTD External DTD are better because of:
– possibility of sharing definitions between XML documents
– The documents that share the same DTD are more uniform and easier to retrieve
24
Attributes
• Opening tags in XML can have attributes. • In a DTD,
declares attributes for element E, along with its data type.
25
Attributes
DTD:
XML:
26
Example: ATTLIST in DTD
• MoviesWithAttribute.dtd • MoviesWithAttribute.xml
27
DTD types: ID and IDREF
DTD: XML:
28
DTD types: ID and IDREF
DTD:
XML:
29
Example: ID and IDREF
• StarMovieData.dtd • StarMovieData.xml
30
Structure of an XML-Schema Document
xml version = ... ?>
…
Defines “xs” to be the namespace described in the URL shown.
Interpret the meaning of schema as part of the name space xs.
31
Elements of XML Schema
•
– name: the tag-name of the element being defined. – type: the type of the element.
• Simple type e.g., xs:string, xs:integer, and xs:boolean
• Complex type and Restricted Simple type that are defined in the document itself
• Use minOccurs and maxOccurs attributes to control the number of occurrences of an xs:element.
32
minOccurs and maxOccurs
• minOccurs: no fewer than minOccurs
• maxOccurs: no more than maxOccurs
• If there is more than one, they must all appear consecutively.
• Unbounded: no upper bound limit
• Default is one occurrence.
33
xs:element
In XML Schema:
XML Elements:
34
User-defined Types
• Complex Types – to define a complex type using existing types
• Restricted Simple Type – to define a simple type by restricting a base type
– enumerations
– range-restricted base types
35
Complex Types
Several ways to construct a complex type
• xs:sequence – order matters
• xs:all – the child elements can appear in any order and that all of the child elements occur once or none of them occur.
• xs:choice – any one of the elements will appear
36
Complex Types
name of the complex type
Note: you need a name if you want to use it for the type of multiple elements.
typical sub-element of complex type
37
Alternative: Complex Types defined in an Element
no type attribute
type of element Movies,
38
A DTD for Movies
]>
39
Example
• MoviesValidatedBySchema.xml • MoviesValidatedBySchema.xsd • MoviesValidatedBySchema.dtd
40
Example: xs:all
• Defines an element named “person” which must contain the “firstname” and the “lastname” elements. They can appear in any order but both elements MUST occur once and only once!
• If exists, maxOccurs must be 1, but minOccurs can be either 0 or 1
• With minOccurs=”0″, each element CAN appear zero or one time! e.g.)
41
Example: xs:choice
• Defines an element named “person” which must contain either a “employee” element or a “member” element, not both.
• minOccurs and maxOccurs can be defined per element.
42
• Persons.xsd • Persons.xml
Example
43
• •
xs:attribute
xs:attribute elements can be used within a complex type to indicate attributes of elements of that type.
attributes of xs:attribute:
– name
– type
– use = “required” or “optional”.
44
With xs:attribute
45
With sub-elements
46
Example
• MoviesWithAttribute.xsd • MoviesWithAttribute.dtd • MoviesWithAttribute.xml
47
Restricted Simple Type
• Restricted simple type can be the type of elements or attributes.
• xs:simpleType can describe enumerations and range-restricted base types.
• name is an attribute
• xs:restriction is a sub-element.
48
• Attribute base gives the simple type to be restricted, e.g., xs:integer.
• Subelements
– xs:{min, max}{Inclusive, Exclusive} are four attributes that can give a lower or upper bound on a numerical range.
or
– xs:enumeration is a subelement with attribute value that allows enumerated types.
49
Example (a)
50
Example (b)
51
Example
• MoviesWithSimpleType.xml • MoviesWithSimpleType.xsd
52
Keys in XML Schema
• An xs:element can have an xs:key subelement.
…
. . .
• The key element MUST contain the following (in order):
– one and only one selector element
– one or more field elements to form a key. The field can be any sub element of the last element on the selector path or an attribute of the last element.
53
Keys in XML Schema
• Selector: Xpath to the containing element
• Field: Xpath to an attribute or element of which value (or set of values) must be a key within the containing element.
Containing element Field
54
Example
Note: The key name “movieKey” will be used if it is restricted by a foreign key.
55
Example • MoviesWithKey.xsd
• MoviesWithKey.xml Movie
Movies MovieSeries
Movie
56
xs:key vs xs:unique
• xs:key
The field must exist.
• xs:unique
The field might not exist, and the constraint is only that they are unique if they exist.
57
Relational Model
XML
Structure
Tables
Hierarchical Tree
Schema
Fixed in advance, required
Flexible, “self- describing” optional
Queries
SQL
XPath, XQuery, XSLT
Ordering
None
Implied ordering
58