www.cardiff.ac.uk/medic/irg-clinicalepidemiology
(eXtensible Markup Language)
Copyright By PowCoder代写 加微信 powcoder
Information modelling
& database systems
markup languages
XML & its basic concepts
structuring data with XML
learning outcomes
describe the XML data model & outline its basic features
understand the advantages of the XML approach
to data management
Text on Web 2.0
XML: design goals
separate syntax from semantics to provide a common framework for structuring information
represent semi–structured data (data that are structured, but do not fit relational model)
offer more flexibility than databases, but still do some
of the database functionality
allow tailor–made markup for any imaginable application domain
support internationalisation (Unicode) and
platform independence
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world’s writing systems.
What is XML?
XML = eXtensible Markup Language
first published in 1997
a World Wide Web Consortium (W3C) standard
W3C: the main international standards organisation
for the World Wide Web
XML = “SGML for the Web”
SGML (Standard Generalized Markup Language):
an ISO-standard technology for defining generalised markup languages for text documents
Markup language
markup language: a system for annotating text in a way
that is syntactically distinguishable from the text itself
three types of electronic markup:
presentational: achieve a visual effect, e.g. in HTML
red and bold
red and bold
procedural: how to process the text, e.g. in LaTeX
\sum_{i=1}^{\infty}\frac{1}{i}
descriptive: provide additional information, e.g. in XML
XML is a markup language that is used to store
data in a self-descriptive manner
making the data “self-descriptive” is achieved by tagging (annotating or marking up) information
unlike delimited files or database tables, XML documents are structured by tags
tags look like this:
tags indicate the beginning & ending of the tagged data – text–based & position–independent
e.g.
The data in the delimited file simply needs to be separated with a consistent character (e.g. a comma [,] but any symbol such as a TAB or ~ could be specified and used).
the basic structure of XML files:
there are tags: …
tags surround data, or other tags:
NOTE: tags can only be nested within other tags,
i.e. they cannot be overlap partially!
therefore, XML documents have
hierarchical or tree-like structure
XML tags – example
XML element
an XML element, e.g.
the delimiters: “<" and ">” (special characters in XML)
the generic identifier/name: the “TAG” enclosed in the two delimiters
the opening & closing tags: “
(note the backslash in the closing tag)
the content: “some data here”
XML attribute
XML attribute: specifies additional information about an XML element
an attribute for an element appears within the opening tag:
attributes are means of specialising generic elements
e.g.
attribute vs. element:
XML special characters
some characters have a special meaning in XML
e.g. “<" is always interpreted as the start of a new element
this will generate an XML error:
replace a special character with an entity reference:
5 predefined entity references in XML:
< < less than
> > greater than
& & ampersand
' ‘ apostrophe
" ” quotation mark
XML structure – summary
an XML document is an ordered, labelled tree
each node, i.e. XML element:
must have a name
may have attributes, each consisting of
a name & a value
may have content, which may include child nodes
the XML code must be syntactically correct or the XML parser will report an error
Well–formed XML documents
an XML document is a text which is well–formed if
it conforms to the XML syntax rules:
it contains only properly encoded legal Unicode characters
none of the special syntax characters (<, >, “, ‘, &)
appear except when performing their markup roles
XML elements are correctly nested, with none missing
& none overlapping
the XML tags are case–sensitive
there is a single “root” element, which contains all
other elements
< > " ' &
What is XML? http://bioinformatics.oxfordjournals.org/content/early/2010/02/21/bioinformatics.btq069.abstract XML vs. HTML HTML vs. XHTML XML is a markup language where documents must be marked up correctly & well–formed XML Schema XML schema XML schema DTD vs. XML Schema XML Schema DTDs have been around for over twenty years as a part of SGML XML data exchange XML advantages XML trade offs XML is a W3C standard meta-language for defining markup languages XML uses documents as a data exchange mechanism /docProps/thumbnail.jpeg 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com
not a language but a meta-language, i.e. a framework for defining markup languages (or dialects)
no fixed collection of markup tags XML is flexible
each XML language tuned for a specific application,
e.g. MathML is an application of XML for describing mathematical notations
all XML languages share common features enables building of generic tools for processing XML data
all XML languages can be processed by a single lightweight parser
XML is intended for machine processing, but it is still a
human readable format mostly because the data are structured in tags that use common language, e.g.
XML: defines logical structure only
HTML: the same intention, but has evolved into a presentation language; a markup language for a
specific purpose – display in browsers
unlike HTML, XML by itself conveys only content & structure, not presentation, behaviour or meaning
these can still be associated with XML, but this requires additional mechanisms such as stylesheets, scripts, namespaces, etc.
XHTML (eXtensible HyperText Markup Language): a family of XML markup languages that mirror or extend versions of the widely used HTML
XHTML is a stricter & cleaner version of HTML
XHTML is HTML re–designed as an XML language
Why XHTML?
many web pages on the Internet contain “bad” HTML:
How are different XML languages or dialects specified?
XML schema = syntax definition (i.e. grammar) of an XML language – describes the structure of an XML document
formal languages for expressing XML schemas:
Document Type Definition (DTD)
XML Schema
they use very different syntax to achieve the same
task of creating documentation:
what elements an XML document can contain
how they should be used
what interactions may take place between parts
of a document
NOTE: neither DTD nor XML Schema are strictly required for XML development!
both DTDs & XML Schemas are important parts of the XML toolbox
they make XML descriptions readable to
automated processors such as parsers,
editors & other XML–based tools
a well–formed XML document is valid if it conforms to the associated schema specified in DTD or XML Schema
“sentence”
XML Schema is relatively new
XML Schema is itself an XML language!
XML standardises the concrete syntax of data exchange in a text–based notation designed to be obvious to both people & machines
XML uses documents as the transfer mechanism for data
XML publishing model decouples data from processing, which isolates changes in large systems, making them more flexible & reliable
XML is suitable for transactional processing in a heterogeneous, asynchronous, distributed environment such as the Web
data representation is text–based & position–independent
open & extensible
platform & language independent portable
interoperability of content, style & behaviour
human & machine readable
self–descriptive data
no dependence on large software vendors
no binding to specific tools
XML is not a slim format: using tags makes data bigger & more complex than a flat file
performance: relational databases are still much faster
no centralised control of data: potential problems
with data integrity
uniformity: too many different formats
markup language: a system for annotating text
XML is used to store data in a self-descriptive manner using XML tags
XML documents are structured using XML elements, which must have a name, may have attributes, and may have content, which may include other XML elements
an XML document is an ordered, labelled tree of XML elements
an XML document is well-formed if it is syntactically correct according to the W3C specification
different markup languages are specified using XML schemas
an XML schema can be expressed in a formal language such as DTD or XML Schema
a well-formed XML document is valid if it conforms to the associated XML schema