learning outcomes
describe the XML data model & outline its basic
features
understand the advantages of the XML approach to data management
W3C: the main international standards organisation for the World Wide Web
XML = “SGML for the Web”
SGML (Standard Generalized Markup Language):
an ISO-standard technology for defining generalised markup languages for text documents
3/12/2016
XML
(eXtensible Markup Language)
CMT207 Information modelling & database systems
1
XML: design goals
separate syntax from semantics to provide a common framework for structuring information
represent semi–structured data (data that are structured, but do not fit relational model)
offer more flexibility than databases, but still do some of the database functionality
allow tailor–made markup for any imaginable application domain
support internationalisation (Unicode) and platform independence
4
Lecture
content
markup languages
XML & its basic concepts structuring data with XML
What is XML?
XML = eXtensible Markup Language
first published in 1997
a World Wide Web Consortium (W3C) standard
2
5
Text on Web 2.0
3
Markup language
markup language: a system for annotating text in a way that is syntactically distinguishable from the text itself
three types of electronic markup:
1. presentational: achieve a visual effect, e.g. in HTML red and bold red and bold
2. procedural: how to process the text, e.g. in LaTeX \sum_{i=1}^{\infty}\frac{1}{i}
3. descriptive: provide additional information, e.g. in XML
6
1
XML tags
XML is a markup language that is used to store data in a self-descriptive manner
making the data “self-descriptive” is achieved by tagging (annotating or marking up) information
unlike delimited files or database tables, XML documents are structured by tags
tags look like this:
open close
tags indicate the beginning & ending of the tagged data – text–based & position–independent
XML element
an XML element, e.g.
1. the delimiters: “<" and ">” (special characters in XML) 2. the generic identifier/name: the “TAG” enclosed in
the two delimiters
3. the opening & closing tags: “
4. the content: “some data here”
e.g.
XML tags
the basic structure of XML files:
there are tags: …
tags surround data, or other tags:
NOTE: tags can only be nested within other tags, i.e. they cannot be overlap partially!
hierarchical or tree-like structure
XML tags – example
1. < <
2. > >
3. & &
4. ' ‘
5. " ”
less than greater than ampersand apostrophe
quotation mark
the same!
7 10
XML attribute
XML attribute: specifies additional information about an XML element
an attribute for an element appears within the opening tag:
attributes are means of specialising generic elements
e.g.
attribute vs. element:
8 11
XML special characters
some characters have a special meaning in XML
e.g. “<" is always interpreted as the start of a new
element
this will generate an XML error:
replace a special character with an entity reference:
9 12
3/12/2016
2
2. none of the special syntax characters (<, >, “, ‘, &) appear except when performing their markup roles
3. XML elements are correctly nested, with none missing & none overlapping
4. the XML tags are case–sensitive
5. there is a single “root” element, which contains all other elements
many web pages on the Internet contain “bad” HTML:
XML is a markup language where documents must be marked up correctly & well–formed 17
3/12/2016
XML structure – summary
an XML document is an ordered, labelled tree each node, i.e. XML element:
must have a name
may have attributes, each consisting of
aname&avalue
may have content, which may include child nodes
the XML code must be syntactically correct or the XML parser will report an error
13
XML vs. HTML
XML: defines logical structure only
HTML: the same intention, but has evolved into a presentation language; a markup language for a specific purpose – display in browsers
unlike HTML, XML by itself conveys only content & structure, not presentation, behaviour or meaning
these can still be associated with XML, but this requires additional mechanisms such as stylesheets, scripts, namespaces, etc.
XHTML (eXtensible HyperText Markup Language): a family of XML markup languages that mirror or extend versions of the widely used HTML
16
Well–formed XML documents
an XML document is a text which is well–formed if it conforms to the XML syntax rules:
1. it contains only properly encoded legal Unicode characters
HTML vs. XHTML
XHTML is a stricter & cleaner version of HTML
XHTML is HTML re–designed as an XML language Why XHTML?
14
What is XML?
not a language but a meta-language, i.e. a framework for defining markup languages (or dialects)
no fixed collection of markup tags → XML is flexible
each XML language tuned for a specific application, e.g. MathML is an application of XML for describing
mathematical notations
all XML languages share common features → enables building of generic tools for processing XML data
all XML languages can be processed by a single lightweight parser
XML is intended for machine processing, but it is still a human readable format mostly because the data are structured in tags that use common language, e.g.
XML Schema
3
they make XML descriptions readable to automated processors such as parsers, editors & other XML–based tools
a well–formed XML document is valid if it conforms to the associated schema specified in DTD or XML Schema
20
interoperability of content, style & behaviour human & machine readable
self–descriptive data
no dependence on large software vendors
no binding to specific tools
3/12/2016
XML schema
How are different XML languages or dialects specified? XML schema = syntax definition (i.e. grammar) of an XML
language – describes the structure of an XML document formal languages for expressing XML schemas:
Document Type Definition (DTD)
XML Schema
they use very different syntax to achieve the same task of creating documentation:
what elements an XML document can contain
how they should be used
what interactions may take place between parts
of a document 19
XML data exchange
XML standardises the concrete syntax of data exchange in a text–based notation designed to be obvious to both people & machines
XML uses documents as the transfer mechanism for data
XML publishing model decouples data from processing, which isolates changes in large systems, making them more flexible & reliable
XML is suitable for transactional processing in a heterogeneous, asynchronous, distributed environment such as the Web
22
XML schema
NOTE: neither DTD nor XML Schema are strictly required for XML development!
both DTDs & XML Schemas are important parts of the XML toolbox
XML advantages
data representation is text–based & position–independent open & extensible
platform & language independent → portable
23
DTD vs. XML Schema
DTD
“grammar”
XML Schema
“grammar”
21
XML
“sentence”
XML trade offs
XML is not a slim format: using tags makes data bigger & more complex than a flat file
performance: relational databases are still much faster
no centralised control of data: potential problems with data integrity
uniformity: too many different formats
24
4
3/12/2016
Summary
XML is a W3C standard meta-language for defining markup languages
markup language: a system for annotating text
XML is used to store data in a self-descriptive manner
using XML tags
XML documents are structured using XML elements, which must have a name, may have attributes, and may have content, which may include other XML elements
an XML document is an ordered, labelled tree of XML elements
25
Summary
XML uses documents as a data exchange mechanism
an XML document is well-formed if it is syntactically
correct according to the W3C specification
different markup languages are specified using XML
schemas
an XML schema can be expressed in a formal language such as DTD or XML Schema
a well-formed XML document is valid if it conforms to the associated XML schema
26
27
5
structure data using tags
in this lecture we will learn how to query such data we will cover two languages:
XPath a language for navigating through an XML document
XQuery a language for querying XML data
text documentnode
namespace
XML documents are treated as trees of nodes
the topmost element of the tree is called the root element
3/12/2016
Querying XML
CMT207 Information modelling & database systems
1
XPath
XPath is used to navigate through elements and attributes in an XML document
XPath uses path expressions to select nodes in an XML document
they look very much like the expressions used when working with a traditional computer file system
XPath also includes over 100 built–in functions
string values, numeric values, date and time comparison, node and QName manipulation, sequence manipulation, Boolean values, etc.
4
Lecture
in this module we learnt about
structuring data using a relational data model
querying the data stored in relational databases
in the previous lecture we learnt about using XML to
Nodes
in XPath, there are seven kinds of nodes:
element processinginstruction attribute comment
2
5
XPath
root
element
attribute
Nodes
6
1
Relationships between nodes
parent child
sibling
Path expression
bookstore
/bookstore
bookstore/book
Comment
select all nodes with the name bookstore
select the root element bookstore
3/12/2016
Atomic values
atomic values are nodes with no children or parent e.g.
J K. Rowling “en”
7
Path expressions
Expression
Description
nodename
select all nodes with the name nodename
/
select from the root node
//
select all nodes descending from the current node that match the selection criteria
.
select the current node
..
selects the parent of the current node
@
select attribute
10
Examples
ancestor
descendant
selects all book elements that are children of bookstore
8
bookstore//book
//book
selects all book elements that are descendant of the bookstore element
select all book elements no matter where they are
//@lang select all attributes that are named lang
11
XPath syntax
a node is selected by following a path
we will use the following example to illustrate the use
of paths:
9
Predicates
predicates are used to find :
a specific node, or
a node that contains a specific value
predicates are embedded in square brackets
e.g. selects the first book element that is the child
of the bookstore element /bookstore/book[1]
12
2
3/12/2016
Examples
Path expression
/bookstore/book[last()]
/bookstore/book
[position()<3]
//title[@lang]
//title[@lang='en']
/bookstore/book [price>35.00]/title
Comment
select the last book element that is the child of the bookstore element
select the first two book elements that are children of the bookstore element
select all title elements that have an attribute named lang
select all title elements that have a “lang” attribute with a value of “en”
select all title elements of the book
elements of the bookstore element that
have a price element with a value >35.00
13
Axis name
Description
self
the current node
attribute
all attributes of the current node
namespace
all namespace nodes of the current node
parent
the parent of the current node
child
all children of the current node
ancestor
all ancestors of the current node
ancestor–or–s elf
as above + the current node itself
descendant
all descendants of the current node
descendant–or–self
as above + the current node itself
following
everything in the document after the closing tag of the current node
following–sibling
all siblings after the current node
XPath axis
an axis defines a node–set relative to the current node
Unknown nodes
XPath wildcards can be used to select unknown XML nodes
Wildcard
*
@*
node()
Description
match any element node
match any attribute node
match any node
preceding
preceding–sibling
all nodes that appear before the current node in the document, except ancestors, attribute nodes and namespace nodes
all siblings before the current node16
Location path
a location path can be absolute or relative
an absolute location path starts with a slash ( / )
a relative location path does not start with a slash
a location path consists of one or more steps, each separated by a slash, e.g.
/step/step/… absolute location path
step/step/… relative location path
each step is evaluated against the nodes in the current node–set
17
e.g.
//title[@*] selects all title elements that have at least one attribute 14
Path expression
Comment
/bookstore/*
selects all elements that are children the bookstore element
//* selects all elements in the document
Multiple paths
operator | can be used within an XPath expression to select multiple paths, e.g.
/bookstore/book/title | select all title elements of the book //price element of the bookstore element AND all
Path expression
Comment
//book/title | //book/price
select all title AND price elements of all book elements
//title | //price
selects all title AND price elements in the document
the price elements in the document
15
Location path
a step in a location path consists of: an axis
a node–test … identifies a node–set within an axis ≥0 predicates … to further refine the selected
node–set the syntax for a location step is:
axisname::nodetest[predicate]
18
3
Boolean
number
these returned values may be combined using the XPath operators
built on XPath expressions
Select all books with a price greater than £30 from the
book collection stored in books.xml
for $x in doc(“books.xml”)/bookstore/book where $x/price>30
order by $x/title
return $x/title
3/12/2016
Examples
Location path
Comment
child::book
all book nodes that are children of the current node
attribute::lang
the lang attribute of the current node
attribute::*
all attributes of the current node
child::node()
all children of the current node
child::*
all elements that are children of the current node
child::text()
all text node children of the current node
descendant::book
all book nodes that are descendants of the current node
child::*/child::price
all price grandchildren of the current node
19
XQuery
XPath operators
an XPath expression can return: node–set
string
XQuery
a language for finding and extracting elements and attributes from XML documents
XQuery is to XML what SQL is to database tables designed to query XML data
20
23
Operator
Description
Example
|
union of two node–sets
//book | //cd
+
addition
6+4
–
subtraction
6–4
*
multiplication
6*4
div
division
8 div 4
mod
division remainder
5 mod 2
=
equal
price=9.80
!=
not equal
price!=9.80
<
less than
price<9.80
<=
less than or equal to
price<=9.80
>
greater than
price>9.80
>=
greater than or equal to
price>=9.80
or
logical or
price=9.80 or price=9.70
and
logical and
price>9.00 and price<9.90
21
XQuery syntax
case–sensitive
elements, attributes and variables must be valid XML
names
string value can be in single (') or double quotes (")
variable is defined with a $ followed by a name, e.g. $bookstore
comments are delimited by (: and :), e.g. (: XQuery comment 🙂
24
4
3/12/2016
Working example – books.xml
25
FLWOR expressions
with FLWOR we can sort the result, e.g.
for where order by return
result:
$x in doc("books.xml")/bookstore/book $x/price>30
$x/title
$x/title
28
Selecting nodes
XQuery uses:
functions … to extract data from XML
documents
path expressions … to navigate through elements in an XML document
predicates … to limit the extracted data from XML documents
e.g. doc(“books.xml”)/bookstore/book[price<30] function path predicate 26 FLWOR expressions FLOWR expression is to XQuery what SELECT statement is to SQL FLWOR stand for For, Let, Where, Order by, Return only return is mandatory Clause where order by return for let Description binds a variable to each item returned by the in expression assigns variables specifies search criteria specifies the sort order of the result specifies what to return in the result 29 FLWOR expressions e.g. path expression: doc("books.xml")/bookstore/book[price>30]/title
result:
the following FLWOR expression does exactly the same:
for $x in doc(“books.xml”)/bookstore/book where $x/price>30
return $x/title
27
The for clause
the for clause binds a variable to each item returned
by the in expression
multiple for clauses can be used in the same FLWOR
expression
the for clause results in iteration
the at keyword can be used to count the iteration, e.g.
for $x at $i in doc(“books.xml”)/bookstore/book/title return
30
5
3/12/2016
The let clause
the let clause allows variable assignments
… to avoid repeating the same expression many times the let clause does not result in iteration
example:
let$x:=(1to5)
return
31
The return clause
the return clause specifies what is to be returned
example:
for $x in doc(“books.xml”)/bookstore/book return $x/title
34
The where clause
the where clause is used to specify one or more criteria
for the result
example:
for $x in doc(“books.xml”)/bookstore/book where $x/price>30 and $x/price<100 return $x/title 32 Conditional expressions if–then–else expressions are allowed in XQuery parentheses around the if expression are required else is required, but it can be just else () e.g. for $xindoc("books.xml")/bookstore/book return if ($x/@category="CHILDREN") then
result:
note that we can add elements and attributes (XML or HTML) to the result
The order clause
the order clause is used to specify the sort order of the
result
e.g. order the result by category and title:
for $x in doc(“books.xml”)/bookstore/book order by $x/@category, $x/title
return $x/title
33
Comparisons
there are two ways of comparing values:
1. general comparison =, !=, <, <=, >, >=
2. value comparison eq, ne, lt, le, gt, ge
examples:
$bookstore//book/@q > 10
returns true if any q attributes have a value >10 $bookstore//book/@q gt 10
returns true if there is only one q attribute returned by the expression, and its value is >10
if more than one q is returned, an error occurs
36
6
Operator
|
+
–
*
div
mod
=
Description
union of two node–sets
addition
subtraction
multiplication
division
division remainder
Example
//book | //cd
6+4
6–4
6*4
8 div 4
5 mod 2
User–defined functions
example:
declare function local:minPrice($p as xs:decimal?, $d as xs:decimal?)
as xs:decimal?
{
3/12/2016
Functions
XQuery and XPath share the same data model and support the same functions and operators
37
User–defined functions
users can also define their own functions in XQuery:
declare function prefix:function_name($parameter as datatype) as returnDatatype
{
… function code here… };
usethedeclare functionkeyword
the name of the function must be prefixed
the data types are defined in XML Schema
the function body must be surrounded by curly braces
40
!=
<
<=
>
>=
or
and
equal
not equal
less than
less than or equal to
greater than
greater than or equal to
logical or
logical and
price=9.80
price!=9.80
price<9.80
price<=9.80
price>9.80
price>=9.80
price=9.80 or price=9.70
price>9.00 and price<9.90
let $disc := ($p * $d) div 100
return ($p – $disc) };
function call:
{local:minPrice($book/price, $book/discount)}
38
41
Function type
Example
Comment
accessor
fn:base-uri(node)
returns the value of the base–uri property of the specified node
error and trace
fn:trace(value, label)
used to debug queries
numeric
fn:round(num)
rounds the number argument to the nearest integer
string
fn:concat(string, string, …)
returns the concatenation of the strings
anyUri
fn:resolve-uri(relative, base)
takes a base URI and a relative URI as arguments, and constructs an absolute URI
Boolean
fn:not(arg)
logical not
duration/date/ time
fn:dateTime(date,time)
converts the arguments to a date and a time
QName
fn:QName(uri, name)
takes a namespace URI and a qualified name as arguments, and constructs a QName value
node
fn:root(node)
returns the root of the tree to which the specified node belongs.
sequence
fn:reverse((item, item, …))
returns the reversed order of the items specified
context
fn:position()
returns the index position of the node that is currently being processed 39
42
7
3/12/2016
JSON
(JavaScript Object Notation)
CMT207 Information modelling & database systems
1
History
JavaScript is a high–level, dynamic, untyped and interpreted programming language
alongside HTML and CSS, JavaScript is one of the three essential technologies of the Web
programmers need an easy way to transfer data on the Web
JSON format is syntactically identical to the code for creating JavaScript objects
instead of using a parser (like XML does), JavaScript can use standard functions to convert JSON data into native objects,e.g.var json = JSON.parse(text);
object string
4
Lecture
in the previous two lectures we learnt about XML, a markup language used to structure data
we pointed that XML format is “fat” and briefly mention JSON as a “slim” format that does the same
JSON string
{
}
key value
“name”: “David Jones”, “age”: 23,
“address”: {
“streetAddress”: “5 The Parade”, “city”: “Cardiff”
}, “phoneNumber”: [
object starts value:string value:number objectstarts
object ends arraystarts object starts
object ends object starts
object ends array ends object ends
{
}, {
}
“type”: “home”,
“number”: “029 1234 5678”
“type”: “mobile”, “number”: “077 8765 4321”
]
5
job
in this lecture we will learn more about JSON
2
JSON
JSON = JavaScript Object Notation
pronounced like the name Jason
JSON is a syntax for storing and exchanging data
text–based
light–weight
human readable
language independent
JSON is an open standard specified on RFC4627 https://www.ietf.org/rfc/rfc4627.txt
3
JSON object in JavaScript
try it online here
// David Jones // 5 The Parade // Cardiff
// 029 1234 5678 // mobile
6
1
JSON on the Web
serialization is the process of converting an object into a format suitable to be stored in a file or memory buffer and/or transmitted
JSON is often used to serialize and transfer data over a
3/12/2016
JSON and JavaScript
JSON is considered as a subset of JavaScript
… but that does not mean that JSON cannot be
used with other languages
JSON uses JavaScript syntax, but the JSON format is
text only… just like XML
JSON is language independent
it works well with most of the modern programming languages
e.g. PHP, Perl, Python, Ruby, Java and many more
7
JSON data
JSON data is written as name/value pairs a name/value pair consists of:
1. field name (in double quotes) 2. colon
3. value
e.g. “firstName”:”John” name colon value
10
JSON data
JSON values can be of the following data types:
Type
network connection
e.g. between web server and a web application note: XML serves the same purpose!
Web services and APIs use JSON format to provide public data
e.g. Flickr and Twitter
8
number
string
Description
double–precision floating– point format in JavaScript
double–quoted Unicode with backslash escaping
{“marks”: 97}
{“name”: “John”}
Example
Boolean
object
array
true or false
an unordered collection of key:value pairs
an ordered sequence of values
{name: “John”, marks: 97, distinction: true}
{name: “John”, marks: 97, distinction: true}
{ “books”: [ {“title”:”Game” }, {title”:”Set”}, {“title”:”Match”} ] }
null empty
11
JSON syntax
JSON syntax is derived from JavaScript object notation syntax:
data is in name/value pairs
data is separated by commas curly braces hold objects
square brackets hold arrays
“name”:”value” ,
{ object }
[ array ]
9
JSON data
JSON objects are written inside curly braces
just like JavaScript, JSON objects can contain multiple
name/values pairs, e.g.
{“firstName”:”John”, “lastName”:”Doe”} JSON arrays are written inside square brackets
just like JavaScript, a JSON array can contain multiple objects, e.g.
“employees”:[
{“firstName”:”John”, “lastName”:”Doe”}, {“firstName”:”Anna”, “lastName”:”Smith”}, {“firstName”:”Peter”,”lastName”:”Jones”}
]
12
2
3/12/2016
JavaScript
JSON syntax is derived from JavaScript object notation
in JavaScript, an array of objects can be created like this:
var employees = [
{“firstName”:”John”, “lastName”:”Doe”}, {“firstName”:”Anna”, “lastName”:”Smith”}, {“firstName”:”Peter”,”lastName”:”Jones”}
];
an element of the JavaScript object array can be accessed like this:
employees[0].firstName + ” ” + employees[0].lastName;
or
employees[0][“firstName”] + ” ” + employees[0][“lastName”];
13
JSON within JavaScript
theJavaScriptfunctionJSON.parse() canbeusedto convert a JSON string into a JavaScript object:
var obj = JSON.parse(text);
the new JavaScript object can now be used in the web page, e.g.
16
JavaScript
an element of the JavaScript object array can be modified like this:
employees[0].firstName = “Gilbert”;
or
JSON Schema
employees[0][“firstName”] = “Gilbert”;
result:
var employees = [
{“firstName”:”Gilbert”, “lastName”:”Doe”}, {“firstName”:”Anna”, “lastName”:”Smith”}, {“firstName”:”Peter”,”lastName”:”Jones”}
];
14
JSON within JavaScript
JSON syntax is derived from JavaScript object notation
very little extra software is needed to work with JSON
within JavaScript
JSON is commonly used to read data from a web server, and display the data in a web page
for simplicity, we will demonstrate such use with a JSON string as input (instead of a file):
var text = ‘{ “employees” : [‘ +
‘{ “firstName”:”John” , “lastName”:”Doe” },’ +
‘{ “firstName”:”Anna” , “lastName”:”Smith” },’ + ‘{ “firstName”:”Peter”, “lastName”:”Jones” } ]}’;
15
Example
schema
{
“title”: “Example Schema”, “type”: “object”, “properties”: {
“firstName”: {“type”:”string”}, “lastName”: {“type”:”string”}, “age”: {
“description”:”Age in years”, “type”:”integer”, “minimum”:0
} },
“required”: [“firstName”, “lastName”] }
JSON {“firstName”:”Peter”,”lastName”:”Pan”,”age”:12}
3
Hello, World!
in JSON Schema, an empty object is a valid schema that will accept any valid JSON, e.g.
accepts any valid JSON, e.g.
Declaring a unique identifier
it is also good practice to include an id property as a unique identifier for each schema, e.g.
{ “id”: “http://yourdomain.com/schemas/myschema.json” }
3/12/2016
JSON Schema
JSON Schema is a specification for JSON–based format for defining the structure of JSON data
JSON Schema itself is written in JSON
schema is data itself, not a computer program
it is just a declarative format for “describing the structure of other data”
JSON data can be validated against a schema using a computer program
for documentation see: http://json-schema.org/
19
Declaring a JSON Schema
JSON Schema is itself JSON
it is not always easy to tell when something is JSON
Schema or just JSON
the $schema keyword is used to declare that something is JSON Schema, e.g.
{ “$schema”: “http://json-schema.org/schema#” }
it is generally good practice to include it, though it is not required
22
20
23
The type keyword
the most common thing to do in a JSON Schema is to
restrict to a specific type, e.g.
only strings are accepted, e.g.
21
Metadata
JSON Schema includes keywords: title, description and default
not used for validation
used to describe parts of a schema
title will provide a short description
description will provide a more lengthy explanation about the purpose of the data described by the schema
neither are required, but they are encouraged as good practice
default specifies a default value for an item
24
4
3/12/2016
Enumerated values
the enum keyword is used to restrict a value to a fixed set of values
it must be an array with at least one element, where each element is unique, e.g.
25
Combining schemas
keywords used to combine schemas are:
allOf anyOf oneOf
must be valid against all sub–schemas must be valid against any subschema
must be valid against exactly one of the sub–schemas
these keywords must be set to an array, where each item is a schema
in addition, there is:
not must not be valid against the given schema
28
Enumerated values
enum can be used without a type, to accept values of different types, e.g.
JSON vs. XML
26
Combining schemas
JSON schemas can be combined
this does not necessarily mean combining schemas from multiple files
it may be as simple as allowing data to be validated against multiple criteria
anyOf is used to say that the given data may be valid against any of the given sub–schemas
as long as a value validates
against any of the sub–schemas,
it is considered valid against the
entire combined schema 27
JSON vs. XML
JSON
XML
{“students”:[
{“name”:”John”, “age”:”23″, “city”:”Cardiff”}, {“name”:”Steve”, “age”:”28″, “city”:”Swansea”}, {“name”:”Peter”, “age”:”32″, “city”:”Bristol”},
]}
notice how the use an array removes the need for the nested “element”
30
5
3/12/2016
JSON vs. XML
similarities
both are self–describing
(human–readable)
both are hierarchical (values within values)
both can be parsed and used by many programming languages
both can be fetched with an XMLHttpRequest
differences
JSON does not use end tag
JSON is shorter
JSON is quicker to read and write
JSON can use arrays biggest difference
XML has to be parsed with an XML parser
JSON can be parsed by a standard JavaScript
function
31
34
JSON vs. XML
for AJAX applications, JSON is faster and easier than XML using XML
1. fetch an XML document
2. use the XML DOM to traverse through the document
3. extract values and store in variables using JSON
1. fetch a JSON string
2. JSON.Parse the JSON string
32
JSON vs. XML
XML
JSON
there are several specifications to define schema for XML, e.g. DTD and XML Schema
JSON Schema does the same for JSON, but it is not as widely used
for selecting specific parts of an XML document, there is standard specification called XPath
JSONPath does the same for JSON, but is not as widely used
XML has XQuery specification for querying XML data
JSON has JAQL, JSONiq etc, but they are not as widely used
XML has XSLT specification, which may be used to apply style to an XML document
JSON does not have any such thing
33
6
3/12/2016
NoSQL (Not Only SQL)
CMT207 Information modelling & database systems
1
Big data
1. 2.
3. 4.
modern data collection technologies (social media, smartphones, sensors, etc.) act as force multipliers for data growth
big data is a broad term for datasets so large or complex that traditional data processing applications (e.g. RDBMS) are inadequate
big data project is defined by 3V + C:
velocity
variety
volume complexity
data is streamed at an unprecedented speed and must be dealt with in near–real time
data in various formats: structured, semi–structured and unstructured
data that involves many terabytes or petabytes
data coming from multiple sources need to be connected and correlated
4
Lecture
in this module we learnt about different types of databases
relational
object–oriented object–relational
in this lecture we will learn about the latest type(s) of databases
NoSQL
2
Velocity
How fast is the data produced/processed?
gathering data quickly is of no benefit is we analyse it
once a week
real–time analytics is about using very current data to provide information that will help improve a service or respond to demand swiftly
5
Big data
Variety
data comes in all types of formats
from structured, numeric data in traditional
databases …
… to unstructured text documents, email, video, audio, stock ticker data and financial transactions
6
1
Volume
How much data?
7 10
Complexity
today’s data comes from multiple sources in a variety of formats
this makes it difficult to link, match, cleanse and transform data across systems
however, it is necessary to connect and correlate relationships, hierarchies and multiple data linkages
otherwise, data can quickly spiral out of control
8 11
MB GB TB PB volume
9 12
3/12/2016
2
the use of languages and interfaces that are “not only”
anywhere in the world
… while scaling and delivering performance across massive data sets and millions of users
today, almost every organisation has the need to deliver applications that utilise Web, mobile and IoT technologies
3/12/2016
What is NoSQL?
Not Only SQL databases AKA cloud databases, non–relational databases, Big Data databases …
a non–relational and largely distributed database system that enables rapid, ad–hoc organisation and analysis of extremely high–volume, disparate data types
developed in response to the sheer volume of data being generated, stored and analysed by modern users and their applications
NoSQL databases have become the first alternative to relational databases, with scalability, availability and fault tolerance being key deciding factors
13
Why NoSQL?
moving away from using databases as integration points
encapsulating databases with applications and integrating using services instead
the rise of the web as a platform created a vital factor change in data storage
need to support large volumes of data by running on clusters
relational databases were not designed to run efficiently on clusters
16
NoSQL databases
schema–less data model horizontal scalability
distributed architectures
Why NoSQL?
NoSQL technology was originally created and used by Internet leaders such as Facebook, Google, Amazon and others
they required a DBMS that could write and read data
14
17
Why NoSQL?
impedance mismatch between the relational data structures and the in–memory data structures of the application
with NoSQL databases developers to not have to convert in–memory structures to relational structures
15
Aggregate data model
relational data modelling is vastly different than the types of data structures that application developers use
movement away from relational modelling and towards aggregate models
an aggregate is a collection of data that we interact with as a unit
the unit of data can reside on any machine and when retrieved gets all the related data along with it
aggregates make it easier for the database to manage data storage over clusters
… but aggregate–oriented databases make inter–aggregate relationships more difficult to handle
18
3
Distribution models
aggregate–oriented databases make distribution of data easier, since all related data is contained in the aggregate
the distribution mechanism has to move the aggregate and not have to worry about related data
two styles of distributing data:
1. sharding distributes different data across multiple servers, so each server acts as the single source for a subset of data
2. replication copies data across multiple servers, so each bit of data can be found in multiple places
Distribution model: sharding
Master–slave replication
Distribution model: replication
data are copied across multiple servers
two forms of replication:
1. master–slave replication makes one node the authoritative copy that handles writes while slaves synchronise with the master and may handle reads
2. peer–to–peer replication allows writes to any node; the nodes coordinate to synchronize their copies
master–slave replication reduces the chance of update conflicts
Types of NoSQL databases
peer–to–peer replication avoids loading all writes onto a single server creating a single point of failure
19
22
20
23
21
24
Peer–to–peer replication
1. 2.
3. 4.
key–value store – all data consists of an indexed key and a value
document database – expands on the basic idea of key–value stores
documentscontainmorecomplexdataandeach document is assigned a unique key
column family store – store data tables as sections of columns of data, rather than rows of data
graph database – designed for data whose relations are well represented as a graph
3/12/2016
4
3/12/2016
Key–value store
key value
simplest NoSQL database to use from an API perspective
store data in a schema–less way
the value is a blob that is just stored, without caring what’s inside; it is the responsibility of the application to understand what was stored
the client can either get the value for the key, put a value for a key or delete a key from the store
key–value stores always use primary–key access, so they generally have great performance and can be scaled easily
e.g. Cassandra, Memcached, Berkeley DB, Amazon DynamoDB, Couchbase
25
Document database
28
Column–family store
AKA column store or wide–column store
column family is a group of related data that is often
accessed together
e.g. for a customer, we would often access their profile information at the same time, but not their orders
column–family databases store data in column families as rows that have many columns associated with a row key
very high performance and a highly scalable architecture
e.g. Cassandra, HBase, HyperTable, Amazon DynamoDB
29
key
key
key
key
Key–value store
value value value value
26
Document database
document databases expand on the basic idea of key– value stores by storing store documents in the value part
they store, retrieve & manage documents
semi–structured data
e.g. XML, JSON, BSON, etc.
self–describing, hierarchical tree data structures which can consist of maps, collections and scalar values
e.g. MongoDB, CouchDB , Terrastore, OrientDB, RavenDB 27
Column–family store vs. relational database
similarity: each column family corresponds to a container of rows in a table where the key identifies the row and the row consists of multiple columns
difference: various rows do not have to have the same columns, and columns can be added to any row at any time without having to add it
to other rows
30
5
≠
Graph database
Scaling
Development model
Structure and data types are fixed in advance.
Vertically, meaning a single server must be made increasingly powerful in order to deal with increased demand.
Mix of open-source (e.g. PostgreSQL, MySQL) and closed source (e.g. Oracle)
Can be configured for strong consistency
Typically dynamic. Applications can add new fields on the fly, and unlike SQL table rows, dissimilar data can be stored together as necessary.
Horizontally, meaning that to add capacity, a database administrator can simply add more commodity servers or cloud instances. The database automatically spreads data across servers as necessary.
Open-source
Depends on product. Some provide strong consistency (e.g. MongoDB) whereas others
3/12/2016
Graph database
graph databases store entities and relationships between these entities
entities are also known as nodes, which have properties
relationships are known as edges, which can also have properties
edges have directional significance
the organisation of the graph lets the data to be stored once and then interpreted in different ways based on relationships
e.g. Neo4J, Infinite Graph, OrientDB
31
NoSQL vs. SQL summary
SQL databases
NoSQL databases
Types
One type with minor variations
Many different types
Development history
Developed in 1970s to deal with first wave of data storage applications
Developed in late 2000s to deal with limitations of SQL databases, especially scalability, multi-structured data, geo- distribution and agile development
Examples
MySQL, PostgreSQL, Microsoft SQL Server, Oracle
MongoDB, Cassandra, HBase, Neo4j
Data storage models
Related data are stored in separate tables, and then joined together when more complex queries are executed.
Varies based on database type. Key-value stores function similarly to SQL databases, but have only two columns (key & value). Document databases store all relevant data together in single document e.g. in JSON or XML, which can nest values hierarchically.
34
Cont.
Schemas
SQL databases
NoSQL databases
32
Supports transactions
Data manipulation
Yes, updates can be configured to complete entirely or not at all
Specific language (SQL) using SELECT, INSERT and UPDATE statements
In certain circumstances and at certain levels (e.g. document level vs. database level)
Through object-oriented APIs
Consistency
offer eventual consistency (e.g. Cassandra).
35
Graph database
most of the value from the graph databases comes from the relationships and their properties
relationships are first–class citizens in graph databases
there is no limit to the number and types of
relationships a node can have
relationships have a type, a start node, an end node,
but can also have properties of their own
these properties can be used to query the graph
e.g. when did they become friends, what is the distance between the nodes, what aspects are shared between the nodes, etc.
33
NoSQL vs. SQL databases
designed to support different application requirements they typically co-exist in most enterprises
it is not a question of either … or!
36
6
3/12/2016
NoSQL vs. SQL databases
key decision points on when to use which:
Use SQL when you need/have…
Centralised applications (e.g. business management)
Moderate to high availability
Moderate velocity data
Data coming in from one/few locations
Primarily structured data
Complex/nested transactions
Primary concern is scaling reads
Philosophy of scaling up for more users/data
Use NoSQL when you need/have…
Decentralised applications (e.g. Web, mobile and IoT)
Continuous availability; no downtime
High velocity data (devices, sensors, etc.)
Data coming in from many locations
Structured, with semi/unstructured
Simple transactions
Concern is to scale both writes and reads
Philosophy of scaling out for more users/data
To maintain moderate data volumes with purge
To maintain high data volumes; retain forever 37
A history of databases in No–tation
1970: NoSQL = We have no SQL 1980: NoSQL = Know SQL
2000: NoSQL = No SQL!
2005: NoSQL = Not only SQL 2013: NoSQL = No, SQL!
38
7