程序代写代做 html javascript data structure flex file system hbase Java graph cache game database 􏰁 learning outcomes

􏰁 learning outcomes
􏰁 describe the XML data model & outline its basic
features
􏰁 understand the advantages of the XML approach to data management
􏰁 W3C: the main international standards organisation for the World Wide Web
􏰁 XML = “SGML for the Web”
􏰁 SGML (Standard Generalized Markup Language):
an ISO-standard technology for defining generalised markup languages for text documents
3/12/2016
XML
(eXtensible Markup Language)
CMT207 Information modelling & database systems
1
XML: design goals
􏰁 separate syntax from semantics to provide a common framework for structuring information
􏰁 represent semi–structured data (data that are structured, but do not fit relational model)
􏰁 offer more flexibility than databases, but still do some of the database functionality
􏰁 allow tailor–made markup for any imaginable application domain
􏰁 support internationalisation (Unicode) and platform independence
4
Lecture
􏰁 content
􏰁 markup languages
􏰁 XML & its basic concepts 􏰁 structuring data with XML
What is XML?
􏰁 XML = eXtensible Markup Language
􏰁 first published in 1997
􏰁 a World Wide Web Consortium (W3C) standard
2
5
Text on Web 2.0
3
Markup language
􏰁 markup language: a system for annotating text in a way that is syntactically distinguishable from the text itself
􏰁 three types of electronic markup:
1. presentational: achieve a visual effect, e.g. in HTML red and bold red and bold
2. procedural: how to process the text, e.g. in LaTeX \sum_{i=1}^{\infty}\frac{1}{i}
3. descriptive: provide additional information, e.g. in XML

a
new example
6
1

XML tags
􏰁 XML is a markup language that is used to store data in a self-descriptive manner
􏰁 making the data “self-descriptive” is achieved by tagging (annotating or marking up) information
􏰁 unlike delimited files or database tables, XML documents are structured by tags
􏰁 tags look like this: some data here
open close
􏰁 tags indicate the beginning & ending of the tagged data – text–based & position–independent
XML element
􏰁 an XML element, e.g. some data here, consists of:
1. the delimiters: “<" and ">” (special characters in XML) 2. the generic identifier/name: the “TAG” enclosed in
the two delimiters
3. the opening & closing tags: “” & ““ (note the backslash in the closing tag)
4. the content: “some data here”
􏰁 e.g. March 4080
XML tags
􏰁 the basic structure of XML files:
􏰁 there are tags: … tagged data
􏰁 tags surround data, or other tags:
nested tag
􏰁 NOTE: tags can only be nested within other tags, i.e. they cannot be overlap partially!
… 􏰁 therefore, XML documents have
hierarchical or tree-like structure
XML tags – example

John C Doe

87 Victoria Road Wisbech Cambridgeshire PE13 2QL



1. < < 2. > >
3. & &
4. ' ‘
5. " ”
less than greater than ampersand apostrophe
quotation mark
the same!
7 10
XML attribute
􏰁 XML attribute: specifies additional information about an XML element
􏰁 an attribute for an element appears within the opening tag:
􏰁 attributes are means of specialising generic elements
􏰁 e.g. 28
􏰁 attribute vs. element:
28C

8 11
XML special characters
􏰁 some characters have a special meaning in XML
􏰁 e.g. “<" is always interpreted as the start of a new element 􏰁 this will generate an XML error: if salary < 1000 then
􏰁 replace a special character with an entity reference:
if salary < 1000 then 􏰁 5 predefined entity references in XML:
9 12
3/12/2016
2

2. none of the special syntax characters (<, >, “, ‘, &) appear except when performing their markup roles
3. XML elements are correctly nested, with none missing & none overlapping
4. the XML tags are case–sensitive
5. there is a single “root” element, which contains all other elements
􏰁 many web pages on the Internet contain “bad” HTML:
􏰁 XML is a markup language where documents must be marked up correctly & well–formed 17
3/12/2016
XML structure – summary
􏰁 an XML document is an ordered, labelled tree 􏰁 each node, i.e. XML element:
􏰁 must have a name
􏰁 may have attributes, each consisting of
aname&avalue
􏰁 may have content, which may include child nodes
􏰁 the XML code must be syntactically correct or the XML parser will report an error
13
XML vs. HTML
􏰁 XML: defines logical structure only
􏰁 HTML: the same intention, but has evolved into a presentation language; a markup language for a specific purpose – display in browsers
􏰁 unlike HTML, XML by itself conveys only content & structure, not presentation, behaviour or meaning
􏰁 these can still be associated with XML, but this requires additional mechanisms such as stylesheets, scripts, namespaces, etc.
􏰁 XHTML (eXtensible HyperText Markup Language): a family of XML markup languages that mirror or extend versions of the widely used HTML
16
Well–formed XML documents
􏰁 an XML document is a text which is well–formed if it conforms to the XML syntax rules:
1. it contains only properly encoded legal Unicode characters
HTML vs. XHTML
􏰁 XHTML is a stricter & cleaner version of HTML
􏰁 XHTML is HTML re–designed as an XML language 􏰁 Why XHTML?
14
What is XML?
􏰁 not a language but a meta-language, i.e. a framework for defining markup languages (or dialects)
􏰁 no fixed collection of markup tags → XML is flexible
􏰁 each XML language tuned for a specific application, e.g. MathML is an application of XML for describing
mathematical notations
􏰁 all XML languages share common features → enables building of generic tools for processing XML data
􏰁 all XML languages can be processed by a single lightweight parser
􏰁 XML is intended for machine processing, but it is still a human readable format mostly because the data are structured in tags that use common language, e.g. 15
XML Schema
3

􏰁 they make XML descriptions readable to automated processors such as parsers, editors & other XML–based tools
􏰁 a well–formed XML document is valid if it conforms to the associated schema specified in DTD or XML Schema
20
􏰁 interoperability of content, style & behaviour 􏰁 human & machine readable
􏰁 self–descriptive data
􏰁 no dependence on large software vendors
􏰁 no binding to specific tools
3/12/2016
XML schema
􏰁 How are different XML languages or dialects specified? 􏰁 XML schema = syntax definition (i.e. grammar) of an XML
language – describes the structure of an XML document 􏰁 formal languages for expressing XML schemas:
􏰁 Document Type Definition (DTD)
􏰁 XML Schema
􏰁 they use very different syntax to achieve the same task of creating documentation:
􏰁 what elements an XML document can contain
􏰁 how they should be used
􏰁 what interactions may take place between parts
of a document 19
XML data exchange
􏰁 XML standardises the concrete syntax of data exchange in a text–based notation designed to be obvious to both people & machines
􏰁 XML uses documents as the transfer mechanism for data
􏰁 XML publishing model decouples data from processing, which isolates changes in large systems, making them more flexible & reliable
􏰁 XML is suitable for transactional processing in a heterogeneous, asynchronous, distributed environment such as the Web
22
XML schema
􏰁 NOTE: neither DTD nor XML Schema are strictly required for XML development!
􏰁 both DTDs & XML Schemas are important parts of the XML toolbox
XML advantages
􏰁 data representation is text–based & position–independent 􏰁 open & extensible
􏰁 platform & language independent → portable
23
DTD vs. XML Schema
DTD
“grammar”
XML Schema
“grammar”
21
XML
“sentence”
XML trade offs
􏰁 XML is not a slim format: using tags makes data bigger & more complex than a flat file
􏰁 performance: relational databases are still much faster
􏰁 no centralised control of data: potential problems with data integrity
􏰁 uniformity: too many different formats
24
4

3/12/2016
Summary
􏰁 XML is a W3C standard meta-language for defining markup languages
􏰁 markup language: a system for annotating text
􏰁 XML is used to store data in a self-descriptive manner
using XML tags
􏰁 XML documents are structured using XML elements, which must have a name, may have attributes, and may have content, which may include other XML elements
􏰁 an XML document is an ordered, labelled tree of XML elements
25
Summary
􏰁 XML uses documents as a data exchange mechanism
􏰁 an XML document is well-formed if it is syntactically
correct according to the W3C specification
􏰁 different markup languages are specified using XML
schemas
􏰁 an XML schema can be expressed in a formal language such as DTD or XML Schema
􏰁 a well-formed XML document is valid if it conforms to the associated XML schema
26
27
5

structure data using tags
􏰁 in this lecture we will learn how to query such data 􏰁 we will cover two languages:
􏰁 XPath a language for navigating through an XML document
􏰁 XQuery a language for querying XML data
􏰁 text 􏰁 documentnode
􏰁 namespace
􏰁 XML documents are treated as trees of nodes
􏰁 the topmost element of the tree is called the root element
3/12/2016
Querying XML
CMT207 Information modelling & database systems
1
XPath
􏰁 XPath is used to navigate through elements and attributes in an XML document
􏰁 XPath uses path expressions to select nodes in an XML document
􏰁 they look very much like the expressions used when working with a traditional computer file system
􏰁 XPath also includes over 100 built–in functions
􏰁 string values, numeric values, date and time comparison, node and QName manipulation, sequence manipulation, Boolean values, etc.
4
Lecture
􏰁 in this module we learnt about
􏰁 structuring data using a relational data model
􏰁 querying the data stored in relational databases
􏰁 in the previous lecture we learnt about using XML to
Nodes
􏰁 in XPath, there are seven kinds of nodes:
􏰁 element 􏰁 processinginstruction 􏰁 attribute 􏰁 comment
2
5
XPath
root
element

attribute
Nodes


Harry Potter J K. Rowling 2005 29.99


6
1

Relationships between nodes
􏰁 parent 􏰁 child
􏰁 sibling

Harry Potter J K. Rowling 2005
Path expression
bookstore
/bookstore
bookstore/book
Comment
select all nodes with the name bookstore
select the root element bookstore
3/12/2016
Atomic values

Harry Potter J K. Rowling 2005 29.99

􏰁 atomic values are nodes with no children or parent 􏰁 e.g.
􏰁 J K. Rowling 􏰁 “en”
7
Path expressions
Expression
Description
nodename
select all nodes with the name nodename
/
select from the root node
//
select all nodes descending from the current node that match the selection criteria
.
select the current node
..
selects the parent of the current node
@
select attribute
10
Examples
􏰁 ancestor 29.99

􏰁 descendant

selects all book elements that are children of bookstore
8
bookstore//book
//book
selects all book elements that are descendant of the bookstore element
select all book elements no matter where they are
//@lang select all attributes that are named lang
11
XPath syntax
􏰁 a node is selected by following a path
􏰁 we will use the following example to illustrate the use
of paths:


Harry Potter 29.99


Learning XML 39.95


9
Predicates
􏰁 predicates are used to find :
􏰁 a specific node, or
􏰁 a node that contains a specific value
􏰁 predicates are embedded in square brackets
􏰁 e.g. selects the first book element that is the child
of the bookstore element /bookstore/book[1]
12
2

3/12/2016
Examples
Path expression
/bookstore/book[last()]
/bookstore/book
[position()<3] //title[@lang] //title[@lang='en'] /bookstore/book [price>35.00]/title
Comment
select the last book element that is the child of the bookstore element
select the first two book elements that are children of the bookstore element
select all title elements that have an attribute named lang
select all title elements that have a “lang” attribute with a value of “en”
select all title elements of the book
elements of the bookstore element that
have a price element with a value >35.00
13
Axis name
Description
self
the current node
attribute
all attributes of the current node
namespace
all namespace nodes of the current node
parent
the parent of the current node
child
all children of the current node
ancestor
all ancestors of the current node
ancestor–or–s elf
as above + the current node itself
descendant
all descendants of the current node
descendant–or–self
as above + the current node itself
following
everything in the document after the closing tag of the current node
following–sibling
all siblings after the current node
XPath axis
􏰁 an axis defines a node–set relative to the current node
Unknown nodes
􏰁 XPath wildcards can be used to select unknown XML nodes
Wildcard
*
@*
node()
Description
match any element node
match any attribute node
match any node
preceding
preceding–sibling
all nodes that appear before the current node in the document, except ancestors, attribute nodes and namespace nodes
all siblings before the current node16
Location path
􏰁 a location path can be absolute or relative
􏰁 an absolute location path starts with a slash ( / )
􏰁 a relative location path does not start with a slash
􏰁 a location path consists of one or more steps, each separated by a slash, e.g.
􏰁 /step/step/… absolute location path
􏰁 step/step/… relative location path
􏰁 each step is evaluated against the nodes in the current node–set
17
􏰁 e.g.
//title[@*] selects all title elements that have at least one attribute 14
Path expression
Comment
/bookstore/*
selects all elements that are children the bookstore element
//* selects all elements in the document
Multiple paths
􏰁 operator | can be used within an XPath expression to select multiple paths, e.g.
/bookstore/book/title | select all title elements of the book //price element of the bookstore element AND all
Path expression
Comment
//book/title | //book/price
select all title AND price elements of all book elements
//title | //price
selects all title AND price elements in the document
the price elements in the document
15
Location path
􏰁 a step in a location path consists of: 􏰁 an axis
􏰁 a node–test … identifies a node–set within an axis 􏰁 ≥0 predicates … to further refine the selected
node–set 􏰁 the syntax for a location step is:
axisname::nodetest[predicate]
18
3

􏰁 Boolean
􏰁 number
􏰁 these returned values may be combined using the XPath operators
􏰁 built on XPath expressions
Select all books with a price greater than £30 from the
book collection stored in books.xml
for $x in doc(“books.xml”)/bookstore/book where $x/price>30
order by $x/title
return $x/title
3/12/2016
Examples
Location path
Comment
child::book
all book nodes that are children of the current node
attribute::lang
the lang attribute of the current node
attribute::*
all attributes of the current node
child::node()
all children of the current node
child::*
all elements that are children of the current node
child::text()
all text node children of the current node
descendant::book
all book nodes that are descendants of the current node
child::*/child::price
all price grandchildren of the current node
19
XQuery
XPath operators
􏰁 an XPath expression can return: 􏰁 node–set
􏰁 string
XQuery
􏰁 a language for finding and extracting elements and attributes from XML documents
􏰁 XQuery is to XML what SQL is to database tables 􏰁 designed to query XML data
20
23
Operator
Description
Example
|
union of two node–sets
//book | //cd
+
addition
6+4

subtraction
6–4
*
multiplication
6*4
div
division
8 div 4
mod
division remainder
5 mod 2
=
equal
price=9.80
!=
not equal
price!=9.80
< less than price<9.80 <= less than or equal to price<=9.80 >
greater than
price>9.80
>=
greater than or equal to
price>=9.80
or
logical or
price=9.80 or price=9.70
and
logical and
price>9.00 and price<9.90 21 XQuery syntax 􏰁 case–sensitive 􏰁 elements, attributes and variables must be valid XML names 􏰁 string value can be in single (') or double quotes (") 􏰁 variable is defined with a $ followed by a name, e.g. $bookstore 􏰁 comments are delimited by (: and :), e.g. (: XQuery comment 🙂 24 4 3/12/2016 Working example – books.xml 25 FLWOR expressions 􏰁 with FLWOR we can sort the result, e.g. for where order by return 􏰁 result: $x in doc("books.xml")/bookstore/book $x/price>30
$x/title
$x/title
Learning XML XQuery Kick Start
28
Selecting nodes
􏰁 XQuery uses:
􏰁 functions … to extract data from XML
documents
􏰁 path expressions … to navigate through elements in an XML document
􏰁 predicates … to limit the extracted data from XML documents
􏰁 e.g. doc(“books.xml”)/bookstore/book[price<30] function path predicate 26 FLWOR expressions 􏰁 FLOWR expression is to XQuery what SELECT statement is to SQL 􏰁 FLWOR stand for For, Let, Where, Order by, Return 􏰁 only return is mandatory Clause where order by return for let Description binds a variable to each item returned by the in expression assigns variables specifies search criteria specifies the sort order of the result specifies what to return in the result 29 FLWOR expressions 􏰁 e.g. path expression: doc("books.xml")/bookstore/book[price>30]/title
􏰁 result:
XQuery Kick Start
Learning XML
􏰁 the following FLWOR expression does exactly the same:
for $x in doc(“books.xml”)/bookstore/book where $x/price>30
return $x/title
27
The for clause
􏰁 the for clause binds a variable to each item returned
by the in expression
􏰁 multiple for clauses can be used in the same FLWOR
expression
􏰁 the for clause results in iteration
􏰁 the at keyword can be used to count the iteration, e.g.
for $x at $i in doc(“books.xml”)/bookstore/book/title return {$i}. {data($x)}
1. Everyday Italian 2. Harry Potter 3. XQuery Kick Start 4. Learning XML
30
5

3/12/2016
The let clause
􏰁 the let clause allows variable assignments
􏰁 … to avoid repeating the same expression many times 􏰁 the let clause does not result in iteration
􏰁 example:
let$x:=(1to5)
return {$x}
1 2 3 4 5
31
The return clause
􏰁 the return clause specifies what is to be returned
􏰁 example:
for $x in doc(“books.xml”)/bookstore/book return $x/title
Everyday Italian Harry Potter XQuery Kick Start Learning XML
34
The where clause
􏰁 the where clause is used to specify one or more criteria
for the result
􏰁 example:
for $x in doc(“books.xml”)/bookstore/book where $x/price>30 and $x/price<100 return $x/title 32 Conditional expressions 􏰁 if–then–else expressions are allowed in XQuery 􏰁 parentheses around the if expression are required 􏰁 else is required, but it can be just else () 􏰁e.g. for $xindoc("books.xml")/bookstore/book return if ($x/@category="CHILDREN") then {data($x/title)} else {data($x/title)}
􏰁 result: Everyday Italian Harry Potter
XQuery Kick Start Learning XML
note that we can add elements and attributes (XML or HTML) to the result
The order clause
􏰁 the order clause is used to specify the sort order of the
result
􏰁 e.g. order the result by category and title:
for $x in doc(“books.xml”)/bookstore/book order by $x/@category, $x/title
return $x/title
Harry Potter Everyday Italian Learning XML XQuery Kick Start
33
Comparisons
􏰁 there are two ways of comparing values:
1. general comparison =, !=, <, <=, >, >=
2. value comparison eq, ne, lt, le, gt, ge
􏰁 examples:
􏰁 $bookstore//book/@q > 10
􏰁 returns true if any q attributes have a value >10 􏰁 $bookstore//book/@q gt 10
􏰁 returns true if there is only one q attribute returned by the expression, and its value is >10
􏰁 if more than one q is returned, an error occurs
36
6

Operator
|
+

*
div
mod
=
Description
union of two node–sets
addition
subtraction
multiplication
division
division remainder
Example
//book | //cd
6+4
6–4
6*4
8 div 4
5 mod 2
User–defined functions
􏰁 example:
declare function local:minPrice($p as xs:decimal?, $d as xs:decimal?)
as xs:decimal?
{
3/12/2016
Functions
􏰁 XQuery and XPath share the same data model and support the same functions and operators
37
User–defined functions
􏰁 users can also define their own functions in XQuery:
declare function prefix:function_name($parameter as datatype) as returnDatatype
{
… function code here… };
􏰁usethedeclare functionkeyword
􏰁 the name of the function must be prefixed
􏰁 the data types are defined in XML Schema
􏰁 the function body must be surrounded by curly braces
40
!=
< <= >
>=
or
and
equal
not equal
less than
less than or equal to
greater than
greater than or equal to
logical or
logical and
price=9.80
price!=9.80
price<9.80 price<=9.80 price>9.80
price>=9.80
price=9.80 or price=9.70
price>9.00 and price<9.90 let $disc := ($p * $d) div 100 return ($p – $disc) }; 􏰁 function call:
{local:minPrice($book/price, $book/discount)}

38
41
Function type
Example
Comment
accessor
fn:base-uri(node)
returns the value of the base–uri property of the specified node
error and trace
fn:trace(value, label)
used to debug queries
numeric
fn:round(num)
rounds the number argument to the nearest integer
string
fn:concat(string, string, …)
returns the concatenation of the strings
anyUri
fn:resolve-uri(relative, base)
takes a base URI and a relative URI as arguments, and constructs an absolute URI
Boolean
fn:not(arg)
logical not
duration/date/ time
fn:dateTime(date,time)
converts the arguments to a date and a time
QName
fn:QName(uri, name)
takes a namespace URI and a qualified name as arguments, and constructs a QName value
node
fn:root(node)
returns the root of the tree to which the specified node belongs.
sequence
fn:reverse((item, item, …))
returns the reversed order of the items specified
context
fn:position()
returns the index position of the node that is currently being processed 39
42
7

3/12/2016
JSON
(JavaScript Object Notation)
CMT207 Information modelling & database systems
1
History
􏰁 JavaScript is a high–level, dynamic, untyped and interpreted programming language
􏰁 alongside HTML and CSS, JavaScript is one of the three essential technologies of the Web
􏰁 programmers need an easy way to transfer data on the Web
􏰁 JSON format is syntactically identical to the code for creating JavaScript objects
􏰁 instead of using a parser (like XML does), JavaScript can use standard functions to convert JSON data into native objects,e.g.var json = JSON.parse(text);
object string
4
Lecture
􏰁 in the previous two lectures we learnt about XML, a markup language used to structure data
􏰁 we pointed that XML format is “fat” and briefly mention JSON as a “slim” format that does the same
JSON string
{
}
key value
“name”: “David Jones”, “age”: 23,
“address”: {
“streetAddress”: “5 The Parade”, “city”: “Cardiff”
}, “phoneNumber”: [
object starts value:string value:number objectstarts
object ends arraystarts object starts
object ends object starts
object ends array ends object ends
{
}, {
}
“type”: “home”,
“number”: “029 1234 5678”
“type”: “mobile”, “number”: “077 8765 4321”
]
5
job
􏰁 in this lecture we will learn more about JSON
2
JSON
􏰁 JSON = JavaScript Object Notation
􏰁 pronounced like the name Jason
􏰁 JSON is a syntax for storing and exchanging data
􏰁 text–based
􏰁 light–weight
􏰁 human readable
􏰁 language independent
􏰁 JSON is an open standard specified on RFC4627 􏰁 https://www.ietf.org/rfc/rfc4627.txt
3
JSON object in JavaScript

􏰁 try it online here
// David Jones // 5 The Parade // Cardiff
// 029 1234 5678 // mobile
6
1

JSON on the Web
􏰁 serialization is the process of converting an object into a format suitable to be stored in a file or memory buffer and/or transmitted
􏰁 JSON is often used to serialize and transfer data over a
3/12/2016
JSON and JavaScript
􏰁 JSON is considered as a subset of JavaScript
􏰁 … but that does not mean that JSON cannot be
used with other languages
􏰁 JSON uses JavaScript syntax, but the JSON format is
text only… just like XML
􏰁 JSON is language independent
􏰁 it works well with most of the modern programming languages
􏰁 e.g. PHP, Perl, Python, Ruby, Java and many more
7
JSON data
􏰁 JSON data is written as name/value pairs 􏰁 a name/value pair consists of:
1. field name (in double quotes) 2. colon
3. value
􏰁 e.g. “firstName”:”John” name colon value
10
JSON data
􏰁 JSON values can be of the following data types:
Type
network connection
􏰁 e.g. between web server and a web application 􏰁 note: XML serves the same purpose!
􏰁 Web services and APIs use JSON format to provide public data
􏰁 e.g. Flickr and Twitter
8
number
string
Description
double–precision floating– point format in JavaScript
double–quoted Unicode with backslash escaping
{“marks”: 97}
{“name”: “John”}
Example
Boolean
object
array
true or false
an unordered collection of key:value pairs
an ordered sequence of values
{name: “John”, marks: 97, distinction: true}
{name: “John”, marks: 97, distinction: true}
{ “books”: [ {“title”:”Game” }, {title”:”Set”}, {“title”:”Match”} ] }
null empty
11
JSON syntax
􏰁 JSON syntax is derived from JavaScript object notation syntax:
􏰁 data is in name/value pairs
􏰁 data is separated by commas 􏰁 curly braces hold objects
􏰁 square brackets hold arrays
“name”:”value” ,
{ object }
[ array ]
9
JSON data
􏰁 JSON objects are written inside curly braces
􏰁 just like JavaScript, JSON objects can contain multiple
name/values pairs, e.g.
{“firstName”:”John”, “lastName”:”Doe”} 􏰁 JSON arrays are written inside square brackets
􏰁 just like JavaScript, a JSON array can contain multiple objects, e.g.
“employees”:[
{“firstName”:”John”, “lastName”:”Doe”}, {“firstName”:”Anna”, “lastName”:”Smith”}, {“firstName”:”Peter”,”lastName”:”Jones”}
]
12
2

3/12/2016
JavaScript
􏰁 JSON syntax is derived from JavaScript object notation
􏰁 in JavaScript, an array of objects can be created like this:
var employees = [
{“firstName”:”John”, “lastName”:”Doe”}, {“firstName”:”Anna”, “lastName”:”Smith”}, {“firstName”:”Peter”,”lastName”:”Jones”}
];
􏰁 an element of the JavaScript object array can be accessed like this:
employees[0].firstName + ” ” + employees[0].lastName;
􏰁 or
employees[0][“firstName”] + ” ” + employees[0][“lastName”];
13
JSON within JavaScript
􏰁theJavaScriptfunctionJSON.parse() canbeusedto convert a JSON string into a JavaScript object:
var obj = JSON.parse(text);
􏰁 the new JavaScript object can now be used in the web page, e.g.


16
JavaScript
􏰁 an element of the JavaScript object array can be modified like this:
employees[0].firstName = “Gilbert”;
􏰁 or
JSON Schema
employees[0][“firstName”] = “Gilbert”;
􏰁 result:
var employees = [
{“firstName”:”Gilbert”, “lastName”:”Doe”}, {“firstName”:”Anna”, “lastName”:”Smith”}, {“firstName”:”Peter”,”lastName”:”Jones”}
];
14
JSON within JavaScript
􏰁 JSON syntax is derived from JavaScript object notation
􏰁 very little extra software is needed to work with JSON
within JavaScript
􏰁 JSON is commonly used to read data from a web server, and display the data in a web page
􏰁 for simplicity, we will demonstrate such use with a JSON string as input (instead of a file):
var text = ‘{ “employees” : [‘ +
‘{ “firstName”:”John” , “lastName”:”Doe” },’ +
‘{ “firstName”:”Anna” , “lastName”:”Smith” },’ + ‘{ “firstName”:”Peter”, “lastName”:”Jones” } ]}’;
15
Example
schema
{
“title”: “Example Schema”, “type”: “object”, “properties”: {
“firstName”: {“type”:”string”}, “lastName”: {“type”:”string”}, “age”: {
“description”:”Age in years”, “type”:”integer”, “minimum”:0
} },
“required”: [“firstName”, “lastName”] }
JSON {“firstName”:”Peter”,”lastName”:”Pan”,”age”:12}
3

Hello, World!
􏰁 in JSON Schema, an empty object is a valid schema that will accept any valid JSON, e.g.
􏰁 accepts any valid JSON, e.g.
Declaring a unique identifier
􏰁 it is also good practice to include an id property as a unique identifier for each schema, e.g.
{ “id”: “http://yourdomain.com/schemas/myschema.json” }
3/12/2016
JSON Schema
􏰁 JSON Schema is a specification for JSON–based format for defining the structure of JSON data
􏰁 JSON Schema itself is written in JSON
􏰁 schema is data itself, not a computer program
􏰁 it is just a declarative format for “describing the structure of other data”
􏰁 JSON data can be validated against a schema using a computer program
􏰁 for documentation see: http://json-schema.org/
19
Declaring a JSON Schema
􏰁 JSON Schema is itself JSON
􏰁 it is not always easy to tell when something is JSON
Schema or just JSON
􏰁 the $schema keyword is used to declare that something is JSON Schema, e.g.
{ “$schema”: “http://json-schema.org/schema#” }
􏰁 it is generally good practice to include it, though it is not required
22
20
23
The type keyword
􏰁 the most common thing to do in a JSON Schema is to
restrict to a specific type, e.g.
􏰁 only strings are accepted, e.g.
21
Metadata
􏰁 JSON Schema includes keywords: title, description and default
􏰁 not used for validation
􏰁 used to describe parts of a schema
􏰁 title will provide a short description
􏰁 description will provide a more lengthy explanation about the purpose of the data described by the schema
􏰁 neither are required, but they are encouraged as good practice
􏰁 default specifies a default value for an item
24
4

3/12/2016
Enumerated values
􏰁 the enum keyword is used to restrict a value to a fixed set of values
􏰁 it must be an array with at least one element, where each element is unique, e.g.
25
Combining schemas
􏰁 keywords used to combine schemas are:
􏰁 allOf 􏰁 anyOf 􏰁 oneOf
must be valid against all sub–schemas must be valid against any subschema
must be valid against exactly one of the sub–schemas
􏰁 these keywords must be set to an array, where each item is a schema
􏰁 in addition, there is:
􏰁 not must not be valid against the given schema
28
Enumerated values
􏰁 enum can be used without a type, to accept values of different types, e.g.
JSON vs. XML
26
Combining schemas
􏰁 JSON schemas can be combined
􏰁 this does not necessarily mean combining schemas from multiple files
􏰁 it may be as simple as allowing data to be validated against multiple criteria
􏰁 anyOf is used to say that the given data may be valid against any of the given sub–schemas
􏰁 as long as a value validates
against any of the sub–schemas,
it is considered valid against the
entire combined schema 27
JSON vs. XML
JSON
XML
{“students”:[
{“name”:”John”, “age”:”23″, “city”:”Cardiff”}, {“name”:”Steve”, “age”:”28″, “city”:”Swansea”}, {“name”:”Peter”, “age”:”32″, “city”:”Bristol”},
]}
notice how the use an array removes the need for the nested “element”
John 23 Cardiff

Steve 28 Swansea


Peter 32 Bristol

30

5

3/12/2016
JSON vs. XML
􏰁 similarities
􏰁 both are self–describing
(human–readable)
􏰁 both are hierarchical (values within values)
􏰁 both can be parsed and used by many programming languages
􏰁 both can be fetched with an XMLHttpRequest
􏰁 differences
􏰁 JSON does not use end tag
􏰁 JSON is shorter
􏰁 JSON is quicker to read and write
􏰁 JSON can use arrays 􏰁 biggest difference
􏰁 XML has to be parsed with an XML parser
􏰁 JSON can be parsed by a standard JavaScript
function
31
34
JSON vs. XML
􏰁 for AJAX applications, JSON is faster and easier than XML 􏰁 using XML
1. fetch an XML document
2. use the XML DOM to traverse through the document
3. extract values and store in variables 􏰁 using JSON
1. fetch a JSON string
2. JSON.Parse the JSON string
32
JSON vs. XML
XML
JSON
there are several specifications to define schema for XML, e.g. DTD and XML Schema
JSON Schema does the same for JSON, but it is not as widely used
for selecting specific parts of an XML document, there is standard specification called XPath
JSONPath does the same for JSON, but is not as widely used
XML has XQuery specification for querying XML data
JSON has JAQL, JSONiq etc, but they are not as widely used
XML has XSLT specification, which may be used to apply style to an XML document
JSON does not have any such thing
33
6

3/12/2016
NoSQL (Not Only SQL)
CMT207 Information modelling & database systems
1
Big data
􏰁
􏰁
􏰁
1. 2.
3. 4.
modern data collection technologies (social media, smartphones, sensors, etc.) act as force multipliers for data growth
big data is a broad term for datasets so large or complex that traditional data processing applications (e.g. RDBMS) are inadequate
big data project is defined by 3V + C:
velocity
variety
volume complexity
data is streamed at an unprecedented speed and must be dealt with in near–real time
data in various formats: structured, semi–structured and unstructured
data that involves many terabytes or petabytes
data coming from multiple sources need to be connected and correlated
4
Lecture
􏰁 in this module we learnt about different types of databases
􏰁 relational
􏰁 object–oriented 􏰁 object–relational
􏰁 in this lecture we will learn about the latest type(s) of databases
􏰁 NoSQL
2
Velocity
􏰁 How fast is the data produced/processed?
􏰁 gathering data quickly is of no benefit is we analyse it
once a week
􏰁 real–time analytics is about using very current data to provide information that will help improve a service or respond to demand swiftly
5
Big data
Variety
􏰁 data comes in all types of formats
􏰁 from structured, numeric data in traditional
databases …
􏰁 … to unstructured text documents, email, video, audio, stock ticker data and financial transactions
6
1

Volume
􏰁 How much data?
7 10
Complexity
􏰁 today’s data comes from multiple sources in a variety of formats
􏰁 this makes it difficult to link, match, cleanse and transform data across systems
􏰁 however, it is necessary to connect and correlate relationships, hierarchies and multiple data linkages
􏰁 otherwise, data can quickly spiral out of control
8 11
MB GB TB PB volume
9 12
3/12/2016
2

􏰁 the use of languages and interfaces that are “not only”
anywhere in the world
􏰁 … while scaling and delivering performance across massive data sets and millions of users
􏰁 today, almost every organisation has the need to deliver applications that utilise Web, mobile and IoT technologies
3/12/2016
What is NoSQL?
􏰁 Not Only SQL databases AKA cloud databases, non–relational databases, Big Data databases …
􏰁 a non–relational and largely distributed database system that enables rapid, ad–hoc organisation and analysis of extremely high–volume, disparate data types
􏰁 developed in response to the sheer volume of data being generated, stored and analysed by modern users and their applications
􏰁 NoSQL databases have become the first alternative to relational databases, with scalability, availability and fault tolerance being key deciding factors
13
Why NoSQL?
􏰁 moving away from using databases as integration points
􏰁 encapsulating databases with applications and integrating using services instead
􏰁 the rise of the web as a platform created a vital factor change in data storage
􏰁 need to support large volumes of data by running on clusters
􏰁 relational databases were not designed to run efficiently on clusters
16
NoSQL databases
􏰁 schema–less data model 􏰁 horizontal scalability
􏰁 distributed architectures
Why NoSQL?
􏰁 NoSQL technology was originally created and used by Internet leaders such as Facebook, Google, Amazon and others
􏰁 they required a DBMS that could write and read data
14
17
Why NoSQL?
􏰁 impedance mismatch between the relational data structures and the in–memory data structures of the application
􏰁 with NoSQL databases developers to not have to convert in–memory structures to relational structures
15
Aggregate data model
􏰁 relational data modelling is vastly different than the types of data structures that application developers use
􏰁 movement away from relational modelling and towards aggregate models
􏰁 an aggregate is a collection of data that we interact with as a unit
􏰁 the unit of data can reside on any machine and when retrieved gets all the related data along with it
􏰁 aggregates make it easier for the database to manage data storage over clusters
􏰁 … but aggregate–oriented databases make inter–aggregate relationships more difficult to handle
18
3

Distribution models
􏰁 aggregate–oriented databases make distribution of data easier, since all related data is contained in the aggregate
􏰁 the distribution mechanism has to move the aggregate and not have to worry about related data
􏰁 two styles of distributing data:
1. sharding distributes different data across multiple servers, so each server acts as the single source for a subset of data
2. replication copies data across multiple servers, so each bit of data can be found in multiple places
Distribution model: sharding
Master–slave replication
Distribution model: replication
􏰁 data are copied across multiple servers
􏰁 two forms of replication:
1. master–slave replication makes one node the authoritative copy that handles writes while slaves synchronise with the master and may handle reads
2. peer–to–peer replication allows writes to any node; the nodes coordinate to synchronize their copies
􏰁 master–slave replication reduces the chance of update conflicts
Types of NoSQL databases
􏰁 peer–to–peer replication avoids loading all writes onto a single server creating a single point of failure
19
22
20
23
21
24
Peer–to–peer replication
1. 2.
3. 4.
key–value store – all data consists of an indexed key and a value
document database – expands on the basic idea of key–value stores
􏰁 documentscontainmorecomplexdataandeach document is assigned a unique key
column family store – store data tables as sections of columns of data, rather than rows of data
graph database – designed for data whose relations are well represented as a graph
3/12/2016
4

3/12/2016
Key–value store
key value
􏰁 simplest NoSQL database to use from an API perspective
􏰁 store data in a schema–less way
􏰁 the value is a blob that is just stored, without caring what’s inside; it is the responsibility of the application to understand what was stored
􏰁 the client can either get the value for the key, put a value for a key or delete a key from the store
􏰁 key–value stores always use primary–key access, so they generally have great performance and can be scaled easily
􏰁 e.g. Cassandra, Memcached, Berkeley DB, Amazon DynamoDB, Couchbase
25
Document database
28
Column–family store
􏰁 AKA column store or wide–column store
􏰁 column family is a group of related data that is often
accessed together
􏰁 e.g. for a customer, we would often access their profile information at the same time, but not their orders
􏰁 column–family databases store data in column families as rows that have many columns associated with a row key
􏰁 very high performance and a highly scalable architecture
􏰁 e.g. Cassandra, HBase, HyperTable, Amazon DynamoDB
29
key
key
key
key
Key–value store
value value value value
26
Document database
􏰁 document databases expand on the basic idea of key– value stores by storing store documents in the value part
􏰁 they store, retrieve & manage documents
􏰁 semi–structured data
􏰁 e.g. XML, JSON, BSON, etc.
􏰁 self–describing, hierarchical tree data structures which can consist of maps, collections and scalar values
􏰁 e.g. MongoDB, CouchDB , Terrastore, OrientDB, RavenDB 27
Column–family store vs. relational database
􏰁 similarity: each column family corresponds to a container of rows in a table where the key identifies the row and the row consists of multiple columns
􏰁 difference: various rows do not have to have the same columns, and columns can be added to any row at any time without having to add it
to other rows
30
5

Graph database
Scaling
Development model
Structure and data types are fixed in advance.
Vertically, meaning a single server must be made increasingly powerful in order to deal with increased demand.
Mix of open-source (e.g. PostgreSQL, MySQL) and closed source (e.g. Oracle)
Can be configured for strong consistency
Typically dynamic. Applications can add new fields on the fly, and unlike SQL table rows, dissimilar data can be stored together as necessary.
Horizontally, meaning that to add capacity, a database administrator can simply add more commodity servers or cloud instances. The database automatically spreads data across servers as necessary.
Open-source
Depends on product. Some provide strong consistency (e.g. MongoDB) whereas others
3/12/2016
Graph database
􏰁 graph databases store entities and relationships between these entities
􏰁 entities are also known as nodes, which have properties
􏰁 relationships are known as edges, which can also have properties
􏰁 edges have directional significance
􏰁 the organisation of the graph lets the data to be stored once and then interpreted in different ways based on relationships
􏰁 e.g. Neo4J, Infinite Graph, OrientDB
31
NoSQL vs. SQL summary
SQL databases
NoSQL databases
Types
One type with minor variations
Many different types
Development history
Developed in 1970s to deal with first wave of data storage applications
Developed in late 2000s to deal with limitations of SQL databases, especially scalability, multi-structured data, geo- distribution and agile development
Examples
MySQL, PostgreSQL, Microsoft SQL Server, Oracle
MongoDB, Cassandra, HBase, Neo4j
Data storage models
Related data are stored in separate tables, and then joined together when more complex queries are executed.
Varies based on database type. Key-value stores function similarly to SQL databases, but have only two columns (key & value). Document databases store all relevant data together in single document e.g. in JSON or XML, which can nest values hierarchically.
34
Cont.
Schemas
SQL databases
NoSQL databases
32
Supports transactions
Data manipulation
Yes, updates can be configured to complete entirely or not at all
Specific language (SQL) using SELECT, INSERT and UPDATE statements
In certain circumstances and at certain levels (e.g. document level vs. database level)
Through object-oriented APIs
Consistency
offer eventual consistency (e.g. Cassandra).
35
Graph database
􏰁 most of the value from the graph databases comes from the relationships and their properties
􏰁 relationships are first–class citizens in graph databases
􏰁 there is no limit to the number and types of
relationships a node can have
􏰁 relationships have a type, a start node, an end node,
but can also have properties of their own
􏰁 these properties can be used to query the graph
􏰁 e.g. when did they become friends, what is the distance between the nodes, what aspects are shared between the nodes, etc.
33
NoSQL vs. SQL databases
􏰁 designed to support different application requirements 􏰁 they typically co-exist in most enterprises
􏰁 it is not a question of either … or!
36
6

3/12/2016
NoSQL vs. SQL databases
􏰁 key decision points on when to use which:
Use SQL when you need/have…
Centralised applications (e.g. business management)
Moderate to high availability
Moderate velocity data
Data coming in from one/few locations
Primarily structured data
Complex/nested transactions
Primary concern is scaling reads
Philosophy of scaling up for more users/data
Use NoSQL when you need/have…
Decentralised applications (e.g. Web, mobile and IoT)
Continuous availability; no downtime
High velocity data (devices, sensors, etc.)
Data coming in from many locations
Structured, with semi/unstructured
Simple transactions
Concern is to scale both writes and reads
Philosophy of scaling out for more users/data
To maintain moderate data volumes with purge
To maintain high data volumes; retain forever 37
A history of databases in No–tation
􏰁 1970: NoSQL = We have no SQL 􏰁 1980: NoSQL = Know SQL
􏰁 2000: NoSQL = No SQL!
􏰁 2005: NoSQL = Not only SQL 􏰁 2013: NoSQL = No, SQL!
38
7