程序代写代做代考 database SQL hbase Excel Java flex javascript COMP5338 – Advanced Data Models

COMP5338 – Advanced Data Models

Dr. Ying Zhou
School of Information Technologies

COMP5338 – Advanced Data Models
Week 2: Document Store: Data Model and Simple Query

Administrative

 Most labs are not full at the moment
 If you wish to move lab but cannot do it online, please go to the lab you want

to attend and let the tutor know
 Students allocated in SIT118

 If you wish to attend Wednesday labs, please attend SIT116 if your sid ends
with even number and SIT117 if your sid ends with odd number

 You may attend one of the Tuesday evening labs as well.

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -2

Time Room Capacity Tutor
Tue 8-9pm SIT114 30 Dai
Tue 8-9pm SIT115 30 Andrian
Tue 8-9pm SIT117 20 Heming (Taurus)
Tue 8-9pm SIT118 20 Chenhao
Wed 5-6pm SIT116 20 Givanna
Wed 5-6pm SIT117 20 Heming(Taurus)
Wed 5-6pm SIT118 20 Will be closed

 Lab arrangement

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou)

Outline
 Overview of Document Databases

 MongoDB Data Model

 MongoDB CRUD Operations

02 -3

Structured and Unstructured Data
 Relational Database System is designed to store

structured data in tabular format, e.g. each pieces of data
is stored in a predefined field (attribute)

 Unstructured data does not follow any predefined “model”
or “format” that is aware to the underlying system .
Examples include data stored in various files, e.g word
document

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -4

8703 Heinz 0293514287

8731 Edgell 0378301294

8927 Kraft 0299412020

9031 CSR 0720977632

Supplier Table:
SuppID Name Phone

Semi-structured Data
 Many data have some structure but should not be

constrained by a predefined and rigid schema
 E.g. if some suppliers have multiple phone numbers, it is hard to

capture such information in a relational model effectively
 Self-describing capability is the key characteristics of semi-

structured data
 schema/structure is an integral part of the data, instead of a separate

declaration
 in database system, the structure is “declared” when you create a

table. All rows need to follow the structure
 in CSV and Excel, the structure is “declared” in the header row. All

subsequent rows are supposed to follow that
 XML and JSON are two types of semi-structured data

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -5

A Self-describing XML document

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -6


1

John

Sydney

123
1

metadata/structure information data

Another invoice with slightly different structure

2

John

Sydney

12345678

123
1

456
2

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -7

JSON Data Format
 JSON (JavaScript Object Notation) is a simple way to

represent JavaScript objects as strings.
 There are many tools to serialize objects in other programming

language as JSON
 JSON was introduced in 1999 as an alternative to XML for

data exchange.
 Each JSON object is represented as a list of property

names and values contained in curly braces, in the following
format:

{ propertyName1 : value1, propertyName2 : value2 }

 Arrays are represented in JSON with square brackets in the
following format:

[ value1, value2, value3 ]

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -8

JSON format example

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -9

Invoice _1= {
order-id: 1,
customer: {name: “John”, address: “Sydney”},
products:[ { code: “123”, quantity: 1}]

}

Invoice _3= {
order_id: 3,
customer: {name: “Smith”,

address: “Melbourne”,
contact: “12345”},

products: [{ code: “123”, quantity: 20},
{ code: “456”, quantity:2}]

delivery: “express”
}

Document Databases
 Document database stores data in semi-structured

documents
 Document structure is flexible

 Provide own query syntax (different to standard SQL)
 Usually has powerful index support
 Examples:

 XML based database
 JSON based database: MongoDB, CouchDB

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -10

Outline
 Overview of Document Databases

 MongoDB Data Model

 MongoDB CRUD operations

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -11

Matching Terms in SQL and MongoDB

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou)

SQL MongoDB
Database Database
Table Collection
Index Index
Row BSON document
Column BSON field
Primary key _id field
Join Embedding and referencing

$lookup in aggregation (since
3.2)

02 -12

MongoDB Document Model

TFN Name Email age

12345 Joe Smith joe@gmail.com 30

54321 Mary Sharp mary@gmail.com 27

{ _id: 12345,
name: “Joe Smith”,
email: “joe@gmail.com”,
age: 30

}
{ _id: 54321,

name: “Mary Sharp”,
email: “mary@gmail.com”,
age: 27

}

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou)

users table in RDBMS

users collection in MongoDB

02-13

two rows

two documents

Column name is part of schema

Field name is part
of data

Repeated in every
document

Native Support for Array

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou)

{ _id: 12345,
name: “Joe Smith”,
emails: [“joe@gmail.com”, “joe@ibm.com”],
age: 30

}
{ _id: 54321,

name: “Mary Sharp”,
email: “mary@gmail.com”,
age: 27

}

02 -14

TFN Name Email age

12345 Joe Smith joe@gmail.com 30

54321 Mary Sharp mary@gmail.com 27
, joe@ibm.com ??

Native Support for Embedded
Document

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou)

{ _id: 12345,
name: “Joe Smith”,
email: [“joe@gmail.com”, “joe@ibm.com”],
age: 30

}

{ _id: 54321,
name: “Mary Sharp”,
email: “mary@gmail.com”,
age: 27,
address: { number: 1,

name: “cleveland street”,
suburb: “chippendale”,
zip: 2008

}
}

02 -15

TFN Name Email age address

12345 Joe Smith joe@gmail.com 30

54321 Mary Sharp mary@gmail.com 27 1 cleveland street,
chippendale, NSW
2008

MongoDB data types
 Primitive types

 String, integer, boolean (true/false), double, null
 Predefined special types

 Date, object id, binary data, regular expression, timestamp, and a
few more

 DB Drivers implement them in language-specific way
 The interactive shell provides constructors for all

 ISODate(“2012-09-11 18:00:00”)

 Array and object
 Field name is of string type with certain restrictions

 “_id” is reserved for primary key
 cannot start with “$”, cannot contain “.” or null

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -16

http://docs.mongodb.org/manual/reference/bson-types/

http://docs.mongodb.org/manual/reference/bson-types/

Data Modelling
 Key design decision in MongoDB data modelling involves

how to represents relationship between data
 How many collections should we use
What is the rough document structure in each collection

 Embedding or Referencing
Which object should have its own Collection

 And reference the id in other collection
Which object can be embedded in other object

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou)
http://www.mongodb.org/display/DOCS/Schema+Design

02 -17

http://www.mongodb.org/display/DOCS/Schema+Design

Referencing
 References store the relationships between data by

including links or references from one document to another.

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -18

Embedding
 Embedded documents capture relationships between data

by storing related data in a single document structure.

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -19

_id is not required

“Schema” Design Example
 A fully normalized relational

model would have the
following tables:
User
Post
Comment
PostLink

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou)

http://docs.mongodb.org/manual/applications/data-models/

02-20

http://docs.mongodb.org/manual/applications/data-models/

MongoDB schema design
 Using three collections

 User collection
 Post collection (with links to User, Comment, and Post itself)
 Comment Collection(with links to User)

 Using two collections
 User collection
 Post collection (with embedded Comment object and links to User and

Post itself

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02-21

Two Collections Schema
 Two collections

 User collection
 Post collection (with embedded Comment object and links to User and

Post itself )

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -22

{ _id: “p1”,
author: “u1” ,
title: “A nice day”,
date: 2012-09-10,
comments: [

{ author: “u2”,
content: “nice here too”,
date: 2012-09-11,

}
]
backlinks: [“p2”]

}

{ _id: “u1”,
name: “user1”,
password: “bq7e0dx…”,
email: “user1@gmail.com”

}

{ _id: “u2”,
name: “user2”,
password: “mb8xfv…”,
email: “user2@gmail.com”

}

{ _id: “p2”,
author: “u2”
title: “NoSQL is dead”,
date: 2012-09-11,
tags: [“MongoDB”, “HBase”],
comments: [

{ author: “u1”,
content: “nonsense”
date: 2012-09-11

}
]

}

User collection:
Post collection:

This post does not have tags, so no “tags” field
This post does not have links pointing to
it, so no “backlink” field

Each user profile is saved as a JSON document

An array of Comment objects
Tags and backlinks are stored as
array

Three Collections Schema

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -23

 Three collections
 User collection
 Post collection (with links to User, Comment, and Post itself)
 Comment Collection(with links to User)

{ _id: “u1”,
name: “user1”,
password: “bq7e0dx…”,
email: “user1@gmail.com”

}

{ _id: “u2”,
name: “user2”,
password: “mb8xfv…”,
email: “user2@gmail.com”

}

User collection: Post collection:

{ _id: “p1”,
author: “u1” ,
title: “A nice day”,
date: 2012-09-10,
comments: [“c2”],
backlinks: [“p2”]

}

{ _id: “p2”,
author: “u2”
title: “NoSQL is dead”,
date: 2012-09-11,
tags: [“MongoDB”, “HBase”],
comments: [ “c1” ]

}

Comment collection:

{ _id: “c1”,
author: “u1” ,
content: “nonsense”,
date: 2012-09-11,

}

{ _id: “c2”,
author: “u2” ,
content: “nice here too”,
date: 2012-09-11,

}

Two Collections vs. Three Collections
 Which one is better?

 Hard to tell by schema itself, we need to look at the actual application to
understand
 Typical data feature

• What would happen if a post attracts lots of comments?
 Typical queries

• Do we want to show all comments when showing a post, or only the latest few, or not at
all?

• Do we need to produce statistics based on comment itself?
 Atomicity consideration

• Is there “all or nothing” update requirement with respect to post and comment

 Other design variation?
 In three collection schema, store post-comment link information in Comment collection

instead of Post collection?
 Embed the recent comments in Post?
 One User collection with embedded Post and Comment objects?
 One User collection with user, post and comment documents?

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -24

General Schema Design Guideline
 Depends on data and intended use cases

 “independent” object should have its own collection
 composition relationship are generally modelled as embedded relation

 Eg. ShoppingOrder and LineItems, Polygon and Points belonging to it
 aggregation relationship are generally modelled as links (references)

 Eg. Department and Employee
 Many-to-Many relationship are generally modelled as links (references)

 Eg. Course and Students enrolled in a course
 If part-objects are always required when whole-object is queried, embed the

part-object
 We always want to display line items when displaying shopping order
 We always want to display Comments along with the blog Post;
 We always want to get Credit Card billing address when querying credit card

information;
 But we might not always want to get all students enrolled when querying about a

course.

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -25

Course information page

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -26

Outline
 Overview of Document Databases

 MongoDB Data Model

 MongoDB CRUD Operations

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -27

MongoDB Queries
 In MongoDB, a read query targets a specific collection. It

specifies criteria, and may include a projection to specify
fields from the matching documents; it may include modifier
to limit, skip, or sort the results.

 A write query may create, update or delete data. One query
modifies the data of a single collection. Update and delete
query can specify query criteria

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -28

http://docs.mongodb.org/manual/core/crud-introduction/

Read Operation Interface
 db.collection.find()

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -29

Find at most 5 documents in the users collection with age field
greater than 18, return only the name and address field of
each document.

Read Query Example

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou) 02 -30

Find documents in the users collection with age field greater
than 18, sort the results in ascending order by age

Read Query Features
 Users can find data using any criteria in MongoDB

 Does not require indexing
 Indexing can improve performance (week 4)

 Query criteria are expressed as BSON document (query object)
 Individual condition is expressed using predefined selection operator, eg. $gt is the

operator for “greater than”

 Query projection are expressed as BSON document as well

COMP5338 “Advanced Data Models” – 2018 ( Y. Zhou)

SQL MongoDB Query in Shell
select * from user db.user.find() or db.user.find({})

select name, age from user db.user.find({},{name:1,age:1,_id:0})

select * from user
where name = “Joe Smith”

db.user.find({name: “Joe Smith”})

select * from user
where age < 30 db.user.find({age: {$lt:30}}) 02 -31 Querying Array field  MongoDB provide various features for querying array field  https://docs.mongodb.com/manual/tutorial/query-arrays/  The syntax are similar to querying simple type field  db.users.find({emails: “joe@gmail.com”})  Find user(s) whose email include “joe@gmail.com”.  db.users.find({“emails.0”: “joe@gmail.com”})  Find user(s) whose first email is “joe@gmail.com”.  db.users.find({emails: {$size:2}})  Find user(s) with 2 emails COMP5338 "Advanced Data Models" - 2018 ( Y. Zhou) { _id: 12345, name: “Joe Smith”, emails: [“joe@gmail.com”, “joe@ibm.com”], age: 30} { _id: 54321, name: “Mary Sharp”, email: “mary@gmail.com”, age: 27} 02 -32 Querying Embedded Document  Embedded Document can be queried as a whole, or by individual field, or by combination of individual fields  db.user.find({address: {number: 1, name: “pine street”, suburb: “chippendale”, zip: 2008}})  db.user.find({“address.suburb”: “chippendale”})  db.user.find({“address.name”: “pine street”, “address.suburb”: “chippendale”}) COMP5338 "Advanced Data Models" - 2018 ( Y. Zhou) { _id: 12345, name: “Joe Smith”, email: [“joe@gmail.com”, “joe@ibm.com”], age: 30, address: {number: 1, name: “pine street”, suburb: “chippendale”, zip: 2008 } } { _id: 54321, name: “Mary Sharp”, email: “mary@gmail.com”,age: 27, address: { number: 1, name: “cleveland street”,suburb: “chippendale”,zip: 2008 } } http://docs.mongodb.org/manual/tutorial/query-documents/#embedded-documents 02 -33 http://docs.mongodb.org/manual/tutorial/query-documents/ Write Query- Insert Operation COMP5338 "Advanced Data Models" - 2018 ( Y. Zhou) 02 -34 Insert a new document in users collection. Insert Example  db.user.insertOne({_id: 12345, name: “Joe Smith”, emails: [“joe@gmail.com”, “joe@ibm.com”],age: 30}) COMP5338 "Advanced Data Models" - 2018 ( Y. Zhou) user collection { _id: 12345, name: “Joe Smith”, emails: [“joe@gmail.com”, “joe@ibm.com”], age: 30 } { _id: 54321, name: “Mary Sharp”, email: “mary@gmail.com”, age: 27, address: { number: 1, name: “cleveland street”, suburb: “chippendale”, zip: 2008 } }  db.user.insertOne({ _id: 54321, name: “Mary Sharp”, email: “mary@gmail.com”, age: 27, address: { number: 1, name: “cleveland street”, suburb: “chippendale”, zip: 2008}}) 02 -35 Insert Behavior  If the collection does not exist, the operation will create one  If the new document does not contain an “_id” field, the system will adds an “_id” field and assign a unique value to it.  If the new document does contain an “_id” field, it should have a unique value  Two other operations:  insertMany  Insert many documents  Insert  Major language APIs only support insertOne and insertMany COMP5338 "Advanced Data Models" - 2018 ( Y. Zhou) 02 -36 Write Query – Update Operation COMP5338 "Advanced Data Models" - 2018 ( Y. Zhou) 02 -37 Has the same effect as the following SQL: Two other operations: updateOne, replaceOne Updates operators  Modifying simple field: $set, $unset  db.user.updateOne({_id: 12345}, {$set: {age: 29}})  db.user.updateOne({_id:54321}, {$unset: {email:1}}) // remove the field  Modifying array elements: $push, $pull, $pullAll  db.user.updateOne({_id: 12345}, {$push: {emails: “joe@hotmail.com”}})  db.user.updateOne({_id: 54321}, {$push: {emails: {$each: [“mary@gmail.com”, “mary@microsoft.com”]}}})  db.user.updateOne({_id: 12345}, {$pull: {emails: “joe@ibm.com”}}) COMP5338 "Advanced Data Models" - 2018 ( Y. Zhou) { _id: 12345, name: “Joe Smith”, emails: [“joe@gmail.com”, “joe@ibm.com”], age: 30} { _id: 54321, name: “Mary Sharp”, email: “mary@gmail.com”, age: 27} { _id: 12345, name: “Joe Smith”, emails: [“joe@gmail.com”, “joe@hotmail.com”], age: 29} { _id: 54321, name: “Mary Sharp”, emails: [“mary@gmail.com”, “mary@microsoft.com”] age: 27} http://www.mongodb.org/display/DOCS/Updating 02 -38 http://www.mongodb.org/display/DOCS/Updating Write Operation - Delete  db.user.deleteMany();  Remove all documents in user collection  db.user.deleteMany({age: {$gt:18}})  Remove all documents matching a certain condition  db.user.deleteOne({_id: 12345})  Remove one document matching a certain condition COMP5338 "Advanced Data Models" - 2018 ( Y. Zhou) 02 -39 Isolation of write operation  The modification of a single document is always atomic  It does not leave a document as partially updated.  A concurrent read will not see a partially updated document  This is true even if the operation modifies multiple embedded documents within a single document  Read Uncommitted  Concurrent read operation may see document that has been updated but not yet committed, or not durable  If a write operation is subsequently rolled back, a concurrent read may return the updated value before it is rolled back COMP5338 "Advanced Data Models" - 2018 ( Y. Zhou) 02 -40 Single Document Atomicity COMP5338 "Advanced Data Models" - 2018 ( Y. Zhou) 02 -41 db.inventory.insertMany( [ { item: "canvas", qty: 100, size: { h: 28, w: 35.5, uom: "cm" }, status: "A" }, { item: "journal", qty: 25, size: { h: 14, w: 21, uom: "cm" }, status: "A" }, { item: "paper", qty: 100, size: { h: 8.5, w: 11, uom: "in" }, status: "D" }]); db.inventory.find({item: "paper"}) db.inventory.updateOne( { item: "paper" }, { $set: { "size.uom": "cm", status: "P" } } ) { item: "paper", qty: 100, size: { h: 8.5, w: 11, uom: "in" }, status: "D" }]); { item: "paper", qty: 100, size: { h: 8.5, w: 11, uom: “cm" }, status: “P" }]); { item: "paper", qty: 100, size: { h: 8.5, w: 11, uom: “cm" }, status: “D" }]); Isolation of write operation  If a write operation modifies multiple documents (insertMany, updateMany, deleteMany), the operation as a whole is not atomic, and other operations may interleave.  Multi-Document Transactions is supported in version 4.0  Other mechanisms were used in earlier versions  The $isolated operator can prevents a write operation that affects multiple documents from yielding to other reads or writes once the first document is written  All those mechanisms have great performance impact and are recommended to avoid if possible, document embedding is recommended as an alternative COMP5338 "Advanced Data Models" - 2018 ( Y. Zhou) 02 -42 Write Operation – interleaving Scenario COMP5338 "Advanced Data Models" - 2018 ( Y. Zhou) 02 -43 {age: 21, status: “U”} {age: 23, status: “S”} {age: 17, status: “E”} {age: 25, status: “R”} {age: 15, status: “S”} {age: 16, status: “C”} {age: 19, status: “O”} {age: 22, status: “L”} db.users.updateMany( { age: { $gt: 18 } }, { $set: { status: “A” } } ) users collection {age: 21, status: “U”} {age: 23, status: “S”} {age: 25, status: “R”} {age: 19, status: “O”} {age: 22, status: “L”} {age: 21, status: “A”} {age: 23, status: “A”} {age: 25, status: “A”} {age: 19, status: “O”} {age: 22, status: “L”} db.users.find( { age: { $gt: 20 } } ) {age: 21, status: “A”} {age: 23, status: “A”} {age: 25, status: “A”} {age: 22, status: “L”} Read returned documents write is on going, a read query comes {age: 21, status: “A”} {age: 23, status: “A”} {age: 25, status: “A”} {age: 19, status: “A”} {age: 22, status: “A”} Write finishes A write query comes Write Operation – Isolation Scenario COMP5338 "Advanced Data Models" - 2018 ( Y. Zhou) 02 -44 {age: 21, status: “U”} {age: 23, status: “S”} {age: 17, status: “E”} {age: 25, status: “R”} {age: 15, status: “S”} {age: 16, status: “C”} {age: 19, status: “O”} {age: 22, status: “L”} users collection {age: 21, status: “U”} {age: 23, status: “S”} {age: 25, status: “R”} {age: 19, status: “O”} {age: 22, status: “L”} {age: 21, status: “A”} {age: 23, status: “A”} {age: 25, status: “A”} {age: 19, status: “O”} {age: 22, status: “L”} db.users.find( { age: { $gt: 20 } } ) {age: 21, status: “A”} {age: 23, status: “A”} {age: 25, status: “A”} {age: 22, status: “A”} Read has to wait write is on going, a read query comes {age: 21, status: “A”} {age: 23, status: “A”} {age: 25, status: “A”} {age: 19, status: “A”} {age: 22, status: “A”} Write finishes db.users.updateMany( { age: { $gt: 18 } }, { $set: { status: “A”, $isolated: 1 } } ) A write query comes Read returns the results References  MongoDB online documents: Mongo DB Data Models  http://docs.mongodb.org/manual/core/data-modeling-introduction/ MongoDB CRUD Operations  http://docs.mongodb.org/manual/core/crud-introduction/  Pramod J. Sadalage, Martin Fowler NoSQL distilled, Addison- Wesley Professional; 1 edition (August 18, 2012)  https://www.amazon.com/NoSQL-Distilled-Emerging-Polyglot- Persistence/dp/0321826620 COMP5338 "Advanced Data Models" - 2018 ( Y. Zhou) 02 -45 http://docs.mongodb.org/manual/core/data-modeling-introduction/ http://docs.mongodb.org/manual/core/crud-introduction/ https://www.amazon.com/NoSQL-Distilled-Emerging-Polyglot-Persistence/dp/0321826620 COMP5338 – Advanced Data Models Administrative Outline Structured and Unstructured Data Semi-structured Data A Self-describing XML document Another invoice with slightly different structure JSON Data Format JSON format example Document Databases Outline Matching Terms in SQL and MongoDB MongoDB Document Model Native Support for Array Native Support for Embedded Document MongoDB data types Data Modelling Referencing Embedding “Schema” Design Example MongoDB schema design Two Collections Schema Three Collections Schema Two Collections vs. Three Collections General Schema Design Guideline Course information page Outline MongoDB Queries Read Operation Interface Read Query Example Read Query Features Querying Array field Querying Embedded Document Write Query- Insert Operation Insert Example Insert Behavior Write Query – Update Operation Updates operators Write Operation - Delete Isolation of write operation Single Document Atomicity Isolation of write operation Write Operation – interleaving Scenario Write Operation – Isolation Scenario References