Microsoft PowerPoint – 22- NoSQL Introduction
© 2018 A. Alawini & A. Parameswaran
NoSQL Overview &
MongoDB Basics
Abdu Alawini
University of Illinois at Urbana-Champaign
CS411: Database Systems
November 25, 2018
1
© 2018 A. Alawini & A. Parameswaran
Announcements
•HW 5 will be posted today (due on 12/5)
•PT1 Stage 5 final demos 11/28 to 12/4
•The deadline for signing up is Tuesday 11/27.
•PT1 report and video: due on 12/3
•PT2 Stage 2 report and slides: due on 12/3
2
© 2018 A. Alawini & A. Parameswaran
Outline
•NoSQL Introduction
•Relational-NoSQL Trade-offs
•MongoDB
•Model and simple queries
•Join
•Aggregation and Map-reduce (next time)
3
© 2018 A. Alawini & A. Parameswaran
The evolution of data models
•Hierarchical (IBM IMS) – 60’s-70’s
•Network, CODASYL (Backman, IDS) – 60’s
•Relational – 70’s
•Object-relational (Stonebraker, et al) – 90’s
•OODBMS (Atkinson, et al) – 90’s
•Array databases (MonetDB, SciDB, …) – 90’s
•XML (document-oriented) – 2000’s
•NoSQL – 2010’s
•NewSQL – 2011-present
4
© 2018 A. Alawini & A. Parameswaran
Why NoSQL?
•Databases are no longer one-size-fits-all
•The needs of modern applications do not always
match what relational databases provide.
•Every large web platform (e.g. Google, Facebook,
LinkedIn) has developed some sort of custom
solution to scale.
5
© 2018 A. Alawini & A. Parameswaran
Four V’s of Big Data
6
Src: https://greenzoneinc.com/what-we-do/big-data-solutions/
© 2018 A. Alawini & A. Parameswaran
7
“Big Data” is two problems
•The analysis problem
•How to extract useful info, using modeling, ML and stats.
•The storage problem
•How to store and manipulate huge amounts of data to facilitate
fast queries and analysis
•Problems with traditional (relational) storage
•Not flexible
•Hard to partition, i.e. place different segments on different
machines
•NoSQL solutions address these problems.
© 2018 A. Alawini & A. Parameswaran
Need for flexibility: E-Commerce
•Problem: Product catalogs store different types of
objects with different sets of attributes.
•This is not easily done within the relational model, need a
more “flexible schema”
•Relational Solutions
•Create a table for each product category
• Put everything in one table
•Use inheritance
• Entity-Attribute-Value
• Put everything in a BLOB
8
© 2018 A. Alawini & A. Parameswaran
RDBMS (1): Table per Product
9
CREATE TABLE `product_audio_album`
( `sku` char(8) NOT NULL, …
`artist` varchar(255) DEFAULT NULL,
`genre_0` varchar(255) DEFAULT NULL,
`genre_1` varchar(255) DEFAULT NULL, …
PRIMARY KEY(`sku`)) …
CREATE TABLE `product_film`
( `sku` char(8) NOT NULL, …
`title` varchar(255) DEFAULT NULL,
`rating` char(8) DEFAULT NULL, …
PRIMARY KEY(`sku`)) …
© 2018 A. Alawini & A. Parameswaran
RDBMS (2): Single table for all
10
CREATE TABLE `product`
( `sku` char(8) NOT NULL, …
`artist` varchar(255) DEFAULT NULL,
`genre_0` varchar(255) DEFAULT NULL,
`genre_1` varchar(255) DEFAULT NULL, …
`title` varchar(255) DEFAULT NULL,
`rating` char(8) DEFAULT NULL, …
PRIMARY KEY(`sku`))
© 2018 A. Alawini & A. Parameswaran
RDBMS (3): Inheritance
11
CREATE TABLE `product`
( `sku` char(8) NOT NULL,
`title` varchar(255) DEFAULT NULL,
`description` varchar(255) DEFAULT NULL,
`price`, …
PRIMARY KEY(`sku`))
CREATE TABLE `product_audio_album`
( `sku` char(8) NOT NULL, …
`artist` varchar(255) DEFAULT NULL,
`genre_0` varchar(255) DEFAULT NULL,
`genre_1` varchar(255) DEFAULT NULL, …
PRIMARY KEY(`sku`),
FOREIGN KEY(`sku`) REFERENCES `product`(`sku`))
….
© 2018 A. Alawini & A. Parameswaran
RDBMS (4): Entity Attribute Value
12
Entity Attribute Value
sku_00e8da9b Type Audio Album
sku_00e8da9b Title A Love Supreme
sku_00e8da9b … …
sku_00e8da9b Artist John Coltrane
sku_00e8da9b Genre Jazz
sku_00e8da9b Genre General
© 2018 A. Alawini & A. Parameswaran
NoSQL solution: flexible schema
•“Key-value store”
13
{ sku: “00e8da9b”,
type: “Audio Album”,
title: “A Love Supreme”,
description: “by John Coltrane”,
shipping: { weight: 6,
dimensions: { width: 10, height: 10, depth: 1 } },
pricing: { list: 1200, retail: 1100, savings: 100},
details: { title: “A Love Supreme [Original Recording]”,
artist: “John Coltrane”,
genre: [ “Jazz”, “General” ]}
}
© 2018 A. Alawini & A. Parameswaran
The analysis problem…
•So far, we’ve focused on the storage problem – flexible
schemas. There is also the analysis problem, which
requires scalability.
•Relational databases typically scale by getting bigger
servers
• Scaling across multiple servers is complicated
•NoSQL solutions are all about scaling across multiple
servers (e.g. cloud instances)
•Data can be automatically distributed across nodes/servers
• “Map-reduce” spreads computation across a cluster
a14
© 2018 A. Alawini & A. Parameswaran
Types of NoSQL solutions
•Key-value stores:
•Column-oriented:
•Document:
•Graph: Neo4J
15
© 2018 A. Alawini & A. Parameswaran
Outline
NoSQL Introduction
•Relational-NoSQL Trade-offs
•MongoDB
•Model and simple queries
•Join
16
© 2018 A. Alawini & A. Parameswaran
Relational-NoSQL Trade-offs
Fundamentally, there are several different trade-offs
• Schema vs. no schema
• Schema performance, no schema flexibility but parse overhead
(can have partial schemas like in XML)
• Replication, data partitioning
• Replicas mean faster queries, slower (consistent) updates
• Level of abstraction
• High-level queries – parsing, optimization, etc. – vs. low-level
operations
• Primitives: index lookup, joins, etc.
•Consistency
• What does the database do on concurrent updates, especially when
distributed?
17
© 2018 A. Alawini & A. Parameswaran
Consistency and NoSQL
•When an object is updated from different sites or in
different transactions, what happens?
•Eventual consistency
• “Eventually the latest write will be the winner, and every write
has an option to see if the data has changed while it was busy”
•Relational-style DBMSs generally have stronger
options, like serializability
•See blog post by Daniel Abbadi at
http://dbmsmusings.blogspot.com/2017/04/
18
© 2018 A. Alawini & A. Parameswaran
Outline
NoSQL Introduction
Relational-NoSQL Trade-offs
•MongoDB
•Model and simple queries
•Join
19
© 2018 A. Alawini & A. Parameswaran
Overview of MongoDB
•MongoDB is an example of a document-oriented
NoSQL solution
•The query language is limited, and oriented
around “collection” (relation) at a time
processing
• Joins are done via a query language
•The power of the solution lies in the distributed,
parallel nature of query processing
•Replication and sharding
20
© 2018 A. Alawini & A. Parameswaran
MondoDB Data Model
21
•A MongoDB deployment hosts a number of databases. A
database holds a set of collections. A collection holds a
set of documents. A document is a set of key-value pairs.
RDBMS MongoDB
Table
Row(s)
Index
Join
Partition
Partition Key
Collection
JSON Document
Index
Embedding & Linking
Shard
Shard Key
© 2018 A. Alawini & A. Parameswaran
Basic data types
•Null
•Boolean
•Integer (32- and 64-bit)
•Floating point
•String
•Date
•ObjectId
•Code (JavaScript)
•Array
•Embedded document
22
© 2018 A. Alawini & A. Parameswaran
Sample Document
23
mydoc = {
_id: 1,
name: { first: “John”, last: “Backus” },
birthyear: 1924,
contribs: [ “Fortran”, “ALGOL”, “Backus-Naur Form”, “FP” ],
awards: [ { award_id: “NMS001”,
year: 1975 },
{ award_id: “TA99”,
year: 1977} ]
}
> db.people.insertOne(mydoc)
Array of
documents
Always indexed, automatically assigned unless
provided
© 2018 A. Alawini & A. Parameswaran
Core MongoDB operations
•CRUD: create, read, update, and delete
• Insert
•One at a time: db.people.insert(mydoc)
•New (version 3.2):
db.collection.insertOne(),
db.collection.insertMany()
•Delete
• Documents that match some predicate, e.g. to remove the document
in the previous slide:
db.people.deleteOne({“_id”: 1})
db.people.deleteMany({birthyear: 1924})
• All documents in a collection: db.people.deleteMany()
• The collection still remains, with indexes
• Remove a collection (faster): db.people.drop()
24
© 2018 A. Alawini & A. Parameswaran
•Update documents in a collection
• db.collection.updateOne(), db.collection.updateMany()
Core MongoDB operations, cont.
25
db.people.updateMany( {birthyear: 1924}, {$set: {birthyear: 1925}})
db.people.updateMany( {birthyear: 1924}, {$set: {type: “Deceased”}})
© 2018 A. Alawini & A. Parameswaran
Querying
•Use find( ) function and a query document
•Ranges, set inclusion, inequalities using $ conditionals
•Complex queries using $where clause
•Queries return a database cursor
•Meta-operations on cursor include skipping some
number of results, limiting the number of results
returned, sorting results.
26
© 2018 A. Alawini & A. Parameswaran
Another sample document
27
d={
_id : ObjectId(“4c4ba5c0672c685e5e8aabf3”),
author : “Kevin”,
date : new Date(“February 2, 2012”),
text : “About MongoDB…”,
birthyear: 1980,
tags : [ “tech”, “databases” ]
}
> db.posts.insert(d)
© 2018 A. Alawini & A. Parameswaran
Find
Return entire collection in posts:
Return posts that match condition (conjunction):
28
{ _id : ObjectId(“4c4ba5c0672c685e5e8aabf3”), author : “Kevin”,
date : Date(“February 2, 2012”), birthyear: 1980,
text : “About MongoDB…”, tags : [ “tech”, “databases” ]}
db.posts.find({author: “Kevin”, birthyear: 1980})
db.posts.find( )
© 2018 A. Alawini & A. Parameswaran
“Pretty” format
• If you want to be able to read the result:
29
{
_id : ObjectId(“4c4ba5c0672c685e5e8aabf3”),
author : “Kevin”,
date : Date(“February 2, 2012”),
birthyear: 1980,
text : “About MongoDB…”,
tags : [ “tech”, “databases” ]
}
db.posts.find({author: “Kevin”, birthyear: 1980}).pretty()
© 2018 A. Alawini & A. Parameswaran
Specifying which keys to return
30
{
_id: 1,
name: { first: “John”, last: “Backus” },
contribs: [ “Fortran”, “ALGOL”, “Backus-Naur Form”, “FP” ]
}
{
name: { first: “John”, last: “Backus” }
}
db.people.find({}, {name:1, contribs:1})
db.people.find({}, {_id: 0, name:1})
© 2018 A. Alawini & A. Parameswaran
Ranges, Negation, OR-clauses
•Comparison operators: $lt, $lte, $gt, $gte
• db.posts.find({birthyear: {$gte: 1970, $lte: 1990}})
•Negation: $ne
• db.posts.find({birthyear: {$ne: 1982}})
•Or queries: $in (single key), $or (different keys)
• db.posts.find({birthyear: {$in: [1982, 1985]}})
• db.posts.find({$or: [{birthyear: 1982}, {author: “John”}]})
31
© 2018 A. Alawini & A. Parameswaran
Arrays
•db.posts.find({tags: “tech”})
•Print complete information about posts which are tagged
“tech”
•db.posts.find({tags: {$all: [“tech”, “databases”]}},
{author:1, tags:1})
•Print author and tags of posts which are tagged with both
“tech” and “databases” (among other things)
•Contrast this with:
db.posts.find({tags: [“databases”, “tech”]})
32
© 2018 A. Alawini & A. Parameswaran
Querying Embedded Documents
•db.people.find({“name.first”: “John”})
• Finds all people with first name John
•db.people.find({“name.first”: “John”, “name.last”:
“Smith”})
• Finds all people with first name John and last name Smith.
•Contrast with
db.people.find({“name”: {“first”: “John”, “last”: “Smith”}})
33
© 2018 A. Alawini & A. Parameswaran
Limits, Skips, Sort, Count
•db.posts.find().limit(3)
• Limits the number of results to 3
•db.posts.find().skip(3)
• Skips the first three results and returns the rest
•db.posts.find().sort({author:1, title: -1})
• Sorts by author ascending (1) and title descending (-1)
•db.people.find(…).count()
•Counts the number of documents in the people collection
matching the find(…)
34
© 2018 A. Alawini & A. Parameswaran
Outline
NoSQL Introduction
Relational-NoSQL Trade-offs
oMongoDB
Model and simple queries
•Join
35