Dr. Ying Zhou
School of Information Technologies
COMP5338 – Advanced Data Models
Week 4: MongoDB – Advanced Features
Outline
◼ Indexing
◼ Replication
◼ Sharding
04-2COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Review: DBMS Components
COMP5338 Advanced Data Models – 2018 (Y. Zhou) 04-3
Disk based
storage
system
Page 6, H.Garcia-Molina, J. D. Ullman, J. Wildom, Databsae Systems
The Complete Book
http://infolab.stanford.edu/~ullman/dscb.html
http://infolab.stanford.edu/~ullman/dscb.html
Storage Engine
◼ Storage engine is responsible for managing how data is
store in memory and disk
◼ MongoDB supports multiple storage engines
WiredTiger is the default one since version 3.2
◼ Some prominent features of WiredTiger
Document level concurrency
MultiVersion Concurrency Control (MVCC)
Snapshots are provided at the start of operation
Snapshots are written to disk (creating checkpoints) at intervals of 60
seconds or 2GB of journal data
Journal
Write-ahead transaction log
Compression
04-4COMP5338 Advanced Data Models – 2018 (Y. Zhou)
The primitive operations of query
◼ Read query
Load the element of interest from disk to main-memory buffer(s) if it is not
already there
Read the content to client’s address space
◼ Write query
The new value is created in the client’s address space
It is copied to the appropriate buffers representing the database in the
memory
The buffer content is flushed to the disk
◼ Both operations involve data movement between disk and memory and
between memory spaces
◼ Typically disk access is the predominant performance cost in single
node settings. Network communication contributes to the cost in cluster
setting
◼ We want to reduce the amount of disk I/Os in read and write queries
COMP5338 Advanced Data Models – 2018 (Y. Zhou) 04-5
Typical Solutions to minimize Disk I/O
◼ Queries involve reading data from the database
Minimize the amount of data need to be moved from disk to memory
Use index and data distribution information to decide on a query plan
◼ Queries involve writing data to the database
Minimize the amount of disk I/O in the write path
Avoid flushing memory content to disk immediately after each write
Push non essential write out the of write path, e.g. do those
asynchronously
To ensure durability, write ahead log/journal/operation log is always
necessary
Appending to logs are much faster than updating the actual database file
The DB system may acknowledge once the data is updated in memory
and appended in the WAL
Update to replicas can be done asynchronously, e.g. not in the write
path
COMP5338 Advanced Data Models – 2018 (Y. Zhou) 04-6
Indexing
◼ An index on an attribute/field A of a table/collection is a data
structure that makes it efficient to find those rows(document)
that have a required value for attribute/field A.
◼ An index consists of records (called index entries) each of
which has a value for the attribute(s) eg of the form
◼ Index files are typically much smaller than the original file
attr. value Pointer to data record
04-7COMP5338 Advanced Data Models – 2018 (Y. Zhou)
MongoDB Basic Indexes
◼ The _id index
_id field is automatically indexed for all collections
The _id index enforces uniqueness for its keys
◼ Indexing on other fields
Index can be created on any other field or combination of fields
db.
fieldName can be a simple field, array field or field of an embedded document
(using dot notation)
• db.blog.createIndex({author:1})
• db.blog.createIndex({tags:1})
• db.blog.createIndex({“comments.author”:1})
the number specifies the direction of the index (1: ascending; -1: descending)
Additional properties can be specified for an index
Sparseness, uniqueness, background, ..
◼ Most MongoDB indexes are organized as B-Tree structure
http://www.mongodb.org/display/DOCS/Indexes
04-8COMP5338 Advanced Data Models – 2018 (Y. Zhou)
http://www.mongodb.org/display/DOCS/Indexes
Single field Index
https://docs.mongodb.com/manual/core/index-single/
04-9COMP5338 Advanced Data Models – 2018 (Y. Zhou)
1 23
Compound Index
◼ Compound Index is a single index structure that holds references to
multiple fields within a collection
◼ The order of field in a compound index is very important
The indexes are sorted by the value of the first field, then second, third…
It supports queries like
db.users.find({userid: “ca2”, score: {$gt:30} })
db.users.find({userid: “ca2”})
But not queries like
db.users.find({score: 75})
https://docs.mongodb.com/manual/core/index-compound/
04-10COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Use Index to Sort (single field)
◼ Sort operation may obtain the order from index or sort the
result in memory
◼ Index can be traversed in either direction
◼ Sort with a single field index
For single field index, sorting by that field can always use the index
regardless of the sort direction
E.g. db.records.createIndex( { a: 1 } ) supports both
db.records.find().sort({a:1}) and
db.records.find().sort({a: -1})
https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/
04-11COMP5338 Advanced Data Models – 2018 (Y. Zhou)
https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/
Use Index to Sort (multiple fields)
◼ Sort on multiple fields
Compound index may be used on sorting multiple fields.
There are constrains on fields and direction
Sort key should have the same order as they appear in the index
All field sort have same sort direction, either going forwards or
backwards the index
E.g. {userid:1, score:-1} and {userid:-1, score:1} can use the index, but
not {userid:1, score:1}
04-12COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Use Index to Sort (multiple fields)
◼ Sort and Index Prefix
If the sort keys correspond to the index keys or an index prefix,
MongoDB can use the index to sort the query results.
E.g. db.data.createIndex( { a:1, b: 1, c: 1, d: 1 } )
Supported query:
db.data.find().sort( { a: -1 } )
db.data.find().sort( { a: 1, b: 1 } )
db.data.find( { a: { $gt: 4 } } ).sort( { a: 1, b: 1 } )
◼ Sort and Non-prefix Subset of an Index
An index can support sort operations on a non-prefix subset of the
index key pattern if the query include equality conditions on all the
prefix keys that precede the sort keys.
e.g supported query: db.data.find( { a: 5 } ).sort( { b: 1, c: 1 } )
db.data.find( { a: 5, b: { $lt: 3} } ).sort( { b: 1 } )
04-13COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Running Example
◼ Suppose we have a users collection with the following 6
documents stored in the order of _id values
04-14
_id: 1
userid: “aa1”
score: 45
_id: 2
userid: “ca2”
score: 55
_id: 3
userid: “nb1”
score: 30
_id: 4
userid: “ca2”
score: 75
_id: 5
userid: “xyz”
score: 90
_id: 6
userid: “ca2”
score: 30
COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Index Entries
◼ Now we create a compound index on userid and score fields :
db.users.createIndex(userid:1, score:-1)
◼ With the current data, the index has six entries because we have 6
unique values for (userid, score) in the collection
New entry will be added each time we insert a document with a
(userid,score) different to the ones already there
◼ Our index entry structure would look like this: the index entries usually
form a doubly linked list to facilitate bi-directional traversal
04-15
(“ca2”, 75)(“aa1”, 45) (“ca2”, 55) (“ca2”, 30) (“nb1”, 30) (“xyz”, 90)
start
_id: 1
userid: “aa1”
score: 45
_id: 2
userid: “ca2”
score: 55
_id: 3
userid: “nb1”
score: 30
_id: 4
userid: “ca2”
score: 75
_id: 5
userid: “xyz”
score: 90
_id: 6
userid: “ca2”
score: 30
end
COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Using index to find documents
◼ For queries that are able to use index, the first step is to find
the boundary entries on the list based on given query
condition
◼ for instance, if we want to look for userid greater than “b”
but less than “s”
db.users.find({userid:{$gt: “b”, $lt:“s”}})
◼ This query is able to use the compound index and the two
bounds are:(“ca1”, 75) and (“nb1”,30) inclusive at both
ends
04-16COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Using index to find documents
◼ The four documents with _id equals: 4, 2, 6 and 3 are the
result of the above query
04-17
(“ca2”, 75)(“aa1”, 45) (“ca2”, 55) (“ca2”, 30) (“nb1”, 30) (“xyz”, 90)
_id: 1
userid: “aa1”
score: 45
_id: 2
userid: “ca2”
score: 55
_id: 3
userid: “nb1”
score: 30
_id: 4
userid: “ca2”
score: 75
_id: 5
userid: “xyz”
score: 90
_id: 6
userid: “ca2”
score: 30
COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Using Index to sort
◼ If our queries include a sorting criteria
db.users.find({userid:{$gt: “b”, $lt:“s”}}).sort({userid:1,
score:-1})
◼ as before, the db engine will start from the lower bound,
following the forward links to the upper bound and return
all documents pointed by the entries
◼ They are :
{_id:4, userid:“ca2”, score: 75} {_id:2,userid:“ca2”, score: 55} {_id:6,
userid:“ca2”, score 30} {_id:3,userid:“nb1”, score: 30}
The results satisfy the condition and are in correct order
04-18COMP5338 Advanced Data Models – 2018 (Y. Zhou)
04-19
(“ca2”, 75)(“aa1”, 45) (“ca2”, 55) (“ca2”, 30) (“nb1”, 30) (“xyz”, 90)
1 2 3 4
_id: 1
userid: “aa1”
score: 45
_id: 2
userid: “ca2”
score: 55
_id: 3
userid: “nb1”
score: 30
_id: 4
userid: “ca2”
score: 75
_id: 5
userid: “xyz”
score: 90
_id: 6
userid: “ca2”
score: 30
COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Sorting that cannot use index
◼ If our query includes yet another sorting criteria
db.users.find({userid:{$gt: “b”, $lt:“s”}}).sort({userid:1, score:1})
◼ We can still use the index to find the bounds and the four
documents satisfying the query condition, but we are not
able to follow a single forward or backward link to get the
correct order of the documents
04-20COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Sorting that cannot use index
◼ If we want to use the index entry list to obtain the correct, we would start from a
mysterious position (“ca2”,30), follow the backward links to (“ca2”,75), and
make a magic jump to the entry (“nb1”, 30).
complexity involved:
how do we find the start point in between lower and upper bound?
how do we decide when and where to jump in another direction?
The complexity of such algorithm makes it less optimal than a memory sort of the actual
documents.
04-21COMP5338 Advanced Data Models – 2018 (Y. Zhou)
General rules
◼ If you are able to traverse the list between the upper and
lower bounds as determined by your query condition in one
direction to obtain the correct order as specified in the sort
condition, the index will be used to sort the result
◼ Otherwise you may still use index to obtain the results but
have to sort them in memory
04-22COMP5338 Advanced Data Models – 2018 (Y. Zhou)
BTree motivation
◼ Finding the boundaries could be time consuming if we only
have the list structure and can only start from one of the two
ends
◼ B-Tree structure is built on top of the index values to
accelerate the process of locating the boundary.
04-23COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Multi key index
◼ Index can be created on array field, the key set include each
element in the array. It behaves the same as single index
field otherwise
◼ There are restrictions on including multi key index in
compound index
04-24COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Text Indexes
◼ Text indexes support efficient text search of string content in
documents of a collection
◼ To create a text index
db.
text index tokenizes and stems the terms in the indexed fields for the index
entries.
◼ To perform text query
db.find($text:{$search:
No field name is specified
◼ Restrictions:
A collection can have at most one text index, but it can include text
from multiple fields
Different field can have different weights in the index, results can be
sorted using text score based on weights
Sort operations cannot obtain sort order from a text index
04-25COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Other Indexes
◼ Geospatial Index
MongoDB can store and query spatial data in a flat or spherical
surface
2d indexes and 2dsphere indexes
◼ Hash indexes
Index the hash value of a field
Only support equality match, but not range query
Mainly used in hash based sharding
04-26COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Indexing properties
◼ Similar to index in RDBMS, extra properties can be
specified for index
◼ We can enforce the uniqueness of a field by create a unique
indexes
db.members.createIndex( { “user_id”: 1 }, { unique: true } )
◼ We can reduce the index storage by specifying index as
sparse
Only documents with the indexed field will have entries in the index
By default, non-sparse index contain entries for all documents.
Documents without the indexed field will be considered as having
null value.
◼ MongoDB also supports TTL indexes and partial index
04-27COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Indexing strategy
◼ Indexing cost
Storage, memory, write latency
◼ Performance consideration
In general, MongoDB only uses one index to fulfil specific queries
$or query on different fields may use different indexes
MongoDB may use intersection of multiple indexes
When index fits in memory, you get the most performance gain
◼ Build index if the performance gain can justify the cost
Understand the query
Understand the index behaviour
04-28COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Performance Monitoring Tools
◼ Profiler
Collects execution information about queries running on a database
IT can be used to identify various underperforming queries
Slowest queries
Queries not using any index
Queries running slower than some threshold
Custom tagged queries, e.g. by commenting
And more
◼ Explain method
Collect detailed information about a particular query
How the query is performed
What execution plans are evaluated
Detailed execution statistics, e.g. how many index entries or documents
have been examined
COMP5338 Advanced Data Models – 2018 (Y. Zhou) 04-29
How to Use the MongoDB Profiler and explain() to Find Slow Queries
Outline
◼ Indexing
◼ Replication
◼ Sharding
04-30COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Replication
◼ MongoDB uses replication to achieve durability, availability and/or read
scalability.
◼ A basic master/slave replication component in MongoDB is called a
replica set
http://www.mongodb.org/display/DOCS/Replica+Sets+-+Basics
Asynchronous updates to
secondary members
By default all reads/writes are sent to
primary member only; this achieves
strong consistency
User may indicate that it is safe
to read from secondary (slave)
member; strong consistency
cannot be guaranteed;
achieves eventual consistency
All members are copies of each other
MongoDB applies database
operations on the primary and
then records the operations on
the primary’s oplog (operation
log). The secondary members
then replicate this log and
apply the operations to
themselves in an
asynchronous process
04-31COMP5338 Advanced Data Models – 2018 (Y. Zhou)
http://www.mongodb.org/display/DOCS/Replica+Sets+-+Basics
http://docs.mongodb.org/manual/reference/glossary/
http://docs.mongodb.org/manual/reference/glossary/
Replica Set
◼ Data Integrity
Single Master (primary)
Write happens only on Master
Read from secondary (slave) member may return previous value
Read may return uncommitted value
◼ Primary Election
May be triggered at different time
Newly formed replica set
Primary is down
…
Replica set members send heartbeats (pings) to each other every 2
seconds.
The first member to receive votes from a majority of members in a set
becomes the next primary until the next election
Replica set needs to have odd number of members
Arbiter is a member of the replica set that does not hold data but are able to vote
during primary election
http://docs.mongodb.org/manual/core/replication-internals/
04-32COMP5338 Advanced Data Models – 2018 (Y. Zhou)
http://docs.mongodb.org/manual/core/replication-internals/
Replica Set – cont’d
◼ Network Partition
Members in a replica set may belong to different racks or different
data centers to maximize durability and availability
During primary election if network partition happens and neither side
of the partition has a majority on its own, the set will not elect a new
primary and the set will become read only
04-33COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Replica Set Read/Write Options
◼ By default, read operations are answered by the primary
member and always return the latest value being written
◼ By default, replication to the secondary member happens
asynchronously
◼ Client can specify “Read Preference” to read from different
members
Primary(default), secondary, nearest, etc
◼ To maintain consistency requirements, client can specify
different levels of “Write Concern”
By default, write is considered successful when it is written on the
primary member
This can be changed to include write operations on secondary
members.
04-34COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Verify Write to Replica Set
http://docs.mongodb.org/manual/core/replica-set-write-concern/
Timeout mechanism
is used to prevent
blocking indefinitely
04-35COMP5338 Advanced Data Models – 2018 (Y. Zhou)
db.products.insert(
{ item: “envelopes”, qty : 100, type: “Clasp” },
{ writeConcern: { w: 2, wtimeout: 5000 } }
)
Read Preference
◼ Read preference describes how MongoDB clients route
read operations to the members of a replica set.
COMP5338 Advanced Data Models – 2018 (Y. Zhou) 04-36
Mode Description
Primary Default one. All operations read from the primary node
PrimaryPreferred in most situations, operations read from the primary but if it is
unavailable, operations read from secondary members.
Secondary All operations read from the secondary members of the replica
set.
SecondaryPreferred In most situations, operations read from secondary members but if
no secondary members are available, operations read from
the primary.
nearest Operations read from member of the replica set with the least
network latency, irrespective of the member’s type.
Read Isolation (Read Concern)
◼ How read operation is carried out inside MongoDB with
replica set to control the consistency and availability
◼ There are many levels
◼ New release may introduce new level(s) to satisfy growing
consistency requirement
◼ To understand what sort of consistency you will get, all three
properties need to be looked at
Write Concern
Read Preference
Read Concern
COMP5338 Advanced Data Models – 2018 (Y. Zhou) 04-37
Read Concern Levels
◼ local: the query returns data from the instance with no guarantee that the
data has been written to a majority of the replica set members (i.e. may
be rolled back)
Default for read against primary, or reads against secondaries if the reads
are associated with causally consistent sessions
◼ available: the query returns data from the instance with no guarantee
that the data has been written to a majority of the replica set members
Default for read against secondaries if the reads are not associated
with causally consistent sessions
◼ majority: The query returns the data that has been acknowledged by a
majority of the replica set members. The documents returned by the
read operation are durable, even in the event of failure.
◼ linearizable
◼ Snapshot
COMP5338 Advanced Data Models – 2018 (Y. Zhou) 04-38
Read uncommitted behaviour may happen with local and available level
Default Behaviour
◼ Write concern:
Write is considered successful when it is written on the primary member
replication to the secondary members happen asynchronously
There is no rollback once the write is applied successfully in the primary
◼ Read Preference:
primary: All read operations are sent to the primary
◼ Read Concern
Local: returns data from the instance (in this case, the primary) with no
guarantee that the data has been written to a majority of the replica set
members
◼ What we get with default setting
The strongest consistency level: strong consistency at single document level
◼ What are trade offs
Availability and latency
All write/read happens at primary, secondaries have little use in terms of live traffic
COMP5338 Advanced Data Models – 2018 (Y. Zhou) 04-39
Customized Behaviour: Write: majority
◼ Write concern: “majority”
Requests acknowledgement that write operations have propagated
to the majority of voting nodes, including the primary
COMP5338 Advanced Data Models – 2018 (Y. Zhou) 04-40
https://docs.mongodb.com/manual/reference/read-concern-local/#readconcern.%22local%22
•All writes prior to Write0 have been successfully replicated to all
members.
•Writeprev is the previous write before Write0.
•No other writes have occured after Write0.
Write0
Write: majority example case
COMP5338 Advanced Data Models – 2018 (Y. Zhou) 04-41
Time Event Most Recent Write
Most Recent w: “majority”
write
t0 Primary applies Write0
Primary: Write0
Secondary1: Writeprev
Secondary2: Writeprev
Primary: Writeprev
Secondary1: Writeprev
Secondary2: Writeprev
t1 Secondary1 applies write0
Primary: Write0
Secondary1: Write0
Secondary2: Writeprev
Primary: Writeprev
Secondary1: Writeprev
Secondary2: Writeprev
t2 Secondary2 applies write0
Primary: Write0
Secondary1: Write0
Secondary2: Write0
Primary: Writeprev
Secondary1: Writeprev
Secondary2: Writeprev
t3
Primary is aware of successful replication to Secondary1 and sends
acknowledgement to client
Primary: Write0
Secondary1: Write0
Secondary2: Write0
Primary: Write0
Secondary1: Writeprev
Secondary2: Writeprev
t4 Primary is aware of successful replication to Secondary2
Primary: Write0
Secondary1: Write0
Secondary2: Write0
Primary: Write0
Secondary1: Writeprev
Secondary2: Writeprev
t5
Secondary1 receives notice (through regular replication mechanism) to
update its snapshot of its most recent w: “majority” write
Primary: Write0
Secondary1: Write0
Secondary2: Write0
Primary: Write0
Secondary1: Write0
Secondary2: Writeprev
t6
Secondary2 receives notice (through regular replication mechanism) to
update its snapshot of its most recent w: “majority” write
Primary: Write0
Secondary1: Write0
Secondary2: Write0
Primary: Write0
Secondary1: Write0
Secondary2: Write0
Read Concern: local example
Read Target Time T State of Data
Primary After t0 Data reflects Write0.
Secondary1 Before t1 Data reflects Writeprev
Secondary1 After t1 Data reflects Write0
Secondary2 Before t2 Data reflects Writeprev
Secondary2 After t2 Data reflects Write0
COMP5338 Advanced Data Models – 2018 (Y. Zhou) 04-42
Read Preference:
Primary,
PrimaryPreferred,
SecondaryPreferred,
Nearest
Read uncommitted
before t3
Read Concern: available has similar behaviour
Read Concern: majority example
Read Target Time T State of Data
Primary Before t3 Data reflects Writeprev
Primary After t3 Data reflects Write0
Secondary1 Before t5 Data reflects Writeprev
Secondary1 After t5 Data reflects Write0
Secondary2 Before or at t6 Data reflects Writeprev
Secondary2 After t6 Data reflects Write0
COMP5338 Advanced Data Models – 2018 (Y. Zhou) 04-43
Read Preference:
Primary,
PrimaryPreferred,
SecondaryPreferred,
Nearest
Primary has the most
recent update Write0
since t1, but before t3
it knows that majority of
the replica has the
previous value Writeprev
t2 Secondary2 applies write0
Primary: Write0
Secondary1: Write0
Secondary2: Write0
Primary: Writeprev
Secondary1: Writeprev
Secondary2: Writeprev
t3
Primary is aware of successful replication to Secondary1 and sends
acknowledgement to client
Primary: Write0
Secondary1: Write0
Secondary2: Write0
Primary: Write0
Secondary1: Writeprev
Secondary2: Writeprev
Consequence
◼ When write concern is set to majority,
Read concern “local” can return the latest value as soon as it is
applied locally, it has the danger of read uncommitted, e.g. return a
value that should not exist if rolled back
Read concern “majority” will return old value some time after the
write happens even if the target is set to primary node; it does not
return uncommitted value regardless of the targe node.
◼ Customized setting will have better scalability by allowing
read to happen at the secondary node
There are various trade offs depending on the actual setting
COMP5338 Advanced Data Models – 2018 (Y. Zhou) 04-44
Outline
◼ Indexing
◼ Replication
◼ Sharding
04-45COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Sharding
◼ MongoDB uses sharding mechanism to scale out
◼ The main database engine mongod is not distributed
◼ Sharding is achieved by running an extra coordinator service mongos
together with a config server set on top of mongod
http://www.mongodb.org/display/DOCS/Sharding+Introduction
http://docs.mongodb.org/manual/core/sharding/
04-46COMP5338 Advanced Data Models – 2018 (Y. Zhou)
http://www.mongodb.org/display/DOCS/Sharding+Introduction
http://docs.mongodb.org/manual/core/sharding/
Shard
◼ Each shard is a standalone
mongod server or a replica
set ( with one primary and a
few secondary members)
◼ Each shard stores a portion
of large collection partitioned
by a shard key
◼ Primary Shard
Every database has a primary
shard that holds all unsharded
collections for a database
04-47COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Shard Keys
◼ The shard key determines the distribution of the
collection’s documents among the cluster’s shards.
◼ Data stored in each shard are organized as fixed sized chunks (usually
64MB)
Chunk is the basic data distribution units (we move around chunks between
shards)
04-48COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Sharding strategy
◼ Hash Sharding vs. Range Sharding
04-49COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Shard key selection
◼ The ideal shard key should distribute data and query evenly in shards
High cardinality
Gender is not a good sharding key candidate
Distribution not skewed
Key with zipf value distribution is not a good sharding key candiddate
Change pattern
Timestamp is perhaps not a very good shard key candidate
04-50COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Monotonically increasing shard key would create
insert hot spot
Shard key with skewed distribution would create
query hot spot
Example of good sharding key
user collection partitioned by field “state” as shard key
04-51COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Config Server
◼ Config servers maintain the shard metadata in a config
database.
Chunks and their locations in shard
◼ Config servers do not run as replica set, it runs two-phase
commit protocol to ensure strong consistency among
copies
3 server is recommended as optimal setting
more instances would increase coordination cost among the config
servers.
user collection partitioned by field “name” as shard key and
are stored as chunks in different shards
04-52COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Routing Processes — mongos
◼ In a sharded cluster, mongos is the front end for client
request
When receiving client requests, the mongos process routes the
request to the appropriate server(s) and merges any results to be
sent back to the client
It has no persistent state, the meta data are pulled from config
servers
There is no limits on the number of mongos processes. They are
independent to each other
◼ Query types
Targeted at a single shard or a limited group of shards based on the
shard key.
Broadcast to all shards in the cluster that hold documents in a
collection.
04-53COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Targeted and Global Operations
◼ Assuming shard key is field x
Operation Type Execution
db.food.find({x:300}) Targeted Query a single shard
db.foo.find( { x : 300, age : 40 } ) Targeted Query a single shard
db.foo.find( { age : 40 } ) Global Query all shards
db.foo.find() Global Query all shards, sequential
db.foo.find(…).count() Variable Same as the corresponding
find() operation
db.foo.count() Global Parallel counting on each shard,
merge results on mongos
db.foo.insert(
db.foo.createIndex(…) Global Parallel indexing on each shard
04-54COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Sharding Restrictions and Limitations
◼ When shard key is not the _id key, the uniqueness of the
_id values can only be guaranteed at application level
◼ Certain operations are not supported in sharded
environments
◼ Shard Key Limitation
Shard key cannot be multikey index, text index or geospatial index
Shard key is immutable
Shard key value in a document is immutable
http://docs.mongodb.org/manual/reference/limits/
04-55COMP5338 Advanced Data Models – 2018 (Y. Zhou)
Summary
◼ MongoDB is a general purpose NoSQL storage system
Lots of resemblance with RDBMS
Indexing, ad-hoc queries
It supports spatial queries
Single document update is always atomic
Latest version has support for multi-document transaction
Application level 2phase commit can be implemented for earlier versions
◼ Key Features
Flexible schema
Collection and Document
Documents are stored in binary JSON format
Natural support for object style query (array and dot notation)
Scalability
Sharding and Replication
Various consistency levels achieved through write concern, read preference
and read concern property combination
04-56COMP5338 Advanced Data Models – 2018 (Y. Zhou)