COMP9313: Big Data Management
Course web site: http://www.cse.unsw.edu.au/~cs9313/
: NoSQL and HBase
Part 1: Introduction to NoSQL
What does RDBMS provide?
❖ Relational model with schemas
❖ Powerful, flexible query language (SQL)
❖ Transactional semantics: ACID
❖ Rich ecosystem, lots of tool support (MySQL, PostgreSQL, etc.)
What is NoSQL?
❖ The name stands for Not Only SQL
❖ Does not use SQL as querying language
❖ Class of non-relational data storage systems
❖ The term NOSQL was introduced by when an event was organized to discuss open-source distributed databases
❖ It’s not a replacement for a RDBMS but compliments it
❖ All NoSQL offerings relax one or more of the ACID properties (will talk about the CAP theorem)
What is NoSQL?
❖ Key features (advantages):
➢ non-relational
➢ don’t require strict schema
➢ data are replicated to multiple nodes (so, identical & fault-
tolerant) and can be partitioned: down nodes easily replaced
no single point of failure
➢ horizontal scalable
➢ cheap, easy to implement (open-source)
➢ massive write performance
➢ fast key-value access
❖ Web apps have different needs (than the apps that RDBMS were designed for)
➢ Low and predictable response time (latency) ➢ Scalability & elasticity (at low cost!)
➢ High availability
➢ Flexible schemas / semi-structured data
➢ Geographic distribution (multiple datacenters) ❖ Web apps can (usually) do without
➢ Transactions / strong consistency / integrity ➢ Complex queries
❖ Google (BigTable)
❖ LinkedIn (Voldemort)
❖ Facebook (Cassandra)
❖ Twitter (HBase, Cassandra) ❖ Baidu (HyperTable)
Who are Using NoSQL?
Three Major Papers for NoSQL
❖ Three major papers were the seeds of the NoSQL movement ➢ BigTable (Google)
➢ Dynamo (Amazon)
Ring partition and replication
Gossip protocol (discovery and error detection) Distributed key-value data store
Eventual consistency
➢ CAP Theorem (discuss in the next few slides)
CAP Theorem
❖ Suppose three properties of a distributed system (sharing data) ➢ Consistency:
all copies have same value ➢ Availability:
reads and writes always succeed ➢ Partition-tolerance:
system properties (consistency and/or availability) hold even when network failures prevent some machines from communicating with others
CAP Theorem
❖ Brewer’s CAP Theorem:
➢ For any system sharing data, it is “impossible” to guarantee
simultaneously all of these three properties
➢ You can have at most two of these three properties for any shared-data system
❖ Very large systems will “partition” at some point:
➢ That leaves either C or A to choose from (traditional DBMS
prefers C over A and P )
➢ In almost all cases, you would choose A over C (except in specific
applications such as order processing)
CAP Theorem: Consistency
Availability Consistency
Partition tolerance
All client always have the same view of the data
Once a writer has written, all readers will see that write
❖ Two kinds of consistency:
➢ strong consistency – ACID (Atomicity Consistency Isolation
Durability)
➢ weak consistency – BASE (Basically Available Soft-state Eventual consistency )
ACID & CAP
➢ A DBMS is expected to support “ACID transactions,” processes
➢ Atomicity: either the whole process is done or none is
➢ Consistency: only valid data are written
➢ Isolation: one operation at a time
➢ Durability: once committed, it stays that way
➢ Consistency: all data on cluster has the same copies
➢ Availability: cluster always accepts reads and writes
➢ Partition tolerance: guaranteed properties are maintained even when network failures prevent some machines from communicating with others
Consistency Model
❖ A consistency model determines rules for visibility and apparent order of updates
❖ Example:
➢ Row X is replicated on nodes M and N
➢ Client A writes row X to node N
➢ Some period of time t elapses
➢ Client B reads row X from node M
➢ Does client B see the write from client A?
➢ Consistency is a continuum with tradeoffs
➢ For NOSQL, the answer would be: “maybe”
➢ CAP theorem states: “strong consistency can’t be achieved at the same time as availability and partition-tolerance”
Eventual Consistency
❖ When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent
❖ For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service
❖ Known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID
❖ http://en.wikipedia.org/wiki/Eventual_consistency
Eventual Consistency
❖ The types of large systems based on CAP aren’t ACID they are BASE (http://queue.acm.org/detail.cfm?id=1394128):
➢ Basically Available – system seems to work all the time
➢ Soft State – it doesn’t have to be consistent all the time
➢ Eventually Consistent – becomes consistent at some later time
❖ Everyone who builds big applications builds them on CAP and BASE: Google, Yahoo, Facebook, Amazon, eBay, etc.
CAP Theorem: Availability
Availability Consistency
Partition tolerance
System is available during software and hardware upgrades and node failures.
❖ Traditionally, thought of as the server/process available five 9’s (99.999 %).
➢ However, for large node system, at almost any point in time there’s a good chance that a node is either down or there is a network disruption among the nodes.
➢ Want a system that is resilient in the face of network disruption 8.17
CAP Theorem: Partition
A system can continue to operate in the presence of a network partitions.
Availability Consistency
Partition tolerance
CAP Theorem
Availability Consistency
Partition tolerance
CAP Theorem: You can have at most two of these properties for any shared-data system
NoSQL Taxonomy
❖ Key-Value stores
➢ Simple K/V lookups (DHT)
❖ Column stores
➢ Each key is associated with many attributes (columns)
➢ NoSQL column stores are actually hybrid row/column stores
Different from “pure” relational column stores! ❖ Document stores
➢ Store semi-structured documents (JSON) ❖ Graph databases
➢ Neo4j, etc.
➢ Not exactly NoSQL
can’t satisfy the requirements for High Availability and Scalability/Elasticity very well
❖ Focus on scaling to huge amounts of data
❖ Designed to handle massive load
❖ Based on Amazon’s dynamo paper
❖ Data model: (global) collection of Key-value pairs ❖ Dynamo ring partitioning and replication
❖ Example: (DynamoDB)
➢ items having one or more attributes (name, value)
➢ An attribute can be single-valued or multi-valued like set. ➢ items are combined into a table
❖ Basic API access:
➢ get(key): extract the value given a key
➢ put(key, value): create or update the value given its key ➢ delete(key): remove the key and its associated value
➢ execute(key, operation, parameters): invoke an operation to the value (given its key) which is a special data structure (e.g. List, Set, Map …. etc)
➢ very fast
➢ very scalable (horizontally distributed to nodes based on key) ➢ simple data model
➢ eventual consistency
➢ fault-tolerance
Can’t model more complex data structure such as objects
Data model
set of couples (key, {attribute}), where attribute is a couple (name, value)
restricted SQL; select, delete, GetAttributes, and PutAttributes operations
Salvatore Sanfilippo
set of couples (key, value), where value is simple typed value, list, ordered (according to ranking) or unordered set, hash value
primitive operations for each value type
like SimpleDB
simple get operation and put in a context
like SimpleDB
similar to Dynamo
❖ Can model more complex objects
❖ Inspired by Lotus Notes
❖ Data model: collection of documents
❖ Document: JSON (JavaScript Object Notation is a data model, key- value pairs, which supports objects, records, structs, lists, array, maps, dates, Boolean with nesting), XML, other semi-structured formats.
❖ Example: (MongoDB) document ➢ {Name:”Jaroslav”,
Address:”Malostranske nám. 25, 118 00 Praha 1”,
Grandchildren: {Claire: “7”, Barbara: “6”, “Magda: “3”, “Kirsten: “1”, “Otis: “3”, Richard: “1“}
Phones: [ “123-456-7890”, “234-567-8963” ] }
Data model
object-structured documents stored in collections;
each object has a primary key called ObjectId
manipulations with objects in collections (find object or objects via simple selections and logical expressions, delete, update,)
document as a list of named (structured) items (JSON document)
by key and key range, views via Javascript and MapReduce
❖ Based on Google’s BigTable paper
❖ Like column oriented relational databases (store data in column order)
but with a twist
❖ Tables similarly to RDBMS, but handle semi-structured
❖ Data model:
➢ Collection of Column Families
➢ Column family = (key, value) where value = set of related columns (standard, super)
➢ indexed by row key, column key and timestamp
❖ One column family can have variable numbers of columns ❖ Cells within a column family are sorted “physically”
❖ Very sparse, most cells have null values
❖ Comparison: RDBMS vs column-based NoSQL
➢ Query on multiple tables
RDBMS: must fetch data from several places on disk and glue
Column-based NoSQL: only fetch column families of those columns that are required by a query (all columns in a column family are stored together on the disk, so multiple rows can be retrieved in one read operation data locality)
Data model
set of couples (key, {value})
selection (by combination of row, column, and time stamp ranges)
groups of columns (a BigTable clone)
JRUBY IRB-based shell (similar to SQL)
like BigTable
HQL (Hypertext Query Language)
Apache (originally Facebook)
columns, groups of columns corresponding to a key (supercolumns)
simple selections on key, range queries, column or columns ranges
(hashed or ordered) tables, typed arrays, flexible schema
selection and projection from a single table (retrieve an arbitrary single record by primary key, range queries, complex predicates, ordering, top-k)
❖ Focus on modeling the structure of data (interconnectivity) ❖ Scales to the complexity of data
❖ Inspired by mathematical Graph Theory (G=(E,V))
❖ Data model:
➢ (Property Graph) nodes and edges
Nodes may have properties (including ID) Edges may have labels or roles
➢ Key-value pairs on both
❖ Interfaces and query languages vary
❖ Single-step vs path expressions vs full recursion ❖ Example:
➢ Neo4j, FlockDB, InfoGrid …
NoSQL Pros/Cons
❖ Advantages
➢ Massive scalability
➢ High availability
➢ Lower cost (than competitive solutions at that scale) ➢ (usually) predictable elasticity
➢ Schema flexibility, sparse & semi-structured data
❖ Disadvantages
➢ Don’t fully support relational features
no join, group by, order by operations (except within partitions)
no referential integrity constraints across partitions
➢ No declarative query language (e.g., SQL) → more programming ➢ Eventual consistency is not intuitive to program for
Makes client applications more complicated
➢ No easy integration with other applications that support SQL
➢ Relaxed ACID (see CAP theorem later) → fewer guarantees 8.31
Conclusion
❖ NOSQL database cover only a part of data-intensive cloud applications (mainly Web applications)
❖ Problems with cloud computing:
➢ SaaS (Software as a Service or on-demand software) applications require enterprise-level functionality, including ACID transactions, security, and other features associated with commercial RDBMS technology, i.e. NOSQL should not be the only option in the cloud
➢ Hybrid solutions:
Voldemort with MySQL as one of storage backend deal with NOSQL data as semi-structured data
->integrating RDBMS and NOSQL via SQL/XML
Part 2: Introduction to HBase
What is HBase?
❖ HBase is an open-source, distributed, column-oriented database built on top of HDFS based on BigTable
➢ Distributed – uses HDFS for storage ➢ Row/column store
➢ Column-oriented – nulls are free
➢ Multi-Dimensional (Versions)
➢ Untyped – stores byte[]
❖ HBase is part of Hadoop
❖ HBase is the Hadoop application to use when you require real-time read/write random access to very large datasets
➢ Aim to support low-latency random access
❖ A sparse, distributed, persistent multi-dimensional sorted map ❖ Sparse
➢ Sparse data is supported with no waste of costly storage space
➢ HBase can handle the fact that we don’t (yet) know that
information
➢ HBase as a schema-less data store; that is, it’s fluid — we can add to, subtract from or modify the schema as you go along
❖ Distributed and persistent
➢ Persistent simply means that the data you store in HBase will
persist or remain after our program or session ends
➢ Just as HBase is an open source implementation of BigTable, HDFS is an open source implementation of GFS.
➢ HBase leverages HDFS to persist its data to disk storage.
➢ By storing data in HDFS, HBase offers reliability, availability, seamless scalability and high performance — all on cost effective distributed servers.
What is HBase? (
❖ Multi-dimensional sorted map
➢ A map (also known as an associative array) is an abstract
collection of key-value pairs, where the key is unique.
➢ The keys are stored in HBase and sorted.
➢ Each value can have multiple versions, which makes the data model multidimensional. By default, data versions are implemented with a timestamp.
HBase: Part of Hadoop’s Ecosystem
HBase is built on top of YARN and HDFS
HBase files are internally stored in HDFS
HBase vs. HDFS
❖ Both are distributed systems that scale to hundreds or thousands of nodes
❖ HDFS is good for batch processing (scans over big files) ➢ Not good for record lookup
➢ Not good for incremental addition of small batches
➢ Not good for updates
❖ HBase is designed to efficiently address the above points ➢ Fast record lookup
➢ Support for record-level insertion
➢ Support for updates (not in place)
❖ HBase updates are done by creating new versions of values
HBase vs. HDFS
If application has neither random reads or writes ➔ Stick to HDFS
HBase Characteristics
❖ Tables have one primary index, the row key.
❖ No join operators.
❖ Scans and queries can select a subset of available columns, perhaps by using a wildcard.
❖ There are three types of lookups:
➢ Fast lookup using row key and optional timestamp. ➢ Full table scan
➢ Range scan from region start to end.
❖ Limited atomicity and transaction support.
➢ HBase supports multiple batched mutations of single rows only. ➢ Data is unstructured and untyped.
❖ No accessed or manipulated via SQL.
➢ Programmatic access via Java, HBase shell, Thrift (Ruby, Python, Perl, C++, ..) etc.
Too Big, or Not Too Big
❖ Two types of data: two big, or not too big
❖ If data is not too big, a relational database should be used
➢ The model is less likely to change as your business needs change. You may want to ask different questions over time, but if you got the logical model correct, you’ll have the answers.
❖ The data is too big?
➢ The relational model doesn’t acknowledge scale. ➢ You need to:
Add indexes
Write really complex, messy SQL Denormalize
➢ How NoSQL/HBase can help? 8.41
❖ Table: Design-time namespace, has multiple sorted rows.
➢ Atomic key/value container, with one row key
➢ Rows are sorted alphabetically by the row key as they are stored
store data in such a way that related rows are near each other (e.g., a website domain)
➢ A column in HBase consists of a column family and a column qualifier,
which are delimited by a : (colon) character.
❖ Table schema only define it’s Column Families
➢ Column families physically co-locate a set of columns and their values
Column: a key in the k/v container inside a row
Value: a time-versioned value in the k/v container
➢ Each column consists of any number of versions
➢ Each column family has a set of storage properties, such as whether its values should be cached in memory etc.
➢ Columns within a family are sorted and stored together 8.42
Data Model
HBase Data Model (
➢ A column qualifier is added to a column family to provide the index for a
given piece of data
➢ Given a column family content, a column qualifier might be content:html, and another might be content:pdf
➢ Column families are fixed at table creation, but column qualifiers are mutable and may differ greatly between rows.
❖ Timestamp: long milliseconds, sorted descending
➢ A timestamp is written alongside each value, and is the identifier for a
given version of a value.
➢ By default, the timestamp represents the time on the RegionServer when the data was written, but you can specify a different timestamp value when you put data into the cell
➢ A combination of row, column family, and column qualifier, and contains a
value and a timestamp, which represents the value’s version ❖ (Row, Family:
HBase Data Model
HBase is based on Google’s Bigtable model
Column Family
TimeStamp value
HBase Data Model
Column family named “Contents”
Column family named “anchor”
Column “anchor:”
Column qualifier
“CNN.co m”
➢ Byte array
➢ Serves as the primary key for the table
➢ Indexed for fast lookup ❖ Column Family
➢ Has a name (string)
➢ Contains one or more
related columns ❖ Column Qualifier
➢ Belongs to one column family
➢ Included inside the row
familyName:column Name
“com.cnn.w ww”
“ …”
Time Stamp
“content s:”
“ …”
“com.apac he.ww w”
“ …”
“anchor:apache .com”
“anchor:cnnsi.co m”
“ …”
“anchor:my.look. ca”
“ …”
HBase Data Model
Version number for each row
Time Stamp
“content s:”
“ …”
“com.apac he.ww w”
“ …”
“anchor:apache .com”
“anchor:cnnsi.co m”
“anchor:my.look. ca”
“ …”
“ …”
❖ Version Number
➢ Unique within each
➢ By default→ System’s
➢ Data type is Long
➢ Byte array
Column “anchor:”
“CNN.co m”
“com.cnn.w ww”
“ …”
HBase Data Model
Column family: animal:
Column family: repairs:
animal:type
animal:size
repairs:cost
enclosure1
enclosure2
❖ Storage: every “cell” (i.e. the time-versioned value of one column in one row) is stored “fully qualified” (with its full rowkey, column family, column name, etc.) on disk
Column family animal:
Column family repairs:
(enclosure1, t2, animal:type)
(enclosure1, t1, animal:size)
(enclosure1, t1, animal:type)
(enclosure1, t1, repairs:cost)
HBase Data Model
HBase Data Model
➢ The “row” is atomic, and gets flushed to disk periodically. But it
doesn’t have to be flushed into just a single file!
➢ It can be broken up into different files with different properties, an reads can look at ju