Non-relational data and NoSQL
Non-relational data and NoSQL
Copyright By PowCoder代写 加微信 powcoder
Relational Databases
Relational DBs replaced file based data processing and offered huge improvements
Invented by . Codd in 1970s
Very rigid two-dimensional design for data organization
Based on extensive utilization of SQL
Only works with structured data
Great for management of transactional records
Departing from traditional design
Growth of Internet
Numerous applications
Hugely increased demand for information
Variety of types of data
Change in data storage
Processing speeds
Volume of data
Web3 ideas
https://www.freecodecamp.org/news/what-is-web3/
Storage Capacity Terms
Business Intelligence Systems
Business intelligence (BI) systems are information systems that:
assist managers and other professionals in the analysis of current and past activities and in the prediction of future events
do not support operational activities, such as the recording and processing of orders
these are supported by transaction processing systems
support management assessment, analysis, planning and control
BI systems fall into two broad categories:
reporting systems that sort, filter, group, and make elementary calculations on operational data
data mining applications that perform sophisticated analyses on data; analyses that usually involve complex statistical and mathematical processing
The Relationship Among
Operational and BI Applications
Components of a Data Warehouse
Data preparation
Problems with Operational Data
“Dirty data,” examples include:
“G” for gender, “213” for age
Missing values, inconsistent data
Nonintegrated data (data from multiple sources)
Incorrect format (ex: too many or not enough digits
Too much data (ex: an excess number of columns)
Data may need to be transformed for use in a data warehouse.
{CountryCode CountryName}
“US” “United States”
Email address to Email domain
“somewhere.com”
Adding dimension to the data
Aggregated datasets
Multidimensional models
NoSQL was first used in 1998 by while naming his lightweight, open-source “relational” database that did not use SQL.
Created as a response to the needs of processing semi-structured, non-structured, and different kinds of data
Departed from 2 dimensional view of data
Can process
Structured
Unstructured
Unstructured Big Data
Utilized by companies such as Google, Twitter, LinkedIn, Facebook, etc.
Capabilities and Advantages
Can be purpose-built to specific data models
“Tableless” and opaque data storage
Can manage unstructured or multi-structured data
No need for a predefined schema
Better manage abstract data
Support graph data modeling
Support document-oriented data store
Less strict consistency (e.g. eventual consistency) models
Better operational performance
Require fewer computing resources
More horizontal and vertical scalability
CAP Theorem (Brewer’s theorem)
Deals with the management of non-relational databases:
The three guarantees that cannot be met simultaneously are:
Consistency
Availability
Partition Tolerance
CAP Principle
Consistency: The data within the database remains consistent, even after an operation has been executed. For instance, after updating a system, all clients will see the same data.
Availability: The system is constantly on (always available), with no downtime.
Partition Tolerance: Even if communication among the servers is no longer reliable, the system will continue to function. This is because the servers can be partitioned off, into multiple groups which can’t communicate with each other.
ACID and BASE Provide Consistency
ACID = Atomicity + Consistency + Isolation + Durability
This concept is used with non-relational DBs as well!
BASE (basically available, soft state, eventually consistent) approach is used for aggregate data stores and is an alternative and less rigid approach than ACID
BASE design
Deals with certain rate of failure acceptance across the partitioned databases
Data is decomposed into functional groups
Allows to support much higher volume of transactions
Allows for decentralized DB approach
Scalability and cost efficiency benefits
Non-relational data storage
Schema-free and non-relational
Allows rapid changes and replication
Horizontally scalable
NoSQL uses data stores optimized for specific purposes
Four storage categories
Key-Value storage
Document storage
Wide Column storage
Graph database
Key-Value stores
Each key is associated with only one value in a collection
Dictionary of key-value pairs
Variety of options for data type classifications
Simplest database types among NoSQL databases
Document-oriented stores
Focus is on storage document-oriented information (semi-structured data)
Pairs keys with document-type data structure which maps keys to the documents
Does not require the data to be split over the tables
Wide-Column Stores
Used table and column-row approach
Names and data format can vary between the rows in the same table
Uses a grouping of columns referred to as families (referred to as Column Family DBs)
Graph Stores
Uses graph-based structures
Uses nodes, edges, and properties to organize and store data
Each node can represent an entity (object) and is connected by an edge (edges) to other nodes to form relationships
Each node has a unique identifier
Each edge also has a unique identifier
Allows to establish a network of connections
Graph Stores
Relational Vs. Non-relational DB
Relational Non-relational
Very rigid structure Flexibility in structure
Does not accommodate all modern needs for data organization Plenty of flexible options
Used primary to FK connections, relies on atomic attributes, stronger mechanisms to enforce business logic Allows for complex and custom data stores, relies on key-value pairs where value does not have to be an atomic attribute
Build to enforce referential integrity Cannot enforce relationships between items
Harder to scale Easy horizontal scalability
Popular Non-Relational/NoSQL Databases
AmazonDynamoDB
IBM Cloudant
And many more …
https://www.trustradius.com/nosql-databases
Database classification
“Hadoop is a highly scalable analytics platform for processing large volumes of structured and unstructured data. Multiple petabytes of data spread across hundreds or thousands of physical storage servers or nodes.”
Future of databases
References:
https://docs.microsoft.com/en-us/azure/architecture/data-guide/big-data/non-relational-data
https://www.trustradius.com/nosql-databases
https://aws.amazon.com/nosql/
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com