Preparation for the workshop – ready, set ……
▪ connect to Flux – flux.qa and be ready to answer questions
▪ login to Oracle:
–SQL Developer (MoVE or Local) OR
Copyright By PowCoder代写 加微信 powcoder
–ORDS https://ora-fit.ocio.monash.edu:8441/ords/sql-developer
Non Relational Databases Big Data
Workshop 2022 S1
Data Growth 2021
Source: https://www.domo.com/learn/data-never-sleeps-9
Data Growth
Source: https://www.seagate.com/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf
Railway In Mining
Pilbara region, WA
Trains perform round trips from the mining site to the port
Loaded minerals and ores
Length: > 2KM Load: > 10 Ton/car Speed: 5-10 Km/hr
Instrumented Ore Car (IOC) Expensive Sensors
Trained Professionals to maintain the sensors
Solution adopted: Network Structure
External Server
Sensor Node Central Node
Central Node Process
00:11:1D GPS
lng: ### } sensors:{
“00:11:14”: { acceler:
strain: } “00:11:1A”: {
acceler: } “00:11:1D”: {
accler: }}
Central Node
127.0.0.1:9999
Note format of data key:value pairs – JSON format
How Big is the Data?
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.13.0-46-generic x86_64)
* Documentation: https://help.ubuntu.com/
MongoDB shell version: 3.0.4
connecting to: test
2015-11-06T11:49:56.337+1100 I CONTROL [initandlisten] 2015-11-06T11:49:56.337+1100 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is ‘always’. 2015-11-06T11:49:56.337+1100 I CONTROL [initandlisten] ** 2015-11-06T11:49:56.337+1100 I CONTROL [initandlisten]
> db.sensordata.find().pretty()
We suggest setting it to ‘never’
“_id” : ObjectId(“5663ce2ce4b099b72ceca8c2”),
“gps”: { “GPSLat” : -21.63893238,”GPSLon” : 116.70659242},
“SomatTime” : 74711, “CarOrient” : 30.2, “EorL” : 1,
“Direction” : “ToPort “, “minSND” : 0, “iSegment” : 5876, “maxSND” : 0,
“PipeA” : 0, “maxCFB” : 0, “minCFB” : 0, “Bounce” : 0, “minCFA” : 0, “maxCFA” : 0, “kmh” : 30.2, “PipeB” : 0, “Rock” : 0, “accR3” : 0, “accR4” : 0, “maxBounce” : 0, “LATACC” : 0
Type “it” for more >
Quantity Timestamp
Geo-location Direction Acceleration Pressure
Ambient temperature Surface temperature Humidity
16 Sensors 200
Data Returned 12-Jun-2015; 09:35:15
N35◦43.57518,W078◦49.78314 ToPort
73 degrees F 78 degrees F 35%
25 Records Per Second
16 * 200 * 25 = 80,000 records/sec
Big Data Processing
Data Acquisition Data Retrieval
Two main problems:
(1) How to receive data … massive amount of data (2) How to retrieve data … very fast
Advanced metering infrastructure (AMI) – Smart Meters
https://www.victorianenergysaver.vic.gov.au/get-help-with-your- bills/smart-meters-and-how-they-work
https://www.energynetworks.com.au/news/energy-insider/get-smart-when-will-austr alia-realise-the-benefits-of-smart-meters-and-iot/
Q1. Which of the following is NOT a characteristic of Big Data (multiple selections are possible):
A. Scaling Up
C. Veracity
D. Variety
F. Velocity
Big Data Characteristics
– The quantity of data to be stored
▪ Velocity
– The speed at which data enters the system and must be processed
– Variations in the structure of the data to be stored
Big Data Characteristics: Volume
▪ Scaling up: keeping the same number of systems but migrating each one to a larger system
▪ Scaling out: when the workload exceeds server capacity, it is spread out across a number of servers
Scaling continued
▪ Big players, notably Google and Amazon chose Scale Out
– Lots and lots of smaller boxes (“commodity” servers)
– Non relational structure
– Google: Bigtable
• https://research.google/pubs/pub27898/
• https://cloud.google.com/bigtable/docs/overview
• Used for wide range of apps Gmail, Google Earth, YouTube
– Amazon: Dynamo
• http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf
• Based on Dynamo: https://aws.amazon.com/dynamodb/
Scaling continued
▪ Term “NoSQL” coined by in 2009 after calling a …”free meetup about “open source, distributed, non relational databases” or NOSQL for short”…
– http://blog.oskarsson.nu/post/22996139456/nosql-meetup
▪ Characteristics
– Non relational,
– mostly open source,
– distributed (cluster friendly),
– schema-less (no fixed storage schema)
Big Data Characteristics: Velocity
▪ Stream processing: focuses on input processing and requires analysis of data stream as it enters the system
– CERN Large Hadron Collider 600TB per second 1GB per second ▪ Feedback loop processing: analysis of data to produce actionable
Big Data Characteristics: Variety
▪ Structured data: fits into a predefined data model
– Relational databases
– Incoming data decomposed under normalisation rules to fit the data model
▪ Unstructured data: does not fit into a predefined model
– Big Data requires that the data is captured in its natural format as generated without imposing a data model on it
▪ Semi structured data: combines elements of both
▪ Hadoop is not a database
– De facto standard for most Big Data storage and processing
– Java-based framework for distributing and processing very large data sets across clusters of computers
– https://www.geeksforgeeks.org/hadoop-ecosystem/
▪ Important components
– Distribution
• Hadoop Distributed File System (HDFS): low-level distributed file processing system that can be used directly for data storage
– Processing
• MapReduce: programming model that supports processing large data sets
Q2. The four main categories of NoSQL databases are (select multiple answers):
A. Aggregate-aware B. Key-Value
C. Document
E. Column-oriented
NoSQL Data Models
▪ Key-value store
– Each item stored consists of a key and value pair (the value may be a numeric,
a document, an image etc)
– Oracle NoSQL database (community edition available)
• https://www.oracle.com/au/database/technologies/related/nosql.html
NoSQL Data Models continued
▪ Document
– Each item is stored as a document (normally BSON or JSON document, but could be XML)
– Note the variable structure and embedded documents
MongoDB – https://www.mongodb.com/
NoSQL Data Models continued
▪ Column Family (also called Wide Column Store)
– Key points to a set of multiple column values containing related data arranged by column family
Cassandra (used on eBay): https://cassandra.apache.org/
https://www.dummies.com/programming/big-data/columnar-data-in-nosql/
NoSQL Data Models continued
▪ Graph – based on a graph structure
– Unlike the previous three which are aggregation oriented, the
graph model views data at a
highly non aggregated level
– Based on graph theory
– Navigate via relationships (edges) between nodes
– Examples • Neo4j
• HyperGraphDB https://neo4j.com/docs/stable/cypher-cookbook-friend-finding.html
NoSQL Databases
▪ Comparison of NoSQL databases
– https://hostingdata.co.uk/nosql-database/ currently lists 200+ NoSQL databases, including some outside these four models.
▪ Data is distributed on multiple machines via: – Sharding
• One copy of the data spread across multiple machines, or
– Replication
• Same data is spread across multiple machines, increased availability and resilience
– Mixtures of Sharding/Replication
▪ Lots of interesting questions and research around consistency vs availability
▪ Document Database
– Community edition available for download (not required for FIT2094 but
may install – see Applied class)
• https://www.mongodb.com/download-center/community
– MongoDB Shell
• https://docs.mongodb.com/manual/tutorial/getting-started/
– Database
– show dbs, use dbname, db.dropDatabase(), db (show current db)
• Contains collections – show collections
» collection contains documents
MongoDB – Database
Documents -> Collections -> Database
Document structure
▪ MongoDB stores data records as BSON documents (binary JSON documents)
▪ Document composed of field-values pairs
– Field names may be enclosed in quotes (allows spaces in name)
– “groups” field above holds an array of strings, marked with [ ]
Relationships – structuring documents
Denormalised – Embedded Documents Normalised – References
Document sample – Drone Rentals
▪ type – sub document
▪ RentalInfo – array of sub documents
Generate JSON object from Oracle Select – JSON functions
set pagesize 50 — this sets the output page size, prevents the output heading appearing every 5 lines
JSON_OBJECT(
‘drone_id’ VALUE drone_id,
‘type’ VALUE JSON_OBJECT (
‘code’ VALUE dt_code,
‘model’ VALUE dt_model,
‘manufacturer’ VALUE manuf_name
‘carrying_capacity’ VALUE dt_carry_kg,
‘pur_date’ VALUE to_char(drone_pur_date,’YYYY-MM-DD’),
‘pur_price’ VALUE drone_pur_price,
‘total_flighttime’ VALUE drone_flight_time,
‘cost_per_hour’ VALUE drone_cost_hr,
‘RentalInfo’ VALUE JSON_ARRAYAGG (
JSON_OBJECT (
‘rent_no’ VALUE rent_no,
‘bond’ VALUE rent_bond,
‘rent_out’ VALUE to_char(rent_out,’YYYY-MM-DD’),
‘rent_in’ VALUE to_char(rent_in ,’YYYY-MM-DD’),
‘custtrain_id’ VALUE ct_id
ORDER BY rent_no
) FORMAT JSON )
FROM … GROUP BY … ORDER BY
https://docs.oracle.com/en/database/oracle/oracle-database/12.2/adjsn/generation.html
Collect DRONE data from Oracle
1. Create a text document dronedata.txt using the output via Visual Studio Code for the Web
2. Format as JSON and save as dronedata.json
mongoDB – CRUD: CREATE
▪ create collection by inserting documents
– db.collection.insertOne ( ….. JSON ….. );
– db.collection.insertMany – insert an array of JSON documents
• insertMany ([ JSON1, JSON2, …]);
Add the first document to MongoDB
https://www.mongodb.com/docs/v4.4/tutorial/getting-started/
Which version of MongoDB is running: db.version() Which databases do you have access to: show dbs Create/use the drone database: use drone
Which database am I in: db
What collections do I have in this database: show collections
Add the first document to MongoDB continued
Now insert the remainder in one insertMany (note the use of an array [ ] to contain the set of documents)
mongoDB – CRUD: RETRIEVE
▪ Documents retrieved by find method on collection
– db.dronerent.find ({}); or db.dronerent.find ({}).pretty()
• find all
mongoDB – CRUD: RETRIEVE continued
▪ Limit output to specified field (project fields) 1 display 0 suppress
▪ count documents returned
mongoDB – CRUD: RETRIEVE continued
▪ Find some documents
– Predicate Operators: https://docs.mongodb.com/manual/reference/operator/query/
• Example: {
mongoDB – CRUD: RETRIEVE continued
db.dronerent.find ({})
db.dronerent.find ({}).count()
db.dronerent.find ({},{_id: 0, “drone_id”: 1, “type.model”: 1})
Find some documents
a. find the details of drone id 102
b. find the details of all drones of type DIN2
c. find the details of all drones which have a carrying capacity > 4
➢ display drone id, model and cost per hour
d. find the details of all drones which have a carrying capacity <= 5 and a cost per hour of < 50
➢ display drone id, carrying capacity and cost per hour
e. how many drones are still on loan?
f. which drones are still out on loan
➢ display drone id, when the drone went out, check your answer by doing a count first
mongoDB - CRUD: RETRIEVE continued
Find some documents
a. find the details of drone id 102
db.dronerent.find ({"drone_id": {$eq: 102}})
b. find the details of all drones of type DIN2
db.dronerent.find ({"type.code": {$eq: "DIN2"}})
c. find the details of all drones which have a carrying capacity > 4
➢ display drone id, model and cost per hour
db.dronerent.find (
{“carrying_capacity”: {$gt: 4}},
{“drone_id”: 1, “type.model”: 1, “cost_per_hour”: 1, “_id”: 0}
mongoDB – CRUD: RETRIEVE continued
d. find the details of all drones which have a carrying capacity <= 5 and a cost per hour of < 50 ➢ display drone id, carrying capacity and cost per hour
db.dronerent.find (
{ $and: [{"carrying_capacity": {$lte: 5}},{"cost_per_hour": {$lt: 50}}]},
{"drone_id": 1, "carrying_capacity": 1, cost_per_hour: 1, "_id": 0}
e. how many drones are still on loan?
db.dronerent.find ({"RentalInfo.rent_in": {$eq: null}}).count()
f. which drones are still out on loan
➢ display drone id, when the drone went out, check your answer by doing a count first
db.dronerent.find (
{"RentalInfo.rent_in": {$eq: null}},
{"drone_id":1, "RentalInfo.rent_out":1, "_id":0 }
mongoDB - CRUD: UPDATE
▪ Update documents via update or updateOne
– uses $set to assign value
– updateOne ({query condition},{update to carry out}
mongoDB - CRUD: UPDATE
▪ Update within an array
– $ placeholder to update the first element that matches the query condition
mongoDB - CRUD: UPDATE Result
▪ Update within an array
BEFORE AFTER
mongoDB - CRUD: DELETE
▪ Delete a document
– via db.dronerent.deleteOne or db.dronerent.deleteMany
▪ Remove current database (local client only) – db.dropDatabase()
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com