CS计算机代考程序代写 SQL javascript database Java file system hadoop Excel PowerPoint Presentation

PowerPoint Presentation

Database Systems Infrastructure

Copyright © 2012, SAS Institute Inc. All rights reserved.
For this and next week, we will be looking at Big data. So this should give you a basic idea of how everything is connected together.

There are extra materials for this lecture – all the text should be on slides or on the video.

This week, we will talk about understanding of big data. Next week, we will talk about Hadoop, MapReduce, and Sparks behind the building of big data.

INFS5710 Week 1

Big Data
What is Big Data?
Buzz Word!
Cannot fit into a USB flash drive
A large and complex dataset
Social media
IoT streaming of data
Capturing of Media
3Vs and more Vs
Big Data is classified into three types:
Structured  
Unstructured  
Semi-Structured 

©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
.
What is Big Data?
Some people would say it is a buzz word dreamed up by consultants.

Some would say it cannot fit into a USB flash drive; some might say if it takes more than an hour to retrieve the data; and others might express in terms of disk size such as more than 1Gb, or 10Gb, or 1Tb.

Therefore, in latent terms, Big Data can be described as a large and complex dataset. What is considered as large today, will be small in ten years’ time as amount of data captured in any means will continue to grow. Thus, Big Data or data in general is not going away.
 
Social media capturing text
Internet of Things (IOT) capturing interval data and timestamp
Capturing of media (such as pictures)

Social and media which we define as unstructured data, are the key contributors to the grow of Big Data. Unstructured data can be described as no formal patterns.

We will cover more on the Vs, and also on three types of Big Data which are:
Structured  
Unstructured  
Semi-Structured 

©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
.
As stated on the slide…

Only a few years ago, this is what people think…

©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
.
As stated on the slide…

Today, everything is Big Data!

Big Data
(1) Volume: Quantity of data to be stored
Scaling up is keeping the same number of systems but migrating each one to a larger system
Scaling out means when the workload exceeds server capacity, it is spread out across a number of servers
(2) Velocity: Speed at which data is entered into system and must be processed
Stream processing focuses on input processing and requires analysis of data stream as it enters the system
Feedback loop processing refers to the analysis of data to produce actionable results

(e.g., 100 GB to 100 TB)
storage issue
storage issue; data need
to be processed rapidly
(e.g., 600 TB per sec of raw data,
only 1GB per sec stored)

©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
.
The first of 3 Vs is volume…

As you have data, you need more storage. You will get more servers to overcome the issue. Eventually, it comes to a stage, you have to say to yourself, the amount of disk space required double every two weeks, so you have think of a better way.

The speed to access the data and processed the data decreases as more data need to be processed.

As for stream processing, it requires input processing as the data comes in…

Big Data
(3) Variety: Variations in the structure of data to be stored
Structured data fits into a predefined data model
Unstructured data dies not fit into a predefined model
Other characteristics:
Variability: Changes in meaning of data based on context
Sentimental analysis attempts to determine attitude
Veracity: Trustworthiness of data
Value: Degree data can be analyzed for meaningful insight
Visualization: Ability to graphically present data to make it understandable to users

relational DB
(maps, imagine, emails, texts, tweets, videos, …)
sarcasm (does
“good” really
mean good?)
accuracy

©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
.
The last V is Variety. Variety is stored different structure of data – structured and unstructured data

Other Vs are: variability, veracity, value (mention in the previous example) and visualisation. (Ask Hijab – it is her Master thesis)

1
2
3
4
5
predict
modify prediction

©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
.
When a user clicks on a link to request to view details of a book, like before I purchase this text book, I will look at table of contents, feedback on the book and a sample chapter.

Data is captured which book is requested by the user

The exact book is found, but it is also attempted to determine/predict which other books and products the user may want to read or interest in purchasing. Before, it only returns the details related to the requested book but now to add value to the business to encourage you to purchase more by showing you other books. This was done by looking at the history search pattern done by other users. For example, User One looked Book A, he then looked at Book B. User Two looked Book A, she then looked at Book B. User Three looked Book B, he then looked at Book A etc… You do not have to be Einstein to work out that if you look at Book A, then it is very likely you will be interested in Book B, then you can predict that is likely to be the case, so if a user looks at Book A, you will show the person Book B.

Return the details book requested along with other recommended book details!

Hopefully, the user will click on one of the recommendations shown.

(1) Structured Data
Any data types that clearly defined be stored, accessed and processed in a fixed format can be defined a structured data
A good example is data stored in a table in a normalised database. You can easily search and retrieve the data from a table using SQL tools. For instance, in the Sales_Person table, we can find the Year of Hire for Sales_Person No 101 is 1995, Cookie Biscuit.

©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
.
Now, we finish talking about the Vs, we look at structured, unstructured, and semi-structured data.

As stated on the slide…

(2) Unstructured Data

Unstructured data can simply be described as not structured data; that is, anything that cannot be described as structured data.
Examples of Unstructured data include free text, videos, images, etc. The ability to analyse social media such as Facebook, Twitter, and WeChat, and images are among the key drives behind the growth of Big Data.

©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
.
As stated on the slide…

Differences between Structured Data and Unstructured Data

Ref: https://www.datamation.com/big-data/structured-vs-unstructured-data.html

©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
.
As stated on the slide…

(3) Semi-Structured
Semi-Structured data is crossed between Structured Data and Unstructured Data, i.e., it has both forms of data. Examples include Electronic Data Interchange (EDI), Markup Language XML, and Open standard JSON (JavaScript Object Notation).
For example, as shown below, XML document is organised in a hierarchy with open and close tags and encoded rules that defines a human- and machine-readable format.

©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
.
As stated on the slide…

Oracle
Oracle
Database
Flat Files
Machine Learning
Prepared by Vincent Pang, Feb. 2021 Note: In-Memory Database (e.g. SAP Hana) is an alternative data model not shown here
ETL (Data
Cleansing)
Data (DW to BD, or
vice versa or both)
RelationalDatabaseBig DataDataWarehouseNormalisationReporting (Business Intelligence and Visualisation) and Business Analysis (End Users)(De-Normalised)External Data (e.g. Excel)Hadoop Distributed File System (HDFS) and MapReduceEntity Relationship Model (ERM)Unstructured Data(Social Media)Structured Data (Internet of Things (IOT))HadoopSQLSQLSpark and NoSQL (and other tools)Data StreamingData Streaming(“Not Normalised”)External Data (e.g. Excel)Star Schema

/docProps/thumbnail.jpeg