程序代写代做代考 database algorithm arm Hive hadoop SQL concurrency data science AWS PowerPoint Presentation

PowerPoint Presentation

CS w186
Introduction to Database Systems
Prof. Joe Hellerstein
Operated this semester by:
Prof. Josh Hug
Lakshya Jain

1

Essential Queries
Why take this class?
What is this class all about?
Who is running this?
How will this class work?

Why? Reason #1: Utility
This class is very, very useful
Data processing backs essentially every app
Databases of one form or another back most apps
The principles taught in this class back nearly everything in computing

Where shall I eat, Database?
Each ratings star added on a Yelp restaurant review translated to anywhere from a 5% to a 9% effect on revenues.
—Harvard Business School, 2011

http://hbswk.hbs.edu/item/the-yelp-factor-are-consumer-reviews-good-for-business

What am I missing, Database?

https://blog.bufferapp.com/instagram-analytics
https://instagram-press.com/blog/2018/07/02/introducing-youre-all-caught-up-in-feed/

What am I missing, Database?

Hey Instagram: that’s Chez Panisse in Berkeley, CA!

Who should I be with, Database?

Funny Tinder Profiles That Will Make You Look Twice

https://www.gotinder.com/press

How does Science work? Database.

Jim Gray
Turing Award Winner
First Berkeley CS PhD

How does Science work? Database.

Experimental

Theoretical

Simulation
Data
Intensive

Astronomy in the 4th Paradigm

Sloan Digital
Sky Survey (SDSS)

Database
Systems
+
Sky Server

http://skyserver.sdss.org

Science in the 4th Paradigm

Astronomy

Connectomics

Cosmological
Physics

Genomics

Oceanography

Home

Your career…
The fundamentals of this class are (and will remain) central to participating in this new and more data-centric world
Many of the details and technologies will change in the coming years
Be prepared to generalize from what you learn here
Keep learning new things

13

Why? Reason #1: Utility
This class is very, very useful
Data processing backs essentially every app
Databases of one form or another back most apps
The principles taught in this class back nearly everything in computing
This material will empower you.

Why? Reason #2: Centrality
Data is at the center of modern society
Data is unique in its nature and significance
Particular and voluminous
Often asymmetric
low value in isolation, high value when aggregated
Difficult to protect

At the center of major issues
Privacy
National Security
Online Misinformation (including Fake News)

18

National Security Data: 2010

Numbers from the guardian
Xkeyscore is latest system (built on federated MySQL servers) replaced Marianas

19

National Security Data: 2018

Data Integrity: Not all Data is Correct
“Any user can change any entry, and if enough users agree with them, it becomes true.”
– Colbert Report 7/31/2007
Asked viewer to update the page on Elephants to reflect a tripling population, forcing Wikipedia to lock the page.
Yet a 2005 Nature study found Wikipedia science articles to be similar in accuracy to Encyclopedia Britannica.

COMEDY CENTRAL VIDEO ARCHIVE VIA WIKIPEDIA
https://en.wikipedia.org/wiki/Reliability_of_Wikipedia
http://www.nature.com/nature/journal/v438/n7070/full/438900a.html
http://www.cc.com/video-clips/z1aahs/the-colbert-report-the-word—wikiality

Data Integrity: Not all Data is Correct

(From the Guardian, Dec 2016)

A Syllogism of Quotes
“information is knowledge”
— Albert Einstein
“knowledge is power”
— Sir Francis Bacon
“with great power comes great responsibility”
— Uncle Ben (Spiderman)

“I could go on and on about all of the amazing work that is happening around the world using data to make lives better everyday, but we also have to address where data is causing more harm than good.”
“Data is such an incredible lever arm for change, we need to make sure that the change that is coming, is the one we all want to see.
So how do we do it? First, there is no single voice that determines these choices. This MUST be community effort.”
https://medium.com/@dpatil/a-code-of-ethics-for-data-science-cda27d1fac1
https://www.oreilly.com/ideas/doing-good-data-science

Berkeley’s New Data Science Major

https://data.berkeley.edu/degrees/data-science-ba

Why? Reason #2: Centrality
Data is at the center of modern society.
Unprecedented in its nature and significance
Particular and voluminous
Often asymmetric
low value in isolation, high value when aggregated
Difficult to protect
The infrastructure determines
what’s possible

Why #3? The Core of Computing
Data growth will continue to outpace computation
Systems for Data at Scale: the core of modern computing

https://www.domo.com/learn/data-never-sleeps-5

Every Minute!

Scale of Scientific Data
Large Hadron Collider, CERN
Raw data: 1MB/event. 600,000,000 events/sec.
= 1.9×1022 bytes/year = 19 ZettaBytes/year
Downsampled: 25GB/sec = 7.88×1017 bytes/year = 788 PetaBytes/year
Downsampled further: 1050MB/sec = 3.3*1016/year = 33 PetaBytes/year

https://home.cern/about/computing/processing-what-record

Forces Driving Data Growth
Ubiquitous sensors and reporting:
Cameras, mobile computing, social media, …
Large collaborative science projects
Philosophy: More Data  More Value?
Enabling Technology
Cheap, Scalable Data
Management Systems

http://hyperboleandahalf.blogspot.com

http://hyperboleandahalf.blogspot.com
30

Why #3? The Core of Computing
Data growth will continue to outpace computation
Systems for Data at Scale: the core of modern computing
Techniques you learn in this class underlie many topics in computing

Essential Queries
Why take this class?
What is this class all about?
Who is running this?
How will this class work?

What is this class all about?
Databases?
What is a database?
Database Management Systems?

Universal Symbol for a Database

Why the Symbol?

Looks Like?

Platters on a Disk Drive

Why the Symbol?

1956: IBM MODEL 350 RAMAC
First Commercial Disk Drive
5MB @ 1 ton
http://www.computerhistory.org/storageengine/first-commercial-hard-disk-drive-shipped
“…We must immediately…attack accounting problems under the philosophy of handling each business transaction as it occurs, rather than under the present condition of batching techniques….”
— F. J. Wesley IBM Senior Manager

Looks Like?

Is This a Database?
Rolodex
Alphbetically ordered cards
Indexed access by first letter

Is This a Database?
A database + “business logic” + user interface?
Most of Tinder’s value is the database itself.

Is This a Database?
Airline reservation systems were one of the earliest pervasive consumer uses of database systems.
IBM/American Airlines’ SABRE system, 1964.
“Semi-Automated Business Research Environment”
Travelocity.com a direct descendant of SABRE
Acquired by Expedia, 1/2015

What is a Database?
Let’s not split hairs.
A database is a large, organized collection of data.
Sometimes confused with a Database Management System (DBMS)
A DBMS is software that stores, manages, and facilitates access to data.

Berkeley Roots!
Ingres / Postgres
Sybase
Informix

Berkeley Roots!
Ingres / Postgres
Sybase
Informix

UC Berkeley
Oracle
IBM

Relational DBMSs
Traditionally DBMS referred to relational databases

RDBMS is a more appropriate term
SQL data description and manipulation language
ACID transaction consistency
Durable writes (prevent data loss)
Mature technologies …

Ranking of DBMS Technologies 2019
http://db-engines.com/en/ranking
Based on #mentions (e.g., stack overflow), google trends, job postings, profile data on LinkedIn, tweets …

Relational Database Market
Big Market > 41B
http://www.infoworld.com/article/2916057/open-source-software/open-source-threatens-to-eat-the-database-market.html

http://www.infoworld.com/article/2916057/open-source-software/open-source-threatens-to-eat-the-database-market.html
45

What is happening here?

Hadoop & NoSQL

Relational Database Market

http://www.infoworld.com/article/2916057/open-source-software/open-source-threatens-to-eat-the-database-market.html
46

Market Trends
Cloud DBMS disrupting on-premises vendors
Cloud is less relational-centric
But fastest-growing services at AWS are RDBMSs
“One size doesn’t fit all”
Main-memory DBMS
Graph DBMS
TimeSeries DBMS
Key-Value Stores (NoSQL)
Analytics Platforms (Spark, Hadoop)
Tools for working with data
Business Intelligence (charting tools)
ML/Data Science platforms
Data preparation and next-gen data integration (ETL)

Reasons for Change
Hardware trends: RAM, SSDs, NVRAM, GPUs, …
Platform trends: cloud and elastic computing
Need to scale: storage and transactions
New data-types: text, json, image, video…
New workloads: machine learning & advanced analytics

Change = Opportunity!
The DBMS world is rapidly changing
Will discuss these changes towards end of the course
Our textbook is rather out of date (2003!)
Opportunity!
You can shape the future of DBMSs

We won’t follow the textbook slavishly.

Instead…
Focus: Foundational System Principles
Basic ideas and components
How to compose those components into a technology stack

Goal:
You will be able to use existing & build new DBMS technologies!

You will learn…
Data Oriented Programming with SQL (a la 61A)
Foundations of Data System Design
Storage, indexing
Query processing and optimization
Transactions
Concurrency, Consistency, Recovery
Data Modeling
Application-level representations of data

Principles
Data Independence
Declarative Programming
Rendezvous in Time and Space
Isolation and consistency
Data representations

Systems
We will examine various levels of a DBMS
Concurrency Control
Recovery

Database Management
System
Database
Query Parsing
& Optimization
Relational Operators
Files and Index Management
Buffer Management
Disk Space Management

What is this class all about?
Databases?
What is a database?
Database Management Systems?
Implementation?
Big Ideas in Database Management Systems
Principles and Algorithms
System Designs
The heart of scalable CS

/docProps/thumbnail.jpeg