代写代考 INFO20003 Database Systems

INFO20003 Database Systems
Dr Renata Borovica-Gajic
Lecture 22
Adaptive databases for the future Introducing research avenues (non-examinable)

Copyright By PowCoder代写 加微信 powcoder

Data, data everywhere…
www.tableausoftware.com
[The Economist]
50-fold from 2010-2020*
* ¡°The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East¡°, 2012, IDC
http://reportlogix.com/reporting.html
And grows exponentially… 2
Data Amounts

Finding useful information
Equals to finding the needle in a haystack 3

Data analysis with databases
Database systems (DB):
4 decades of research, predominant data analysis tool
Process Access Store
SQL Queries
select val from sales where id = max;

From data to knowledge
Too expensive
query optimize
prepare store
The luxury is long gone
Database (DB) System

Unless…..

Data analysis with databases
Process Access
SQL Queries

Store data carefully
Storage tiering in private and public clouds
Hot Warm Cold
[VLDB¡¯16, ADMS¡¯17]
Memory Disk
Performance

Store data carefully
Storage tiering in private and public clouds
[VLDB¡¯16, ADMS¡¯17]
Storing 100TB of data [Horison, 2015]
Hot Warm Cold
Cold Storage Cold Storage
Memory Disk
Trad. Cold 3-tier 2-tier
Embrace new technology
Performance
Cost (x1000$)

Cost benefit without performance penalty
Setting: multitenant enterprise datacenter, clients: TPCH 50, Q12, CSD: shared, layout: one client per group
5 4 3 2 1 0
Cold storage
PostgreSQL
Skipper on Storage
12345 Number of clients
Average exec. time (x1000 sec)

Skipper to the rescue
Virtualised enterprise data center
Multi-way joins: VM2 VM3
Hash Scan B
PostgreSQL Opportunistic execution
triggered upon data arrival
Hash Scan A
Hash Scan C
Cache Management
I/O Scheduler object-group map.
Novel ranking algorithm: Balances access efficiency across groups and fairness
Progress driven caching: Favors caching of objects to maximise query progress
across clients
Cold Storage

Embrace new HW technology

Data analysis
Process Access Store
SQL Queries

Choose access strategy on-the-fly
[DBTest¡¯12, ICDE¡¯15, VLDBJ¡®18] Find ¡®age¡¯ from people where name = ¡®Smith¡¯
Index Scan
Smooth Scan
Result size
Adapt to data
Execution time

Morphing mechanism
1. Index Access: Traditional index access
2. Entire Page Probe: Index access probes entire page
3. Gradual Flattening Access: Probe adjacent region(s)
HEAP PAGES

Smooth Scan in action
Setting: Micro-benchmark, 25GB table, Order by, Selectivity 0-100% 100000
10000 1000 100 10 1 0.1
Index Scan
Smooth Scan
0 0.001 0.01
0.1 1 Selectivity(%)
Near-optimal over entire selectivity range 16
Execution time (sec) (log scale)

Learn from data

Data analysis with databases
Process Access Store
SQL Queries

Process instantly
[SIGMOD¡¯12, VLDB¡¯12, CACM¡¯15]
80% reuse within 3 hours
[Facebook]
Adapt to queries
File querying NoDB (MR)
Time Loading

PostgresRaw: NoDB from idea to practice
1. Positional
Pointers to end of tuples
1|Supplier#01|17|335-1736|5755.94|each slyly…
2|Supplier#02|5|861-2259|4032.68| slyly bold… 3|Supplier#03|1|516-1199|4192.40|blithely… 4|Supplier#04|15|787-7479|4641.08|riously eve… 5|Supplier#05|11|21-151-690-3663|-283.84|. Slyly… 6|Supplier#06|14|24-696-997- 4969|1365.79|final…..
3. Statistics
Pointers to
attributes
NationKey Name
Supplier#01
Supplier#02
1 3 5 7 9 111315
Adjust to queries = progressively cheaper 20

Q20 Q19 Q18 Q17 Q16 Q15 Q14 Q13 Q12 Q11 Q10 Q9 Q8 Q7 Q6 Q5 Q4 Q3 Q2 Q1 Load
PostgresRaw in action
Setting: 7.5M tuples, 150 attributes, 11GB file
Queries: 10 arbitrary attributes per query, vary selectivity
Data-to-insight time halved with PostgresRaw
PostgresRaw
MySQL CSV Engine DBMS X DBMS X PostgreSQL
MySQL w/ external files
Per query performance comparable to traditional DBMS
Execution time (sec)

Learn from queries

Self-designing systems for data analysis
¡°It is not the strongest species that survive, nor the most intelligent, but the ones most responsive to change.¡±
[SIGMOD¡¯12] [VLDB¡¯12] [CACM¡¯15]
Adapt Refine
Fast response
[DBTest¡¯12] [ICDE¡¯15] [VLDBJ¡¯18]
[VLDB¡¯16] [ADMS¡¯17]
Thank you!
Anyone can be a data scientist with self-driving DB

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com