INFO20003 Database Systems
Dr Renata Borovica-Gajic
Lecture 22
Adaptive databases for the future Introducing research avenues (non-examinable)
Copyright By PowCoder代写 加微信 powcoder
Data, data everywhere…
www.tableausoftware.com
[The Economist]
50-fold from 2010-2020*
* ¡°The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East¡°, 2012, IDC
http://reportlogix.com/reporting.html
And grows exponentially… 2
Data Amounts
Finding useful information
Equals to finding the needle in a haystack 3
Data analysis with databases
Database systems (DB):
4 decades of research, predominant data analysis tool
Process Access Store
SQL Queries
select val from sales where id = max;
From data to knowledge
Too expensive
query optimize
prepare store
The luxury is long gone
Database (DB) System
Unless…..
Data analysis with databases
Process Access
SQL Queries
Store data carefully
Storage tiering in private and public clouds
Hot Warm Cold
[VLDB¡¯16, ADMS¡¯17]
Memory Disk
Performance
Store data carefully
Storage tiering in private and public clouds
[VLDB¡¯16, ADMS¡¯17]
Storing 100TB of data [Horison, 2015]
Hot Warm Cold
Cold Storage Cold Storage
Memory Disk
Trad. Cold 3-tier 2-tier
Embrace new technology
Performance
Cost (x1000$)
Cost benefit without performance penalty
Setting: multitenant enterprise datacenter, clients: TPCH 50, Q12, CSD: shared, layout: one client per group
5 4 3 2 1 0
Cold storage
PostgreSQL
Skipper on Storage
12345 Number of clients
Average exec. time (x1000 sec)
Skipper to the rescue
Virtualised enterprise data center
Multi-way joins: VM2 VM3
Hash Scan B
PostgreSQL Opportunistic execution
triggered upon data arrival
Hash Scan A
Hash Scan C
Cache Management
I/O Scheduler object-group map.
Novel ranking algorithm: Balances access efficiency across groups and fairness
Progress driven caching: Favors caching of objects to maximise query progress
across clients
Cold Storage
Embrace new HW technology
Data analysis
Process Access Store
SQL Queries
Choose access strategy on-the-fly
[DBTest¡¯12, ICDE¡¯15, VLDBJ¡®18] Find ¡®age¡¯ from people where name = ¡®Smith¡¯
Index Scan
Smooth Scan
Result size
Adapt to data
Execution time
Morphing mechanism
1. Index Access: Traditional index access
2. Entire Page Probe: Index access probes entire page
3. Gradual Flattening Access: Probe adjacent region(s)
HEAP PAGES
Smooth Scan in action
Setting: Micro-benchmark, 25GB table, Order by, Selectivity 0-100% 100000
10000 1000 100 10 1 0.1
Index Scan
Smooth Scan
0 0.001 0.01
0.1 1 Selectivity(%)
Near-optimal over entire selectivity range 16
Execution time (sec) (log scale)
Learn from data
Data analysis with databases
Process Access Store
SQL Queries
Process instantly
[SIGMOD¡¯12, VLDB¡¯12, CACM¡¯15]
80% reuse within 3 hours
[Facebook]
Adapt to queries
File querying NoDB (MR)
Time Loading
PostgresRaw: NoDB from idea to practice
1. Positional
Pointers to end of tuples
1|Supplier#01|17|335-1736|5755.94|each slyly…
2|Supplier#02|5|861-2259|4032.68| slyly bold… 3|Supplier#03|1|516-1199|4192.40|blithely… 4|Supplier#04|15|787-7479|4641.08|riously eve… 5|Supplier#05|11|21-151-690-3663|-283.84|. Slyly… 6|Supplier#06|14|24-696-997- 4969|1365.79|final…..
3. Statistics
Pointers to
attributes
NationKey Name
Supplier#01
Supplier#02
1 3 5 7 9 111315
Adjust to queries = progressively cheaper 20
Q20 Q19 Q18 Q17 Q16 Q15 Q14 Q13 Q12 Q11 Q10 Q9 Q8 Q7 Q6 Q5 Q4 Q3 Q2 Q1 Load
PostgresRaw in action
Setting: 7.5M tuples, 150 attributes, 11GB file
Queries: 10 arbitrary attributes per query, vary selectivity
Data-to-insight time halved with PostgresRaw
PostgresRaw
MySQL CSV Engine DBMS X DBMS X PostgreSQL
MySQL w/ external files
Per query performance comparable to traditional DBMS
Execution time (sec)
Learn from queries
Self-designing systems for data analysis
¡°It is not the strongest species that survive, nor the most intelligent, but the ones most responsive to change.¡±
[SIGMOD¡¯12] [VLDB¡¯12] [CACM¡¯15]
Adapt Refine
Fast response
[DBTest¡¯12] [ICDE¡¯15] [VLDBJ¡¯18]
[VLDB¡¯16] [ADMS¡¯17]
Thank you!
Anyone can be a data scientist with self-driving DB
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com