INFO20003 Database Systems
Dr Renata Borovica-Gajic
Lecture 14
Query Optimization Part II
Copyright By PowCoder代写 加微信 powcoder
INFO20003 Database Systems © University of Melbourne
Remember this? Components of a DBMS
Query processing module
Parser/ Compiler
Optimizer Executor
Plan enumeration
Concurrency control module
Transaction mgr.
Crash recovery module
Concurrency
control module
Storage module
File and access methods mgr.
Buffer pool mgr. Disk space mgr.
This is one of several possible architectures; each system has its own slight variations.
Index files Heap files
INFO20003 Database Systems © University of Melbourne
Enumeration of Alternative Plans
• When enumerating alternative plans, there are two main cases:
– Single-relation plans
– Multiple-relation plans (joins)
• For queries over a single relation:
‒ Each available access path (file scan / index) is considered, and
the one with the lowest estimated cost is chosen
‒ Heap scan is always one alternative
‒ Each index can be another alternative (if matching selection predicates)
‒ Other operations can be performed on top of access paths, but they typically don’t incur additional cost since they are done on the fly (e.g. projections, additional non-matching predicates)
INFO20003 Database Systems © University of Melbourne
Cost Estimates for Single-Relation Plans
1. Sequential(heap)scanofdatafile:
Cost = NPages(R)
2. Indexselectionoveraprimarykey(justasingletuple): Cost(B+Tree)=Height(I)+1, Height is the index height Cost(HashIndex)= ProbeCost(I)+1, ProbeCost(I)~1.2
Cost(B+Tree)=(NPages(I) + NPages(R))*∏ 𝑹𝑹𝑹𝑹 𝒊𝒊=𝟏𝟏..𝒏𝒏 𝒊𝒊
3. Clusteredindexmatchingoneormorepredicates:
Cost(HashIndex)= NPages(R)*∏𝒊𝒊=𝟏𝟏..𝒏𝒏 𝑹𝑹𝑹𝑹𝒊𝒊 ∗ 𝟐𝟐. 𝟐𝟐
4. Non-clusteredindexmatchingoneormorepredicates:
Cost(B+Tree)=(NPages(I) + NTuples(R))*∏ 𝑹𝑹𝑹𝑹 𝒊𝒊=𝟏𝟏..𝒏𝒏 𝒊𝒊
Cost(HashIndex)= NTuples(R)*∏𝒊𝒊=𝟏𝟏..𝒏𝒏 𝑹𝑹𝑹𝑹𝒊𝒊 ∗ 𝟐𝟐. 𝟐𝟐
INFO20003 Database Systems © University of Melbourne
Let’s say that Sailors(S) has 500 pages, 40000 tuples, NKeys(rating) = 10
• Result size = (1/NKeys(rating)) * NTuples(S) = (1/10)*40000 =4000 tuples
If we have I(rating), NPages(I) = 50:
– Clustered index:
Cost = (1/NKeys(rating))*(NPages(I)+NPages(S))=(1/10)*(50+500) = 55 I/O
– Unclustered index:
Cost = (1/NKeys(rating))*(NPages(I)+NTuples(S))=(1/10)*(50+40000) = 4005 I/O
If we have an I(sid), NPages(I)= 50:
– Cost = ?, Result size = ?
– Would have to retrieve all tuples/pages. With a clustered index, the cost is 50+500, with unclustered index, 50+40000
Doing a file scan:
– Cost = NPages(S) = 500
SELECT S.sid FROM Sailors S WHERE S.rating=8
INFO20003 Database Systems © University of Melbourne
Plan Enumeration for multi-relation plans
1. Select order of relations
– E.g. SxRxB, or SxBxR or RxSxB… – maximumpossibleorderings=N!
2. Foreachjoin,selectjoinalgorithm – E.g. Hash join, Sort-Merge Join…
3. Foreachinputrelation,selectaccessmethod – Heap Scan, or various index alternatives
Q: How many plans are there for a query over N relations?
Back-of-envelope calculation:
With 3 join algorithms, I indexes per relation: # plans ≈ [N!] * [3(N-1)] * [(I + 1)N]
SupposeN=3,I=2:#plans≈3!*32 *33 =1458plans
This is just for illustration – you don’t need to remember this
INFO20003 Database Systems © University of Melbourne
Queries Over Multiple Relations
• As number of joins increases, number of alternative plans grows rapidlyneed to restrict search space
• Fundamental decision in System R (first DBMS): only left-deep join trees are considered
– Left-deep trees allow us to generate all fully pipelined plans •Intermediate results are not written to temporary files
INFO20003 Database Systems © University of Melbourne
Plan Enumeration Example
• Let’s assume:
–Two join algorithms to choose from:
•Hash-Join
•NL-Join (page-oriented)
–Clustered B+Tree index: I(R.sid); NPages(I) = 50 –No other indexes
–S: NPages(S) = 500, NTuplesPerPage(S)= 80 –R: NPages(R) = 1000, NTuplesPerPage(R) = 100 –B: NPages(B) = 10
–100 R S tuples fit on a page
SELECT S.sname, B.bname, R.day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid
INFO20003 Database Systems © University of Melbourne
Candidate Plans
1. Enumeraterelationorderings: B
BxR SRRSBS
* Prune plans with cross-products immediately!
SELECT S.sname, B.bname, R.day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid
INFO20003 Database Systems © University of Melbourne
Candidate Plans
2. Enumeratejoinalgorithmchoices:
NLJ B HJ B
+ do the same for other plans
SELECT S.sname, B.bname, R.day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid
INFO20003 Database Systems © University of Melbourne
Candidate Plans
3. Enumerateaccessmethodchoices:
B (heapscan) (heap scan)
+ do same for other plans
(heap scan)
(heap scan) (INDEX scan on R.sid)
(heap scan)
SELECT S.sname, B.bname, R.day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid
INFO20003 Database Systems © University of Melbourne
Now estimate the cost of each plan
(heap scan)
S: NPages(S) = 500, NTuplesPerPage(S)= 80
R: NPages(R) = 1000, NTuplesPerPage(R) = 100 (heap scan) (heap scan) B: NPages(B) = 10
100 R S tuples fit on a page All 3 relations are Heap Scan
Calculating cost:
Cost (SxR) = 500 + 500*1000 = 500500
Result size (SxR) = 40000*100000 *1/40000 = 100000 tuples = 1000 pages Cost(xB) = 1000 + 1000*10 = 10000
Already read – left deep plans apply pipelining
Total Cost = 500 + 500*1000 + 1000 * 10 = 510500 I/O
SELECT S.sname, B.bname, R.day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid
INFO20003 Database Systems © University of Melbourne
Now estimate the cost of each plan
S: NPages(S) = 500, NTuplesPerPage(S)= 80
R: NPages(R) = 1000, NTuplesPerPage(R) = 100 B: NPages(B) = 10
100 R S tuples fit on a page
All 3 relations are Heap Scan
(heap scan)
Calculating cost:
(heap scan) (heap scan)
Cost (SxR) = 500 + 500*1000 = 500500
Result size (SxR) = 100000*40000 *1/40000 = 100000 tuples = 1000 pages
Cost(xB) = 3*1000 + 3*10 = 2*1000 + 3*10 = 2030
Already read once – left deep plans apply pipelining
Total Cost = 500 + 500*1000 + 2*1000+ 3*10 = 502530 I/O
INFO20003 Database Systems © University of Melbourne
S: NPages(S) = 500, NTuplesPerPage(S)= 80
R: NPages(R) = 1000, NTuplesPerPage(R) = 100 B: NPages(B) = 10
100 R S tuples fit on a page
All 3 relations are Heap Scan
(heap scan) (heap scan) (heap scan)
SR Plan 4:
Calculating cost:
Cost (P3) = ? Cost (P4) = ?
(heap scan) (heap scan) (heap scan)
INFO20003 Database Systems © University of Melbourne
S: NPages(S) = 500, NTuplesPerPage(S)= 80
R: NPages(R) = 1000, NTuplesPerPage(R) = 100 B: NPages(B) = 10, NTuplesPerPage(B) = 10 SMJ : 2 passes, RxB: 10 tuples per page
I(S.sid); NPages(I) = 50
(heap scan) (heap scan)
(INDEX scan)
(heap scan)
(heap scan)
(heap scan)
(heap scan)
(heap scan) (heap scan)
INFO20003 Database Systems © University of Melbourne
What’s examinable
• Understand plan enumeration and cost various plans • Important for Assignment 3 as well
INFO20003 Database Systems © University of Melbourne
Next Lecture
• Normalization
INFO20003 Database Systems © Univers-it-y of Melbourne
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com