程序代写代做代考 C algorithm concurrency database compiler INFO20003 Database Systems

INFO20003 Database Systems
Dr Renata Borovica-Gajic
Lecture 14
Query Optimization Part II
Week 7
1
INFO20003 Database Systems © University of Melbourne

Remember this? Components of a DBMS
DBMS
Query processing module
Parser/ Compiler
Optimizer Executor
TODAY
Plan enumeration
Concurrency control module
Transaction mgr.
Lock mgr.
Crash recovery module
Concurrency
control module
Log mgr.
Storage module
File and access methods mgr.
Buffer pool mgr. Disk space mgr.
This is one of several possible architectures; each system has its own slight variations.
Index files Heap files
Database
INFO20003 Database Systems © University of Melbourne
4

Enumeration of Alternative Plans
• When enumerating alternative plans, there are two main cases:
– Single-relation plans
– Multiple-relation plans (joins)
• For queries over a single relation:
‒ Each available access path (file scan / index) is considered, and
the one with the lowest estimated cost is chosen
‒ Heap scan is always one alternative
‒ Each index can be another alternative (if matching selection predicates)
‒ Other operations can be performed on top of access paths, but they typically don’t incur additional cost since they are done on the fly (e.g. projections, additional non-matching predicates)
INFO20003 Database Systems © University of Melbourne
5

Cost Estimates for Single-Relation Plans
1. Sequential(heap)scanofdatafile:
Cost = NPages(R)
2. Indexselectionoveraprimarykey(justasingletuple): Cost(B+Tree)=Height(I)+1, Height is the index height Cost(HashIndex)= ProbeCost(I)+1, ProbeCost(I)~1.2
Cost(B+Tree)=(NPages(I) + NPages(R))*∏ 𝑹𝑹𝑹𝑹 𝒊𝒊=𝟏𝟏..𝒏𝒏 𝒊𝒊
3. Clusteredindexmatchingoneormorepredicates:
Cost(HashIndex)= NPages(R)*∏𝒊𝒊=𝟏𝟏..𝒏𝒏 𝑹𝑹𝑹𝑹𝒊𝒊 ∗ 𝟐𝟐. 𝟐𝟐
4. Non-clusteredindexmatchingoneormorepredicates:
Cost(B+Tree)=(NPages(I) + NTuples(R))*∏ 𝑹𝑹𝑹𝑹 𝒊𝒊=𝟏𝟏..𝒏𝒏 𝒊𝒊
Cost(HashIndex)= NTuples(R)*∏𝒊𝒊=𝟏𝟏..𝒏𝒏 𝑹𝑹𝑹𝑹𝒊𝒊 ∗ 𝟐𝟐. 𝟐𝟐
INFO20003 Database Systems © University of Melbourne
6

Examples
Let’s say that Sailors(S) has 500 pages, 40000 tuples, NKeys(rating) = 10
• Result size = (1/NKeys(rating)) * NTuples(S) = (1/10)*40000 =4000 tuples
1.
If we have I(rating), NPages(I) = 50:
– Clustered index:
Cost = (1/NKeys(rating))*(NPages(I)+NPages(S))=(1/10)*(50+500) = 55 I/O
– Unclustered index:
Cost = (1/NKeys(rating))*(NPages(I)+NTuples(S))=(1/10)*(50+40000) = 4005 I/O
If we have an I(sid), NPages(I)= 50:
– Cost = ?, Result size = ?
– Would have to retrieve all tuples/pages. With a clustered index, the cost is 50+500, with unclustered index, 50+40000
Doing a file scan:
– Cost = NPages(S) = 500
2. 3.
SELECT S.sid FROM Sailors S WHERE S.rating=8
INFO20003 Database Systems © University of Melbourne
7

Plan Enumeration for multi-relation plans
Steps:
1. Select order of relations
– E.g. SxRxB, or SxBxR or RxSxB… – maximumpossibleorderings=N!
2. Foreachjoin,selectjoinalgorithm – E.g. Hash join, Sort-Merge Join…
3. Foreachinputrelation,selectaccessmethod – Heap Scan, or various index alternatives
Q: How many plans are there for a query over N relations?
Back-of-envelope calculation:

• •
With 3 join algorithms, I indexes per relation: # plans ≈ [N!] * [3(N-1)] * [(I + 1)N]
SupposeN=3,I=2:#plans≈3!*32 *33 =1458plans
This is just for illustration – you don’t need to remember this
INFO20003 Database Systems © University of Melbourne
8

Queries Over Multiple Relations
• As number of joins increases, number of alternative plans grows rapidlyneed to restrict search space
• Fundamental decision in System R (first DBMS): only left-deep join trees are considered
– Left-deep trees allow us to generate all fully pipelined plans •Intermediate results are not written to temporary files
D
C
AB AB
D C
ABCD
INFO20003 Database Systems © University of Melbourne
9

Plan Enumeration Example
• Let’s assume:
–Two join algorithms to choose from:
•Hash-Join
•NL-Join (page-oriented)
–Clustered B+Tree index: I(R.sid); NPages(I) = 50 –No other indexes
–S: NPages(S) = 500, NTuplesPerPage(S)= 80 –R: NPages(R) = 1000, NTuplesPerPage(R) = 100 –B: NPages(B) = 10
–100 R S tuples fit on a page
SELECT S.sname, B.bname, R.day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid
INFO20003 Database Systems © University of Melbourne
10

Candidate Plans
1. Enumeraterelationorderings: B
BxR SRRSBS
x
* Prune plans with cross-products immediately!
R SBRBBR
SS
SELECT S.sname, B.bname, R.day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid
INFO20003 Database Systems © University of Melbourne
11

Candidate Plans
2. Enumeratejoinalgorithmchoices:
B SR
NLJ HJ
NLJ NLJ
B SRSR
HJ HJ
NLJ B HJ B
SRSR
B
+ do the same for other plans
SELECT S.sname, B.bname, R.day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid
INFO20003 Database Systems © University of Melbourne
12

Candidate Plans
3. Enumerateaccessmethodchoices:
NLJ NLJ
NLJ NLJ
B SR
SR
B (heapscan) (heap scan)
+ do same for other plans
NLJ
NLJ B
(heap scan)
SR
(heap scan) (INDEX scan on R.sid)
(heap scan)
SELECT S.sname, B.bname, R.day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid
INFO20003 Database Systems © University of Melbourne
13

Now estimate the cost of each plan
NLJ NLJ
B
(heap scan)
S: NPages(S) = 500, NTuplesPerPage(S)= 80
R: NPages(R) = 1000, NTuplesPerPage(R) = 100 (heap scan) (heap scan) B: NPages(B) = 10
100 R S tuples fit on a page All 3 relations are Heap Scan
SR
Calculating cost:
SxR
Cost (SxR) = 500 + 500*1000 = 500500
(SxR)xB
Result size (SxR) = 40000*100000 *1/40000 = 100000 tuples = 1000 pages Cost(xB) = 1000 + 1000*10 = 10000
Already read – left deep plans apply pipelining
Total Cost = 500 + 500*1000 + 1000 * 10 = 510500 I/O
SELECT S.sname, B.bname, R.day FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid AND R.bid = B.bid
INFO20003 Database Systems © University of Melbourne
14

Now estimate the cost of each plan
HJ NLJ
B
S: NPages(S) = 500, NTuplesPerPage(S)= 80
R: NPages(R) = 1000, NTuplesPerPage(R) = 100 B: NPages(B) = 10
100 R S tuples fit on a page
All 3 relations are Heap Scan
(heap scan) (heap scan)
SR
(heap scan)
Calculating cost:
SxR
Cost (SxR) = 500 + 500*1000 = 500500
(SxR)xB
Result size (SxR) = 100000*40000 *1/40000 = 100000 tuples = 1000 pages
Cost(xB) = 3*1000 + 3*10 = 2*1000 + 3*10 = 2030
2
Already read once – left deep plans apply pipelining
Total Cost = 500 + 500*1000 + 2*1000+ 3*10 = 502530 I/O
INFO20003 Database Systems © University of Melbourne
15

Your turn
Plan 3:
NLJ HJ
B
S: NPages(S) = 500, NTuplesPerPage(S)= 80
R: NPages(R) = 1000, NTuplesPerPage(R) = 100 B: NPages(B) = 10
100 R S tuples fit on a page
All 3 relations are Heap Scan
(heap scan) (heap scan) (heap scan)
SR Plan 4:
Calculating cost:
Cost (P3) = ? Cost (P4) = ?
HJ HJ
B
(heap scan) (heap scan) (heap scan)
SR
INFO20003 Database Systems © University of Melbourne
16

At home
S: NPages(S) = 500, NTuplesPerPage(S)= 80
R: NPages(R) = 1000, NTuplesPerPage(R) = 100 B: NPages(B) = 10, NTuplesPerPage(B) = 10 SMJ : 2 passes, RxB: 10 tuples per page
I(S.sid); NPages(I) = 50
1)
SMJ
RB
(heap scan) (heap scan)
SMJ
SMJ
RB
2)
SMJ
RB
NLJ
SMJ
S
S
(INDEX scan)
3)
(heap scan)
(heap scan)
(heap scan)
S
(heap scan)
(heap scan) (heap scan)
INFO20003 Database Systems © University of Melbourne
17

What’s examinable
• Understand plan enumeration and cost various plans • Important for Assignment 3 as well
INFO20003 Database Systems © University of Melbourne
18

Next Lecture
• Normalization
INFO20003 Database Systems © Univers-it-y of Melbourne
19