程序代写代做代考 concurrency database SQL algorithm compiler INFO20003 Database Systems

INFO20003 Database Systems

INFO20003 Database Systems 1© University of Melbourne 2018

INFO20003 Database Systems

Lecture 14

Query Optimization Part II

Semester 2 2018, Week 7

Dr Renata Borovica-Gajic

INFO20003 Database Systems 2© University of Melbourne 2018

MST

• When: Tuesday 11/09/2018 @ 2:45 – 4:00pm

–Seated at 2:45pm, the test starts at 3:00pm and runs for 45min

• Where: Wilson Hall

• Rules:

–Bring student ID, and pen/pencil (no cheat sheet)

–Cover [Lec1, Lec9]: Modelling, Relational Algebra, SQL

• Best way to prepare:

–There is a sample test in “Practice on your own”

–Look at mistakes you’ve made in Assignment 1 and learn from them

–Solution for Assignment 1 is in Resources

–Attempt first 5,6 questions of Assignment 2 (maximize the effort) and

look at the provided labs material plus the given practice study (in

Practice on your own)

–For simple joins with filter predicates try to express them with RA

INFO20003 Database Systems 4© University of Melbourne 2018

Remember this? Components of a DBMS

DBMS

Index files

Heap files

Database

Concurrency

control module

File and access

methods mgr.

Buffer pool mgr.

Disk space mgr.

Storage module Concurrency

control module

Transaction

mgr.

Lock mgr.

Crash recovery

module

Log mgr.

Query processing module

Parser/

Compiler
Optimizer Executor

This is one of several possible architectures;

each system has its own slight variations.

TODAY

Plan enumeration

INFO20003 Database Systems 5© University of Melbourne 2018

Enumeration of Alternative Plans

• When enumerating alternative plans, there are two main

cases:

–Single-relation plans

–Multiple-relation plans (joins)

• For queries over a single relation:

‒ Each available access path (file scan / index) is considered, and

the one with the lowest estimated cost is chosen

‒ Heap scan is always one alternative

‒ Each index can be another alternative (if matching selection

predicates)

‒ Other operations can be performed on top of access paths, but

they typically don’t incur additional cost since they are done on the

fly (e.g. projections, additional non-matching predicates)

INFO20003 Database Systems 6© University of Melbourne 2018

Cost Estimates for Single-Relation Plans

1. Sequential (heap) scan of data file:
Cost = NPages(R)

2. Index selection over a primary key (just a single tuple):
Cost(B+Tree)=Height(I)+1, Height is the index height

Cost(HashIndex)= ProbeCost(I)+1, ProbeCost(I)~1.2

3. Clustered index matching one or more predicates:

Cost(B+Tree)=(NPages(I) + NPages(R))*ς𝒊=𝟏..𝒏𝑹𝑭𝒊
Cost(HashIndex)= NPages(R)*ς𝒊=𝟏..𝒏𝑹𝑭𝒊 ∗ 𝟐. 𝟐

4. Non-clustered index matching one or more predicates:

Cost(B+Tree)=(NPages(I) + NTuples(R))*ς𝒊=𝟏..𝒏𝑹𝑭𝒊
Cost(HashIndex)= NTuples(R)*ς𝒊=𝟏..𝒏𝑹𝑭𝒊 ∗ 𝟐. 𝟐

INFO20003 Database Systems 7© University of Melbourne 2018

Examples

Let’s say that Sailors(S) has 500 pages, 40000 tuples, NKeys(rating) = 10

• Result size = (1/NKeys(I)) * NTuples(S) = (1/10)*40000 =4000 tuples

1. If we have I(rating), NPages(I) = 50:

– Clustered index:

Cost = (1/NKeys(I))*(NPages(I)+NPages(S))=(1/10)*(50+500) = 55 I/O

– Unclustered index:

Cost = (1/NKeys(I))*(NPages(I)+NTuples(S))=(1/10)*(50+40000) = 4005 I/O

2. If we have an I(sid), NPages(I)= 50:

– Cost = ?, Result size = ?

– Would have to retrieve all tuples/pages. With a clustered index, the

cost is 50+500, with unclustered index, 50+40000

3. Doing a file scan:

– Cost = NPages(S) = 500

SELECT S.sid FROM Sailors S WHERE S.rating=8

INFO20003 Database Systems 8© University of Melbourne 2018

Plan Enumeration

Steps:

1. Select order of relations

– E.g. SxRxB, or SxBxR or RxSxB…

– maximum possible orderings = N!

2. For each join, select join algorithm

– E.g. Hash join, Sort-Merge Join…

3. For each input relation, select access method

– Heap Scan, or various index alternatives

Q: How many plans are there for a query over N relations?

Back-of-envelope calculation:
• With 3 join algorithms, I indexes per relation:

# plans ≈ [N!] * [3(N-1)] * [(I + 1)N]

• Suppose N = 3, I = 2: # plans ≈ 3! * 32 * 33 = 1458 plans

• This is just for illustration – you don’t need to remember this

INFO20003 Database Systems 9© University of Melbourne 2018

Queries Over Multiple Relations

• As number of joins increases, number of alternative

plans grows rapidly  need to restrict search space

• Fundamental decision in System R (first DBMS): only

left-deep join trees are considered

–Left-deep trees allow us to generate all fully pipelined plans

•Intermediate results are not written to temporary files

BA

C

D

BA

C

D

C DBA

INFO20003 Database Systems 10© University of Melbourne 2018

• Let’s assume:

–Two join algorithms to choose from:

•Hash-Join

•NL-Join (page-oriented)

–Unclustered B+Tree index: I(R.sid); NPages(I) = 50

–No other indexes

–S: NPages(S) = 500, NTuplesPerPage(S)= 80

–R: NPages(R) = 1000, NTuplesPerPage(R) = 100

–B: NPages(B) = 10

–100 R S tuples fit on a page

Plan Enumeration Example

SELECT S.sname, B.bname, R.day

FROM Sailors S, Reserves R, Boats B

WHERE S.sid = R.sid AND R.bid = B.bid

INFO20003 Database Systems 11© University of Melbourne 2018

Candidate Plans

1. Enumerate relation orderings:

RS

B

SELECT S.sname, B.bname, R.day

FROM Sailors S, Reserves R, Boats B

WHERE S.sid = R.sid AND R.bid = B.bid

BS

R

SR

B

BR

S

RB

S
x

SB

R
x

* Prune plans with cross-products immediately!

INFO20003 Database Systems 12© University of Melbourne 2018

2. Enumerate join algorithm choices:

RS

B

SELECT S.sname, B.bname, R.day

FROM Sailors S, Reserves R, Boats B

WHERE S.sid = R.sid AND R.bid = B.bid

RS

B

HJ

HJ

RS

B

HJ

NLJ

RS

B

NLJ

HJ

RS

B

NLJ

NLJ

+ do the same

for other plans

Candidate Plans

INFO20003 Database Systems 13© University of Melbourne 2018

3. Enumerate access method choices:

SELECT S.sname, B.bname, R.day

FROM Sailors S, Reserves R, Boats B

WHERE S.sid = R.sid AND R.bid = B.bid

RS

B

NLJ

NLJ

+ do same for

other plans

RS

B

NLJ

NLJ

(heap scan)

(heap scan)

(heap scan)

RS

B

NLJ

NLJ

(INDEX scan on R.sid)

(heap scan)

(heap scan)

Candidate Plans

INFO20003 Database Systems 14© University of Melbourne 2018

Now estimate the cost of each plan

RS

B

NLJ

NLJ

S: NPages(S) = 500, NTuplesPerPage(S)= 80

R: NPages(R) = 1000, NTuplesPerPage(R) = 100

B: NPages(B) = 10

100 R S tuples fit on a page

All 3 relations are Heap Scan

Calculating cost:

SxR

Cost (SxR) = 500 + 500*1000 = 500500

(SxR)xB

Result size (SxR) = 100000*40000 *1/40000 = 100000 tuples = 1000 pages

Cost(xB) = 1000 + 1000*10 = 10000

Total Cost = 500 + 500*1000 + 1000 * 10 = 510500 I/O

Already read – left deep plans apply pipelining

(heap scan) (heap scan)

(heap scan)

INFO20003 Database Systems 15© University of Melbourne 2018

Now estimate the cost of each plan

S: NPages(S) = 500, NTuplesPerPage(S)= 80

R: NPages(R) = 1000, NTuplesPerPage(R) = 100

B: NPages(B) = 10

100 R S tuples fit on a page

All 3 relations are Heap Scan

Calculating cost:

SxR

Cost (SxR) = 500 + 500*1000 = 500500

(SxR)xB

Result size (SxR) = 100000*40000 *1/40000 = 100000 tuples = 1000 pages

Cost(xB) = 3*1000 + 3*10 = 2*1000 + 3*10 = 2030

Total Cost = 500 + 500*1000 + 2*1000+ 3*10 = 502530 I/O

Already read once – left deep plans apply pipelining

RS

B

HJ

NLJ

2

(heap scan) (heap scan)

(heap scan)

INFO20003 Database Systems 16© University of Melbourne 2018

Your turn

RS

B

NLJ

HJ

Plan 4:

RS

B

HJ

HJ

Plan 3:
S: NPages(S) = 500, NTuplesPerPage(S)= 80

R: NPages(R) = 1000, NTuplesPerPage(R) = 100

B: NPages(B) = 10

100 R S tuples fit on a page

All 3 relations are Heap Scan

Cost (P3) = ?

Cost (P4) = ?

Calculating cost:

(heap scan) (heap scan)

(heap scan)

(heap scan) (heap scan)

(heap scan)

INFO20003 Database Systems 17© University of Melbourne 2018

At home

R B

SMJ

(heap scan)(heap scan)

S

NLJ

(heap scan)

R B

SMJ

(heap scan)(heap scan)

S

SMJ

(heap scan)

R B

SMJ

(heap scan)(heap scan)

S

SMJ

(INDEX scan)

1) 2)

3)

S: NPages(S) = 500, NTuplesPerPage(S)= 80

R: NPages(R) = 1000, NTuplesPerPage(R) = 100

B: NPages(B) = 10, NTuplesPerPage(B) = 10

SMJ : 2 passes, RxB: 10 tuples per page

I(S.sid); NPages(I) = 50

INFO20003 Database Systems 18© University of Melbourne 2018

What’s examinable

• Understand plan enumeration and cost various plans

• Important for Assignment 3 as well

INFO20003 Database Systems 19© University of Melbourne 2018

Next Lecture

– –

• Normalization