Query Evaluation & Query Optimization (cont.)
Ed Knorr, Fall 2020
Chapters 12 & 14 Combined (Part 2 of 2):
Based on:
Ramakrishnan & Gehrke (textbook):
(a) Sections 12.4.1 to 12.6.2 (on pages 405-417);
(b) Sections 14.4.2. to 14.7; but, don’t worry about: the 2B refinement (page 462), double buffering (page 463), and hybrid hash join (pages 465-466).
Other References:
UBC Database Faculty Members,
“Readings in Database Systems” by Hellerstein & Stonebraker
1
Learning Goals
Compute the cost of evaluating a given query using sort-merge join (SMJ).
Justify the use of metadata in estimating the cost of computing relational queries. Explain why the System R model has stood the test of time for query evaluation.
Provide examples of—and explain—how “out of date” catalog statistics can provide horrendous query performance.
Compare and contrast the roles that sorting and hashing have (for intermediate results) when evaluating a query.
Compute the cost of evaluating a given query using a hash join (HJ).
Identify possible uses of HJ, and explain how they scale to larger datasets.
Justify which type of join to perform (i.e., PNL, BNL, INL, SMJ, HJ) when evaluating a query, given a list of indexes, catalog statistics, reduction factors, and other assumptions.
Construct a query evaluation tree for evaluating an SQL query.
Explain the purpose of left-deep joins and pipelining. Relate these to the complexity of evaluating a query plan when joining multiple tables.
2
Learning Goals (cont.)
Estimate the number of page I/Os required to evaluate a query using various plans, both with and without pipelining, for: (a) single-relation queries, and (b) multiple-relation queries (where “multiple” is a very small positive integer like 2 or 3).
Analyze and explain (in high-level terms) the additional complexity that multiple-relation queries bring to query evaluation and optimization. For example, demonstrate how additional tables, joins, indexes, reduction factors, etc. complicate the process.
Suggest additional forms of metadata (that are currently not available) that would be useful when performing query evaluation.
[Link to an earlier unit] Justify the role that self-tuning RDBMSs may have in improving query performance. Provide examples of DBA activity (relating to this chapter) that can be simplified using autonomic computing.
Describe, using a diagram, how HJ can improve BNL join performance (over SMJ).
Explain the conditions under which you should use HJ instead of SMJ.
3
Schemas for Our Examples
(You may wish to bookmark this page.)
Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer, day: date, rname: string)
Sailors has 4+38+4+4 = 50 bytes; Reserves has 4+4+10+22 = 40 bytes Sailors: NTuplesS = 40,000; NPagesS = 500
Each tuple is 50 bytes long, 80 tuples per page
Reserves: NTuplesR = 100,000; NPagesR = 1,000 Each tuple is 40 bytes long, 100 tuples per page
4
2.
Go to Step (1) to resume scanning R and S, until end-of- file.
Sort-Merge Join (SMJ): (R ⋈ S) i=j
What if neither R nor S has an index on the relevant columns?
One solution: Sort both R and S on the column(s) to be joined, then scan them to do a merge on the join column(s), and finally output the joined tuples.
1.
Scan R until current R-tuple ≥ current S tuple, then scan S
until current S-tuple ≥ current R tuple
• Join tuples only if the relevant column(s) in the current R tuple = those in the current S tuple
• All R tuples with this same value in Ri (current R group) and all S tuples with this same value in Sj (current S group) match.
• Output all possible
5
Sort-Merge Join (R ⋈ S) (cont.) i=j
R is scanned once; each S group is scanned once per matching R tuple
Multiple scans of an S group are likely to find needed pages in buffer
Compare the expected and worst-case complexities.
6
22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty
7 45.0 9 35.0 8 55.5 5 35.0 10 35.0
Example of Sort-Merge Join
sid sname rating age
28 103 28 103 31 101 31 102 31 101 58 103
12/4/96 11/3/96 10/10/96 10/12/96 10/11/96 11/12/96
guppy yuppy dustin lubber lubber dustin
sid bid
day
rname
The cost of scanning is O(M+N), but could be O(MN) even though that’s very unlikely.
7
Example of Sort-Merge Join (cont.)
For the Sailors and Reserves tables whose schemas are shown on Slide 4:
With B = 35, 100, or 300 buffer pages, both Reserves and Sailors can be sorted in 2 passes, and therefore the total join cost = 7500 page I/Os.
8
Highlights of IBM’s System R Optimizer
From the 1970s. Its impact on relational DBMSs? Huge! Currently, the most widely used model
Works well for < 10 joins
Estimates the “cost” of a query (often, we can only approximate the cost ... but a relative ranking is sufficient)
Statistics, maintained in DBMS catalogs, are used to estimate the cost of operations and their result sizes.
Considers a combination of CPU and I/O costs, but mostly the latter Deals with probabilities and assumptions
Plan Space: Usually too large; so, must be pruned Sometimes, only left-deep plans are considered.
• Left-deep plans allow the output of each operator to be pipelined into the next operator, without storing the intermediate result in a temp. table.
Avoid large intermediate relations, if possible.
9
Left-Deep Plans and Pipelining
Outer table is the left table in a join expression Left table can be a temporary relation
Right, too – if we don’t restrict ourselves to left-deep plans Examples:
10
Highlights of Optimizers in General
Optimizing a query that joins n relations is NP-hard. Plan Space
Bushy trees may be fine, not just the left-deep ones. • Today, most optimizers consider them.
• They can yield some excellent/efficient plans.
• We will deviate from the System R simplification, and consider bushy trees in some of our analysis.
A plan can include all kinds of joins and early/late selections and projections, including:
• Table scan, index scan, BNL, INL, SMJ, hash join (HJ), duplicate elimination, grouped aggregation, etc.
11
Size Estimation and Reduction Factors (aka Selectivity Estimation)
Consider a query block:
SELECT attribute list
FROM relation list
WHERE term1 AND ... AND termk
The maximum # tuples in the result is the product of the cardinalities of the relations in the FROM clause.
A reduction factor (RF) associated with each term reflects the impact of the term in reducing the result size.
RF = fraction of tuples (in the table) that satisfies a conjunct
The smaller that the RF is, the more selective the access path is.
12
Selectivity Estimation (cont.)
SELECT attributelist FROM Reserves WHERE rname < ‘C%’
Let’s revisit Chapter 12’s index example, for the Selection operation.
With RF = 2/26, recall:
• Clustered B+ tree: We need at least 2/26 of 1000 data pages, i.e., 77
pages.
• Unclustered B+ tree: This case is trickier. As a rule of thumb, in the worst case, we need at least 2/26 of 100,000 tuples, i.e., 7692 tuples, so at least 7692 I/Os may be needed—assuming that we do not have enough buffer space to retain the pages after they’ve been read in during the first time.
• With a sufficient number of buffers, we might still be able to sort the RIDs before retrieving the records; but, for large RID lists, the total number of I/Os to sort, etc., may still greatly exceed the # of data pages.
13
Unclustered Indexes: Optimization for Retrieval of Tuples
Example: If 10,000 tuples qualify using an unclustered index, and we have a small number of buffer pool pages, the tuple retrieval cost could be 10,000 page I/Os.
Important refinement for unclustered indexes:
1. Find the qualifying data entries.
2. Sort the rids of the data records to be retrieved (e.g., put page numbers in ascending sequence).
3. Fetch the rids in order.
• This ensures that each data page is looked at just once, with the hope
that multiple records can be fetched with one I/O.
• But ... we’ll need to have an adequate number of buffers to do the sorting.
14
Multiple Indexes: Optimization Using Intersection of RIDs
How to handle something like:
day < 2003-08-19 AND bid=5 AND sid=3
First approach: Find the most selective access path, retrieve tuples using it, and then apply any remaining terms that don’t match the index.
Second approach: Get sets of rids of data records using each matching index.
Then, intersect these sets of rids.
Then, retrieve the records and apply any remaining terms.
15
When Are We Better Off Performing a Table Scan Instead of Using an Index?
Clustered case:
cost = expected # of internal pages to get to the leaf level
+ expected # of qualifying leaf pages
+ expected # of data pages (qualifying tuples / # tuples per page)
• The last term is the number of qualifying data pages.
• For B+ trees, note that “# of internal pages to get to the leaf level” is the height of the B+ tree; but, in the case of hash indexes, we’ll assume it’s used for loading the directory.
Use a table scan if the above cost is ≥ # of pages in the table.
Unclustered case:
How would the above calculation change?
16
Example of an Alt. 2 Clustered Case:
17
Motivating Example: Introduction
SELECT S.sname
FROM Reserves R, Sailors S WHERE R.sid = S.sid AND
R.bid = 100 AND S.rating > 5
What is the relational algebra expression for the above SQL query?
18
Motivating Example (cont.)
sname
SELECT S.sname
FROM Reserves R, Sailors S WHERE R.sid = S.sid AND
(1) Query Tree:
R.bid = 100 AND S.rating > 5
Sailors (On-the-fly)
Assuming no indexes, but 3 buffer pages Use PNL (BNL) Join
Pipeline the results
Cost estimate =
bid=100
rating > 5
(On-the-fly)
1000 + 1000 * 500 = 501,000 page I/Os
sid (BNL Join) Reserves Sailors
(2) Query Plan:
sname
Reserves
bid=100
rating > 5
sid
19
Motivating Example (cont.)
SELECT S.sname
FROM Reserves R, Sailors S WHERE R.sid = S.sid AND
sid=sid
R.bid = 100 AND S.rating > 5
But, this misses several opportunities: selections could have been ‘pushed’ earlier, no use is made of
Same Plan: sname
any available indexes, etc.
bid=100
rating > 5
(On-the-fly)
Goal of optimization: Find more efficient plans that compute the same answer.
(PNL Join) sid
Reserves
Sailors
20
Same Tree:
sname
bid=100
rating > 5
Reserves (On-the-fly)
Sailors
Alternative Plan 1, (Still) No Indexes
sname(On-the-fly)
Assume B = 5 buffer pages
(Scan; write to temp T1)
bid=100 Reserves
rating > 5 (Scan; write to
Main difference: push selects
Sailors
Cost of plan:
Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats,
uniform distribution)
Scan Sailors (500) + write temp T2 (250 pages, if we have 10 ratings)
Sort T1 (2*2*10), sort T2 (2*4*250), merge (10+250)
Total: 4060 page I/Os (Final output cost is usually ignored. Why?)
sid T1
(Sort-Merge Join) T2
temp T2)
21
Alternative Plan 1, No Indexes (cont.)
sname(On-the-fly)
If we use BNL join, the join cost is 10+4*250; so, the total cost is now 2770:
(Scan; write to temp T1)
bid=100 Reserves
rating > 5 (Scan; write to
sid T1
(SMJ) T2
Sailors
temp T2)
22
Alternative Plan 1,
sname(On-the-fly)
No Indexes (cont.)
T1 bid=100
sid
(BNL) T2
Can we do better?
(Scan; write to temp T1)
(Scan; rating > 5 write to
Suppose we ‘push’ the projections:
T1 has only sid; T2 has only sid and sname T1 now fits in 1 page. How?
temp T2)
Reserves
Sailors
23
Alternative Plan 2 With Indexes
sname (On-the-fly)
With clustered hash index on bid of Reserves, we get RF * 100,000 tuples qualifying = 1/100 * 100,000 tuples = 1,000 tuples … or, equivalently,
sid bid=100 Reserves
(Index Nested Loops, with pipelining)
1/100 * 1000 pages = 10 pages @ 100 tuples each
result to temp)
INL with pipelining (outer is not materialized)
Join column sid is a PK for Sailors
–At most one matching tuple; so, unclustered index on sid is OK
(Use hash index; do not write
Sailors
(Hash index)
rating > 5
(On-the-fly)
24
Alternative Plan 2 With Indexes (cont.)
sname (On-the-fly)
Decision not to push rating > 5 before the join is based on availability of unclustered sid index on Sailors
sid
(INL, pipelining)
Cost: Selection of Reserves tuples using clustered hash index has cost = 2-1 + 4 + 10; and, for each of the 1000 qualifying tuples, we must get the unique, matching Sailors tuple using the Sailors hash index on sid; cost = 1000*(1.2+1) more I/Os
result to temp)
Grand total = 15 + 2200 = 2215 page I/Os
We’ll post details of the calculations in a separate PDF document.
(Use hash index; do not write
bid=100 Reserves
Sailors
(Hash index)
rating > 5
(On-the-fly)
25
Block Nested Loop (BNL) Join with B Buffers: An Optimization
Performance Improvement with a Hash Table:
For each block of B-2 pages of R: • Create a hash table for that block • Then, for each page of S:
R&S tables
…
• For each tuple s in the current page of S:
• For each matching tuple r in R’s hash table • Add joined
Hash Table for Block of R
… …
Join Result
Input Buffer for S Output Buffer
26
Hash Join
Original Relation
OUTPUT 1
Partitions
(1) Partition Phase: Partition both relations using hash fn h1: R tuples in partition i will only match S tuples in partition i during the join.
INPUT
2
1 2
(2) Join Phase: Read in a partition of R, hash it using h2 (!= h1). Scan matching partition i of S, searching for matches.
h2
h2
…
Hash Function
Disk
Hashed Partitions of R & S
Join Result
Disk
B Main Memory Buffers Disk
hash fn
Hash Table for Partition Ri
h1 B-1
B Main Memory Buffers Disk
B-1
Input Buffer for Si
Output Buffer
27
b) c)
Temporarily store each Ri partition on disk.
Hash Join: Details
(1) Partition Phase. Suppose we have B = 101 buffers, R = 1000 pages, S = 500 pages, and hash function h1(k).
a)
After hashing all 1000 pages of R, the 100 partitions for R (i.e., R1, R2, …, R100) each has about 10 pages.
Repeat (a) and (b) for S using the same hash function. • What is the average size of each of S’s partitions, Si?
28
Hash Join: Details (cont.)
(2) Join Phase. Hash again, this time with a different hash function h2(k):
Fori=1toB-1do:
a) Take partition Ri’s 10 pages (approx.) and create a hash table for all its tuples in memory. (Most buckets in this new hash table contain few entries.)
b) For each page in Si , read the page, and for each tuple on that page, apply h2(k) to see which bucket of Ri to search, for a possible join.
c) Output the joined tuples, if any.
29
Observations about Hash Join
When we build an in-memory hash table during the merge phase, often a little more memory is needed.
e.g., a 10-20% “fudge factor”
If the hash function does not partition uniformly, one or more R partitions may not fit in memory.
If so, we can apply the hash-join technique recursively (another round) to do the join of an Ri partition with its corresponding Si partition.
If a table fits into memory, there’s no need to partition.
Stop partitioning when the partitions fit into memory.
Note that hashing can also be used to quickly:
Detect duplicates (e.g., during projection)
Perform other kinds of set operations (e.g., set difference)
30
Cost of Hash Join
Summary from textbook: As long as the number of buffers B > sqrt(X), where X is the number of pages in the smaller relation, and we allow a small fudge factor, then…
In the partitioning phase, we read and write both relations; so, we use 2(M+N) page I/Os so far.
In the merging (matching) phase, we read both relations; so, we use M+N more page I/Os.
To join Reserves & Sailors, this is a total of: 3(1000 + 500) = 4500 page I/Os
31
Cost of Sort-Merge Join
Recall 2PMMS from Chapter 13.
Given a sufficient number of buffers, if we don’t count the final
output stage (for a plan), then SMJ can finish in 3(M+N) I/Os, too.
Optional summary of a long explanation from the textbook (and you don’t have to remember this): If B > sqrt(Y), where Y is the size of the larger relation, and if we use a sorting refinement that produces runs of length 2*B in Pass 0 (see the textbook), then we can do SMJ in 3(M+N) I/Os.
32
Summary of SMJ vs. HJ
HJ is superior if the relation sizes differ greatly.
SMJ has some parallelizable aspects; but HJ is highly
parallelizable.
SMJ is less sensitive to data skew.
With SMJ, the result is sorted. Why might this be useful?
Rule of thumb: Cost of HJ < SMJ if the number of buffers is between sqrt(X) and sqrt(Y), where X and Y are the sizes of the relations to be joined.
33
Summary of the Two Chapters on Query Evaluation & Optimization
There are several alternative evaluation algorithms for each relational operator.
Two parts to optimizing a query:
Consider a set of alternative plans containing different access paths.
Estimate the cost of each plan that is considered.
• Key issues: Statistics, indexes, operator implementations
• Type of index, and clustered vs. unclustered, are important
We’ve used and analyzed several kinds of joins:
Page Nested Loop (PNL) Join (= BNL with only 3 buffer pages)
Block Nested Loop (BNL) Join Index Nested Loop (INL) Join Sort-Merge Join (SMJ)
Hash Join (HJ)
34
Summary of the Two Chapters (cont.)
A query is evaluated by converting it to a tree of operators and then evaluating the operators in the tree.
We must understand query optimization in order to fully understand the performance impact of a given database design (relations, indexes) on a given workload (set of queries).
For a given query, human experts can sometimes beat a query optimizer by manually determining the best plan, or by cleverly defining the tuning parameters for the optimizer to follow; but these days, it’s tough to beat a good optimizer.
e.g., IBM DB2’s “autonomic” computing initiative (e.g., self-tuning databases)
DBAs can update some parts of the system catalog manually, to force the optimizer into making a particular decision. This is not done very often.
35