程序代写代做代考 database go algorithm Excel Query Evaluation & Query Optimization (cont.)

Query Evaluation & Query Optimization (cont.)
Ed Knorr, Fall 2020
Chapters 12 & 14 Combined (Part 2 of 2):
Based on:
Ramakrishnan & Gehrke (textbook):
(a) Sections 12.4.1 to 12.6.2 (on pages 405-417);
(b) Sections 14.4.2. to 14.7; but, don’t worry about: the 2B refinement (page 462), double buffering (page 463), and hybrid hash join (pages 465-466).
Other References:
UBC Database Faculty Members,
“Readings in Database Systems” by Hellerstein & Stonebraker
1

Learning Goals
 Compute the cost of evaluating a given query using sort-merge join (SMJ).
 Justify the use of metadata in estimating the cost of computing relational queries. Explain why the System R model has stood the test of time for query evaluation.
 Provide examples of—and explain—how “out of date” catalog statistics can provide horrendous query performance.
 Compare and contrast the roles that sorting and hashing have (for intermediate results) when evaluating a query.
 Compute the cost of evaluating a given query using a hash join (HJ).
 Identify possible uses of HJ, and explain how they scale to larger datasets.
 Justify which type of join to perform (i.e., PNL, BNL, INL, SMJ, HJ) when evaluating a query, given a list of indexes, catalog statistics, reduction factors, and other assumptions.
 Construct a query evaluation tree for evaluating an SQL query.
 Explain the purpose of left-deep joins and pipelining. Relate these to the complexity of evaluating a query plan when joining multiple tables.
2

Learning Goals (cont.)
 Estimate the number of page I/Os required to evaluate a query using various plans, both with and without pipelining, for: (a) single-relation queries, and (b) multiple-relation queries (where “multiple” is a very small positive integer like 2 or 3).
 Analyze and explain (in high-level terms) the additional complexity that multiple-relation queries bring to query evaluation and optimization. For example, demonstrate how additional tables, joins, indexes, reduction factors, etc. complicate the process.
 Suggest additional forms of metadata (that are currently not available) that would be useful when performing query evaluation.
 [Link to an earlier unit] Justify the role that self-tuning RDBMSs may have in improving query performance. Provide examples of DBA activity (relating to this chapter) that can be simplified using autonomic computing.
 Describe, using a diagram, how HJ can improve BNL join performance (over SMJ).
 Explain the conditions under which you should use HJ instead of SMJ.
3

Schemas for Our Examples
(You may wish to bookmark this page.)
Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer, day: date, rname: string)
Sailors has 4+38+4+4 = 50 bytes; Reserves has 4+4+10+22 = 40 bytes  Sailors: NTuplesS = 40,000; NPagesS = 500
 Each tuple is 50 bytes long, 80 tuples per page
 Reserves: NTuplesR = 100,000; NPagesR = 1,000  Each tuple is 40 bytes long, 100 tuples per page
4

2.
Go to Step (1) to resume scanning R and S, until end-of- file.
Sort-Merge Join (SMJ): (R ⋈ S) i=j
 What if neither R nor S has an index on the relevant columns?
 One solution: Sort both R and S on the column(s) to be joined, then scan them to do a merge on the join column(s), and finally output the joined tuples.
1.
Scan R until current R-tuple ≥ current S tuple, then scan S
until current S-tuple ≥ current R tuple
• Join tuples only if the relevant column(s) in the current R tuple = those in the current S tuple
• All R tuples with this same value in Ri (current R group) and all S tuples with this same value in Sj (current S group) match.
• Output all possible pairs that match.
5

Sort-Merge Join (R ⋈ S) (cont.) i=j
 R is scanned once; each S group is scanned once per matching R tuple
 Multiple scans of an S group are likely to find needed pages in buffer
 Compare the expected and worst-case complexities.
6

22 dustin 28 yuppy 31 lubber 44 guppy 58 rusty
7 45.0 9 35.0 8 55.5 5 35.0 10 35.0
Example of Sort-Merge Join
sid sname rating age
28 103 28 103 31 101 31 102 31 101 58 103
12/4/96 11/3/96 10/10/96 10/12/96 10/11/96 11/12/96
guppy yuppy dustin lubber lubber dustin
sid bid
day
rname
 The cost of scanning is O(M+N), but could be O(MN) even though that’s very unlikely.
7

Example of Sort-Merge Join (cont.)
 For the Sailors and Reserves tables whose schemas are shown on Slide 4:
 With B = 35, 100, or 300 buffer pages, both Reserves and Sailors can be sorted in 2 passes, and therefore the total join cost = 7500 page I/Os.
8

Highlights of IBM’s System R Optimizer
 From the 1970s. Its impact on relational DBMSs? Huge!  Currently, the most widely used model
 Works well for < 10 joins  Estimates the “cost” of a query (often, we can only approximate the cost ... but a relative ranking is sufficient)  Statistics, maintained in DBMS catalogs, are used to estimate the cost of operations and their result sizes.  Considers a combination of CPU and I/O costs, but mostly the latter  Deals with probabilities and assumptions  Plan Space: Usually too large; so, must be pruned  Sometimes, only left-deep plans are considered. • Left-deep plans allow the output of each operator to be pipelined into the next operator, without storing the intermediate result in a temp. table.  Avoid large intermediate relations, if possible. 9 Left-Deep Plans and Pipelining  Outer table is the left table in a join expression  Left table can be a temporary relation  Right, too – if we don’t restrict ourselves to left-deep plans  Examples: 10 Highlights of Optimizers in General  Optimizing a query that joins n relations is NP-hard.  Plan Space  Bushy trees may be fine, not just the left-deep ones. • Today, most optimizers consider them. • They can yield some excellent/efficient plans. • We will deviate from the System R simplification, and consider bushy trees in some of our analysis.  A plan can include all kinds of joins and early/late selections and projections, including: • Table scan, index scan, BNL, INL, SMJ, hash join (HJ), duplicate elimination, grouped aggregation, etc. 11 Size Estimation and Reduction Factors (aka Selectivity Estimation)  Consider a query block: SELECT attribute list FROM relation list WHERE term1 AND ... AND termk  The maximum # tuples in the result is the product of the cardinalities of the relations in the FROM clause.  A reduction factor (RF) associated with each term reflects the impact of the term in reducing the result size.  RF = fraction of tuples (in the table) that satisfies a conjunct  The smaller that the RF is, the more selective the access path is. 12 Selectivity Estimation (cont.) SELECT attributelist FROM Reserves WHERE rname < ‘C%’  Let’s revisit Chapter 12’s index example, for the Selection operation.  With RF = 2/26, recall: • Clustered B+ tree: We need at least 2/26 of 1000 data pages, i.e.,  77 pages. • Unclustered B+ tree: This case is trickier. As a rule of thumb, in the worst case, we need at least 2/26 of 100,000 tuples, i.e.,  7692 tuples, so at least 7692 I/Os may be needed—assuming that we do not have enough buffer space to retain the pages after they’ve been read in during the first time. • With a sufficient number of buffers, we might still be able to sort the RIDs before retrieving the records; but, for large RID lists, the total number of I/Os to sort, etc., may still greatly exceed the # of data pages. 13 Unclustered Indexes: Optimization for Retrieval of Tuples  Example: If 10,000 tuples qualify using an unclustered index, and we have a small number of buffer pool pages, the tuple retrieval cost could be 10,000 page I/Os.  Important refinement for unclustered indexes: 1. Find the qualifying data entries. 2. Sort the rids of the data records to be retrieved (e.g., put page numbers in ascending sequence). 3. Fetch the rids in order. • This ensures that each data page is looked at just once, with the hope that multiple records can be fetched with one I/O. • But ... we’ll need to have an adequate number of buffers to do the sorting. 14 Multiple Indexes: Optimization Using Intersection of RIDs  How to handle something like: day < 2003-08-19 AND bid=5 AND sid=3  First approach: Find the most selective access path, retrieve tuples using it, and then apply any remaining terms that don’t match the index.  Second approach: Get sets of rids of data records using each matching index.  Then, intersect these sets of rids.  Then, retrieve the records and apply any remaining terms. 15 When Are We Better Off Performing a Table Scan Instead of Using an Index?  Clustered case: cost = expected # of internal pages to get to the leaf level + expected # of qualifying leaf pages + expected # of data pages (qualifying tuples / # tuples per page) • The last term is the number of qualifying data pages. • For B+ trees, note that “# of internal pages to get to the leaf level” is the height of the B+ tree; but, in the case of hash indexes, we’ll assume it’s used for loading the directory.  Use a table scan if the above cost is ≥ # of pages in the table.  Unclustered case:  How would the above calculation change? 16 Example of an Alt. 2 Clustered Case: 17 Motivating Example: Introduction SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5
 What is the relational algebra expression for the above SQL query?
18

Motivating Example (cont.)
sname
SELECT S.sname
FROM Reserves R, Sailors S WHERE R.sid = S.sid AND
(1) Query Tree:
R.bid = 100 AND S.rating > 5
Sailors (On-the-fly)
 Assuming no indexes, but 3 buffer pages  Use PNL (BNL) Join
 Pipeline the results
 Cost estimate =
bid=100
rating > 5
(On-the-fly)
1000 + 1000 * 500 = 501,000 page I/Os
sid (BNL Join) Reserves Sailors
(2) Query Plan:
sname
Reserves
bid=100
rating > 5
sid
19

Motivating Example (cont.)
SELECT S.sname
FROM Reserves R, Sailors S WHERE R.sid = S.sid AND
sid=sid
R.bid = 100 AND S.rating > 5
 But, this misses several opportunities: selections could have been ‘pushed’ earlier, no use is made of
Same Plan: sname
any available indexes, etc.
bid=100
rating > 5
(On-the-fly)
 Goal of optimization: Find more efficient plans that compute the same answer.
(PNL Join) sid
Reserves
Sailors
20
Same Tree:
sname
bid=100
rating > 5
Reserves (On-the-fly)
Sailors

Alternative Plan 1, (Still) No Indexes
sname(On-the-fly)
 Assume B = 5 buffer pages
(Scan; write to temp T1)
bid=100 Reserves
rating > 5 (Scan; write to
 Main difference: push selects
Sailors
 Cost of plan:
 Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats,
uniform distribution)
 Scan Sailors (500) + write temp T2 (250 pages, if we have 10 ratings)
 Sort T1 (2*2*10), sort T2 (2*4*250), merge (10+250)
 Total: 4060 page I/Os (Final output cost is usually ignored. Why?)
sid T1
(Sort-Merge Join) T2
temp T2)
21

Alternative Plan 1, No Indexes (cont.)
sname(On-the-fly)
 If we use BNL join, the join cost is 10+4*250; so, the total cost is now 2770:
(Scan; write to temp T1)
bid=100 Reserves
rating > 5 (Scan; write to
sid T1
(SMJ) T2
Sailors
temp T2)
22

Alternative Plan 1,
sname(On-the-fly)
No Indexes (cont.)
T1 bid=100
sid
(BNL) T2
 Can we do better?
(Scan; write to temp T1)
(Scan; rating > 5 write to
 Suppose we ‘push’ the projections:
 T1 has only sid; T2 has only sid and sname  T1 now fits in 1 page. How?
temp T2)
Reserves
Sailors
23

Alternative Plan 2 With Indexes
sname (On-the-fly)
 With clustered hash index on bid of Reserves, we get RF * 100,000 tuples qualifying = 1/100 * 100,000 tuples = 1,000 tuples … or, equivalently,
sid bid=100 Reserves
(Index Nested Loops, with pipelining)
1/100 * 1000 pages = 10 pages @ 100 tuples each
result to temp)
 INL with pipelining (outer is not materialized)
 Join column sid is a PK for Sailors
–At most one matching tuple; so, unclustered index on sid is OK
(Use hash index; do not write
Sailors
(Hash index)
rating > 5
(On-the-fly)
24

Alternative Plan 2 With Indexes (cont.)
sname (On-the-fly)
 Decision not to push rating > 5 before the join is based on availability of unclustered sid index on Sailors
sid
(INL, pipelining)
 Cost: Selection of Reserves tuples using clustered hash index has cost = 2-1 + 4 + 10; and, for each of the 1000 qualifying tuples, we must get the unique, matching Sailors tuple using the Sailors hash index on sid; cost = 1000*(1.2+1) more I/Os
result to temp)
 Grand total = 15 + 2200 = 2215 page I/Os
 We’ll post details of the calculations in a separate PDF document.
(Use hash index; do not write
bid=100 Reserves
Sailors
(Hash index)
rating > 5
(On-the-fly)
25

Block Nested Loop (BNL) Join with B Buffers: An Optimization
 Performance Improvement with a Hash Table:
 For each block of B-2 pages of R: • Create a hash table for that block • Then, for each page of S:
R&S tables

• For each tuple s in the current page of S:
• For each matching tuple r in R’s hash table • Add joined to result
Hash Table for Block of R
… …
Join Result
Input Buffer for S Output Buffer
26

Hash Join
Original Relation
OUTPUT 1
Partitions
 (1) Partition Phase: Partition both relations using hash fn h1: R tuples in partition i will only match S tuples in partition i during the join.
INPUT
2
1 2
 (2) Join Phase: Read in a partition of R, hash it using h2 (!= h1). Scan matching partition i of S, searching for matches.
h2
h2

Hash Function
Disk
Hashed Partitions of R & S
Join Result
Disk
B Main Memory Buffers Disk
hash fn
Hash Table for Partition Ri
h1 B-1
B Main Memory Buffers Disk
B-1
Input Buffer for Si
Output Buffer
27

b) c)
Temporarily store each Ri partition on disk.
Hash Join: Details
(1) Partition Phase. Suppose we have B = 101 buffers, R = 1000 pages, S = 500 pages, and hash function h1(k).
a)
After hashing all 1000 pages of R, the 100 partitions for R (i.e., R1, R2, …, R100) each has about 10 pages.
Repeat (a) and (b) for S using the same hash function. • What is the average size of each of S’s partitions, Si?
28

Hash Join: Details (cont.)
(2) Join Phase. Hash again, this time with a different hash function h2(k):
 Fori=1toB-1do:
a) Take partition Ri’s 10 pages (approx.) and create a hash table for all its tuples in memory. (Most buckets in this new hash table contain few entries.)
b) For each page in Si , read the page, and for each tuple on that page, apply h2(k) to see which bucket of Ri to search, for a possible join.
c) Output the joined tuples, if any.
29

Observations about Hash Join
 When we build an in-memory hash table during the merge phase, often a little more memory is needed.
 e.g., a 10-20% “fudge factor”
 If the hash function does not partition uniformly, one or more R partitions may not fit in memory.
 If so, we can apply the hash-join technique recursively (another round) to do the join of an Ri partition with its corresponding Si partition.
 If a table fits into memory, there’s no need to partition.
 Stop partitioning when the partitions fit into memory.
 Note that hashing can also be used to quickly:
 Detect duplicates (e.g., during projection)
 Perform other kinds of set operations (e.g., set difference)
30

Cost of Hash Join
 Summary from textbook: As long as the number of buffers B > sqrt(X), where X is the number of pages in the smaller relation, and we allow a small fudge factor, then…
 In the partitioning phase, we read and write both relations; so, we use 2(M+N) page I/Os so far.
 In the merging (matching) phase, we read both relations; so, we use M+N more page I/Os.
 To join Reserves & Sailors, this is a total of: 3(1000 + 500) = 4500 page I/Os
31

Cost of Sort-Merge Join
 Recall 2PMMS from Chapter 13.
 Given a sufficient number of buffers, if we don’t count the final
output stage (for a plan), then SMJ can finish in 3(M+N) I/Os, too.
 Optional summary of a long explanation from the textbook (and you don’t have to remember this): If B > sqrt(Y), where Y is the size of the larger relation, and if we use a sorting refinement that produces runs of length 2*B in Pass 0 (see the textbook), then we can do SMJ in 3(M+N) I/Os.
32

Summary of SMJ vs. HJ
 HJ is superior if the relation sizes differ greatly.
 SMJ has some parallelizable aspects; but HJ is highly
parallelizable.
 SMJ is less sensitive to data skew.
 With SMJ, the result is sorted. Why might this be useful?
 Rule of thumb: Cost of HJ < SMJ if the number of buffers is between sqrt(X) and sqrt(Y), where X and Y are the sizes of the relations to be joined. 33 Summary of the Two Chapters on Query Evaluation & Optimization  There are several alternative evaluation algorithms for each relational operator.  Two parts to optimizing a query:  Consider a set of alternative plans containing different access paths.  Estimate the cost of each plan that is considered. • Key issues: Statistics, indexes, operator implementations • Type of index, and clustered vs. unclustered, are important  We’ve used and analyzed several kinds of joins:  Page Nested Loop (PNL) Join (= BNL with only 3 buffer pages)  Block Nested Loop (BNL) Join  Index Nested Loop (INL) Join  Sort-Merge Join (SMJ)  Hash Join (HJ) 34 Summary of the Two Chapters (cont.)  A query is evaluated by converting it to a tree of operators and then evaluating the operators in the tree.  We must understand query optimization in order to fully understand the performance impact of a given database design (relations, indexes) on a given workload (set of queries).  For a given query, human experts can sometimes beat a query optimizer by manually determining the best plan, or by cleverly defining the tuning parameters for the optimizer to follow; but these days, it’s tough to beat a good optimizer.  e.g., IBM DB2’s “autonomic” computing initiative (e.g., self-tuning databases)  DBAs can update some parts of the system catalog manually, to force the optimizer into making a particular decision. This is not done very often. 35