程序代写代做代考 algorithm Query Evaluation 4

Query Evaluation 4

R
S
out

 Assume that both tables to be joined are sorted on the join attribute
▪ The tables may be joined with one pass
▪ Like merging two sorted runs cost = B(R) + B(S)
 Read in pages of R and S – join on x
 While xr != xs
▪ If xr < xs move to the next R record else ▪ Move to the S next record  Ifxr==xs ▪ Concatenate r and s, and ▪ Add to output buffer  Repeat until all records have been read RS id ... 11 17 18 23 34 41 44 47 49 53 61 ... id ... 11 16 20 23 32 40 41 42 46 49 52 ... But R and S may not be sorted on the join attribute John Edgar 2  The sort-merge join* combines the join operation with the merge step of external merge sort ▪ R and S are processed independently *aka sort-join ▪ The first pass makes sorted runs of R and S of size M ▪ Process merge runs of R and S as external merge sort until the combined number of sorted runs is less than M ▪ If M is large or R and S are small this step may not be necessary ▪ The final merge phase of the external sort algorithm is combined with the join, by comparing the runs of R and S ▪ Records that do not meet the join condition are discarded ▪ Records that meet the condition are concatenated and output John Edgar 3  Given sufficient main memory sort-merge join can be performed in two passes ▪ For a cost of 3(B(R) + B(S))  Main memory must be large enough to allow an input buffer for each sorted run of both R and S ▪ Main memory must be greater than (B(R) + B(S)) to perform the join in two passes ▪ Initial pass produces B(R) / M + B(S) / M sorted runs of size M ▪ If M is greater than (B(R) + B(S) ) then (B(R) / M + B(S) / M ) must be less than M John Edgar 4 cost to write out final result not included main memory B(R) = 49 M > (B(R) + B(S)) input page for each
sorted run
M = 10
M < (B(R) + B(S)) insufficient frames for page for each run M = 8 out B(S) = 28 sorted runs of R and S after initial sort pass sorted runs of R and S after initial sort pass Must perform another merge pass John Edgar 5  If both relations have a primary tree index on the join attribute a zig-zag join can be performed ▪ Scan the leaves of the two B+ trees in order from the left ▪ i.e. from the record with the smallest value for the join attribute ▪ When the search key value of one index is higher, scan the other index ▪ When both indexes contain the same search key values matching records are retrieved and concatenated ▪ Recall that the index is typically much smaller than the file Cost = blocks of leaves of both indexes + blocks of matching records John Edgar 6  The hash join algorithm has two phases M = 7 ▪ Partitioning, and in ▪ Probing  Partitioning R (S not shown) ▪ Both relations are partitioned using the same hash function, h, on the join attribute ▪ Records in one partition of R can only match records in the matching partition of S ▪ One input buffer page and M- 1 output buffer pages are used to make M - 1 partitions for each relation ▪ If the largest partition of one relation does not fit in main memory, the relations must be further partitioned John Edgar 7  Probing ▪ Read in one partition of R, where R is the smaller relation ▪ To reduce CPU costs, build an in memory out M=7 hash table using hash function h2 (h2  h) ▪ Read the corresponding partition of S into an input buffer one page at a time ▪ Join matching records using the hash table ▪ Repeat for each partition of R  Cost ▪ If each partition of one relation fits in main memory the overall cost is 3(B(R) + B(S)) John Edgar R after partitioning S after partitioning 8  Relations must be partitioned until the largest partition of the smallest relation (S) fits in main memory  Ideally only one partitioning step is required ▪ Which requires that M - 2 is  (B(S)) ▪ BuffersforRinputandforoutputareneeded ▪ PartitioningproducesB(S)-1partitions ▪ Of average size M / (B(S) - 1) ▪ If M - 2 is  (B(S)) the cost of hash join is 3(B(R) + B(S))  If M < (B(S)) then B(S) / M must be larger than M, and the partitions are larger than main memory ▪ Therefore the relations must be further partitioned John Edgar 9 Assuming that partitions are the same size  Hybrid hash join can be used if M is large ▪ Retain an entire partition of the smaller relation (S) during the partitioning phase ▪ Eliminating the need to write out the partition, and read it back in during the probing phase ▪ Matching R records are joined and written out to the result when R is partitioned ▪ Hence the records of both R and S belonging to that partition are only read once  This approach can be generalized to retain more than one partition where possible John Edgar 10  Partition S (the smaller relation) into k partitions ▪ Retain t partitions, S1, ... St in main memory ▪ The remaining k – t partitions , St+1, ... Sk are written to disk  Partition R into k partitions ▪ The first t partitions are joined to S since those t partitions of S are still in main memory The cost improvement is incremental ▪ The remaining k – t partitions are written to disk  Join the remaining k – t partitions as normal  Cost is B(R) + B(S) + 2 * ((k – t)/k) * (B(R) + B(S)) ▪ = (3 – 2 * t / k)(B(R) + B(S))  (3 – 2 * M / B(R)) (B(R) + B(S)) John Edgar 11  There must be 1 main memory buffer for each partition ▪Sok ≤M k = the number of partitions t = the number of partitions to be retained in main memory ▪ Hybrid hash join is only used where M  B(S), such that (B(S) / k) < M  The ratio t / k, , should be as large as possible ▪ And t / k * B(S) + k - t ≤ M sufficient buffers for the other (k – t) partitions t/k = fraction of S kept in main memory ▪ The retained partitions must fit in main memory with t = 1, k small ▪ One approach: retain one partition and make as few partitions as possible t = 1 and k as small as possible John Edgar 12  Statistics ▪ B(R) = 100,000 ▪ B(S) = 1,000 ▪ M = 200, note that (B(S)) = 31.6  Choose values for k and t ▪ k is the number of partitions and t is the number to be retained in main memory ▪ Selectt=1 ▪ k should be as small as possible while still allowing ▪ one partition to be retained in main memory ▪ one output page for each if the other (k-t) partitions ▪ one input page k = 6 each partition is 167 blocks, 1 is retained leaving 33 blocks for 1 input buffer and 5 output buffers for the other partitions John Edgar 13 partition S – read in all of S and write out (k-t) / k = 5/6 of S and retain one partition use partition 1 of S (s1) s2 s3 s4 s5 s6 in frames ... ... # 0 1 ... 165 166 167 168 169 170 171 172 partition R – read in all of R, write out (k-t) / k = 5/6 of R and join partition 1 of R and S use s1 r2 r3 r4 r5 r6 in result frames ... ... ... # 0 1 ... 165 166 167 168 169 170 171 172 173 read in second partition of S and scan and join second partition of R use s2 in (r2) result frames ... ... # 0 1 ... 165 166 167 168 repeat for the remaining four partitions of R and S John Edgar 14  Statistics ▪ B(R) = 100,000 ▪ B(S) = 1,000 ▪ k = 6, t = 1  Cost ▪ Read all of S – cost = B(S) = 1,000 ▪ Write out 5/6 of S – cost = B(S) * 5/6 = 833 ▪ Read all of R – cost = B(R) = 100,000 ▪ Write out 5/6 of R – cost = B(S) * 5/6 = 83,333 ▪ Read remaining partitions of R – cost = 833 ▪ Scan and probe matching partitions of S – cost = 83,333 ▪ Total cost = B(R) + B(S) + 2 * (5/6) * (B(R) + B(S)) = 269,333 John Edgar 15  If the smaller relation fits in main memory the costs are identical ▪ The smaller relation is read once B(R) + B(S) ▪ The larger relation is scanned once to join the records  Otherwise hybrid hash join is more efficient ▪ Block nested loop reads R once ▪ But S once for each clump of R ▪ Hybrid hash join reads one partition of R and S once ▪ Reads the other partitions twice and writes them once ▪ And the records of both R and S belonging to a particular partition are only read once, after the partitioning phase John Edgar 16 Sort-join in 2 passes: M > (B(R) + B(S)) insufficient frames!
M = 10 B(R) = 60 B(S) = 80
Hash join in 2 passes: M > (B(smaller)) OK!
M = 10
B(R) = 60 B(S) = 80
But
out
etc.
sort-join not sensitive to data skew
sort-join results sorted on join attribute
sorted runs of R and S after initial sort pass
partitions of R and S
John Edgar
17

 Simple nested loop join (read S for each record)
▪ Retains the original order of R  Index nested loop join
▪ Retains the original order of R  Sort join
▪ Ordered by the join attribute  Zig-zag join
▪ Ordered by the join attribute  All other join methods
▪ No order
But an awful algorithm …
Order might make an upstream operation is more efficient
Such as a join with a third table on the same join attribute
John Edgar
18

 The join process is more complex if the join condition is not simple equality on one attribute
 For equalities over several attributes
▪ Sort-merge and hash join must sort (or hash) over all of the
attributes in the selection
▪ An index that matches one of the equalities may be used for the index nested loop join
 For inequalities (, , etc.)
▪ Hash indexes cannot be used for index nested loop joins ▪ Sort-merge and hash joins are not possible
▪ Other join algorithms are unaffected
John Edgar 19

SELECT fName, lName FROM Patient INTERSECT
SELECT fName, lName FROM Doctor
fName,lName(Patient)  fName,lName(Doctor)
Note that set operations, unlike other operations remove duplicates by default
 Intersection R  S
▪ A join where the condition is equality on all attributes
 Cartesian product R  S
▪ A special case of join where there is no join condition ▪ All records are joined to each other
fName,lName(Patient) ⋈ fName,lName(Doctor)
John Edgar 20

 Union using sorting
▪ Sort R and S using all fields
▪ Scan and merge the results while removing duplicates
 Union using hashing
▪ Partition R and S using a hash function h
▪ For each partition of smaller relation (S)
▪ Build an in-memory hash table (using h2)
▪ Scan the corresponding partition of R, and for each record probe the hash table if it is not in the table, add it
 Set difference
▪ Similar to union except that for R – S, if records are not in the hash
table for S add it to the result John Edgar
The result is separate from the S hash table
21

Memory Requirements

Operation
Algorithm
M Requirement
Disk I/O
, 
scan
1
B
, *
scan
B
B
, , −, 
scan
min(B(R), B(S))
B(R) + B(S)

nested loop
min(B(R), B(S))
B(R) + B(S)

nested loop
M2
B(R) + B(R) * B(S)/M
* = duplicate removal,  = grouping
cost is greater if M requirement is not met
John Edgar 23

Operation
M Requirement
Disk I/O
Notes
,
B
3B
, , −
(B(R) + B(S))
3(B(R) + B(S))

(B(R) + B(S))
3(B(R) + B(S))
sort-merge join
cost is greater if M requirement is not met
John Edgar 24

Operation
M Requirement
Disk I/O
Notes
,
B
3B
, , −
B(S)
3(B(R) + B(S))
B(S) is smaller relation

B(S)
3(B(R) + B(S))

> B(S)
(3 – 2 * t / k)(B(R) + B(S))note
hybrid hash-join
Assume B(S)  B(R), and B(S)  M cost is greater if M requirement is not met
note – reduction dependent on relative sizes of M and R
John Edgar
25

 Hash indexes can be used if there is an equality condition for every attribute in the search key
▪ e.g. a single hash index on {city, street, number}
▪ city=”London”street=”Baker”number=221(Detective) can be used
▪ city=”LosAngeles”street=”Cahuenga”(Detective)cannot
 Tree indexes can be used if there is a selection on
each of the first n attributes of the search key ▪ e.g. B+ index on {city, street, number}
▪ city=”London”street=”Baker”number=221(Detective) can be used ▪ city=”LosAngeles”street=”Cahuenga”(Detective)canbeused
John Edgar 27

 If an index matches a subset of the conjuncts
▪ Use the index to return a result that contains some
unwanted records
▪ Scan the result for matches to the other conjuncts
▪ city=”London”street=”Baker”number=221fName=”Sherlock” (Detective)
▪ Use the address index and scan result for Sherlocks
 If more than one index matches a conjunct
▪ Either use the most selective index, then scan the result,
discarding records that fail to match to the other criteria
▪ Or use all indexes and retrieve the rids
▪ Then take the intersection of the rids and retrieve those records
John Edgar 28

 Consider the relation and selection shown below
▪ Detective = {id, fName, lName, age, city, street, number, author}
▪ city=”New York”author=”Spillane”lName=”Hammer”(Detective)  With indexes
▪ Secondary hash index, {city, street, number} ▪ Secondary B+ tree index, {lName, fName}
▪ Secondary hash index, {author}
cannot be used
can be used
 There are two strategies:
▪ Use the most selective of the two matching indexes, and search
the results for the remaining criteria
▪ Use both indexes, take the intersection of the rid
can be used
John Edgar 29
What if the B+ tree index is primary?

 Consider the selections shown below
▪ (author=”King”  age>35)(lName=”Tam”  id=11)(Detective)
▪ (author=”King”)  (lName=”Tam”  id=11)(Detective)  Indexes on the relation
▪ Secondary B+ tree index, {lName, fName}
▪ Secondary hash index, {author}  Compare the two selections
▪ In the first selection each conjunct contains a disjunction without an index (age, id) so a file scan is required
▪ In the second selection the index on author can be used, and records that don’t meet the other criteria removed
John Edgar 30

 If the smaller relation fits in main memory the costs are identical
▪ The smaller relation is read once
B(R) + B(S)
▪ The larger relation is scanned once to join the records
 Otherwise hybrid hash join is more efficient
B(R) + 5 * B(S) = 6n
▪ Block nested loop reads R once
▪ But S once for each clump of R
(B(R) + B(S)) / 5 +
(B(R) + B(S)) * 12 / 5
= 2n / 5 + 24n / 5 ▪ Hybrid hash join reads one partition of R and S once
= 26n / 5
▪ Reads the other partitions twice and w=r5i.t2ens them once
▪ And the records of both R and S belonging to a particular partition are only read once, after the partitioning phase
John Edgar 31