Query Evaluation 3
Projections
Nested Loop Joins
John Edgar 2
3.1
Naively, projection and duplicate removal entails
▪ Scan the table, remove unwanted attributes, and write it back
SELECT DISTINCT fname, lname FROM Customer
▪ cost 2B disk I/Os
▪ cost 4B, possibly more if the file is very large
The cost to write the result is less than B
▪ Sort the result, using all of its attributes as a compound sort key
Again, less than 4B
▪ Scan the result, removing the adjacent duplicates as they are encountered; cost B And again, the relation size is now less than B
▪ The cost to write out the result of this last stage is not included; it may be the last operation or may be pipelined into another operations
It appears that the total cost is 7B disk I/Os, but this process can be much improved by combining multiple steps
John Edgar 4
The initial scan is performed as follows
▪ Read M pages and remove unwanted attributes
▪ Sort the records, and remove any duplicates
▪ Write the sorted run to disk
▪ Repeat for the rest of the file, for a total cost of 2B
▪ Actually less than 2B since the result will be smaller than B from attribute size
Perform merge passes as required on the output from
The final result size can be estimated
the first stage
▪ Remove any duplicates as they are encountered
▪ If only one merge pass is required the cost is ≈ 1B
and other statistics
For a total cost of ≈ 3B John Edgar
Unless more merge passes are required
5
Duplicates can also be identified by using hashing Duplicate removal by hashing has two stages
▪ Partitioning and probing In the partitioning stage
in
output buffers
▪ Partition into M-1 partitions using a hash function, h
▪ With an output buffer for each partition, and one input buffer
▪ The file is read into main memory one page at a time, with each record being hashed to the appropriate buffer
▪ Output buffers are written out when full
Partitions contain records with different attribute values
▪ Duplicates are eliminated in the next stage
John Edgar 6
table
result
main memory
The duplicate elimination stage uses a second hash function h2 (h2 h) to reduce main memory costs
▪ An in-memory hash table is built using h2
▪ If two records hash to the same location they are checked to see
if they are duplicates
▪ Duplicates can, instead, be removed using in-memory sorting If each partition produced in the partitioning stage can fit in
main memory the cost is
▪ Partitioning stage: 2B
▪ Duplicate elimination stage: B, for a total cost of 3B
This is the same cost as projection using sorting
John Edgar 7
Approximate cost: actual cost is less since the result is smaller than the original file
Sort and hash projection have the same cost (3B) If M > (B) sorting and sort projection
can be performed in two passes
▪ The first pass produces B/M sorted runs
table M result
▪ If there are less than M-1 of them only one merge pass is
required
Hash projection partitions are different sizes
▪ If just one partition is greater than M-1, further partitioning is required
result
▪ Regardless of the overall size of the file John Edgar
table
M
8
Aggregations without groups are simple to compute
▪ Scan the file and calculate the aggregate amount
▪ Requires one input buffer and a variable for the result
▪ Aggregations can usually be pipelined
from a previous operation
Aggregations with groups require more memory
▪ To keep track of the grouped data
▪ They can be calculated by sorting
or hashing on the group attribute(s)
▪ Or by using an index with all the required attributes
John Edgar 9
SELECT MIN(gpa) FROM Student
SELECT AVG(income) FROM Doctor GROUP BY specialty
The table is sorted on the group attribute(s) The results of the sort are scanned and the
aggregate operation computed
▪ These two processes can be combined in a similar way to the sort based projection algorithm
The cost is driven by the sort cost
▪ 3B(R) if the table can be sorted in one merge pass
▪ Final result is typically much smaller than the sorted table
John Edgar 10
In the hash based approach an in-memory hash table is build on the grouping attribute
▪ Hash table entries consist of
▪ grouping-value,running-information
e.g. count and sum for AVG
The table is scanned and for each record
▪ Probe the hash table to find the entry for the group that the
record belongs to, and
▪ Update the running information for that group
Once the table has been scanned the grouped results are computed using the hash table entries
▪ If the hash table fits in main memory the cost is B(R)
John Edgar 11
It may be possible to satisfy an aggregate query using just the data entries of an index
▪ The search key must include all of the attributes required
for the query
▪ The data entries may be sorted or hashed, and
▪ No access to the records is required
▪ If the GROUP BY clause is a prefix of a tree index, the data entries can be retrieved in the grouping order
▪ The actual records may also be retrieved in this order This is an example of an index-only plan
This may seem unlikely
John Edgar 12
but an index may be created for this use
3.2 / 4.1
SELECT *
FROM Customer NATURAL INNER JOIN Account
Customer ⋈ Account
C.sin = A.sin(Customer Account)
A join is defined as a Cartesian product followed by a selection
▪ Where the selection is the join condition
▪ A natural join’s condition is equality on all attributes in common Cartesian products typically result in much larger tables
than joins
▪ It is important to be able to efficiently implement joins
John Edgar 14
R
S
R⋈S
How do we find joined records without searching the entire space?
RxS
T(R) = 10,000 and T(S) = 4,000
Algorithms
Nested loop joins Sort-merge join Hash join
▪ Assume S has a foreign key that references R ▪ SorecordsinSrelatestoatmostonerecordinR
The sizes of the join and the Cartesian product are ▪ Cartesian product – 40,000,000 records
▪ Natural join – 4,000 (if every s in S relates to an r in R)
John Edgar 15
There are three nested loop join algorithms that compare each record in one relation to each record in the other
▪ They differ in how often the inner relation is read Tuple nested loop join Cost = B(R) + (T(R) * B(S))
▪ Read one page of R at a time
▪ For each record in R
▪ Scan S and compare to all S records
▪ Result has the same ordering as R Improved nested loop join
memory use
R
S
…
out
▪ As tuple nested loop join but scan and compare S for records of
R⋈R.i=S.j S
for each record r R for each record s S
if ri = sj then
add r,s to result
R one page at a time John Edgar
Cost = B(R) + (B(R) * B(S))
16
The simple nested loop join algorithms do not make effective use of main memory
▪ Both require only two input buffers and one output buffer
The algorithm can be improved by making the input buffer
memory use
for R as large as possible
▪ Use M – 2 pages as an input buffer for the outer relation
▪ 1 page as an input buffer for the inner relation, and
▪ 1 page as an output buffer
▪ If the smaller relation fits in M – 2 pages the cost is B(R) + B(S)
R
…
S
out
▪ CPU costs are reduced by building an in-memory hash table on R, using the join attribute for the hash function
John Edgar 17
What if the smaller relation is larger than M-2? ▪ Break R, the outer relation, into blocks of M – 2 pages
▪ I refer (somewhat flippantly) to these blocks as clumps ▪ Scan S once for each clump of R
▪ Insertconcatenatedrecordsr,sthatmatch the join condition into the output buffer
memory
R
…
S
out
▪ SisreadB(R)/(M-2)times
▪ The total cost is B(R) + (B(R)/(M-2) * B(S))
Disk seek time can be reduced by increasing the size of the input buffer for S
M-2 is the clump size
▪ Which may increase the number of times that S is scanned John Edgar 18
R on disk
S on disk
scan S 4 times
Indexes can be used to compute a join where one relation has an index on the join attribute
▪ The indexed relation is made the inner relation (call it S)
▪ Scan the outer relation B(R) reads
▪ While retrieving matching records of S using the index index cost?
The inner relation is never scanned
▪ Only records that satisfy the join condition are retrieved
▪ Unlike the other nested loop joins this algorithm does not compare every record in R to every record in S
Cost depends on the size of R and the type of index ▪ B(R) + (T(R) * index cost)
John Edgar 19
The cost of index nested loop join is dependent on the type of index and the number of matching records
The outer relation is scanned and records of S retrieved by using the index for each record of R
▪ Search index for matching RIDs – access leaf or bucket ▪ If no matching records move on to next record of R
▪ Retrieve matching records
▪ One disk read if a single S record matches one R record
▪ If multiple S records match to a single R the cost is dependent on the number of records and whether the index is primary or secondary
John Edgar 20
System catalog records data to estimate cost