CS W186 Introduction to Database Systems
Spring 2020 Josh Hug, Michael Ball DIS 6
1 Assorted Joins
• Companies: (company_id, industry, ipo_date) • NYSE: (company_id, date, trade, quantity)
We have 20 pages of memory, and we want to join two tables Companies and NYSE on C.company_id = N.company_id. Attribute company_id is the primary key for Companies. For every tuple in Companies, assume there are 4 matching tuples in NYSE.
NYSE contains [N] = 100 pages, NYSE holds pN = 100 tuples per page. Companies contains [C] = 50 pages, C holds pC = 50 tuples per page.
There are unclustered B+ indexes on C.company_id and N.company_id – for both indexes, assume it takes 2 I/Os to access a leaf.
(a) How many disk I/Os are needed to perform a simple nested loops join?
(b) How many disk I/Os are needed to perform a block nested loops join?
(c) How many disk I/Os are needed to perform an index nested loops join?
(d) For this part only, assume the index on NYSE.company_id is clustered. What is the cost of an index nested loops join using companies as the outer relation?
CS W186, Spring 2020, DIS 6 1
(e)
(f)
2
How many disk I/Os are needed to perform a sort merge join? This is ordinary sort-merge, not optimized sort-merge.
How many disk I/Os are needed to perform a hash join?
Grace Hash Join
We have 2 tables – Catalog and Transactions.
Catalog has a total of 100 pages and 20 tuples per page. Transactions has a total of 50 pages and 50 tuples per page. Assume that the distribution among the key that we are joining on is uniform for the two tables.
(a) If we had 10 buffer pages, how many partitioning phases would we require for grace hash join? Consider which table we should build the hash table in the probing phase on.
(b) What is the I/O cost for the grace hash join then?
(c) For the above question, if we only had 8 buffer pages, how many number of partition phases would there be?
(d) What will be the I/O cost?
CS W186, Spring 2020, DIS 6 2