CS W186 Introduction to Database Systems
Spring 2020 Josh Hug, Michael Ball DIS 5
1 General External Merge Sort
Given – N pages to sort, B buffer pages in memory
Pass 0 – Use B buffer pages. Produce N sorted runs of B pages each.
B Further passes – Merge B − 1 runs.
Last pass – Produces 1 run of N pages.
Total I/O cost – 2N (# of passes) = 2N (1 + ⌈log ⌈ N ⌉⌉)
B−1 B
(a) You have 4 buffer pages and your file has a total of 108 pages of records to sort. How many
passes would it take to sort the file?
(b) How many runs would each pass produce?
(c) What is the total cost for this sort process in terms of I/O?
(d) If the pages were already sorted individually, how many passes would it take to sort the file and how many I/Os would it be instead?
(e) If we wanted to sort N pages in at most p total passes, write an expression relating the minimum # of buffer pages B needed with N and p. What do you notice about B when p = 1?
CS W186, Spring 2020, DIS 5 1
2 Hashing
Given – N pages to hash, B buffer pages in memory
Initial partitioning – Read in N pages and hash into B − 1 partitions. Write out
B−1
∑ (# of pages in partition i) i=1
Recursive partitioning – For each individual partition, recursively partition if its size s > B. Building in-memory hash table – Once a partition’s size s ≤ B, read in s pages, build in-memory hash table, and write out s pages.
(a) What are some use-cases in which hashing is preferred over sorting?
(b) SupposewehaveBbufferpagesandcanprocessB(B−1)pagesofdatawithExternalHashing in two passes. For this case, fill in the blanks with the appropriate # of pages.
______ input buffer(s)
______ partitions after pass 0
______ pages per partition
(c) If you are processing exactly B(B − 1) pages of data with external hashing, is it likely that you’ll have to perform recursive external hashing? Why or why not?
(d) We want to hash N = 100 pages using B = 10 buffer pages. Suppose in the initial partitioning pass, the pages are unevenly hashed into partitions of 10, 20, 20, and 50 pages. Assuming uniform hash functions are used for every partitioning pass after this pass, what is the total I/O cost for External Hashing?
CS W186, Spring 2020, DIS 5 2