CS5481 Data Engineering Tutorial 6
1. Suppose you need to sort a relation of 40 gigabytes, with 4 kilobyte blocks, using a memory size of 40 megabytes. Assume that tS = 5ms and tT = 0.1ms. Find the cost of sorting the relation, in seconds, with bb = 1, 9, 30, 300, and 900, where bb is the number of buffer blocks allocated to each input run and output run.
2.
Apply merge-join to the following numerical example used in the lecture. nstudent =5,000 bstudent =100
ntakes =10,000 btakes =400
What will be the cost if the relations are not sorted and the memory size is still 3 blocks?
What will be the cost if the memory size is increased to 25 blocks and the relations are not sorted?
In order to reduce the number of seeks (without increasing the number of block transfers), what is the number of buffer blocks (i.e., bb) that should be allocated
(i) to each run and the output in the merge step of sorting and,
(ii) for buffering each relation and the output in the join step?
What will be the cost of part (c)?
(a) (b) (c)
(d)
3.
tuples fit on one block. For r2, it has 45,000 tuples and 30 tuples fit on one block. Which join algorithm gives the lowest worst-case cost estimate for r1 r2? Consider only number of block transfers in this exercise. Specify the minimum amount of memory in number of blocks for the worst-case estimates.
Consider two relations r1(A, B, C) and r2(C, D, E). For r1, it has 20,000 tuples and 25