CS5481
Data Engineering Tutorial 4
1. a)
Insert the search-key values 31, 41, 40 in order to the following B+-tree.
For the resultant B+-tree in a), show the form of the tree after deleting 7, 11, 43, 37
13
23
43
7
43
47
2
3
5
7
11
13
17
19
23
29
37
13
40
7
23
31
43
31
37
2
3
5
7
11
40
41
43
47
13
17
19
23
29
b)
and 41 in order.
13
23
31
31
40
47
13
17
19
23
29
2
3
5
c) Re-create the resultant B+-tree in a), i.e., rebuild the tree from an empty tree, using bottom-up B+-tree construction.
29
7
17
40
43
40
41
2
3
5
7
11
13
43
47
17
19
23
29
31
37
CS5481 Data Engineering Tutorial 4
2. Consider a hash function on integer search keys i defined by h(i) = i2 mod B, where B is the number of buckets. What is wrong with this hash function if B = 10?
Since each integer can be represented as 10a + b where 0 b < 10 and so its square modulo 10 is the same as b2 modulo 10.
Thesquaresof0to9modulo10are:0,1,4,9,6,5,6,9,4,1,andsothebucketsfor 2,3,7,8 will always be empty and buckets 1, 4, 6, 9 are twice as likely to be hit.
So, this function is not uniform.
3. A PARTS file with Part# as hash key includes records with the following Part# values: 2369, 3760, 4692, 4871, 5659, 1821, 1074, 7115, 1620, 2428, 3943, 4750, 6975, 4981, 9208. The file uses 8 buckets, numbered 0 to 7. Each bucket is one disk block and holds two records. Load these records into the file in the given order using the hash function h(K)=K mod 8. Calculate the average number of block accesses for a random retrieval on Part#.
The records will hash to the following buckets: K h(K) (bucket number)
2369 1
3760 0
4692 4
4871 7
5659 3
1821 5
1074 2
7115 3
1620 4
2428 4 overflow 3943 7
4750 6
6975 7 overflow 4981 5
9208 0
Two records out of 15 are in overflow buckets, which require an additional block access. The other records require only one block access. Hence, the average time to retrieve a random record is:
(1 * (13/15)) + (2 * (2/15)) = 0.867 + 0.266 = 1.133 block accesses