Hash Functions
Last time we discussed two ways to store tables:
– Direct (very memory inefficient) – Hash table (2D, list of collisions)
Open addressing keeps things in one dimension (so don¡¯t need to grow/shrink lists) but has key limit
Open addressing
Much like direct addressing, we have a simple 1D array (starts empty):
Much like hash tables, when putting
data in we use a hash function:
234
insert 234 h(234) = 4
012345
Open addressing
If this is just a 1D array, where would you put a collision?
012345 insert 584
h(584) = 4
Where do you put the data ¡°584¡±?
234
Open addressing
Probably the most straight forward answer is: the next open spot
012345 insert 584
h(584) = 4
To do this we use a modified hash: g(k, i) = h(k) + i (mod m)
234
584
Open addressing
234
584
012345 insert 584
h(584) = 4
First try to put this data in: g(k,0)
If this is already occupied try: g(k,1) Next try: g(k, 2)… and so on until you find an open spot
Open addressing
Using the next open spot… what issues happen over the long term?
012345 insert 584
h(584) = 4
234
584
Open addressing
This tends to create long consecutive filled entries; 3 compounding issues:
– If a key collides, it extends the length of filled consecutive filled – Longer consecutive filled means more runtime to find an open spot – More keys in array means more chance of collision on next added
Open addressing
You might be tempted to ¡°space out¡± these keys by using:
g(k, i) = h(k) + 2i (mod m)
g(584,1) g(584,0)
584
234
012345 insert 584
h(584) = 4
(skip)
Open addressing
This actually makes the problem worse… consider adding more keys:
g(74,0) g(74,1) g(74,2) g(74,3)
012345 insert 74
h(74) = 4
We g(k,i) can¡¯t insert key 74 despite
having open space
584
8884
234
Open addressing
This issue comes as the size of our array (6) and our ¡°space out¡± (2) are not relatively prime
Relatively prime: if greatest common divisor (GCD) is one
6 divisors = {1, 2, 3, 6} 2 divisors = {1, 2}
GCD(6,2) = 2
2 ¡ 1, so (6,2)
not relatively prime
Open addressing
One easy way to fix this is to have ¡°m¡± a prime number and the ¡°space out¡± anything smaller than ¡°m¡±
This still doesn¡¯t fix the issues that the ¡°space out¡± does not fix our ¡°consecutive filled¡± issue
Open addressing
This can be fixed by changing the ¡°space out¡± to be different for every input:
g(k, i) = h(k) + h2(k)*i (mod m)
Just need to ensure the possible outputs of h2() is less than ¡°m¡±
Open addressing
Set m=7 (array size = prime number)
h(k) = k %10, h (k) = k%4+1 2
need
¡°space out¡±
to be at least 1 or g(k,0)=g(k,1)
6
g(10,0)
g(10,1)
584
8884
234
012345
insert(10) h(10) = 0 h2(10) = 3
Open addressing
Set m=7 (array size = prime number)
h(k) = k %10, h (k) = k%4+1 2
need
¡°space out¡±
to be at least 1 or g(k,0)=g(k,1)
6
g(10,0)
g(10,1)
584
8884
10
234
012345
insert(10) h(10) = 0 h2(10) = 3
Open addressing
Set m=7 (array size = prime number)
h(k) = k %10, h (k) = k%4+1 g(20,0) 2
need
¡°space out¡±
to be at least 1 or g(k,0)=g(k,1)
6
g(10,0) g(20,1)
g(10,1)
584
8884
10
234
012345
insert(10) h(10) = 0 h2(10) = 3
insert(20) h(20) = 0 h2(20) = 1
Open addressing
Set m=7 (array size = prime number)
h(k) = k %10, h (k) = k%4+1 g(20,0) 2
need
¡°space out¡±
to be at least 1 or g(k,0)=g(k,1)
6
g(10,0) g(20,1)
g(10,1)
584
20
8884
10
234
012345
insert(10) h(10) = 0 h2(10) = 3
insert(20) h(20) = 0 h2(20) = 1
Open addressing