COMP 250
INTRODUCTION TO COMPUTER SCIENCE
Week 13-2 : Hashing
Giulia Alberini, Fall 2020 Slides adapted from Michael Langer’s
WHAT ARE WE GOING TO DO IN THIS VIDEO?
Hash Maps
RECALL: MAP
keys (type K)
values (typeV)
Each (key, value) pairs is an “entry”. For each key, there is at most one value.
ARRAYS OF VALUES
3
6
8
14
4 type V (value) and have O(1) 9
Then, we could use an array of
access.
This would work well is keys are small integers.
12
22
JAVA HASHCODE()
keys K
int
(32 bits)
TODAY: MAPCOMPOSITION
positive integers in small range
3
6
8
14
0 hashcode :
keys (K) int (32 bits)
{−231,…0,…,231 − 1}
4
:
9
:
12
:
22
m-1
values (V)
COMPRESSION MAP
3
6
8
14
keys (K)
int
4
:
9
:
12
:
22
hashcode
0 :
compression
(32 bits)
{−231,…0,…,231 − 1}
m-1
values (V)
COMPRESSION MAP
compression: 𝑖 → 𝑖 𝑚𝑜𝑑 𝑚, where 𝑚 is the length of the array.
3
6
8
14
hashcode
compression (many to 1)
0 :
4
:
9
:
12
:
22
keys (K)
int
(32 bits)
{−231,…0,…,231 − 1}
m-1
values (V)
HASH FUNCTION
“hash values” hash function : keys {0, …, m-1}
3
6
8
14
hashcode
compression (many to 1)
0 :
4
:
9
:
12
:
22
keys (K)
int
(32 bits)
{−231,…0,…,231 − 1}
m-1
values (V)
EXAMPLE
Let m = 7
“hash function” ≡ compression ° hashCode hash code hash value (hash code % 7)
41 6 16 2 25 4 21 0 36 1 35 0 53 4
0 :
6
8
TERMINOLOGY
A “hashCode” maps keys to int
A “hash function” maps keys to “hash values”
We use values both to refer to the values of the hash function as well as the values in the key-value pairs of the map we want to represent!
PROBLEM : COLLISIONS
Two or more keys can map to the same hash value.
“hash values”
3
6
8
14
hashcode()
0 :
“compression”4 :
keys (K)
int
9
:
12
:
22
Collisions can happen in two ways.
m-1
SOLUTION : HASH TABLE (OR HASH MAP)
Each array slot holds a singly linked list of entries
3
6
8
14
hashcode()
“hash values”
0 :
“compression”4 :
keys (K)
int
9
:
12
:
22
m-1
BUCKETS
Each array slot + linked list is called a bucket. This map has m buckets.
0 :
4
:
9
:
12
:
22
3
6
8
14
m-1
Note simpler linked list notation here.
OBSERVATIONS
Why is it necessary to store (key, value) pairs in the linked list? Why not just the values?
LOAD FACTOR
≡
numberof key,value pairsinmap numberofbuckets, m
One typically keeps the load factor below 1.
In the Java HashMap class, the default MAXIMUM
load factor is 0.75
EXAMPLE OF A “GOOD HASH”
0
:
4
:
9
:
12
:
22
m-1
3
6
8
14
EXAMPLE OF A “BAD HASH”
0 :
:
9
:
:
22
m-1
3
6
8
14
EXAMPLE
h : K {0, 1, …, m-1} Example: Suppose keys are McGill Student IDs,
e.g. 260745918.
How many buckets to choose ? Good hash function?
Bad hash function ?
EXAMPLE
h : K {0, 1, …, m-1} Example: Suppose keys are McGill Student IDs,
e.g. 260745918.
How many buckets to choosenumber of entries Good hash function?
Bad hash function ?
EXAMPLE
h : K {0, 1, …, m-1} Example: Suppose keys are McGill Student IDs,
e.g. 260745918.
How many buckets to choose ?number of entries Good hash function? rightmost 5 digits
Bad hash function ?
EXAMPLE
h : K {0, 1, …, m-1} Example: Suppose keys are McGill Student IDs,
e.g. 260745918.
How many buckets to choosenumber of entries Good hash function? rightmost 5 digits
Bad hash function ? leftmost 5 digits
PERFORMANCE OF HASH MAPS
put(key, value) get(key)
remove(key)
If load factor is less than 1 and if hash function is good, then operations are O(1) “in practice”. This beats all potential map data structures we discussed last video.
If we have a bad hash, we can choose a different hash function.
PERFORMANCE OF HASH MAPS
put(key, value)
get(key)
remove(key) contains(value) ?
PERFORMANCE OF HASH MAPS
put(key, value) get(key)
remove(key)
contains(value)
We will need to look through each of the m buckets (i.e. search each linked list for that value)
PERFORMANCE OF HASH MAPS
put(key, value) get(key)
remove(key)
contains(value) getKeys()
getValues()
These last three methods all require traversing the hash table which takes time 𝑂(𝑛 + 𝑚) where 𝑛 is the number of entries and 𝑚 is the number of buckets.
JAVAHashMap
In constructor, you can specify initial number m of buckets, and maximum load factor
(by default m = 16, and max load factor = 0.75)
How is hash function specified ?
JAVAHashMap
In constructor, you can specify initial number m of buckets, and maximum load factor
(by default m = 16, and max load factor = 0.75)
How is hash function specified ?
Use key’s hashCode(), take absolute value, and compress it by taking mod of the number of buckets.
𝑖 → 𝑖 𝑚𝑜𝑑𝑚
JAVAHashSet
Similar to HashMap, but there are no values. Just use it to store a set of
objects of some type. Operations: add(e)
contains( e)
remove( e)
…
If hash function is good, then these operations are O(1). Note that this is not a list! There’s no order in the elements and elements must be unique.
JAVA HashSet
“hash values”
0 :
“compression”4 :
9
:
12
:
22
3
6
8
14
elements E (e.g. Shape)
int
m-1
In the next videos: Graphs