程序代写代做代考 database data structure cache interpreter go C chain Java game Hash Tables (Chapter 11 in the textbook)

Hash Tables (Chapter 11 in the textbook)
Based on slides of Dan Suciu
1

Dictionary ADT
create : insert : find : delete :
 dictionary
dictionary  key  values  dictionary dictionary  key  values
dictionary  key  dictionary
kim chi
spicy cabbage
Kreplach
tasty stuffed dough
Kiwi
Australian fruit
insert(kohlrabi, upscale tuber)
find(kreplach) kreplach: tasty stuffed dough
2

Implementations So Far
unsorted list
sorted array
Binary Search Trees;
Red-Black – worst case
Array of size n where keys are 0,…,n-1
insert
find+O(1)
O(n)
O(log n)
find
O(n)
O(log n)
O(log n)
delete
find+ O(1)
O(n)
O(log n)
If the keys are 0, 1, …, n-1 then we can do all three in O(1) !
But typically the “space” of keys is much larger than n.
3

Hash Tables: Basic Idea
• Useakey(arbitrarystringornumber)toindex directly into an array – O(1) time to access records
– A[h(“kreplach”)] = “tasty stuffed dough”
– Need a hash function, h, to convert the key to an integer
Key
Data
0 = h(“kim chi”)
“kim chi”
spicy cabbage
1 = h(“kreplach”)
“kreplach”
tasty stuffed dough
2 = h(“kiwi”)
“kiwi”
Australian fruit
4


Applications
When log(n) is just too big…
– Symbol tables in interpreters
– Real-time databases
• air traffic control
• packet routing
When associative memory is needed…
(standard memory: give location, get value at that location;
associative memory: give value, get locations where the value is stored.)
– Dynamic programming
• cache results of previous computation
• Chess endgames
– Many text processing applications – e.g. Web

5

Properties of Good Hash Functions
• Mustreturnnumber0,…,tablesize-1
• Shouldbeefficientlycomputable:O(1)time
• Shouldnotwastespaceunnecessarily
– For every index, there is at least one key that hashes to it
• Load factor lambda  = (number of keys / TableSize)
• Shouldminimizecollisions
= different keys hashing to same index
6

Integer Keys
• Hash(x) = x % TableSize (if the key x is a number)
• In theory it is a good idea to make TableSize prime. Why? Keys often have some pattern
• mostly even
• mostly multiples of 10
• in general: mostly multiples of some k
If k is a factor of TableSize, then only (TableSize/k) slots will ever be used!
To be safe: choose TableSize = a prime.
7

String Keys – converting to integers
• If keys are strings, can get an integer by adding up ASCII values of characters
in key
• Problem 1: What if TableSize is 10,000 and all keys are 8 or less characters long?
• Problem 2: What if keys often contain the same characters (“abc”, “bca”, etc.)?
8
for (i=0;i1/2
22

Closed Hashing II: Quadratic Probing • MainIdea:Spreadoutthesearchforanemptyslot–
Increment by i2 instead of i
• hi(X)=(Hash(X)+i2)%TableSize h0(X) = Hash(X) % TableSize
h1(X) = Hash(X) + 1 % TableSize h2(X) = Hash(X) + 4 % TableSize h3(X) = Hash(X) + 9 % TableSize
23

Quadratic Probing Example
insert(14) insert(8) insert(21) insert(2) 14%7 = 0 8%7 = 1 21%7 =0 2%7 = 2
0000 1111 2222 3333 4444 5555 6666
probes: 1 1 3 1 24
14
14
8
14
8
21
14
8
2
21

Problem With Quadratic Probing
insert(14) insert(8) insert(21) insert(2) 14%7 = 0 8%7 = 1 21%7 =0 2%7 = 2
insert(7) 7%7 = 0
0 1 2 3 4 5 6
14
14
8
14
8
21
14
8
2
21
14
8
2
21
0000 1111 2222 3333 4444 5555 6666
probes:1 1 3 1 ??25

Load Factor in Quadratic Probing
• The problem is called secondary clustering (the set of
filled slots ‘bounces’ around the array in a fixed pattern).
• Theorem: If TableSize is prime and   1⁄2, quadratic probing
will find an empty slot; for greater , might not
• With load factors near 1⁄2 the expected number of probes is empirically near optimal – no exact analysis known
26

Closed Hashing III: Double Hashing • Idea:Spreadoutthesearchforanemptyslotby
using a second hash function
– No primary or secondary clustering
• hi(X)=(Hash1(X)+(i-1)*Hash2(X))mod
TableSize
for i = 0, 1, 2, …
• Good choice of Hash2(X) can guarantee does not get “stuck” as long as  < 1 – Integer keys: Hash2(X) = R – (X mod R) where R is a prime smaller than TableSize 27 Double Hashing Example insert(14) insert(8) insert(21) insert(2) 14%7 = 0 8%7 = 1 21%7 =0 2%7 = 2 insert(7) 7%7 = 0 5-(7%5)=3 0 1 2 3 4 5 6 5-(21%5)=4 0000 1111 2222 3333 4444 5555 6666 14 14 8 14 8 21 14 8 2 21 14 8 2 21 probes:1 1 2 1 ??28 Load Factor in Double Hashing • For any  < 1, double hashing will find an empty slot (given appropriate table size and hash2) • Search cost approaches optimal (random re-hash): – unsuccessful search: – successful search:  1 1ln 1 1 Note natural logarithm! 1 • No primary clustering and no secondary clustering • Still becomes costly as  nears 1. 29 What to do when the hash table is too full: Rehash: • Build a new table with size > 2 * size of old table, and a prime number.
•Take a new hash function (appropriate for the new size). •Insert all the elements from the old table in the new table.
30

Deletion with Separate Chaining
No problem – simply delete element from the linked list
31

Deletion in Closed Hashing
delete(2) find(7) 00
11 22 33 44 55 66
Where is it?!
0
1
2
7
0
1
7
What should we do instead?
32

Lazy Deletion
delete(2) find(7) 00
11 22 33 44 55 66
But now what is the problem?
Indicates deleted value: if you find it, probe again
0
1
2
7
0
1
#
7
33