程序代写代做代考 data structure chain COMP251: Hashing

COMP251: Hashing
Jérôme Waldispühl School of Computer Science
McGill University
Based on (Cormen et al., 2002)

Problem Definition Table S with n records x:
X
Satellite data
We want a data structure to store and retrieve these data. Operations:
Key[x]
Information or data associated with x
• insert(S,x):S←S∪{x} • delete(S,x):S←S\{x} • search(S,k)
Dynamic set

Direct Address Table
Illustration (CLR, 2005)
• If there is an element x with key k, then T[k] contains a pointer to x.
• If T [k] is empty, represented by NIL.
All operations in O(1), but if n (#keys) < m (#slots), lot of wasted space. • Each slot, or position, corresponds to a key in U . Hash Tables • Reduce storage to O(n) keys. • Resolve conflicts by chaining. • Search time in O(1) time in average, but not the worst case. Hash function: h:U →{0,1,...,m−1} h(k1) h(k4) Analysis of Hashing with Chaining Insertion: O(1) time (Insert at the beginning of the list). Deletion: Search time + O(1) if we use a double linked list. Search: • Worst case: Worst search time to is O(n). Search time = time to compute hash function + time to search the list. Assuming the time to compute hash function is O(1). Worsttimehappenswhenallkeysgothesameslot(listof sizen), and we need to scan the full list => O(n).
• Average case: It depends how keys are distributed among slots.

Average case Analysis Assume a simple uniform hashing: n keys are distributed
uniformly among m slots.
Let n be the number of keys, and m the number of slots.
Average number of element per linked list?
α = mn
Theorem:
The expected time of a search is Θ(1 + α). Note: O(1) if α < 1, but O(n) if α is O(n). Load factor: Average case Analysis Theorem: The expected time of a search is Θ(1 + α). Proof? Distinguish two cases: • search is unsuccessful • search is successful Unsuccessful search • Assume that we can compute the hash function in O(1) time. • An unsuccessful search requires to scan all the keys in the list. Average search time = O( 1 + average length of lists ) Let ni be the length of the list attached to slot i. Average value of ni ? E ( n i ) = α = mn Þ O( 1 ) + O( α ) = O( 1 + α ) (Load factor) Successful search • Assume the position of the searched key x is equally likely to be any of the elements stored in the list. • New keys inserted at the head of the list ⇒ Keys scanned after finding x have been inserted in the hash table before x. • Use indicator to count the number of collisions: m X =I h(k)=h(k ) ;E(X )= 1 (probabilityofacollision) ij{i j}ij Successful search & 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑘𝑒𝑦𝑠 𝑖𝑛𝑠𝑒𝑟𝑡𝑒𝑑 𝑎𝑓𝑡𝑒𝑟 𝑥 = 1 + ; 𝑋#! !"#$% 1&& 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑐𝑎𝑛𝑛𝑒𝑑 𝑘𝑒𝑦𝑠 = 𝐸 (1∑n " ∑n %+ 1∑n " ∑n % E * $ 1 + X i j ' - = $ 1 + E () X i j +, ' *n$ '-n$ ' 𝑛 ; #"% 1 + ; !"#$% 𝑋#! ) i=1 # j=i+1 &, i=1 # j=i+1 & 1∑n" ∑n 1% Search time: Θ 1+1+𝛼−𝛼 =Θ(1+𝛼) 2 2𝑛 = $1+ ' n$m' i=1 # j=i+1 & =1+𝛼−𝛼 2 2𝑛 Supplementary material Designing a hash function Properties: 1. Uniform distribution of keys into slots 2. Regularity in key disturb should not affect uniformity. List of functions: • Division method • Multiplication methods • Open addressing: • Linear probing • Quadratic probing • Double hashing Binary Numbers (reminder) Each integer x accepts a unique decomposition x = ∑ai ⋅ 2i where0≤ai <2 Example: x =11=1⋅20 +1⋅21 +0⋅22 +1⋅23 i The binary number representation of an integer x is its (reversed) sequence of a’s. 23 22 21 1, 0,1,1 01 20 1 1 Example: x =11→ →1011 Binary number operations: 101101 >> 1 = 10110 (right shift) : quotient of division by 2k 101101 << 2 = 10110100 (left shift) : multiplication by 2k 101101 mod 22 = 01 (modulo 2k) : remainder of division by 2k Division Method h(k)=kmodd d must be chosen carefully! Example 1: d = 2 and all keys are even? Odd slots are never used... Example 2: d = 2r k = 100010110101101011 r = 2 -> 11
r = 3 -> 011 r = 4 -> 1011
keeps only r last bits…
Good heuristic: Choose d prime not too close from a power of 2. Note: Easy to implement, but division is slow…

Multiplication method
h(k) = (A ⋅ k mod 2w )>> (w − r) 2w-1 { 1, … , m } Universe of keys probe number slot
Constraints:
• 𝑛 ≤ 𝑚 (i.e. more slots than keys to store)
• Deletion is difficult
Challenge: How to build the hash function?

Open addressing
Illustration: Where to store key 282?
Full!
index
key
1
355
2
3
567
4
233
5
282
6
799
7
h(282,0)=3 h(282,1)=1 h(282,2)=5
Note: Search must use the same probe sequence.

Linear & Quadratic probing Linear probing:
h(k,i)=(h'(k)+i)modm Note: tendency to create clusters.
Quadratic probing:
h(k,i)= h'(k)+c ⋅i+c ⋅i2 modm (12)
Remarks:
• We must ensure that we have a full permutation of ⟨
0, … , m-1 ⟩.
• Secondary clustering: 2 distinct keys have the same
hʹ value, if they have the same probe sequence.

Double hashing
h(k,i)=(h1(k)+i⋅h2(k))modm
Must have h2(k) be “relatively” prime to m to guarantee that
the probe sequence is a full permutation of ⟨0, 1, . . . , m −1⟩.
Examples:
• m power of 2 and h2 returns odd numbers • mprimenumberand1,
• a = denotes a sequence of r+1 elements randomly chosen from {0, 1, … , m – 1}.
The class H defined by:
H=!a {ha}withha(x)=åi=0tor aixi modm
is an universal function.

Cost of Universal Hashing
Theorem:
Using chaining and universal hashing on key k:
• If k is not in the table T, the expected length of the list that k
hashes to is £ a.
• If k is in the table T, the expected length of the list that k
hashes to is £ 1+a.
Proof:
Xk = # of keys (≠k) that hash to the same slot as k. Ckl = I{h(k)=h(l)}; E[Ckl] = Pr{h(k)=h(l)} ≤ 1/m.
#&1 Xk = ∑ Ckl, andE[CXk]= E% ∑ Ckl(= ∑ E[Ckl]≤ ∑ m
l∈T \{k} $l∈T \{k} ‘ l∈T \{k} l∈T∧l≠k If k∉T,E[Xk]≤n/m=α.
Ifk∈T, E[Xk]+1≤(n−1)/m+1=1+α−1/m<1+α. Proof (universal hashing function) Let X = x0, x1,..., xr and Y = y0, y1,..., yr be 2 distinct keys. They differ at (at least) one position. WLOG let 0 be this position. For how many h do X and Y collide? rr ∑aixi ≡∑aiyi(modm) i=0 i=0 r ∑ai(xi −yi)≡0(modm) i=0 r a0(x0 −y0)≡−∑ai(xi −yi)(modm) For any choice of < a1, a2, ... , ar> there is only one choice of a0 s.t. X and Y collide.
#{h that collide} = m × m × … × m × 1
= mr = |H|/m
i=1 ⎛∑r ⎞
a0 ≡⎜⎝− ai(xi −yi)⎟⎠⋅(x0 −y0)−1 (modm) i=1