CS计算机代考程序代写 data structure chain algorithm CSE 440: Advanced Algorithms

CSE 440: Advanced Algorithms
Lecture 7: Hash Tables CLRS Ch. 11 (11.1-11.4)

Direct Address Tables
• Works well when the universe of keys is small
• Compare with • Arrays?
• Linked Lists?

Hash Tables
• Observation:
• if |U| is too large then using “Direct Access table” is
prohibitive expensive (memory)
• If |K| << |U| then too much space is wasted • Dual goal: • space requirement: Q(|K|) space used • Expected time for dictionary operations O(1) • Idea • h: Uà{0,...,m-1} //m = Q(|K|) Hash Tables • Differences in design with direct address tables • Hash functions • Key k is stored in slot h(k). -- we name h(k) the hash value of k • Has to be deterministic!! • Possible collision: efficient hash function should reduce its probability • Collision resolution • Closed addressing • Open addressing Hash Function • Goal: • Simpleuniformhashing:eachkeyisequallylikelytohashtoanyofthe m slots, independently of where any other key has hashed to. • Challenges • Requiresunderstandingthedistributionfromwhichkeysaredrawn • Keysmightnotbedrawnindependently • Division method • Interpretkeysasnaturalnumbers • H(k) =kmodm • Simplesolution,butmostlyeffectiveifmisprimenotooclosetoan exact power of 2 • More complex solutions (used for security more than balancing) • Universalhashing • SHA-256 Closed Addressing (Chaining) • Insert: O(1) if no check for duplicates • Delete: O(1) if doubly linked list and input is element not key • Main issue is in search Analysis of Chaining • Letn=|K|andm=sizeofarray • Load factor: a(T) = a = n/m is load factor • Note: because of chaining, a(T) may be greater than 1 • In fact, a(T) is a good representation for the number of elements stored in a chain • Analysis of search • Worst case? • Best case? • Average case? Analysis of Chaining • Assume uniform hashing: any given element is equally likely to hash into any of the m slots • nj is the length of the list T[j] • E[nj] = a • If key does not exist • Average case time = Q(1+a) • What if key exists? • WewillshowinthenextslidesthatitisalsoQ(1+a) • Ifn=O(m) • Q(1+a) = O(1) Analysis of Chaining • Number of elements examined • Number of elements appear before x in x’s list • Which is: number of elements inserted after x • Definitions • Xi = “i-th element added to the table” • ki = xi.key • Xij = I{h(ki) = h(kj)} • Thus • E[Xij] = Pr(h(ki) = h(kj)) = 1/m • Expected number of elements examined Analysis of Chaining Which is Q(1+a) Open Addressing • Probing within the array itself • No pointers at all • a is never greater than 1 • Hash function includes probe number • h: U ́ {0,1,...,m-1} à {0,1,...,m-1} • Probe sequence: •
• Has to be a permutation of <0, 1, m-1>
• Better if uniform hashing: equally likely to be any of the m! permutations

What About Deletion?
• Cannot simply replace entry with Nil • Probing will not work.
• Delete(T,k)
i = hash-search(T,k)
T[i] = “Delete”
• We modify Hash-insert accordingly:
T[j] == nil or T[j] == “Delete”

Probe Sequencing
• Linear probing
• h(k,i)=(h’(k)+i)modm
• Maindrawback:primaryclustering
• long runs of occupied cells, which increases average search time
• Quadratic probing
• h(k,i) = (h’(k) + (c1*i)+ (c2 *c2*i) ) mod m • Maindrawback:secondaryclustering
• Initial probe determines the entire sequence.
• Double hashing
• h(k,i) = (h1(k) + i*h2(k) ) mod m
• Better performance: more possible probe sequences, less clustering • Still not as ideal as uniform hashing

Analysis of Open Addressing
• Assume uniform hashing
• Not true for our probing techniques, but simplifies
analysis
• Unsuccessful search: E[number of probes] ≤ 1/(1-α)
• Informal justification (formal proof in textbook – skip) • 1/(1-α)=1+α+α2 +α3 +…
First probe always happen
Second probe happens with probability α
Third probe happens with probability α2
• Successful search is even better
• E[X] ≤ (1/ α) ln(1/(1-α)) (proof skipped)

Back to Dynamic Tables
Discussion

Discussion
• What about hash tables with closed addressing (e.g., chaining)?
• Sameidea,withslightchangeofdefinitionof⍺
• Expand/shrink table based on threshold of bucket sizes • Examples:
• Maintain per-bucket/per-table metadata with each insert/delete
• Sample the sizes of randomly selected buckets
• Possibleimprovement
• Resize each bucket independently
• Example:
Kevin Williams, Joe Foster, Athicha Srivirote, Ahmed Hassan, Joseph Tassarotti, Lewis Tseng, Roberto Palmieri “On Building Modular and Elastic Data Structures with Bulk Operations”, ICDCN’21
• What about hash tables with open addressing?
• Nodifferenceforexpansion
• Contractiondependsonthewaywehandledeletes

Discussion
• What about concurrent dynamic tables
• Although expansion cost is amortized, it blocks any concurrent
operation until it is finished. • Solutions?
• Amortize the actual resizing not its cost
• Allocate new table eagerly, move data to new table lazily
• Key idea: resizing is a structural change not a semantic change.
• Depends on implementation details
• Example: Yujie Liu, Kunlong Zhang, and Michael Spear “Dynamic-Sized Nonblocking Hash Tables”, PODC’14
• Freeze objects in old hash table
• A new operation (insert/delete/lookup) on a frozen object x first copies x to
• Another resize first completes the previous resize (hopefully by then most elements will be already moved).
the new hash table

Thanks!