程序代写 Topic 6(2): Balancing BST, AVL, Hash Table 􏰀 Balanced Binary Search Trees:

Topic 6(2): Balancing BST, AVL, Hash Table 􏰀 Balanced Binary Search Trees: AVL Trees (See the note below)
􏰀 Hash Tables (CLRS ch 11.1-3)
􏰀 Note: AVL trees is one of the approaches to balanced BSTs. CLRS leaves it as a problem on p333. AVL trees use the same tree rotation method given in CLRS ch 13.2. Good explanations of AVL trees can be found online too, e.g., https://www.geeksforgeeks.org/avl-tree-set-1-insertion/

Balancing Binary Search Trees
􏰀 Previous discussion shows that runtime of Insert()/Delete() is O(h). 􏰀 However, in the worst case, the height of the tree is linear Θ(n).
􏰀 But a balanced or near balanced BST has height O(log n). 􏰀 We need balancing BST
􏰀 AVL-trees (we focus on this method in this course)
􏰀 Red-Black trees
􏰀 both rely on tree rotations with cases how a tree rotation can achieve the
desired effect in each
􏰀 These maintain the tree “shallow” — i.e., height = O(log(#nodes))
Topic 6(2): Balancing BST, AVL, Hash Table

Balanced Binary Search Tree: Rotations
􏰀 Two types: Left-rotation and Right-rotation
􏰀 Right-Rotate: the old-root becomes the right child of the new root. 􏰀 Left-Rotate: the old-root becomes the left child of the new root.
Topic 6(2): Balancing BST, AVL, Hash Table

Topic 6(2): Balancing BST, AVL, Hash Table
Right rotate pseudo-code is symmetric.

􏰀 (Recall) Definition: Given a rooted tree T, its height is the max path-length from the root to a leaf.
􏰀 Notation: Given a rooted tree T, we denote hL and hR as the heights of the left subtree and the right-subtree (respectively) of T’s root. If a subtree is nil we say its height is −1.
􏰀 So T.height = max{T.hL, T.hR} + 1.
􏰀 Ideally: at every node we have hL = hR
􏰀 Too restrictive. The only trees that satisfy this are complete trees (all layers are full). Such trees can only hold 2k − 1 nodes.
􏰀 Definition: An AVL tree is a BST where for any node we have |hL − hR| ≤ 1.
􏰀 May not be fully balanced, but almost
Topic 6(2): Balancing BST, AVL, Hash Table

􏰀 Claim: If T is an AVL tree of height h then the number of nodes in T is at least F(h) (= Fibonacci number
􏰀 Proof: Let N(h) be the minimal number of nodes in a AVL-tree of height h.
􏰀 Fix h. Fix a tree of size h. It has left- and right- subtrees of heights hL and
hR, and WLOG (without loss of generality) hL ≥ hR.
􏰀 Hence, hL = h − 1.
􏰀 Because of AVL-property, hR ≥ hL −1 = h−2.
􏰀 N(h) = 1+N(hL)+N(hR) ≥ 1+N(h−1)+N(h−2), with N(0) = 1
and N(1) = 2.
􏰀 This solves to N(h) ≥ F(h) = Θ((1.618…)h).
􏰀 Corollary: h ≤ O(log(n))
􏰀 We know that n ≥ F(h) ≥ 1.5h (in fact, F(h) ≈ 1.618h), so
h ≤ log1.5(n) = O(log(n))
􏰀 So we want to keep our tree with the AVL property.
􏰀 Note: any tree with 1 or 2 nodes is an AVL-tree (or of height 0 or 1). But there are trees with 3 nodes that aren’t AVL-trees…
Topic 6(2): Balancing BST, AVL, Hash Table

How to Maintain the AVL Property
􏰀 We can use the two tree rotation operations to balance any subtree that has lost the “height-balance” property after an insertion (or deletion). There are four cases that need to be handled and are depicted on the next two slides
􏰀 We can write a procedure that will balance a node as required (using Rotation); it can be then shown that the tree is kept an AVL after each Insert and Delete operation and the cost of each of these operations remains Θ(h).
􏰀 Since in an AVL tree the height is always O(log n) then all the update operations for AVL tree will take at most O(log n).
Topic 6(2): Balancing BST, AVL, Hash Table

How to Maintain the AVL Property
􏰀 Notice how a tree’s height is reduced. (In the second picture below, the subscriptsof tree lables on the right are incorrectly reversed; just change them back to the same order T0, T1, T2, T3.)
Topic 6(2): Balancing BST, AVL, Hash Table

How to Maintain the AVL Property
Topic 6(2): Balancing BST, AVL, Hash Table

Hash-Table:
􏰀 A data structure
􏰀 Supports insert(x) (insert an element pointed to by x), find(key) and remove(x) in time O(1) under reasonable assumptions.
􏰀 Doesn’t keep the keys sorted. (Good or bad?)
􏰀 When is it useful?
􏰀 When there’s a large universe U of potential keys, but we use only n keys.
􏰀 E.g., think of the warehouse of “Ramazon,” a new online shopping website. You type in a product’s serial number and it checks whether the product exists in the Ramazon’s warehouse.
􏰀 E.g., student IDs for UofA — I just want to access the record of student #4413928
􏰀 E.g., think of your C compiler: there are many variables, the universe of potential names for variables and functions is huge — but your code only has a moderate amount of variables and function names.
Topic 6(2): Balancing BST, AVL, Hash Table

Hash-Table Illustration
The main idea: use a easy-to-compute Hash Function that maps
h : U → {1,2…,m}. (Ideally m = n, but we may have m = O(n).)
Topic 6(2): Balancing BST, AVL, Hash Table

Hash-Table:
􏰀 Since multiple keys may be mapped to the same value
h(key1) = h(key2) = v, T[v] stores a list / an array of elements.
􏰀 So insert(x), find(key) and remove(x) all work by 1. Compute v = h(key) (or v = h(x.key))
2. Goto T[v] (in O(1))
3. Traverse the list in T[v] to add/search/remove
􏰀 Runtime: O(computing h(k)) + O(length of list in T [v])
􏰀 We use a simple h, so assume computing h(k) takes O(1)
The dominating factor is the length of the longest list in T
􏰀 Since we assume |U| ≫ m we can’t have an injective h.
􏰀 In fact, worst-case — all n elements have the same hash-value and the table is reduced to a linked-list, ….
Topic 6(2): Balancing BST, AVL, Hash Table

Collision Resolution by Chaining:
􏰀 Place all elements hashed to the same slot into the same linked list 􏰀 insert(T,x): insert x at the head of list T[h(x.key)];
–(worst-case) runtime O(1) if assuming x is not in T.
􏰀 search(T,k): search for an element with key k in list T[h(k)];
–runtime proportional to the length of the list
􏰀 delete(T,x): delete x (a pointer to an object) from the list
T [h(x.key)];
– runtime O(1) if using doubly linked lists
Topic 6(2): Balancing BST, AVL, Hash Table

Analysis of Hashing by Chaining
􏰀 Load factor: α = n/m, with n elements and m slots – average
#elements per slot.
􏰀 Simple uniform hashing: Assume any element is equally likely to hash into any of the m slots, independently of where any other element has hashed to.
􏰀 Assume O(1) time to compute h(k).
􏰀 Theorem:(CLRS p.259) In a hash table in which collisions are resolved by chaining, under the assumption of simple uniform hashing,
􏰀 an unsuccessful search takes average-case time Θ(1 + α) 􏰀 a successful search takes average-case time Θ(1 + α).
􏰀 Its proof (p.260 of CLRS) will be discussed in class.
􏰀 Implication: Suppose n = O(m). Then,
α = n/m = O(m)/m = O(1) Searching takes constant time on average!
Topic 6(2): Balancing BST, AVL, Hash Table

Hash Functions:
􏰀 Simple uniform hashing: we typically have no way to check this condition, as we in general rarely know the probability distribution from which the keys are drawn
􏰀 h that works reasonably well in practice:
􏰀 The division method: map h(k) = (k mod m)
􏰀 If distribution of keys is uniform over {1, 2, …, |U |}, then
distribution of h(k)s is ∼uniform on {0, 1, 2, …, m − 1}. 􏰀 Fixed hash functions are subject to adversarial attacks.
Topic 6(2): Balancing BST, AVL, Hash Table

Topic 6(2): Balancing BST, AVL, Hash Table
Universal Hashing (Optional, p.265-268 of CLRS):
􏰀 At each execution, select the hash function randomly from a carefully designed family of hash functions.
􏰀 Different behavior even for the same input; the probability of worst-case scenario is small.
􏰀 Def: A collection of hash functions H = {h : U → [m]} is said to be universal if ∀k,l ∈ U, k ̸= l: Prh∈H[h(k) = h(l)] ≤ 1/m.
I.e., the probability of k and l to be hashed into the same value is no larger than 1/m.
􏰀 Under the chaining method to resolve collisions, we have
Theorem: Let H be a universal collection of hash functions. Suppose h is chosen randomly from H and has been used to hash n keys into a table T of size m. If key k is not in T, then the expected length E[nh(k)] of the list that k hashed to is ≤ the load factor
α = n/m. If key k is in T, then the expected length E[nh(k)]of the list containing k is ≤ 1 + α.

A Hash Function with Provable Guarantees:
􏰀 Pick some prime p > |U|
􏰀 Pick a uniformly at random from {1, 2, …, p − 1} and b uniformly at
random from {0, 1, 2, .., p − 1}
􏰀 Definehab(k)=􏰂(a·k+b) modp􏰃 modm
􏰀 Let Hpm be the collection of such functions hab.
􏰀 Theorem: The family of hash functions Hpm is universal.
􏰀 Proof (p. 267-268 of CLRS)
Topic 6(2): Balancing BST, AVL, Hash Table

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts