Topic 6(2): Balancing BST, AVL, Hash Table Balanced Binary Search Trees: AVL Trees (See the note below)
Hash Tables (CLRS ch 11.1-3)
Note: AVL trees is one of the approaches to balanced BSTs. CLRS leaves it as a problem on p333. AVL trees use the same tree rotation method given in CLRS ch 13.2. Good explanations of AVL trees can be found online too, e.g., https://www.geeksforgeeks.org/avl-tree-set-1-insertion/
Copyright By PowCoder代写 加微信 powcoder
Balancing Binary Search Trees
Previous discussion shows that runtime of Insert()/Delete() is O(h). However, in the worst case, the height of the tree is linear Θ(n).
But a balanced or near balanced BST has height O(log n). We need balancing BST
AVL-trees (we focus on this method in this course)
Red-Black trees
both rely on tree rotations with cases how a tree rotation can achieve the
desired effect in each
These maintain the tree “shallow” — i.e., height = O(log(#nodes))
Topic 6(2): Balancing BST, AVL, Hash Table
Balanced Binary Search Tree: Rotations
Two types: Left-rotation and Right-rotation
Right-Rotate: the old-root becomes the right child of the new root. Left-Rotate: the old-root becomes the left child of the new root.
Topic 6(2): Balancing BST, AVL, Hash Table
Topic 6(2): Balancing BST, AVL, Hash Table
Right rotate pseudo-code is symmetric.
(Recall) Definition: Given a rooted tree T, its height is the max path-length from the root to a leaf.
Notation: Given a rooted tree T, we denote hL and hR as the heights of the left subtree and the right-subtree (respectively) of T’s root. If a subtree is nil we say its height is −1.
So T.height = max{T.hL, T.hR} + 1.
Ideally: at every node we have hL = hR
Too restrictive. The only trees that satisfy this are complete trees (all layers are full). Such trees can only hold 2k − 1 nodes.
Definition: An AVL tree is a BST where for any node we have |hL − hR| ≤ 1.
May not be fully balanced, but almost
Topic 6(2): Balancing BST, AVL, Hash Table
Claim: If T is an AVL tree of height h then the number of nodes in T is at least F(h) (= Fibonacci number
Proof: Let N(h) be the minimal number of nodes in a AVL-tree of height h.
Fix h. Fix a tree of size h. It has left- and right- subtrees of heights hL and
hR, and WLOG (without loss of generality) hL ≥ hR.
Hence, hL = h − 1.
Because of AVL-property, hR ≥ hL −1 = h−2.
N(h) = 1+N(hL)+N(hR) ≥ 1+N(h−1)+N(h−2), with N(0) = 1
and N(1) = 2.
This solves to N(h) ≥ F(h) = Θ((1.618…)h).
Corollary: h ≤ O(log(n))
We know that n ≥ F(h) ≥ 1.5h (in fact, F(h) ≈ 1.618h), so
h ≤ log1.5(n) = O(log(n))
So we want to keep our tree with the AVL property.
Note: any tree with 1 or 2 nodes is an AVL-tree (or of height 0 or 1). But there are trees with 3 nodes that aren’t AVL-trees…
Topic 6(2): Balancing BST, AVL, Hash Table
How to Maintain the AVL Property
We can use the two tree rotation operations to balance any subtree that has lost the “height-balance” property after an insertion (or deletion). There are four cases that need to be handled and are depicted on the next two slides
We can write a procedure that will balance a node as required (using Rotation); it can be then shown that the tree is kept an AVL after each Insert and Delete operation and the cost of each of these operations remains Θ(h).
Since in an AVL tree the height is always O(log n) then all the update operations for AVL tree will take at most O(log n).
Topic 6(2): Balancing BST, AVL, Hash Table
How to Maintain the AVL Property
Notice how a tree’s height is reduced. (In the second picture below, the subscriptsof tree lables on the right are incorrectly reversed; just change them back to the same order T0, T1, T2, T3.)
Topic 6(2): Balancing BST, AVL, Hash Table
How to Maintain the AVL Property
Topic 6(2): Balancing BST, AVL, Hash Table
Hash-Table:
A data structure
Supports insert(x) (insert an element pointed to by x), find(key) and remove(x) in time O(1) under reasonable assumptions.
Doesn’t keep the keys sorted. (Good or bad?)
When is it useful?
When there’s a large universe U of potential keys, but we use only n keys.
E.g., think of the warehouse of “Ramazon,” a new online shopping website. You type in a product’s serial number and it checks whether the product exists in the Ramazon’s warehouse.
E.g., student IDs for UofA — I just want to access the record of student #4413928
E.g., think of your C compiler: there are many variables, the universe of potential names for variables and functions is huge — but your code only has a moderate amount of variables and function names.
Topic 6(2): Balancing BST, AVL, Hash Table
Hash-Table Illustration
The main idea: use a easy-to-compute Hash Function that maps
h : U → {1,2…,m}. (Ideally m = n, but we may have m = O(n).)
Topic 6(2): Balancing BST, AVL, Hash Table
Hash-Table:
Since multiple keys may be mapped to the same value
h(key1) = h(key2) = v, T[v] stores a list / an array of elements.
So insert(x), find(key) and remove(x) all work by 1. Compute v = h(key) (or v = h(x.key))
2. Goto T[v] (in O(1))
3. Traverse the list in T[v] to add/search/remove
Runtime: O(computing h(k)) + O(length of list in T [v])
We use a simple h, so assume computing h(k) takes O(1)
The dominating factor is the length of the longest list in T
Since we assume |U| ≫ m we can’t have an injective h.
In fact, worst-case — all n elements have the same hash-value and the table is reduced to a linked-list, ….
Topic 6(2): Balancing BST, AVL, Hash Table
Collision Resolution by Chaining:
Place all elements hashed to the same slot into the same linked list insert(T,x): insert x at the head of list T[h(x.key)];
–(worst-case) runtime O(1) if assuming x is not in T.
search(T,k): search for an element with key k in list T[h(k)];
–runtime proportional to the length of the list
delete(T,x): delete x (a pointer to an object) from the list
T [h(x.key)];
– runtime O(1) if using doubly linked lists
Topic 6(2): Balancing BST, AVL, Hash Table
Analysis of Hashing by Chaining
Load factor: α = n/m, with n elements and m slots – average
#elements per slot.
Simple uniform hashing: Assume any element is equally likely to hash into any of the m slots, independently of where any other element has hashed to.
Assume O(1) time to compute h(k).
Theorem:(CLRS p.259) In a hash table in which collisions are resolved by chaining, under the assumption of simple uniform hashing,
an unsuccessful search takes average-case time Θ(1 + α) a successful search takes average-case time Θ(1 + α).
Its proof (p.260 of CLRS) will be discussed in class.
Implication: Suppose n = O(m). Then,
α = n/m = O(m)/m = O(1) Searching takes constant time on average!
Topic 6(2): Balancing BST, AVL, Hash Table
Hash Functions:
Simple uniform hashing: we typically have no way to check this condition, as we in general rarely know the probability distribution from which the keys are drawn
h that works reasonably well in practice:
The division method: map h(k) = (k mod m)
If distribution of keys is uniform over {1, 2, …, |U |}, then
distribution of h(k)s is ∼uniform on {0, 1, 2, …, m − 1}. Fixed hash functions are subject to adversarial attacks.
Topic 6(2): Balancing BST, AVL, Hash Table
Topic 6(2): Balancing BST, AVL, Hash Table
Universal Hashing (Optional, p.265-268 of CLRS):
At each execution, select the hash function randomly from a carefully designed family of hash functions.
Different behavior even for the same input; the probability of worst-case scenario is small.
Def: A collection of hash functions H = {h : U → [m]} is said to be universal if ∀k,l ∈ U, k ̸= l: Prh∈H[h(k) = h(l)] ≤ 1/m.
I.e., the probability of k and l to be hashed into the same value is no larger than 1/m.
Under the chaining method to resolve collisions, we have
Theorem: Let H be a universal collection of hash functions. Suppose h is chosen randomly from H and has been used to hash n keys into a table T of size m. If key k is not in T, then the expected length E[nh(k)] of the list that k hashed to is ≤ the load factor
α = n/m. If key k is in T, then the expected length E[nh(k)]of the list containing k is ≤ 1 + α.
A Hash Function with Provable Guarantees:
Pick some prime p > |U|
Pick a uniformly at random from {1, 2, …, p − 1} and b uniformly at
random from {0, 1, 2, .., p − 1}
Definehab(k)=(a·k+b) modp modm
Let Hpm be the collection of such functions hab.
Theorem: The family of hash functions Hpm is universal.
Proof (p. 267-268 of CLRS)
Topic 6(2): Balancing BST, AVL, Hash Table
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com