Topic 6(1): Quicksort, Sorting Lower Bound, BST
Quicksort and Analysis (CLRS ch 7.1-2, 7.4)
Sorting Lower Bounds (CLRS ch 8.1)
Binary Search Trees (BST): a review (CLRS ch 12.1-3)
Copyright By PowCoder代写 加微信 powcoder
QuickSort: Another sorting meets divide-and-conquer
The ideas:
Pick one key (pivot), compare it to all others.
Rearrange A to be: [elements ≤ pivot, pivot, elements > pivot]
Recursively sort subarrays before and after the pivot. Pseudocode:
procedure QuickSort(A,p,r)
∗∗ sorts the subarray A[p, …, r] if (p < r) then
q← Partition(A,p,r)
∗∗ Partition returns q such that
∗∗ (i) A[q] = pivot,
∗∗ (ii) All elements ≤ pivot appear in A[p, ..., q − 1] ∗∗ (iii) All elements > pivot appear in A[q + 1, …, r]
QuickSort(A, p, q − 1) QuickSort(A, q + 1, r)
How will you prove QuickSort’s correctness?
Byinductiononn=#elementsinA=r−p+1
Yourbasecaseiswhenn=0andn=1. (Why?)
Induction step is easy if we know Partition() is correct
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Partition(A, p, r):
procedure Partition(A,p,r)
** last element, A[r], is the pivot key picked of the partition pivot ← A[r]
i ← p−1 ∗∗ i is the location of the last element known to be ≤ pivot for (j from p to r−1) do
if (A[j] ≤ pivot) then i←i+1
exchange A[i] ↔ A[j] exchange A[i + 1] ↔ A[r]
return i+1
Example: A[1..8] = {3,1,7,6,4,8,2,5}, p = 1, r = 8, pivot = A[8] = 5
Topic 6(1): Quicksort, Sorting Lower Bound, BST
31764825 31764825 31764825 31764825 31764825 31467825 31467825 31467825 31427865 31425867
i = 4, j = 8 (for-loop ended)
Partition(A, p, r):
The pivot happens to be A[r].
Works by traversing the keys of the array from p to r − 1. j indicates the current element we are considering.
i is the index of the last known element which is ≤ pivot.
The invariant: i
A[j..(r − 1)] contains keys yet to be compared to pivot A[r] is the pivot
A[j] is the current key
If A[j] > pivot — no need to change i or exchange keys as the invariant is
maintained
If A[j] ≤ pivot, exchange A[j] with the first larger-then-pivot element
(A[i + 1]) and increment i to maintain the fact that A[p, …, j] is built from
two consecutive subarrays of elements ≤ pivot and elements > pivot
At the end, exchange A[r] ↔ A[i + 1] such that:
A[p..i] contains keys ≤ pivot
A[i + 1] contains pivot
A[(i + 2)..r] contains keys > pivot
Topic 6(1): Quicksort, Sorting Lower Bound, BST
QuickSort Analysis
Why we study QuickSort and its analysis: very efficient, in use
divide-and-conquer
huge literature
a model for analysis of algorithms
a terrific example of the usefulness of randomization
Observations:
(Again) key comparison is the dominant operation
Counting KC — only need to know (at each call) the rank
of the split key
Topic 6(1): Quicksort, Sorting Lower Bound, BST
QuickSort recursion tree:
root = pivot, left and right children = trees for the first and second recursive calls
An example:
H hhh
02 05 07 12
Topic 6(1): Quicksort, Sorting Lower Bound, BSTanalysis
More observations:
In the resulting recursion tree, at each node
(all keys in left subtree) ≤ (key in this node) < (all keys in right
QuickSort recursion tree −→ binary search tree
QuickSort Running Time
Like before - dominated by #KC.
The pivot is compared with every other key: (n − 1) KC
Recurrence:
0, n≤1 T(n1)+T(n−1−n1)+(n−1), n≥2
where0≤n1 ≤n−1
This raises the question: how do we estimate what n1 is going to be?
There is no single answer.
Topic 6(1): Quicksort, Sorting Lower Bound, BSTanalysis
QuickSort WC running time:
Recurrence: T(n) =
0, n≤1 T(n1)+T(n−1−n1)+(n−1), n≥2
Notice that when both subarrays are non-empty, then #KC in next level is
(n1 −1)+(n−1−n1 −1) = (n−3)
But when one subarray is empty then #KC in next level is (n − 2).
WC recurrence:
T (n) = T (0) + T (n − 1) + (n − 1) = T (n − 1) + (n − 1),
Solving the recurrence — Master Theorem doesn’t apply
T(n) = T(n−1)+(n−1)=T(n−2)+(n−2)+(n−1) = ...
= T(1)+1+2+...+(n−1)
So, T (n) ∈ Θ(n2)
What is a worst-case instance for QuickSort?
Topic 6(1): Quicksort, Sorting Lower Bound, BSTanalysis
QuickSort Almost-WC running time:
Recurrence: T(n) =
0, n≤1 T(n1)+T(n−1−n1)+(n−1), n≥2
Let’s try an almost-WC situation: At every step, we find a pivot for whichn1 =n−2.
WC recurrence:
T (n) = T (1) + T (n − 2) + (n − 1) = T (n − 2) + (n − 1),
Solving the recurrence —
T(n) = T(n−2)+(n−1)=T(n−4)+(n−3)+(n−1) = ...
= T(1)+1+3+...+(n−3)+(n−1)
ClearlyT(n)≤n2(n−1)∈O(n2),andalsoT(n)≥n4 ·n2 ∈Ω(n2). So, T(n) ∈ Θ(n2).
Topic 6(1): Quicksort, Sorting Lower Bound, BSTanalysis
QuickSort BC running time:
Recurrence: T(n) =
0, n≤1 T(n1)+T(n−1−n1)+(n−1), n≥2
When both subarrays are non-empty, we are saving 1 KC ...
Best case: each partition is a bipartition !!! Saving as many KC as possible every level ... The recursion tree is as short as possible ...
Recurrence:
Solving the recurrence — Master Theorem T (n) = 2T ( n2 ) + (n − 1)
solves to Θ(n log n).
Question: What is the best case array for the case of {1, 2, ..., 7}?
T(n) = 2×T(n−1)+(n−1) 2
Topic 6(1): Quicksort, Sorting Lower Bound, BSTanalysis
QuickSort Almost-BC running time:
Let’s assume the at each round we get an approximated bipartition. Namely, each split is 34 n and 14 n, resulting in recurrence
T (n) = T ( 3n ) + T ( n ) + cn 44
where, for simplicity, we just write cn to mean linear time in n
We may even consider a more extreme case where split is 9 n and
1 n, resulting in recurrence 10
T (n) = T ( 9n ) + T ( n ) + cn 10 10
Topic 6(1): Quicksort, Sorting Lower Bound, BSTanalysis
In both cases, the running time is O(n log n), and in fact Θ(n log n) considering time Ω(n log n) for BC.
In fact, for any split of constant proportionality, the running time remains to be Θ(n log n) (See CLPR p.175-176).
QuickSort Average Case running time:
The recurrence for running time is:
0, if n = 0, 1
T(n)= T(n1)+T(n−1−n1)+(n−1), if n≥2
Average case: always ask “average over what input distribution?”
Here, we assume each possible input equiprobable, i.e. uniform distribution.
Key observation: equiprobable inputs imply for each key, rank among keys so far is equiprobable.
So, n1 can be 0,1,2,...,n−2,n−1, with the same probability 1
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Solving T(n):
AsPr[n1 =i]= n1 foreveryi,weget
T(0) = 0,T(1) = 0,
T(2) = (2−1)+ 12 (T(0)+T(2−1−0))+ 21 (T(1)+T(2−1−1)), T(n) = (n−1)+ n1 (T(0)+T(n−1))
+ n1 (T(1)+T(n−2)) +...
+ n1 (T(n−2)+T(1))
+ n1 (T(n−1)+T(0)) n−1
=(n−1)+n2 T(i). i=0
Master Theorem does NOT apply here, but it can be shown
T(n) ≤ 2(n + 1)[H(n + 1) − 1], where the Harmonic number n
H(n)=1i =lnn+γ,whereγ≈0.577···. i=1
(The details are omitted here, but see, e.g., https://en.wikipedia.org/ w/index.php?title=Quicksort&action=edit§ion=13)
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Sorting Algorithms: Running Time Comparison
InsertionSort
Θ(n log n)
Θ(n log n)
Θ(n log n)
Θ(n log n)
Θ(n log n)
Θ(n log n)
Random QuickSort
Θ(n log n)
Θ(n log n)
What is “∗”?
QuickSort Improvement and space requirement:
QuickSort is considered an in-place sorting algorithm:
extra space required at each recursive call is only constant.
whereas in MergeSort, at each recursive call up to Θ(n) extra
space is required.
To improve the algorithm we use a random pivot
Difference between a Randomized Algorithm and AC Analysis
AC analysis means we make an assumption on the input
No guarantee that the assumption holds
Input is chosen once: on avg we might have a good running
time, but once input is given our running time is determined. A randomized algorithm works for any input (WC analysis)
Randomness in the coins we toss (not in the input) so we control the distribution of the coin toss
We can always start the algorithm anew if it takes too long; or run it multiple times in parallel
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Randomized QuickSort
We invoke RandomPartition rather than Partition
Pseudocode
procedure RandomPartition(A,p,r)
i← uniformly chosen random integer in {p,...,r} exchange A[i] ↔ A[r]
return Partition(A,p,r)
Q: How do we analyze the #KC of the Random QuickSort? Depends on which pivot we pick, we can compare any two elements. And of course, there is a chance we pick the worst-pivot (last element) in every iteration...
A: We analyze the expected WC #KC (As always, proportional to Expected WC running time)
Conclusion: the expected WC running time for random quicksort is Θ(n log n). (We ignore the details in this course.)
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Sorting lower bound:
So far: we looked at BC runtime — for lower bounds purposes. They serve as lower bounds, for specific algorithms.
E.g., “Even in the best case, my algorithm makes KC.” So this is a lower bound of the form
∃ algoritm A s.t. ∀ input I, runtime(A(I)) ≥ ....
We now give a lower bound for the problem of sorting.
A lower bound on any algorithm for sorting - even those not invented yet. This is a lower bound of the form
∀ algoritm A ∃ input I, runtime(A(I)) ≥ ....
Q: Can we derive a lower bound of the form
∀ algoritm A and ∀ input I, runtime(A(I)) ≥ ....?
A: Not a very informative bound, since for every input I0 we can always “massage” any algorithm into an algorithm that first checks for I0.
If (input= I0) return solution(I0) else ... (run original algorithm)
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Two useful trees in algorithm analysis:
1. Recursion tree
node ←→ recursive call
describes algorithm execution for one particular input by showing all
calls made
one algorithm execution ←→ all nodes (a tree)
useful in analysis: sum the numbers of operations over all nodes
H hhh
02 05 07 12
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Two useful trees in algorithm analysis:
2. Decision tree
node ←→ algorithm decision
describes algorithm execution for all possible inputs by showing all
possible algorithm decisions
one algorithm execution ←→ one root-to-leaf path
useful in analysis: sum the numbers of operations over nodes on one
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Sorting lower bound:
Consider comparison-based sorting algorithms. These algorithms map to decision trees (nodes have exactly 2 children).
Binary tree facts:
Suppose there are t leaves and k levels. Then,
So,logt≤(k−1)
Equivalently, k ≥ 1 + log t
— binary tree with t leaves has at least (1 + log t) levels Comparison-based sorting algorithm facts:
Look at its Decision Tree. It’s a binary tree.
It should contain every possible output: every permutation of the
positions {1,2,...,n}.
So, it contains at least n! leaves ...
Equivalently, it has at least 1 + log(n!) levels.
A longest root-to-leaf path of length at least log(n!).
So in the worst case, the algorithm makes at least log(n!) KC, and
log(n!) ∈ Θ(n log n)
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Now let us turn to some data structures and study their role in the design of algorithms:
Binary Search Trees (BST): a review (CLRS ch 12.1-3) Balanced Binary Search Trees: AVL Trees
Hash Tables (CLRS ch 11.1-3)
Binary Search Trees
Definition: A rooted tree is a data-structure defined recursively as:
The empty rooted tree, nil
A special node, the root, which has children — pointers to distinct and
unique rooted trees.
Unique: A non-nil tree cannot be pointed more than once
A tree is implemented by a doubly linked list. A node contains pointers to its child nodes and parent (nil if non-existent) and a node holds a key (possibly other satellite data); if needed there is also an attribute T.height. (review CLRS ch 10.2 for linked lists).
Anodeuisadescendantofnodev(orthatnodevisanancestorofu)if there’s a path from v.root to u that uses only child-pointers (and there’s a path from u.root to v that uses only parent pointers).
A leaf is a rooted tree with no children (all children are nil)
A binary rooted tree is a rooted tree in which all nodes have at most two
children, left and right.
The length (= # edges) of the longest path from the root of the tree to a
leaf is called the height of the tree.
The i-th layer in a tree is the set of all nodes whose distance (= # edges) to the root is precisely i.
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Binary Search Trees
Definition: A binary rooted-tree is called a binary search tree if for every node v we have the following two properties
1. All keys stored in the v’s left subtree are ≤ v.key 2. All keys stored in the v’s right subtree are > v.key
On the same set of nodes/keys, there could be many binary trees of different heights.
Like in the case of heaps: n ≤ 2height+1 − 1…
…but (as opposed to heaps) it is wrong to assume 2height ≈ n
In fact, in the worst case height ≈ n
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Binary Search Trees
In a BST, finding a key x is rather simple:
procedure Find(T,x) if (T =nil) then
return nil
if (T.root.key = x) then
return T ∗∗ alternatively, return T.root else if (x < T.root.key) then
Topic 6(1): Quicksort, Sorting Lower Bound, BST
return Find(T.root.left,x) else
return Find(T.root.right,x)
Runtime for tree of height h is T (h) =
I.e., runtime is O(h).
O(1) + T(h − 1),
if h = 0 o/w
Note: While we use T to denote a BST and T.root to refer to the root node of T, CLRS uses a root node to represent a tree, e.g., on p.290, the procedure TREE-SEARCH(x, k) is to find key k in the BST whose root node is x.
Binary Search Trees: Find Min
Of particular importance is finding the min/max in a tree. procedure FindMin(T)
∗∗ precondition: T isn’t nil
if (T.root.left =nil) then
return T ∗∗ alternatively, return T.root else
return FindMin(T.root.left)
How do we prove correctness of this code?
Clearly, by induction. But on what?
Answer: induction on hL = the height of the left subtree.
Base case: hL = 0. This means that there are no left descendants. So, by
definition, all keys in the right subtree are greater than T.root.key so by
returning T.root we return the elements with the smallest key.
Induction step: Fix any hL ≥ 0. Assuming that for any tree with left-height hL FindMin() returns the min key of this tree, we show FindMin() returns
the minimum of any tree whose left-height is hL + 1.
Let T be any tree whose left-subtree’s height is hL + 1 > 0. So FindMin(T) returns FindMin(T.root.left), and by IH we return the smallest of all the keys that are ≤ T.root.key. All other keys in the tree are either T.root.key or the keys stored in the right subtree which are greater than T.root.key. Hence, the minimum of the left subtree is indeed the minimum of the whole tree.
What’s the code for FindMax(T)?
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Binary Search Tree: Insert
In a BST, inserting a key x is done at the leaf:
procedure Insert(T,x) T new ← a new tree Tnew.root ← x Tnew.root.left ←nil Tnew.root.right ←nil if (T =nil) then
∗∗ if this is the first item in the tree.
replace T with Tnew else
InsertTree(T,Tnew)
procedure InsertTree(T,Tnew)
∗∗ precondition: T isn’t nil
if (Tnew.root.key ≤ T.root.key) then
if (T.root.left =nil) then T.root.left ← Tnew
T new.root.parent = T
else InsertTree(T.root.left,Tnew)
else ∗∗ I.e., Tnew.root.key > T.root.key if (T.root.right =nil) then
T.root.right ← Tnew
T new.root.parent = T else
InsertTree(T.root.right,Tnew)
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Insert() takes O(height of T).
The correctness statement: After invoking Insert(T,x) is resulting tree is a BST that contains all previous keys and x.
Binary Search Tree: Delete
How to do deletion of a node x?
Easy case: x is a leaf – remove it. Done.
How to delete a node with only a single child?
What about deleting a node with two non-nil children?
Find a different node y that can replace x — delete y and put y instead of x.
Which node y shall we look for? The largest in the left subtree (or the smallest in the rightsubtree, both works), and delete y.
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Binary Search Tree: Delete
An example of the simple deletion case:
Topic 6(1): Quicksort, Sorting Lower Bound, BST
An example of the more complex deletion case:
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Binary Search Tree: Delete
(WC) Runtime Analysis?
Na ̈ıve first attempt: write the runtime as a recursive relation, T(h) = O(1) + O(h) + T(l), where l = depth of max node (with T (0) = O(1)).
But note: invoking the recursive call DeleteRoot() on the largest element in the left-subtree means that this node must not have a right child!
Hence, the recursive call is invoked at most once.
Thus, T(h) = O(1) + O(h) + O(1) = O(h).
Exercise: Depict the tree after each of instruction:
(Delete(x) refers to DeleteRoot(Find(x)))
Insert(1), Insert(2), Insert(3), Insert(5), Insert(4), Delete(1), Delete(5), Delete(3), Insert(1), Insert(5), Delete(2)
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Topic 6(1): Quicksort, Sorting Lower Bound, BST
Binary Search Tree: Outputting the Sorted Sequence
How to output all keys held in the tree from smallest to largest? In-order traversal:
procedure In-Order(T) if (T ̸=nil) then
In-Order(T .root.lef t) Print(T.root.key) In-Order(T.root.right)
Runtime: T(n) = O(1)+T(nL)+T(nR) when nL,nR denote the number of nodes on the left/right subtree respectively. (Of course, T (0) = O(1)).
Solves to T (n) = O(n).
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com