CS代写 NOV 1993)

Lecture 8: Quicksort and Linear-Time Selection
Divide and Conquer Randomized Algorithms

1. Quicksort & Partition

Copyright By PowCoder代写 加微信 powcoder

2. Randomization
3. Analysis of Randomized Quicksort 4. Randomized Selection

Quicksort: The “dual” of merge sort
Mergesort(𝐴, 𝑝, 𝑟):
if 𝑝=𝑟 then return 𝑞← (𝑝+𝑟)/2 Mergesort(𝐴, 𝑝, 𝑞) Mergesort(𝐴, 𝑞 + 1, 𝑟) Merge(𝐴, 𝑝, 𝑞, 𝑟)
First call: Mergesort(𝐴,1,𝑛)
52471326 5247 1326 2457 1236 12234567

Quicksort: The “dual” of merge sort [II]
Mergesort(𝐴, 𝑝, 𝑟):
if 𝑝=𝑟 then return 𝑞← (𝑝+𝑟)/2 Mergesort(𝐴, 𝑝, 𝑞) Mergesort(𝐴, 𝑞 + 1, 𝑟) Merge(𝐴, 𝑝, 𝑞, 𝑟)
First call: Mergesort(𝐴,1,𝑛)
Quicksort(𝐴, 𝑝, 𝑟):
if 𝑝≥𝑟 then return 𝑞 = Partition(𝐴, 𝑝, 𝑟) Quicksort(𝐴, 𝑝, 𝑞 − 1) Quicksort(𝐴, 𝑞 + 1, 𝑟)
First call: Quicksort(𝐴,1,𝑛)
52471263 2123 5476 1223 4567 12234567

Quicksort: The “dual” of merge sort [III]
Quicksort chooses item as pivot.
It partitions array so that all items less than or equal to pivot are on the left and all items greater than pivot on the right.
It then recursively Quicksorts left and right sides.
52471263 2123 5476 1223 4567 12234567
Quicksort(𝐴, 𝑝, 𝑟):
if 𝑝≥𝑟 then return 𝑞=Partition(𝐴,𝑝,𝑟) //new pivot position Quicksort(𝐴, 𝑝, 𝑞 − 1)
Quicksort(𝐴, 𝑞 + 1, 𝑟)
First call: Quicksort(𝐴,1,𝑛)

Partition with the last element as the pivot
Partition(𝐴, 𝑝, 𝑟): 𝑥←𝐴𝑟
for 𝑗←𝑝 to 𝑟−1
if 𝐴𝑗 ≤𝑥 then 𝑖←𝑖+1
swap 𝐴[𝑖] and 𝐴 𝑗 swap 𝐴𝑖+1 and 𝐴𝑟
return 𝑖+1 //pivot position
Working space: (in-place algorithm)

Partition: Example

Running time.
Pivot selection is crucial
 [Best case.] Select the median (middle value) element as the pivot: quicksort runs in time (problem is that we don’t know the median).
 [Worst case.] Select the smallest (or the largest) element as the pivot: quicksort runs in 􏰈 time.
Q: How to make running time independent of input A: Randomly choose an element as the pivot
(swap it with last item in array and then run partition)
This is what’s known as a randomized algorithm:
algorithm makes a random choice each time it chooses a pivot.
It might help to think of the algorithm as trying to fool an evil adversary who is attempting to create a bad input by forcing bad pivots all the time. By making random pivot choices, the algorithm makes it impossible for the adversary to force a bad pivot, since the adversary can’t know which key will be chosen as a pivot.

Analysis for Randomized Algorithms
Worst case almost never happens: Every pivot would have to be minimum or maximum; occurs with very low probability.
Expected running time of any input of size .
Average case analysis
Used for deterministic algorithms
Assume the input is chosen randomly from some distribution
Depends on assumptions on the input, weaker
Expected case analysis
Used for randomized algorithms
Need to work for any input
Randomization is inherent within the algorithm, stronger
Our Randomized QuickSort Analysis

Analysis of Randomized Quicksort: The binary tree representation
Assumption: All elements are distinct – Relabel the elements from small to large as z1, z2, …, zn. We measure the running time based on the average number of comparisons performed.
Create a Binary tree corresponding to the Quicksort partitioning. The root of the tree will be the first pivot. All items to the left (right) of the root will be in its left (right) subtree.

Analysis of Randomized Quicksort: The binary tree representation
Create a Binary tree corresponding to the Quicksort partitioning
Pivots will be nodes: items in subarray to left of pivot will be in left subtree; items to right of pivot, in right subtree

Analysis of Randomized Quicksort: The binary tree representation

Analysis of Randomized Quicksort: The binary tree representation

Analysis of quicksort: The binary tree representation
Keep going.
At the end we have a binary search tree that corresponds to the choice of pivots
z1 z2 z3 z4 z5 z6 z7 z8 z9 z10 z12 z11 z13 z14 z15 z16 z17

Analysis of quicksort: The binary tree representation
The tree structure completely encodes the running of quicksort.
LetTi bethesubtree rooted at zi
Ex: T5 is the subtree rooted at z5
(*) 𝑧􏰌 is compared with 𝑧􏰙 by Qsort if and only if
(**) in the tree
𝑧􏰌 is an ancestor of 𝑧􏰙
or 𝑧􏰙 is an ancestor of 𝑧􏰌
This means that when zi was a pivot, its subarray contained exactly the items in Ti .
Those items are then partitioned around zi (corresponding to being placed in the left and right subtrees).

Summary of Observations
Elements zi and zj are compared at most once in Qsort iff in the binary tree:
1. zi is an ancestor of zj, i.e., zi is the first pivot chosen among {zi, zi+1, …, zj}, or
2. zj is an ancestor of zi i.e., zj is the first pivot chosen among {zi, zi+1, …, zj}
Otherwise, if another element zk {zi, zi+1, …, zj} is chosen as a pivot, zi and zj will go to different subtrees (subarrays) rooted at zk and never be compared.
1. Define Indicator Random Variables: Then # of comparisons is
2. By Linearity of Expectation
E[# of comparisons made by Qsort] 􏰌􏰝􏰙 􏰌􏰙
3. Will show that
Pr[zi,zj compared = 2/(j-i+1).
4. And also show
E[#ofcomparisonsmadebyQsort] 􏰌􏰝􏰙2/(j−i+1) = (nlog ).
if 􏰌 is compared with 􏰙 􏰌􏰝􏰙 zi, zj compared

Analysis of quicksort
(*) zi and zj are compared iff zi or zi are chosen as the first pivot among {zi, zi+1, …, zj}
Consider the running of Quicksort.
1. For any pair i,j, consider first time that an item in Z= {zi, zi+1, …, zj} is chosen as a pivot.
2. At each step of Qsort, every item in the current subarray is equally likely to be chosen as a pivot.
3. Because Qsort chooses pivots randomly, when the first pivot in Z is chosen it is equally likely to be any item in Z (which has size j-i+1).
=> 𝐏𝐫 𝒛𝒊 𝒊𝒔𝒇𝒊𝒓𝒔𝒕𝒑𝒊𝒗𝒐𝒕𝒄𝒉𝒐𝒔𝒆𝒏𝒊𝒏𝒁 = 𝟏 = 𝑷𝒓 𝒛𝒋 𝒊𝒔𝒇𝒊𝒓𝒔𝒕𝒑𝒊𝒗𝒐𝒕𝒄𝒉𝒐𝒔𝒆𝒏𝒊𝒏𝒁 𝒋−𝒊+𝟏
=> 𝑷𝒓 𝒛𝒊 𝒂𝒏𝒅 𝒛𝒋 𝒂𝒓𝒆 𝒄𝒐𝒎𝒑𝒂𝒓𝒆𝒅 = 𝐏𝐫[𝑬𝒊𝒕𝒉𝒆𝒓 𝒛𝒊 𝒐𝒓 𝒛𝒋 𝒂𝒓𝒆 𝒇𝒊𝒓𝒔𝒕 𝒑𝒊𝒗𝒐𝒕 𝒄𝒉𝒐𝒔𝒆𝒏 𝒊𝒏 𝒁] 𝟐
4. E[# of comparisons made by Qsort] = ∑􏰌􏰝􏰙 𝐸[𝑋􏰌􏰙] = ∑􏰌􏰝􏰙 Pr[zi, zj compared] = 􏰌􏰝􏰙

Analysis of quicksort (continued)
Theorem. Expected # of comparisons is
and then show that each of the n rows has sum
􏰈 􏰌􏰝􏰙 􏰙􏰓􏰌􏰑􏰆
Pf. Remains to show that
 Idea is to write out the summation as items in an upper triangular matrix
Since every row has sum 𝑂(log 𝑛) the entire matrix has sum 𝑂(n log 𝑛).
………………
2(1+1/2+…1/n)-1 1+1/2+…1/n is the harmonic series (logn)

Quicksort in practice
Why does quicksort work very well in practice?
time in expectation on any input
– Actually, it’s time with very high probability
Small hidden constants and Cache-efficient
Even though it has a 􏰈 worst-case running time it beats the
worst case Mergesort “on average’’
Real Algorithmic Engineering
 Start with quicksort
 When recursion is too deep (say, ), switch to insertion
sort or heap sort (discussed later)
 Use better ways of choosing “random” pivot
 Implemented in C++ Standard Template Library (STL)
 For details, see
Engineering a Sort Function, J. L. Bentley & M.D. McIlroy
SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 23(11), 1249–1265 (NOV 1993)

Randomized Selection: Main Idea
Selection. Given an unsorted array and an integer , return the -th smallest element of .
• Trivial cases: If i=1, return the minimum. If i=n, return the maximum. • Special Case: If i=n/2, return the median.
3. After partitioning, pivot 𝑥 will be at known location 𝑞.
Choose a Pivot 𝑥 from 𝐴[1. . 𝑛] Partition 𝐴 around 𝑥. (linear time)
Then 𝑥 is the actual solution
Then the 𝑖-th element of 𝐴[1..𝑛] is the 𝑖-th element of 𝐴 1..𝑞 − 1 . Solve recursively
Then the𝑖-th elementof𝐴[1..𝑛] isthek=𝑖−𝑞􏰧𝑡h elementof𝐴𝑞+1..𝑛. Solve recursively

Example of Randomized Selection
Call Selection(A, 1, 15, 10)
p = 1, r = 15, i = 10, q = 13, k = 13
return Selection(A, 1, 12, 10)
Goal is to find 10th item in A[1..15].
Current problem is to find 10th item in A[1..15]
Yellow item is current pivot; Grey items have been thrown away. Search is only done in white (+yellow) subarray.
After Pivoting, problem is reduced to finding 10th item in A[1..12]

p = 1, r = 12, i = 10, q = 5, k = 5
Example (cont.)
Call Selection(A, 1, 12, 10)
return Selection(A, 6, 12, 5)
Goal is to find 10th item in A[1..15].
Current problem is to find 10th item in A[1..12]
Yellow item is current pivot; Grey items have been thrown away. Search is only done in white (+yellow) subarray.
After Pivoting, problem is reduced to finding 5th item in A[6..12]

p = 6, r = 12, i = 5, q = 9, k = 4
Call Selection(A, 6, 12, 5)
return Selection(A, 10, 12, 1)
Goal is to find 10th item in A[1..15].
Current problem is to find 5th item in A[6..12]
Yellow item is current pivot; Grey items have been thrown away. Search is only done in white (+yellow) subarray.
After Pivoting, problem is reduced to finding 1st item in A[10..12]

p = 10, r = 12, i = 1, q = 11, k = 2
Call Selection(A, 10, 12, 1)
return Selection(A, 10, 10, 1)
Goal is to find 10th item in A[1..15].
Current problem is to find 1st item in A[10..12]
Yellow item is current pivot; Grey items have been thrown away. Search is only done in white (+yellow) subarray.
After Pivoting, problem is reduced to finding 1st item in A[10..10]

p = 10, r = 10, i = 1
Call Selection(A, 10, 10, 1)
return A[10]
Goal is to find 10th item in A[1..15].
Current problem is to find 1st item in A[10..10]
Yellow item is current pivot; Grey items have been thrown away. Search is only done in white (+yellow) subarray.
After Pivoting, problem is solved!

Randomized Selection: Pseudocode
Selection. Given an array of distinct elements and an integer , return the -th smallest element of . Examples:
Goal: Want to do better than sorting, i.e., linear time.
Select(𝐴, 𝑝, 𝑟, 𝑖): 𝑖-th smallest element of 𝐴[𝑝. . 𝑟] if 𝑝 = 𝑟 then return 𝐴[𝑝]
Randomly choose an element in 𝐴[𝑝..𝑟]
as the pivot and swap it with 𝐴[𝑟]
𝑞 ← Partition(𝐴, 𝑝, 𝑟) //pivot position in 𝐴 𝑝. . 𝑟 after partition
𝑘 ← 𝑞 − 𝑝 + 1 //relative position of pivot in 𝐴[𝑝..𝑟]
if 𝑖 = 𝑘 then return 𝐴[𝑞]
else if 𝑖 < 𝑘 then return Select(𝐴, 𝑝, 𝑞 − 1, 𝑖) else return Select(𝐴,𝑞+1,𝑟,𝑖−𝑘)// //relative position of i in 𝐴[𝑞+1..𝑟] First call: Select(𝐴,1,𝑛,𝑖) Mathematical Revision Suppose you have a fair coin with probability 􏰆􏰈 of turning up Heads and 􏰆􏰈 of Tails. Start flipping. Let X be the number of flips until a Head is seen (we called it waiting time in previous lecture). We already saw E[X]= 1/p = 2. Keep flipping, marking every time new Head is seen. Let stage i be the time between the -th Head and the ( +1)-st Head (not including i-th and including (i+1)st), =0, 1, 2,... X0 = length of stage 0 = # of flips until first head is seen Xi = length of stage i = # of flips from after ith Head is seen until (i+1)st Head is seen. From above, for all i, E[Xi] = 2 Analysis of randomized selection If pivot x falls within the (yellow) middle half • either everything to the left of x, including box L is thrown away or • everything to the right of x, including box R is thrown away. So, if pivot x falls within the (yellow) middle half, => at least 1⁄4 of the items are thrown away.
Call a pivot “good” if it’s between the 25%- and 75%-percentile of 𝐴, otherwise “bad”.
 The probability of a random pivot x being good is 1/2.
 Each good pivot reduces size of array by at least 1/4.

Analysis of randomized selection
Theorem: The expected running time of randomized selection is . Pf: Call a pivot “good” if it’s between the 25%- and 75%-percentile of
 The probability of a random pivot being good is 1/2.
 Each good pivot reduces by at least .
After i good pivots, total # of items left to process is
– i.e., After throwing away 1⁄4 of remaining items i times
 Let i-th stage be the time between the -th good pivot (not including) and the ( +1)-st good pivot (including), =0, 1, 2, …
– In i-th stage, # of items being processed in array is

Analysis of randomized selection
Theorem: The expected running time of randomized selection is . Pf: A pivot is “good” if it is between the 25%- and 75%-percentile of ,
The probability of a random pivot being good is . After i good pivots, total # of items left to process is
𝑖 = the running time of ith stage
𝑖 = the #􏰌of pivots (recursive calls) in ith stage
[ 𝑖] (waitingtime) so [ 𝑖] 𝑖
From the mathematical revision
=> Expected total running time
Remark: A deterministic linear-time selection algorithm exists but it is complicated

Exercise on Sorting and Selection
You are given two sorted arrays A1 and A2 of sizes m and n. Design an
algorithm to find the kth smallest element in the union of the elements in A1 and A2 (k ≤ m+n)). Analyze the time complexity of your algorithm.
1] Very Naïve method
Merge A1 and A2 into a sorted array of size m+n. Return the kth element in the
merged array. Cost (m+n).
2] Naïve method
Start merging A1 and A2 (using the merge function of merge sort). Stop when you reach the kth element in the merged array. Cost (k).

Exercise on Sorting and Selection (cont)
D&C Method
Lets call the method: kth(arrays A1, A2, integers start1, end1, start2, end2, k)
Main Idea: Compare elements A1[k/2] and A2[k/2] If A1[k/2]A2[k/2]):
Eliminate first half of A2
Return kth(A1, A2, start1, end1, k/2+1, end2, k/2)
Cost: (logk)

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com