Selecting the kth smallest element
Nishant Mehta March 4th, 2021
M. C. Escher (1948): “Drawing Hands”
Selecting Medians and Order Statistics
•
• •
• •
•
Fundamental problem:
Select the kth smallest element in an unsorted sequence
Definition: An element x is the kth order statistic of a sequence S if x is the kth smallest element of S
Selection Problem:
Given an array S of n elements and k ∈ {1, 2, …, n}, Return the kth order statistic of S
Example: If n is odd and k = (n+1)/2, we get the median
Selecting Medians and Order Statistics
•
• •
• •
•
Fundamental problem:
Select the kth smallest element in an unsorted sequence
Definition: An element x is the kth order statistic of a sequence S if x is the kth smallest element of S
Selection Problem:
Given an array S of n elements and k ∈ {1, 2, …, n}, Return the kth order statistic of S
Example: If n is odd and k = (n+1)/2, we get the median
For simplicity we’ll assume all n elements are distinct (no big ideas are needed for the general case)
A naïve solution
A sorting-based approach:
1. Sort S in increasing order
2. Output the kth element of the sorted sequence
How long does this take? Is this the best possible?
A naïve solution
A sorting-based approach:
1. Sort S in increasing order
2. Output the kth element of the sorted sequence
How long does this take? O(n log n) Is this the best possible?
A naïve solution
A sorting-based approach:
1. Sort S in increasing order
2. Output the kth element of the sorted sequence
How long does this take? O(n log n) Is this the best possible? No!
O(n) is possible!!
Quickselect: Quicksort with pruning
Goal: Select the 6th smallest element
15 elements
7 elements
7 elements
S
pivot0
pivot0
pivot0
Quickselect: Quicksort with pruning
Goal: Select the 6th smallest element
15 elements
S
7 elements
pivot0
pivot0
7 elements
pivot0
Quickselect: Quicksort with pruning
Goal: Select the 6th smallest element
15 elements
7 elements
pivot0
pivot0
7 elements
S
pivot0
8th smallest element and larger
Quickselect: Quicksort with pruning
Goal: Select the 6th smallest element
15 elements
S
Prune!
7 elements
pivot0
pivot0
7 elements
pivot0
8th smallest element and larger
Quickselect: Quicksort with pruning
Goal: Select the 6th smallest element
15 elements
Prune!
7 elements
pivot0
pivot0
7 elements
Quickselect with k=6
8th smallest element and larger
S
pivot0
Quickselect: Quicksort with pruning
Goal: Select the 6th smallest element
15 elements
7 elements
pivot0
pivot0
7 elements
pivot0
3 elements
3 elements
pivot1
pivot1
pivot1
S
Quickselect: Quicksort with pruning
Goal: Select the 6th smallest element
15 elements
7 elements
pivot0
pivot0
7 elements
pivot0
S
3 elements
pivot1
pivot1
3 elements
pivot1
Quickselect: Quicksort with pruning
Goal: Select the 6th smallest element
15 elements
7 elements
pivot0
pivot0
7 elements
pivot0
S
3 elements
pivot1
pivot1
3 elements
pivot1
4th smallest element and smaller
Quickselect: Quicksort with pruning
Goal: Select the 6th smallest element
15 elements
7 elements
pivot0
pivot0
7 elements
Prune!
pivot0
S
3 elements
pivot1
pivot1
3 elements
pivot1
4th smallest element and smaller
Quickselect: Quicksort with pruning
Goal: Select the 6th smallest element
15 elements
7 elements
pivot0
pivot0
7 elements
Prune!
pivot0
S
3 elements
pivot1
pivot1
3 elements
pivot1
4th smallest element and smaller
Quickselect with k=(6-4)=2
[L, G] = Partition(S, p) If k <= length(L)
Return Quickselect(L, k) ElseIf k == (length(L) + 1)
Return p
Else // k > (length(L) + 1)
↳3 p
Quickselect
Quickselect(S, k): If S.length() == 1
Return S[0]
p = PickPivot(S) // how to pick pivot? To be explained later!
Return Quickselect(G, k – length(L) – 1)
Quickselect – Optimistic analysis
•
• •
OR
(b) returns the kth order statistic itself
Suppose we always take the pivot to be the first element in the sequence and are so lucky that it always is the median
Then PickPivot(S) just returns S[0] and so costs 1
Quickselect on sequence of length n either:
(a) calls Quickselect on sequence of length at most bn/2c
Quickselect – Optimistic analysis
•
• •
OR
(b) returns the kth order statistic itself
Suppose we always take the pivot to be the first element in the sequence and are so lucky that it always is the median
Then PickPivot(S) just returns S[0] and so costs 1
Quickselect on sequence of length n either:
(a) calls Quickselect on sequence of length at most bn/2c
So:
T(n) T(n/2) + cn
sah constant ( maybe
pantin
Quick Select when given arrays of length n
5or 10. . . )
Quickselect – Optimistic analysis
T(n) T(n/2) + cn
T(n/4) + cn/2 + cn
T(n/8) + cn/4 + cn/2 + cn
. .
⇒. j=0
geometric
series r– I
T(1)+(cn)X1 2 j
= T(1) + 2cn = O(n)
Quickselect – Optimistic analysis
T(n) T(n/2) + cn
T(n/4) + cn/2 + cn
T(n/8) + cn/4 + cn/2 + cn
.
T(1)+(cn)X1 2 j j=0
= T(1) + 2cn = O(n)
This is great! But we cheated by assuming that taking the pivot as the first element always gives us the median
Quickselect – Worst-case analysis
Goal: Select the 6th smallest element
15 elements
14 elements
empty!
pivot0
S
13 elements
pivot0 pivot0
empty!
pivot1
pivot1
pivot1
Quickselect – Worst-case analysis
T(n) = T(n 1) + cn
= T(n 2) + c(n 1) + cn
= T(n 3) + c(n 2) + c(n 1) + cn
.
Xn j=2
j
= T (1) + c
= O(1) + c · ✓n(n + 1) 1◆
2 = ⌦(n2)
• • •
•
Picking a good pivot
Median pivots are the best possible choice
But if we knew how to get the median, we would be done! Idea: Try to find an “approximate median” using less work
Find the median of a well-chosen subset of the sequence
Picking a good pivot
Definition: Let satisfy 1/2 1 . We say that an element m of sequence S is a -approximate median of S if:
L E An
EP n
#
At most n elements of S are greater than m
elements
in
G
At most n elements of S are less than m and
# elements in
Picking a good pivot
Definition: Let satisfy 1/2 1 . We say that an element m of sequence S is a -approximate median of S if:
At most n elements of S are less than m and
#
elements
in
L E An
At most n elements of S are greater than m # elements in
G
EP n
set of all -approximate medians when = 3/4
Is a β-approximate median a good pivot?
•
•
If the pivot is a -approximate median, then calling Quickselect on a sequence of n points leads to both L and G that each are of size at most n
If Quickselect always uses a -approximate median, then at level j of Quickselect (i.e. inside the jth recursive call), both L and G each can have size at most j n
Is a β-approximate median a good pivot?
•
•
If the pivot is a -approximate median, then calling Quickselect on a sequence of n points leads to both L and G that each are of size at most n
If Quickselect always uses a -approximate median, then at level j of Quickselect (i.e. inside the jth recursive call), both L and G each can have size at most j n
T(n) T( n) + cn
T( 2n) + c n + cn
T( 3n) + c 2n + c n + cn
. .
If k
X→
= O(1) +
a-
– Assume
I
A
T(1) + cn
j
cn
j=0
1
Quickselect with β-approximate median
So, runtime of Quickselect using a -approximate median is
T(n)=O(1)+ cn =O(n) 1
Quickselect with β-approximate median
So, runtime of Quickselect using a -approximate median is T(n)=O(1)+ cn =O(n)
1 This is great, but we are still cheating…
We need a way to find -approximate median AND must account for the computational cost for doing so
• •
Find the median of each segment.
Find the median of the n/5 medians (somehow)
Computing a β-approximate median
•
For simplicity, we ignore the fact that the last segment
Partition sequence into n/5 segments, each of size 5
•
might have size less than 5.
Median of medians
Median of medians
Median of medians
Median of medians
median of medians
Median of medians
median of medians
Quickselect with median-of-medians pivot
•
Sort to get median. Cost: O(1)
•
What does it cost to compute all the medians?
• For each segment of length 5:
# n/5 such segments Total cost: O(n)
Two outstanding problems:
(1) Is median of medians a good pivot,
i.e. is it a -approximate median?
(2) How do you efficiently compute median of medians?
Quickselect with median-of-medians pivot
•
Is median of medians a good pivot, i.e. is it a -approximate median?
m⇤
(median of medians)
Quickselect with median-of-medians pivot
•
Is median of medians a good pivot, i.e. is it a -approximate median?
•
n/5 medians, of which n/10 of are less than or equal to m⇤
m⇤
(median of medians)
Quickselect with median-of-medians pivot
•
Is median of medians a good pivot, i.e. is it a -approximate median?
• •
n/5 medians, of which n/10 of are less than or equal to m⇤
For each such median, 2 more points are less than m⇤
m⇤
(median of medians)
Quickselect with median-of-medians pivot
•
Is median of medians a good pivot, i.e. is it a -approximate median?
• •
•
n/5 medians, of which n/10 of are less than or equal to m⇤
For each such median, 2 more points are less than m⇤
m⇤
(median of medians)
At least (3n/10) 1 elements less than m⇤
Quickselect with median-of-medians pivot
•
Is median of medians a good pivot, i.e. is it a -approximate median?
• •
• •
n/5 medians, of which n/10 of are less than or equal to m⇤
For each such median, 2 more points are less than m⇤
m⇤
(median of medians)
At least (3n/10) 1 elements less than m⇤
Hence, at most 7n/10 elements are greater than m⇤
Quickselect with median-of-medians pivot
•
Is median of medians a good pivot, i.e. is it a -approximate median?
• •
• •
•
n/5 medians, of which n/10 of are less than or equal to m⇤
For each such median, 2 more points are less than m⇤
m⇤
(median of medians)
At least (3n/10) 1 elements less than m⇤
Hence, at most 7n/10 elements are greater than m⇤
By symmetry, at most 7n/10 elements are less than m⇤
Quickselect with median-of-medians pivot
•
Is median of medians a good pivot, i.e. is it a -approximate median?
• •
n/5 medians, of which n/10 of are less than or equal to m⇤
For each such median, 2 more points are less than m⇤
m⇤
(median of medians)
(• •
At least (3n/10) 1 elements less than m⇤
•
Hence, at most 7n/10 elements
m⇤ is a -approximate median (for = 7/10)
are greater than m⇤
By symmetry, at most 7n/10
elements are less than m⇤
Quickselect with median-of-medians pivot
⇒ don’t want to sort
rs
• •-
them
!
How to compute median of medians?
Idea! Recursively call Quickselect(medians, (n/5)/2)
•
Original sequence was of length n
Can this really work?
K
– –
• • •
Sequence of medians is of length only n/5 Smells like divide-and-conquer
Run-time Analysis
We could use substitution method to analyze complexity Instead, let’s use “stack of bricks” view of the recursion tree Set↵=1/5 and =7/10
< Mtn
T(n) = T(n/5) + T(7n/10) + cn
Run-time Analysis
We could use substitution method to analyze complexity Instead, let’s use “stack of bricks” view of the recursion tree Set↵=1/5 and =7/10
£
T(n) = T(n/5) + T(7n/10) + cn
cn
Run-time Analysis
T(n) = T(n/5) + T(7n/10) + cn
We could use substitution method to analyze complexity Instead, let’s use “stack of bricks” view of the recursion tree Set↵=1/5 and =7/10
sum: c(↵ + )n
cn
c↵n
c n
Run-time Analysis
T(n) = T(n/5) + T(7n/10) + cn
We could use substitution method to analyze complexity Instead, let’s use “stack of bricks” view of the recursion tree Set↵=1/5 and =7/10
cn
tc ↵n 1.
c ↵ n
t
c ↵ n
b
c n c 2n
sum: c(↵ + )n sum: c(↵ + )2n
c↵2n
Run-time Analysis
T(n) = T(n/5) + T(7n/10) + cn
We could use substitution method to analyze complexity
Instead, let’s use “stack of bricks” view of the recursion tree
I
:e(0 Sum atl) n
i
sum: c(↵ + )n sum: c(↵ + )2n
c↵2n X1 j cn
Set↵=1/5and =7/10
att- ftIo-
E.+ Fo-- fo al
ff cn c↵n , c n
ftft
c 2n
c ↵ n c ↵ n
T(n)O(1)+cn (↵+ ) = 1 (↵+ ) =10cn=O(n) j=0
Upper bound on runtime of Quickselect with median of medians pivot
Theorem
Quickselect(S,k) using the median of medians pivot returns the kth order statistic in time at most O(n).
•
•
•
•
Suppose Bob tells Alice he has an algorithm that can select the kth order statistic in sublinear time.
A lower bound
Alice is dubious that Bob’s algorithm is correct, because a sublinear algorithm cannot look at all n elements.
Alice cooks up a length-n sequence of n distinct integers and observes which value Bob’s algorithm does not look at.
She then changes that value so that it becomes the kth order statistic, rendering Bob’s algorithm wrong on this adjusted input.
Selecting the kth order statistic takes time ⌦(n)
A lower bound
•
Alice observes that on her current input, Bob’s
Example: nth order statistic (the maximum).
•
algorithm does not look at the first element.
•
Alice adjusts S via S[0] = 1 + max{S[1], S[2], ..., S[n]}
Optimality of Quickselect with median of medians
Worst-case runtime for selecting the kth order statistic
Quickselect with the median of medians pivot has worst-case runtime of O(n)
The worst-case lower bound for any algorithm is ⌦(n)
Quickselect with the median of medians pivot has worst-case runtime ⇥(n), and this is optimal
Asymptotic optimality isn’t everything
•
•
• •
Quickselect with median of medians pivots is clever and asymptotically optimal in the worst-case
BUT: In practice, Quickselect with a random pivot can be much faster. Why?
Quickselect spends a lot of time computing its pivot The constants hidden by the O(n) actually matter.