CS61B
Lecture 34: Sorting IV
¡ñ Sorting Summary
¡ñ Math Problems out of Nowhere
¡ñ Theoretical Bounds on Sorting
Other Desirable Sorting Properties: Stability
A sort is said to be stable if order of equivalent items is preserved.
sort(studentRecords, BY_NAME); sort(studentRecords, BY_SECTION);
Bas
3
Fikriyya
4
Jana
3
Jouni
3
Lara
1
Nikolaj
4
Rosella
3
Sigurd
2
Lara
1
Sigurd
2
Bas
3
Jana
3
Jouni
3
Rosella
3
Fikriyya
4
Nikolaj
4
Equivalent items don¡¯t ¡®cross over¡¯ when being stably sorted.
Other Desirable Sorting Properties: Stability
A sort is said to be stable if order of equivalent items is preserved.
sort(studentRecords, BY_NAME); sort(studentRecords, BY_SECTION);
Bas
3
Fikriyya
4
Jana
3
Jouni
3
Lara
1
Nikolaj
4
Rosella
3
Sigurd
2
Lara
1
Sigurd
2
Jouni
3
Rosella
3
Bas
3
Jana
3
Fikriyya
4
Nikolaj
4
Sorting instability can be really annoying! Wanted students listed alphabetically by section.
Arrays.sort
In Java, Arrays.sort(someArray) uses:
¡ñ Mergesort (specifically the TimSort variant) if someArray consists of Objects.
¡ñ Quicksort if someArray consists of primitives. Why? See A level problems.
Arrays.sort
In Java, Arrays.sort(someArray) uses:
¡ñ Mergesort (specifically the TimSort variant) if someArray consists of Objects.
¡ñ Quicksort if someArray consists of primitives. Why?
¡ñ When you are using a primitive value, they are the ¡®same¡¯. A 4 is a 4. Unstable sort has no observable effect.
¡ñ By contrast, objects can have many properties, e.g. section and name, so equivalent items CAN be differentiated.
Sorting
Sorting is a foundational problem.
¡ñ Obviously useful for putting things in order.
¡ñ But can also be used to solve other tasks, sometimes in non-trivial ways.
¡ð Sorting improves duplicate finding from a naive N2 to N log N.
¡ð Sorting improves 3SUM from a naive N3 to N2.
¡ñ There are many ways to sort an array, each with its own interesting tradeoffs
and algorithmic features.
Today we¡¯ll discuss the fundamental nature of the sorting problem itself: How hard is it to sort?
Sorts Summary
Memory
# Compares
Notes
Stable?
Heapsort
¦¨(1)
¦¨(N log N)
Bad caching (61C)
No
Insertion
¦¨(1)
¦¨(N2)
Best for almost sorted and N < 15
Yes
Mergesort
¦¨(N)
¦¨(N log N)
Fastest stable sort
Yes
Quicksort LTHS
¦¨(log N)
¦¨(N log N) expected
Fastest sort
No
This is due to the cost of tracking recursive calls by the computer, and is also an ¡°expected¡± amount. The difference between log N and constant memory is trivial.
You can create a stable Quicksort. However, using unstable partitioning schemes (like Hoare partitioning) and using randomness to avoid bad pivots tend to yield better runtimes.
Math Problems out of Nowhere
A Math Problem out of Nowhere
Consider the functions N! and (N/2)N/2 Is N! ¡Ê ¦¸((N/2)N/2)? Prove your answer.
¡ñ Recall that ¡Ê ¦¸ can be informally be interpreted to mean ¡Ý
¡ñ In other words, does factorial grow at least as quickly as (N/2)N/2?
A Math Problem out of Nowhere
Consider the functions N! and (N/2)N/2
Is N! ¡Ê ¦¸((N/2)N/2)? Prove your answer.
10!
¡ñ 10 * 9 * 8 * 7 * 6 * ... * 1
55
¡ñ 5 * 5 *5 *5 * 5
N! > (N/2)N/2, for large N, therefore N! ¡Ê ¦¸((N/2)N/2)
Another Math Problem
Given: N! > (N/2)N/2, which we used to prove our answer to the previous problem.
Show that log(N!) ¡Ê ¦¸(N log N).
¡ñ Recall: log means an unspecified base.
Another Math Problem
Given that N! > (N/2)N/2
Show that log(N!) ¡Ê ¦¸(N log N).
We have that N! > (N/2)N/2
¡ñ Taking the log of both sides, we have that log(N!) > log((N/2)N/2).
¡ñ Bringing down the exponent we have that log(N!) > N/2 log(N/2).
¡ñ Discarding the unnecessary constant, we have log(N!) ¡Ê ¦¸(N log (N/2)).
¡ñ From there, we have that log(N!) ¡Ê ¦¸(N log N).
Since log(N/2) is the same thing asymptotically as log(N).
In other words, log(N!) grows at least as quickly as N log N.
Last Math Problem
In the previous problem, we showed that log(N!) ¡Ê ¦¸(N log N). Now show that N log N ¡Ê ¦¸(log(N!)).
Last Math Problem
Show that N log N ¡Ê ¦¸(log(N!)) Proof:
¡ñ log(N!) = log(N) + log(N-1) + log(N-2) + …. + log(1)
¡ñ N log N = log(N) + log(N) + log(N) + … log(N)
¡ñ Therefore N log N ¡Ê ¦¸(log(N!))
Omega and Theta: yellkey.com/out Given:
¡ñ N log N ¡Ê ¦¸(log(N!))
¡ñ log(N!) ¡Ê ¦¸(N log N)
Which of the following can we say?
A. N log N ¡Ê ¦¨(log N!)
B. log N! ¡Ê ¦¨(N log N)
C. Both A and B
D. Neither
Omega and Theta
Given:
¡ñ N log N ¡Ê ¦¸(log(N!))
Informally: N log N ¡Ý log(N!)
Informally: log(N!) ¡Ý N log N
¡ñ log(N!) ¡Ê ¦¸(N log N)
Which of the following can we say?
A. N log N ¡Ê ¦¨(log N!)
B. log N! ¡Ê ¦¨(N log N)
C. Both A and B
D. Neither
Informally: N log N = log(N!)
Summary
We¡¯ve shown that log(N!) ¡Ê ¦¨(N log N).
¡ñ In other words, these two functions grow at the same rate asymptotically.
As for why we did this, we will see in a little while…
Theoretical Bounds on Sorting
Sorting
We have shown several sorts to require ¦¨(N log N) worst case time.
¡ñ Can we build a better sorting algorithm? By comparison sort, I mean that it uses e.g. the
compareTo method in Java to make decisions.
Let the ultimate comparison sort (TUCS) be the asymptotically fastest possible comparison sorting algorithm, possibly yet to be discovered, and let R(N) be its worst case runtime.
Give the best ¦¸ and O bounds you can for R(N).
It might seem strange to give ¦¸ and O bounds for an algorithm whose details are completely unknown, but you can, I promise!
Sorting
We have shown several sorts to require ¦¨(N log N) worst case time.
¡ñ Can we build a better sorting algorithm? By comparison sort, I mean that it uses e.g. the
compareTo method in Java to make decisions.
Let the ultimate comparison sort (TUCS) be the asymptotically fastest possible comparison sorting algorithm, possibly yet to be discovered, and let R(N) be its worst case runtime.
O(N log N)
¦¸(1)
¡ñ Worst case run-time of TUCS, R(N) is O(N log N).
¡ð Obvious: Mergesort is ¦¨(N log N) so R(N) can¡¯t be worse!
¡ñ Worst case run-time of TUCS, R(N) is ¦¸(1).
¡ð Obvious: Problem doesn¡¯t get easier with N.
¡ð Can we make a stronger statement than ¦¸(1)?
TUCS Worst Case ¦¨ Runtime
Sorting
Let TUCS be the asymptotically fastest possible comparison sorting algorithm, possibly yet to be discovered.
¡ñ Worst case run-time of TUCS, R(N) is O(N log N). Why?
¡ñ Worst case run-time of TUCS, R(N) is also ¦¸(N).
¡ð Have to at least look at every item.
O(N log N)
¦¸(N)
TUCS Worst Case ¦¨ Runtime
Sorting
We know that TUCS ¡°lives¡± between N and N log N.
¡ñ Worst case asymptotic runtime of TUCS is between ¦¨(N) and ¦¨(N log N).
O(N log N)
¦¸(N)
¡ñ Can we make an even stronger statement on the lower bound? ¡ð With a clever argument, yes (as we¡¯ll see soon see).
¡ö Spoiler alert: It will turn out to be ¦¸(N log N)
¡ð This lower bound means that across the infinite space of all
possible ideas that any human might ever have for sorting using sequential comparisons, NONE has a worst case runtime that is better than ¦¨(N log N).
TUCS Worst Case ¦¨ Runtime
The Game of Puppy, Cat, Dog
Suppose we have a puppy, a cat, and a dog, each in an opaque soundproof box labeled A, B, and C. We want to figure out which is which using a scale.
The Game of Puppy, Cat, Dog
Suppose we have a puppy, a cat, and a dog, each in an opaque soundproof box labeled A, B, and C. We want to figure out which is which using a scale.
a