CS61B
Lecture 36: The End of Sorting
¡ñ An Intuitive, Analytical, and Empirical look at Radix vs. Comparison
Sorting
¡ñ The Just-In-Time Compiler
¡ñ Radix Sorting Integers
¡ñ Summary
datastructur.es
Intuitive: Radix Sort vs. Comparison Sorting
datastructur.es
Merge Sort Runtime yellkey.com/wall Merge Sort requires ¦¨(N log N) compares.
What is Merge Sort¡¯s runtime on strings of length W?
datastructur.es
Merge Sort Runtime
Merge Sort requires ¦¨(N log N) compares.
What is Merge Sort¡¯s runtime on strings of length W?
¡ñ It depends!
¡ð ¦¨(N log N) if each comparison takes constant time.
¡ö Example: Strings are all different in top character. ¡ð ¦¨(WN log N) if each comparison takes ¦¨(W) time.
¡ö Example: Strings are all equal.
datastructur.es
LSD vs. Merge Sort
The facts.
¡ñ Treating alphabet size as constant, LSD Sort has runtime ¦¨(WN).
¡ñ Merge Sort has runtime between ¦¨(N log N) and ¦¨(WN log N). Which is better? It depends.
¡ñ When might LSD sort be faster?
¡ñ When might Merge Sort be faster?
datastructur.es
LSD vs. Merge Sort (Your Answer)
The facts:
¡ñ Treating alphabet size as constant, LSD Sort has runtime ¦¨(WN).
¡ñ Merge Sort has runtime between ¦¨(N log N) and ¦¨(WN log N). Which is better? It depends.
¡ñ When might LSD sort be faster?
¡ð Intuitively if W < log N. If N is really really big.
¡ð If the Strings are relatively samish.
¡ñ When might Merge Sort be faster?
¡ð If the Strings are very different.
datastructur.es
LSD vs. Merge Sort (My Answer)
The facts:
¡ñ Treating alphabet size as constant, LSD Sort has runtime ¦¨(WN).
¡ñ Merge Sort is between ¦¨(N log N) and ¦¨(WN log N). Which is better? It depends.
¡ñ When might LSD sort be faster?
¡ð Sufficiently large N.
¡ð If strings are very similar to each other.
¡ö Each Merge Sort comparison costs ¦¨(W) time. ¡ñ When might Merge Sort be faster?
¡ð If strings are highly dissimilar from each other. ¡ö Each Merge Sort comparison is very fast.
AAAAAAAAAAAAA.......AB
AAAAAAAAAAAAA.......AA
AAAAAAAAAAAAA.......AQ
...
IUYQWLKJASHLEIUHAD...
LIUHLIUHRGLIUEHWEF...
OZIUHIOHLHLZIEIUHF...
...
datastructur.es
Cost Model: Radix Sort vs. Comparison Sorting
datastructur.es
Alternate Approach: Picking a Cost Model
An alternate approach is to pick a cost model.
¡ñ We¡¯ll use number of characters examined.
¡ñ By ¡°examined¡±, we mean:
¡ð Radix Sort: Calling charAt in order to count occurrences of each character.
¡ð Merge Sort: Calling charAt in order to compare two Strings.
datastructur.es
MSD vs. Mergesort
Suppose we have 100 strings of 1000 characters each.
¡ñ Estimate the total number of characters examined by MSD Radix Sort if all strings are equal.
datastructur.es
MSD vs. Mergesort
Suppose we have 100 strings of 1000 characters each.
¡ñ Estimate the total number of characters examined by MSD Radix Sort if all
strings are equal.
For MSD Radix Sort, in the worst case (all strings equal), every character is examined exactly once. Thus, we have exactly 100,000 total character examinations.
datastructur.es
MSD vs. Mergesort
Suppose we have 100 strings of 1000 characters each.
¡ñ Estimate the total number of characters examined by Merge Sort if all strings are equal.
datastructur.es
MSD vs. Mergesort
Suppose we have 100 strings of 1000 characters each.
¡ñ Estimate the total number of characters examined by Merge Sort if all strings
are equal.
Merging 100 items, assuming equal items results in always picking left: ¡ñ Comparing A[0] to A[50]: 2000 character examinations.
Left half
0 1 2 97 98 99
Right half
AAA...AA
AAA...AA
AAA...AA
...
AAA...AA
AAA...AA
AAA...AA
i=0 j=50
datastructur.es
MSD vs. Mergesort
Suppose we have 100 strings of 1000 characters each.
¡ñ Estimate the total number of characters examined by Merge Sort if all strings
are equal.
Merging 100 items, assuming equal items results in always picking left:
¡ñ Comparing A[0] to A[50]: 2000 character examinations.
¡ñ Comparing A[1] to A[50]: 2000 character examinations.
Left half
0 1 2 97 98 99
Right half
AAA...AA
AAA...AA
AAA...AA
...
AAA...AA
AAA...AA
AAA...AA
i=1 j=50
datastructur.es
MSD vs. Mergesort
Suppose we have 100 strings of 1000 characters each.
¡ñ Estimate the total number of characters examined by Merge Sort if all strings
are equal.
Merging 100 items, assuming equal items results in always picking left:
¡ñ Comparing A[0] to A[50]: 2000 character examinations.
¡ñ Comparing A[1] to A[50]: 2000 character examinations.
¡ñ ... Comparing A[49] to A[50]: 2000 character examinations.
¡ñ Total characters examined: 50 * 2000 = 100000.
Left half
0 1 2
N/2 * 2000 = 1000N
Right half
97 98 99
AAA...AA
AAA...AA
AAA...AA
...
AAA...AA
AAA...AA
AAA...AA
i=49 j=50
datastructur.es
MSD vs. Mergesort
Suppose we have 100 strings of 1000 characters each.
¡ñ Estimate the total number of characters examined by Merge Sort if all strings are equal.
¡ñ From previous slide: Merging N strings of 1000 characters requires N/2 * 2000 = 1000N examinations.
datastructur.es
MSD vs. Mergesort
Suppose we have 100 strings of 1000 characters each.
¡ñ Estimate the total number of characters examined by Merge Sort if all strings are equal.
¡ñ From previous slide: Merging N strings of 1000 characters requires N/2 * 2000 = 1000N examinations.
In total, we must examine approximately 1000N log2 N total characters.
¡ñ 100000 + 50000*2 + 25000 * 4 + ... = ~660,000 characters.
merge(100): 100000
merge(50): 50000 merge(50): 50000
merge(25): 25000 merge(25): 25000 merge(25): 25000 merge(25): 25000
datastructur.es
MSD vs. Mergesort Character Examinations
For N equal strings of length 1000, we found that:
¡ñ MSD radix sort will examine ~1000N characters (For N= 100: 100,000).
¡ñ Merge sort will examine ~1000Nlog2(N) characters (For N=100: 660,000).
If character examination are an appropriate cost model, we¡¯d expect Merge Sort
to be slower by a factor of log2N.
To see if we¡¯re right, we¡¯ll need to do a computational experiment.
datastructur.es
Highly Dissimilar Very Long Strings
Mergesort vs. MSD:
¡ñ Runtimes: MSD < Mergesort < LSD: student analysis.
LSD will be really slow, because it¡¯s going to consider a bunch of meaningless low order characters.
Hfalwieuhflawiuehfblfevauwh Uzifyhaliweyflaizweyfliuaywef Wleiuryalweiuryalbwieuryawe
¡ñ MSD splits problem into roughly 26 subproblems each time (assuming 26 character alphabet). Total character examined will be NK, where K is the depth we get to. K will be roughly log_26(N). Overall runtime N log_26(N).
datastructur.es
¡ñ Mergesort: Number of characters needed to resolve compareTo. If we were
Highly Dissimilar Very Long Strings
Mergesort vs. MSD:
¡ñ Runtimes: MSD < Mergesort < LSD: student analysis.
LSD will be really slow, because it¡¯s going to consider a bunch of meaningless low order characters.
Mergesort is:
MSD is between: theta(N + R) [best case, all done up front] and theta(WN + WR) [worst case]
¡ñ In the random case, we¡¯ll never hit the best case for large N, because of the pigeonhole principle. If you have 27 strings, for example, with an alphabet of size 26, at least the front character must match for two strings.
datastructur.es
Empirical Study: Radix Sort vs. Comparison Sorting
datastructur.es
Computational Experiment Results
Computational experiment for W = 100.
¡ñ MSD and merge sort implementations are highly optimized versions taken from our optional algorithms textbook.
¡ñ Does our data match our runtime hypothesis? Merge MSD
N
Runtime
# chars
10,000
0.05
13,801,600
100,000
0.26
170,780,800
1,000,000
0.87
2,013,286,400
10,000,000
13.8
23,757,632,000
N
Runtime
# chars
10,000
0.05
1,000,000
100,000
1.98
10,000,000
1,000,000
30.21
100,000,000
10,000,000
370.26
1,000,000,000
As we expected, Merge sort considers log2N times as many characters, e.g. log2(10,000,000) = 23.25
datastructur.es
Computational Experiment Results
Computational experiment for W = 100.
¡ñ MSD and merge sort implementations are highly optimized versions taken from our optional algorithms textbook.
¡ñ Does our data match our runtime hypothesis? No! Merge ¡ð Any guesses as to why not? MSD
N
Runtime
# chars
10,000
0.05
13,801,600
100,000
0.26
170,780,800
1,000,000
0.87
2,013,286,400
10,000,000
13.8
23,757,632,000
N
Runtime
# chars
10,000
0.05
1,000,000
100,000
1.98
10,000,000
1,000,000
30.21
100,000,000
10,000,000
370.26
1,000,000,000
datastructur.es
Computational Experiment Results (Your Answers)
Computational experiment for W = 100.
¡ñ MSD and merge sort implementations are highly optimized versions taken from our optional algorithms textbook.
¡ñ Does our data match our runtime hypothesis? No! Why not?
¡ð MSD needs more memory allocation. (I think not).
¡ð MSD across all the strings. Merge, going two adjacent strings at once.
Merge
Could be caching related. MSD
N
Runtime
# chars
10,000
0.05
13,801,600
100,000
0.26
170,780,800
1,000,000
0.87
2,013,286,400
10,000,000
13.8
23,757,632,000
N
10,000
100,000
1,000,000
10,000,000
Runtime
0.05
1.98
30.21
370.26
# chars
1,000,000
10,000,000
100,000,000
1,000,000,000 data
structur.es
Computational Experiment Results (Your Answers)
Computational experiment for W = 100.
¡ñ MSD and merge sort implementations are highly optimized versions taken from our optional algorithms textbook.
¡ñ Does our data match our runtime hypothesis? No! Why not?
Merge
¡ð ¡ð
Mergesort: Uses the built in compareTo method -- this could be an
issue, could be optimized in some crazy way.
There are potentially extra copy operations in MSD.
MSD
N
Runtime
# chars
10,000
0.05
13,801,600
100,000
0.26
170,780,800
1,000,000
0.87
2,013,286,400
10,000,000
13.8
23,757,632,000
N
10,000
100,000
1,000,000
10,000,000
Runtime
0.05
1.98
30.21
370.26
# chars
1,000,000
10,000,000
100,000,000
1,000,000,000 data
structur.es
Computational Experiment Results (Your Answers)
Computational experiment for W = 100.
¡ñ MSD and merge sort implementations are highly optimized versions taken from our optional algorithms textbook.
¡ñ Does our data match our runtime hypothesis? No! Why not?
¡ð There are potentially extra copy operations in MSD. Our cost model
Merge
may not capture everything that is important.
MSD
N
Runtime
# chars
10,000
0.05
13,801,600
100,000
0.26
170,780,800
1,000,000
0.87
2,013,286,400
10,000,000
13.8
23,757,632,000
N
10,000
100,000
1,000,000
10,000,000
Runtime
0.05
1.98
30.21
370.26
# chars
1,000,000
10,000,000
100,000,000
1,000,000,000 data
structur.es
Computational Experiment Results (My Answers)
Computational experiment for W = 100.
¡ñ MSD and merge sort implementations are highly optimized versions taken from our optional algorithms textbook.
¡ñ Does our data match our runtime hypothesis? No! Why not?
¡ð ¡ð
Our cost model isn¡¯t representative of everything that is happening. One particularly thorny issue: The ¡°Just In Time¡± Compiler.
MSD
Merge
N
Runtime
# chars
10,000
0.05
13,801,600
100,000
0.26
170,780,800
1,000,000
0.87
2,013,286,400
10,000,000
13.8
23,757,632,000
N
10,000
100,000
1,000,000
10,000,000
Runtime
0.05
1.98
30.21
370.26
# chars
1,000,000
10,000,000
100,000,000
1,000,000,000 data
structur.es
An Unexpected Factor: The Just-In-Time Compiler
Java¡¯s Just-In-Time Compiler secretly optimizes your code when it runs.
¡ñ The code you write is not necessarily the code that executes!
¡ñ As your code runs, the ¡°interpreter¡± is watching everything that happens.
¡ð If some segment of code is called many times, the interpreter actually studies and re-implements your code based on what it learned by watching WHILE ITS RUNNING (!!).
¡ö ¡ö
Hello.java
Example: Performing calculations whose results are unused. See this video if you¡¯re curious.
Compiler
Interpreter
javac Hello.class java
stuff happens
datastructur.es
JIT Example
The code below creates Linked Lists, 1000 at a time.
¡ñ
Repeating this 500 times yields an interesting result.
public class JITDemo1 {
static final int NUM_LISTS = 1000;
public static void main(String[] args) { for (int i = 0; i < 500; i += 1) {
long startTime = System.nanoTime();
for (int j = 0; j < NUM_LISTS; j += 1) {
LinkedList
long endTime = System.nanoTime();
System.out.println(i + ¡°: ¡° + endTime – startTime); }
}
Create 1000 linked lists and print total time it takes.
}
datastructur.es
JIT Example
The code below creates Linked Lists, 1000 at a time.
¡ñ Repeating this 500 times yields an interesting result.
¡ñ First optimization: Not sure what it does.
¡ñ Second optimization: Stops creating linked lists since we¡¯re not actually
using them.
Warmup
Optimization 2
Optimization 1
datastructur.es
Rerunning Our Empirical Study With No JIT
datastructur.es
Computational Experiments Results
Results with JIT disabled (using the -Xint option).
N
Runtime
# chars
1,000
10,000
100,000
N
Runtime
# chars
1,000
10,000
100,000
Merge MSD
datastructur.es
Computational Experiments Results
Results with JIT disabled (using the -Xint option).
¡ñ Both sorts are MUCH MUCH slower than before.
¡ñ Merge sort is slower than MSD (though not by as much as we predicted).
¡ñ What this tells us: The JIT was somehow able to massively optimize the
compareTo calls.
¡ð Makes some intuitive sense: Comparing ¡°AAA…A¡± to ¡°AAA…A¡± over
and over is redundant. I have no idea what it did specifically.
N
Runtime
# chars
1,000
0.05
1,000,800
10,000
1.41
13,801,600
100,000
16.58
170,780,800
N
Runtime
# chars
1,000
0.04
100,000
10,000
0.4
1,000,000
100,000
5.90
10,000,000
Merge
MSD
datastructur.es
… So Which is Better? MSD or MergeSort?
We showed that if the JIT is enabled, merge sort is much faster for the case of
equal strings, and slower if JIT is disabled.
¡ñ Since JIT is usually on, I¡¯d say merge sort is better for this case.
Many other possible cases to consider:
¡ñ Almost equal strings (maybe the trick used by the JIT won¡¯t work?).
¡ñ Randomized strings.
¡ñ Real world data from some dataset of interest.
See code in lectureCode repo if you want to try running experiments yourself.
¡ñ In real world applications, you¡¯d profile different implementations on real data and pick the best one.
datastructur.es
Bottom Line: Algorithms Can Be Hard to Compare
Comparing algorithms that have the same order of growth is challenging.
¡ñ Have to perform computational experiments.
¡ñ In modern programming environments, experiments can be tricky due to
optimizations like the JIT in Java.
Note: There¡¯s always the chance that some small optimization to an algorithm can make it significantly faster.
¡ñ Example: Change to Quicksort suggested by Vladimir Yaroslavskiy that we mentioned briefly in the quicksort lecture.
datastructur.es
JIT Compilers Are Always Evolving
The JIT is a fantastically complex and important piece of code.
¡ñ Active area of research and development in the field of compilers.
¡ñ The old JIT compiler called C2 is so complicated that its code base is AFAIK
being abandoned: ¡°However, C2 has been delivering diminishing returns in recent years and no major improvements have been implemented in the compiler in the last several years. Not only that, but the code in C2 has become very hard to maintain and extend, and it is very hard for any new engineer to get up to speed with the codebase, which is written in a specific dialect of C++.¡± (from this site)
¡ñ If you like this stuff, try taking CS164 and maybe even try to get involved in compiler research.
datastructur.es
Radix Sorting Integers (61C Preview)
datastructur.es
Sorting Integers
Wow videos looked like trash in 2007.
datastructur.es
Linear Time Sorting
As we¡¯ve seen, estimating radix sort vs. comparison sort performance is very hard.
¡ñ But in the very large N limit, it¡¯s easy. Radix sort is simply faster!
¡ð Treating alphabet size as constant, LSD Sort has runtime ¦¨(WN).
¡ð Comparison sorts have runtime ¦¨(N log N) in the worst case.
datastructur.es
Linear Time Sorting
As we¡¯ve seen, estimating radix sort vs. comparison sort performance is very hard.
¡ñ But in the very large N limit, it¡¯s easy. Radix sort is simply faster!
¡ð Treating alphabet size as constant, LSD Sort has runtime ¦¨(WN).
¡ð Comparison sorts have runtime ¦¨(N log N) in the worst case.
Issue: We don¡¯t have a charAt method for integers.
¡ñ How would you LSD radix sort an array of integers?
datastructur.es
Linear Time Sorting (Your Answers)
As we¡¯ve seen, estimating radix sort vs. comparison sort performance is very hard.
¡ñ But in the very large N limit, it¡¯s easy. Radix sort is simply faster!
¡ð Treating alphabet size as constant, LSD Sort has runtime ¦¨(WN).
¡ð Comparison sorts have runtime ¦¨(N log N) in the worst case.
Issue: We don¡¯t have a charAt method for integers.
¡ñ How would you LSD radix sort an array of integers?
¡ð Mods and floorMods and division operations and whatever ¡ú write
yourself a getDigit method for integer.
¡ö This approach will be faster and more general.
¡ð Make them into a String. We get an alphabet size 10 String. ¡ö REductions style approach.
datastructur.es
Linear Time Sorting (My Answer)
As we¡¯ve seen, estimating radix sort vs. comparison sort performance is very hard.
¡ñ But in the very large N limit, it¡¯s easy. Radix sort is simply faster!
¡ð Treating alphabet size as constant, LSD Sort has runtime ¦¨(WN).
¡ð Comparison sorts have runtime ¦¨(N log N) in the worst case.
Issue: We don¡¯t have a charAt method for integers.
¡ñ How would you LSD radix sort an array of integers?
¡ð Could convert into a String and treat as a base 10 number. Since
maximum Java int is 2,000,000,000, W is also 10.
¡ð Could modify LSD radix sort to work natively on integers.
¡ö Instead of using charAt, maybe write a helper method like
getDthDigit(int N, int d). Example: getDthDigit(15009, 2) = 5.
datastructur.es
LSD Radix Sort on Integers
Note: There¡¯s no reason to stick with base 10!
¡ñ Could instead treat as a base 16, base 256, base 65536 number.
Example: 512,312 in base 16 is a 5 digit number:
¡ñ 51231210 = (7 x 164) + (13 x 163) + (1 x 162) + (3 x 161) + (8 x 160)
Note this digit is greater than 9! That¡¯s OK, because we¡¯re in base 16.
Example: 512,312 in base 256 is a 3 digit number:
¡ñ 51231210 = (7 x 2562) + (209 x 2561) + (56 x 2560)
Note these digit are greater than 9! That¡¯s OK, because we¡¯re in base 256.
datastructur.es
Relationship Between Base and Max # Digits
For Java integers:
¡ñ R=10, treat as a base 10 number. Up to 10 digits.
¡ñ R=16, treat as a base 16 number. Up to 8 digits.
¡ñ R=256, treat as a base 256 number. Up to 4 digits.
¡ñ R=65336, treat as a base 65536 number. Up to 2 digits.
¡ñ R=2147483647, treat as a base 2147483647 number (this is equivalent to
counting sort). Has exactly 1 digit.
Interesting fact: Runtime depends on the alphabet size.
¡ñ As we saw with city sorting last time, R = 2147483647 will result in a very
slow radix sort (since it¡¯s just counting sort).
datastructur.es
Another Computational Experiment
Results of a computational experiment:
¡ñ Treating as a base 256 number (4 digits), LSD radix sorting integers easily defeats Quicksort.
Sort
Base
# of Digits
Runtime
Java QuickSort
N/A
N/A
10.9 seconds
LSD Radix Sort
2^4 = 16
8
3.6 seconds
LSD Radix Sort
2^8 = 256
4
2.28 seconds
LSD Radix Sort
2^16 = 65536
2
3.66 seconds
LSD Radix Sort
2^30 = 1073741824
2
20 seconds
Sorting 100,000,000 integers
datastructur.es
Sorting Summary
datastructur.es
Sorting Landscape
Below, we see the landscape of the sorting algorithms we¡¯ve studied.
¡ñ Three basic flavors: Comparison, Alphabet, and Radix based.
¡ñ Each can be useful in different circumstances, but the important part was
the analysis and the deep thought!
¡ð Hoping to teach you how to approach problems in general.
Comparison Based Sorting Algorithms:
If heapify first Heapsort
Radix Sorting Algorithms: (require a sorting subroutine)
Counting
Selection
Merge
LSD
MSD
Partition
Insertion
If insert into BST, equiv. to
Small-Alphabet (e.g. Integer) Sorting Algorithms: Counting
datastructur.es
Sorting vs. Searching
We¡¯ve now concluded our study of the ¡°sort problem.¡±
¡ñ During the data structures part of the class, we studied what we called the ¡°search problem¡±: Retrieve data of interest.
¡ñ There are some interesting connections between the two.
Name
Storage Operation(s)
Primary Retrieval Operation
Retrieve By:
List
add(key)
insert(key, index)
get(index)
index
Map
put(key, value)
get(key)
key identity
Set
add(key)
containsKey(key)
key identity
PQ
add(key)
getSmallest()
key order (a.k.a. key size)
Disjoint Sets
connect(int1, int2)
isConnected(int1, int2)
two int values
Partial list of search problem data structures.
datastructur.es
Search-By-Key-Identity Data Structures:
2-3 Tree
RedBlack
Set
Map
Separate Chaining
Searches using compareTo() Analogous to Comparison-Based
Searches using hashCode() and equals() Roughly Analogous to Integer Sorting
Tries
Searches digit-by-digit
Roughly Analogous to Radix Sorting
Comparison Based Sorting Algorithms:
If heapify first Heapsort
Selection
Merge
Partition
Insertion
If insert into BST, equiv. to
Radix Sorting Algorithms: (require a sorting subroutine)
Counting
LSD
MSD
Small-Alphabet (e.g. Integer) Sorting Algorithms: Counting
datastructur.es
Going Even Further
There¡¯s plenty more to explore!
Many of these ideas can be mixed and matched with others. Examples:
¡ñ What if we use quicksort as a subroutine for MSD radix sort instead of counting sort?
¡ñ Implementing the comparable interface means an object can be stored in our compareTo-based data structures (e.g. TreeSet), or sorted with our comparison based sorts. Is there a single equivalent interface that would allow storage in a trie AND radix sorting? What would that interface look like?
¡ñ If an object has both digits AND is comparable, could we somehow use an LLRB to improve radix sort in some way?
datastructur.es
As an Example of Mixing and Matching [came up in Live Q&A]
Modern Java Hash maps are quite interesting in how they store buckets.
¡ñ Small buckets are just linked lists (doesn¡¯t really help to have theta(log(N)) performance for your bucket vs theta(1) if the number of items is small).
¡ñ Once a bucket gets kind of full, it converts the bucket into a Tree of the items in the HashMap.
¡ð If the items are comparable, this is trivial, you just insert them all into a red black tree.
¡ð If the items are not comparable, then we use the HashCode is the value upon which to compare, e.g. root has some hashCode, items on the left all have smaller hashCodes, items on the right have larger hashCodes.
¡ñ QUESTIONS:
¡ð Why not just do a tree? Trees are fast for small AND large buckets.
¡ð Why bother using the item¡¯s Comparable method instead of hashCode? datastructur.es