CS计算机代考程序代写 Java Excel data structure cache algorithm Announcements

Announcements
We are now in Phase 3 of the course Algorithms and Software Engineering. Lectures in this phase:
¡ñ Algorithms.
¡ñ 3 software engineering lectures (we already did #1).
Optional textbook for software engineering lectures: ¡°A Philosophy of Software Design¡± by John Ousterhout.
datastructur.es

Announcements
Phase 3 of the course starts today: Algorithms and Software Engineering. Assignments in this phase:
¡ñ Finish Gitlet.
¡ð First chance at software engineering and a little design.
¡ñ Project 3: Build Your Own World.
¡ð Second chance to do some software engineering.
¡ð Lots more design practice.
¡ð You¡¯ll decide your own task and approach.
¡ð Partner project, so try to find a partner soon.
datastructur.es

CS61B, 2021
Lecture 29: Sorting I
¡ñ The Sorting Problem
¡ñ Selection Sort, Heapsort
¡ñ Mergesort
¡ñ Insertion Sort
¡ñ Shell¡¯s Sort (Extra)
datastructur.es

Our Major Focus for Several Lectures: Sorting
For many of our remaining lectures, we¡¯ll discuss the sorting problem. ¡ñ Informally: Given items, put them in order.
This is a useful task in its own right. Examples:
¡ñ Equivalent items are adjacent, allowing rapid duplicate finding.
¡ñ Items are in increasing order, allowing binary search.
¡ñ Can be converted into various balanced data structures (e.g. BSTs, KdTrees).
Also provide interesting case studies for how to approach basic computational problems.
¡ñ Some of the solutions will involve using data structures we¡¯ve studied.
datastructur.es

Sorting – Definitions (from Knuth¡¯s TAOCP)
An ordering relation < for keys a, b, and c has the following properties: ¡ñ Law of Trichotomy: Exactly one of a < b, a = b, b < a is true. ¡ñ Law of Transitivity: If a < b, and b < c, then a < c. An ordering relation with the properties above is also known as a ¡°total order¡±. A sort is a permutation (re-arrangement) of a sequence of elements that puts the keys into non-decreasing order relative to a given ordering relation. ¡ñ x1 ¡Üx2¡Üx3¡Ü...¡ÜxN datastructur.es Example: String Length Example of an ordering relation: The length of strings. ¡ñ Law of Trichotomy: Exactly one of the following is true: ¡ð len(a) < len(b) ¡ð len(a) = len(b) ¡ð len(b) < len(a) ¡ñ Law of Transitivity: If len(a) < len(b) and len(b) < len(c), then len(a) < len(c). Two valid sorts for [¡°cows¡±, ¡°get¡±, ¡°going¡±, ¡°the¡±] for the ordering relation above: ¡ñ [¡°the¡±, ¡°get¡±, ¡°cows¡±, ¡°going¡±] ¡ñ [¡°get¡±, ¡°the¡±, ¡°cows¡±, ¡°going¡±] = under the relation, not the Java idea of .equals Under this relation, ¡°the¡± is considered = to ¡°get¡±, since len(¡°the¡±) = len(¡°get¡±). datastructur.es Java Note Ordering relations are typically given in the form of compareTo or compare methods. import java.util.Comparator; public class LengthComparator implements Comparator { public int compare(String x, String b) {
return x.length() – b.length(); }
}
Note that with respect to the order defined by the method above ¡°the¡± = ¡°get¡±. ¡ñ This usage of = is not the same as the equals given by the String method.
datastructur.es

Sorting: An Alternate Viewpoint
An inversion is a pair of elements that are out of order with respect to <. 0 1 1 2 3 4 8 6 9 57 8-6 8-5 8-7 6-5 9-5 9-7 (6 inversions out of 55 max) Yoda Gabriel Cramer Another way to state the goal of sorting: ¡ñ Given a sequence of elements with Z inversions. ¡ñ Perform a sequence of operations that reduces inversions to 0. datastructur.es Performance Definitions Characterizations of the runtime efficiency are sometimes called the time complexity of an algorithm. Example: ¡ñ Dijkstra¡¯s has time complexity O(E log V). Characterizations of the ¡°extra¡± memory usage of an algorithm is sometimes called the space complexity of an algorithm. ¡ñ Dijkstra¡¯s has space complexity ¦¨(V) (for queue, distTo, edgeTo). ¡ð Note that the graph takes up space ¦¨(V+E), but we don¡¯t count this as part of the space complexity of Dijkstra since the graph itself already exists and is an input to Dijkstra¡¯s. datastructur.es Selection Sort and Heapsort datastructur.es Selection Sort We¡¯ve seen this already. ¡ñ Find smallest item. ¡ñ Swap this item to the front and ¡®fix¡¯ it. ¡ñ Repeat for unfixed items until all items are fixed. ¡ñ Demo: https://goo.gl/g14Cit Sort Properties: ¡ñ ¦¨(N2) time if we use an array (or similar data structure). Seems inefficient: We look through entire remaining array every time to find the minimum. datastructur.es Naive Heapsort: Leveraging a Max-Oriented Heap Idea: Instead of rescanning entire array looking for minimum, maintain a heap so that getting the minimum is fast! For reasons that will become clear soon, we¡¯ll use a max-oriented heap. Naive Heapsort Demo: https://goo.gl/EZWwSJ datastructur.es A min heap would work as well, but wouldn¡¯t be able to take advantage of the fancy trick in a few slides. Naive heapsorting N items: ¡ñ Insert all items into a max heap, and discard input array. Create output array. ¡ñ Repeat N times: ¡ð Delete largest item from the max heap. ¡ð Put largest item at the end of the unused part of the output array. Naive Heapsort Runtime: http://yellkey.com/couple Naive heapsorting N items: ¡ñ Insert all items into a max heap, and discard input array. Create output array. ¡ñ Repeat N times: ¡ð Delete largest item from the max heap. ¡ð Put largest item at the end of the unused part of the output array. What is the TOTAL runtime of naive heapsort? A. ¦¨(N) B. ¦¨(N log N) C. ¦¨(N2), but faster than selection sort. datastructur.es Heapsort Runtime Analysis Use the magic of the heap to sort our data. ¡ñ Getting items into the heap O(N log N) time. ¡ñ Selecting largest item: ¦¨(1) time. ¡ñ Removing largest item: O(log N) for each removal. Overall runtime is O(N log N) + ¦¨(N) + O(N log N) = O(N log N) ¡ñ Far better that selection sort! Memory usage is ¦¨(N) to build the additional copy of all of our data. ¡ñ Worse than selection sort, but probably no big deal (??). ¡ñ Can eliminate this extra memory cost with same fancy trickery. datastructur.es In-place Heapsort Alternate approach, treat input array as a heap! ¡ñ Rather than inserting into a new array of length N + 1, use a process known as ¡°bottom-up heapification¡± to convert the array into a heap. ¡ð To bottom-up heapify, just sink nodes in reverse level order. ¡ñ Avoids need for extra copy of all data. ¡ñ Once heapified, algorithm is almost the same as naive heap sort. 32 15 2 17 19 26 41 17 17 41 19 32 17 15 26 2 17 17 heapification In-place heap sort: Demo datastructur.es For this algorithm we don¡¯t leave spot 0 blank. In-place Heapsort Runtime: http://yellkey.com/response Use the magic of the heap to sort our data. ¡ñ Bottom-up Heapification: O(???) time. ¡ñ Selecting largest item: ¦¨(1) time. ¡ñ Removing largest item: O(log N) for each removal. Give the time complexity of in-place heapsort in big O notation. A. O(N) B. O(N log N) C. O(N2) datastructur.es In-place Heapsort Runtime Use the magic of the heap to sort our data. ¡ñ Bottom-up Heapification: O(N log N) time. ¡ñ Selecting largest item: ¦¨(1) time. ¡ñ Removing largest item: O(log N) for each removal. Give the time complexity of in-place heapsort in big O notation. A. O(N log N) Bottom-up heapification is N sink operations, each taking no more than O(log N) time, so overall runtime for heapification is O(N log N). ¡ñ Extra for experts, show that bottom-up heapification is ¦¨(N) in the worst case. ¡ñ More extra for experts, show heapsort is ¦¨(N log N) in the worst case. datastructur.es In-place Heapsort: http://yellkey.com/tell What is the memory complexity of Heapsort? ¡ñ Also called ¡°space complexity¡±. A. ¦¨(1) B. ¦¨(log N) C. ¦¨(N) D. ¦¨(N log N) E. ¦¨(N2) datastructur.es In-place Heapsort: http://yellkey.com/tell What is the memory complexity of Heapsort? ¡ñ Also called ¡°space complexity¡±. A. ¦¨(1) B. ¦¨(log N) C. ¦¨(N) D. ¦¨(N log N) E. ¦¨(N2) Incorrect answer given by student during lecture: ¦¨(N): Creating N spots for a min heap. ¡ñ Actually I¡¯m not, I¡¯m reusing the same array that I was given. In other words, the algorithm is in-place. datastructur.es In-place Heapsort What is the memory complexity of Heapsort? ¡ñ Also called ¡°space complexity¡±. A. ¦¨(1) B. ¦¨(log N) C. ¦¨(N) D. ¦¨(N log N) E. ¦¨(N2) The only extra memory we need is a constant number instance variables, e.g. size. ¡ñ Unimportant caveat: If we employ recursion to implement various heap operations, space complexity is ¦¨(log N) due to need to track recursive calls. The difference between ¦¨(log N) and ¦¨(1) space is effectively nothing. datastructur.es Sorts So Far Best Case Runtime Worst Case Runtime Space Demo Notes Selection Sort ¦¨(N2) ¦¨(N2) ¦¨(1) Link Heapsort (in place) ¦¨(N)* ¦¨(N log N) ¦¨(1) Link Bad cache (61C) performance. *: An array of all duplicates yields linear runtime for heapsort. datastructur.es Mergesort datastructur.es Mergesort We¡¯ve seen this one before as well. Mergesort: Demo ¡ñ Split items into 2 roughly even pieces. ¡ñ Mergesort each half (steps not shown, this is a recursive algorithm!) ¡ñ Merge the two sorted halves to form the final result. Time complexity, analysis from previous lecture: ¦¨(N log N runtime) ¡ñ Space complexity with aux array: Costs ¦¨(N) memory. Also possible to do in-place merge sort, but algorithm is very complicated, and runtime performance suffers by a significant constant factor. datastructur.es Sorts So Far Best Case Runtime Worst Case Runtime Space Demo Notes Selection Sort ¦¨(N2) ¦¨(N2) ¦¨(1) Link Heapsort (in place) ¦¨(N)* ¦¨(N log N) ¦¨(1) Link Bad cache (61C) performance. Mergesort ¦¨(N log N) ¦¨(N log N) ¦¨(N) Link Faster than heap sort. *: An array of all duplicates yields linear runtime for heapsort. datastructur.es Insertion Sort datastructur.es Insertion Sort General strategy: ¡ñ Starting with an empty output sequence. ¡ñ Add each item from input, inserting into output at right point. Naive approach, build entirely new output: Demo (Link) Input: Output: 32 15 2 17 19 26 41 17 17 datastructur.es Insertion Sort General strategy: ¡ñ Starting with an empty output sequence. ¡ñ Add each item from input, inserting into output at right point. Naive approach, build entirely new output: Demo (Link) Input: Output: 32 15 2 17 19 26 41 17 17 2 15 17 17 17 19 26 32 41 datastructur.es Insertion Sort General strategy: ¡ñ Starting with an empty output sequence. ¡ñ Add each item from input, inserting into output at right point. For naive approach, if output sequence contains k items, worst cost to insert a single item is k. ¡ñ Might need to move everything over. More efficient method: ¡ñ Do everything in place using swapping: Demo (Link) datastructur.es In-place Insertion Sort Two more examples. 7 swaps: 36 swaps: S O R T E X A M P LE S O R T E X A MP LE O S R TE XA M P L E O R S TE X A M P L E O R S T E X A M PL E E O R S T X A M PL E E O R S T X A M PL E A E O R S T X M PL E A E M OR S T X P L E A E M O P R S T X LE A E L MO P R S T X E A E E L M O P R ST X P O T A TO P O T AT O O P TATO O P T ATO A OP T T O A O PT T O A O O PT T (0 swaps) (1 swap ) (0 swaps) (3 swaps) (0 swaps) (3 swaps) (0 swaps) (1 swap ) (1 swap ) (0 swaps) (4 swaps) (0 swaps) (6 swaps) (5 swaps) (4 swaps) (7 swaps) (8 swaps) that we¡¯re moving left (with swaps). that got swapped with purple. Purple: Element Black: Elements Grey: Not considered this iteration. datastructur.es Insertion Sort Runtime: http://yellkey.com/arrive What is the runtime of insertion sort? A. ¦¸(1), O(N) B. ¦¸(N), O(N) C. ¦¸(1), O(N2) D. ¦¸(N), O(N2) E. ¦¸(N2), O(N2) 36 swaps: datastructur.es Insertion Sort Runtime What is the runtime of insertion sort? A. ¦¸(1), O(N) B. ¦¸(N), O(N) C. ¦¸(1), O(N2) D. ¦¸(N), O(N2) E. ¦¸(N2), O(N2) You may recall ¦¸ is not ¡°best case¡±. So technnnniically you could also say ¦¸(1) 36 swaps: datastructur.es Picking the Best Sort: http://yellkey.com/machine Suppose we do the following: ¡ñ Read 1,000,000 integers from a file into an array of length 1,000,000. ¡ñ Mergesort these integers. ¡ñ Select one integer randomly and change it. ¡ñ Sort using algorithm X of your choice. Which sorting algorithm would be the fastest choice for X? A. Selection Sort B. Heapsort C. Mergesort D. Insertion Sort datastructur.es Observation: Insertion Sort on Almost Sorted Arrays For arrays that are almost sorted, insertion sort does very little work. ¡ñ Left array: 5 inversions, so only 5 swaps. ¡ñ Right array: 3 inversion, so only 3 swaps. datastructur.es Picking the Best Sort (Poll Everywhere) Suppose we do the following: ¡ñ Read 1,000,000 integers from a file into an array of length 1,000,000. ¡ñ Mergesort these integers. ¡ñ Select one integer randomly and change it. ¡ñ Sort using algorithm X of your choice. ¡ñ In the worst case, we have 999,999 inversions: ¦¨(N) inversions. Which sorting algorithm would be the fastest choice for X? Worst case run-times: A. Selection Sort: ¦¨(N2) B. Heapsort: ¦¨(N log N) C. Mergesort: ¦¨(N log N) D. Insertion Sort: ¦¨(N) datastructur.es Insertion Sort Sweet Spots On arrays with a small number of inversions, insertion sort is extremely fast. ¡ñ One exchange per inversion (and number of comparisons is similar). Runtime is ¦¨(N + K) where K is number of inversions. ¡ñ Define an almost sorted array as one in which number of inversions ¡Ü cN for some c. Insertion sort is excellent on these arrays. Less obvious: For small arrays (N < 15 or so), insertion sort is fastest. ¡ñ More of an empirical fact than a theoretical one. ¡ñ Theoretical analysis beyond scope of the course. ¡ñ Rough idea: Divide and conquer algorithms like heapsort / mergesort spend too much time dividing, but insertion sort goes straight to the conquest. ¡ñ The Java implementation of Mergesort does this (Link). datastructur.es Sorts So Far Best Case Runtime Worst Case Runtime Space Demo Notes Selection Sort ¦¨(N2) ¦¨(N2) ¦¨(1) Link Heapsort (in place) ¦¨(N)* ¦¨(N log N) ¦¨(1) Link Bad cache (61C) performance. Mergesort ¦¨(N log N) ¦¨(N log N) ¦¨(N) Link Fastest of these. Insertion Sort (in place) ¦¨(N) ¦¨(N2) ¦¨(1) Link Best for small N or almost sorted. datastructur.es Shell¡¯s Sort (Extra) (Not on Exam) datastructur.es Optimizing Insertion Sort: Shell¡¯s Sort Big idea: Fix multiple inversions at once. ¡ñ Instead of comparing adjacent items, compare items that are one stride length h apart. ¡ñ Start with large stride, and decrease towards 1. ¡ñ Example: h = 7, 3, 1. Optimizing Insertion Sort: Shell¡¯s Sort h=1 is just insertion sort. 7-sorting: S O R TE M O R TE M O R TE MO L TE MOL E E 3-sorting: M O LEE E OL M E E E LM O E E LMO A EL E O A E LEO A E LEO A E LEO A E LEO 3 swaps X A M P LE X A S PLE X A S P LE X AS P R E X A S PR T 5 swaps X AS P RT X AS P RT XA S PR T X AS PR T X M SP R T X M S P RT P MS X R T P MS X R T P MS X R T (1 swap, > 1 inversion)
(0 swaps)
(1 swap, > 1 inversion)
(1 swap, > 1 inversion)
1-sorting: 6 swaps
A E L E O P MS XRT
A E L E O PM SX RT A E L E O PM SX RT A E L EO PMS X RT A E E L O PMS X RT A E E L O PMS XR T A E E L O P MS XR T A E E L M O P S XRT A E E L M O P S XRT A E E L M O PS X RT AEELMO P R S X T A E E L M O PR S T X
(1 swap, >1 inversions)
(1 swap, >1 inversions)
(0 swaps,
(2 swaps,
(0 swaps,
(1 swaps,
(0 swaps,
(0 swaps,
0 inversions)
>1 inversions)
0 inversions)
>1 inversions)
0 inversions)
0 inversions)
Total swaps: 14 (vs. 31)
Example from Algorithms 4th Edition

Shell¡¯s Sort: Generalization and Performance
h=1 is just normal insertion sort.
¡ñ By using large strides first, fixes most of the inversions.
We used 7, 3, 1. Can generalize to 2k – 1 from some k down to 1.
¡ñ Requires ¦¨(N1.5) time in the worst case (see CS170).
¡ñ Other stride patterns can be faster.

Sorts So Far
Best Case Runtime
Worst Case Runtime
Space
Demo
Notes
Selection Sort
¦¨(N2)
¦¨(N2)
¦¨(1)
Link
Heapsort (in place)
¦¨(N)*
¦¨(N log N)
¦¨(1)
Link
Bad cache (61C) performance.
Mergesort
¦¨(N log N)
¦¨(N log N)
¦¨(N)
Link
Fastest of these.
Insertion Sort
(in-place)
¦¨(N)
¦¨(N2)
¦¨(1)
Link
Best for small N or almost sorted.
Shell¡¯s Sort
¦¨(N)
¦¸(N log N), O(?)
¦¨(1)
N/A
Rich theory!
An earlier version of this slide assumed that Heapsort was always given an array with no duplicate elements. If we omit this assumption, heapsort¡¯s best case is ¦¨(N).
datastructur.es

Citations
Lego man:
http://thetechnicgear.com/wp-content/uploads/2014/02/sorting-lego.jpg
Keynote Animations of Algorithms courtesy of Kevin Wayne
datastructur.es