12_Merge_Sort
EECS 281:
Data Structures
and Algorithms
Lecture 12
Merge Sort
Merge Sort
Data Structures & Algorithms
A Different Idea For Sorting
• Quicksort works by dividing a file into two
parts
– k smallest elements
– n – k largest elements
• Merge Sort combines two ordered files to
make one larger ordered file
3
4
Comparing Quicksort to Merge sort
Algorithm quicksort(array)
partition(array)
quicksort(lefthalf)
quicksort(righthalf)
Algorithm merge_sort(array)
merge_sort(lefthalf)
merge_sort(righthalf)
merge(lefthalf, righthalf)
• Much in common
• Top-down
approach
– Divide work
– Combine work
• Nothing gets
done until
recursive calls
get work done
Important Concerns For Sorting
External Sort
• File to be sorted is on tape or disk
• Items are accessed sequentially or in large blocks
Memory Efficiency
• Sort in place with no extra memory
• Sort in place, but have pointers to or indices
(n items need an additional n pointers or indices)
• Need enough extra memory for an additional
copy of items to be sorted
6
C++ Syntax: Ternary Operator
c[k] = (a[i] <= b[j]) ? a[i++] : b[j++];
1 if (a[i] <= b[j]) {
2 c[k] = a[i];
3 ++i;
4 } // if
5 else {
6 c[k] = b[j];
7 ++j;
8 } // else
is equivalent to:
7
1. conditional
2. do if true
3. do if false
8
Merging Sorted Ranges
1 void mergeAB(Item c[], Item a[], size_t size_a,
2 Item b[], size_t size_b) {
3 size_t i = 0, j = 0;
4 for (size_t k = 0; k < size_a + size_b; ++k) {
5 if (i == size_a)
6 c[k] = b[j++];
7 else if (j == size_b)
8 c[k] = a[i++];
9 else
10 c[k] = (a[i] <= b[j]) ? a[i++] : b[j++];
11 } // for
12 } // mergeAB()
• Append smallest remaining
item from a or b onto c until
all items are in c
• Q(size_a + size_b) time for
both arrays and linked lists
Example of mergeAB()
D D F G HA B
G H B D FA Da b
c
1 23 01 20
31 20 5 64
k = 6: (j = 3) == size_b ? Yes, i = 4
k = 0: (a[i = 0] = A) <= (b[j = 0] = B) ? Yes, i = 1
k = 1: (a[i = 1] = D) <= (b[j = 0] = B) ? No, j = 1
k = 2: (a[i = 1] = D) <= (b[j = 1] = D) ? Yes, i = 2
k = 3: (a[i = 2] = G) <= (b[j = 1] = D) ? No, j = 2
k = 4: (a[i = 2] = G) <= (b[j = 2] = F) ? No, j = 3
size_a = 4
size_b = 3
k = 5: (j = 3) == size_b ? Yes, i = 3
9
10
Top-down Merge Sort (Recursive)
1 void merge_sort(Item a[], size_t left, size_t right) {
2 if (right < left + 2) // base case: 0 or 1 item(s)
3 return;
4 size_t mid = left + (right - left) / 2;
5 merge_sort(a, left, mid); // [left, mid)
6 merge_sort(a, mid, right); // [mid, right)
7 merge(a, left, mid, right);
8 } // merge_sort()
• Prototypical ‘combine and conquer’ algorithm
• Recursively call until sorting array of size 0 or 1
• Then merge sorted lists larger and larger
• Is it safe to use recursion here?
Modified merge()
1 void merge(Item a[], size_t left, size_t mid, size_t right) {
2 size_t n = right - left;
3 vector
4
5 for (size_t i = left, j = mid, k = 0; k < n; ++k) {
6 if (i == mid)
7 c[k] = a[j++];
8 else if (j == right)
9 c[k] = a[i++];
10 else
11 c[k] = (a[i] <= a[j]) ? a[i++] : a[j++];
12 } // for
13
14 copy(begin(c), end(c), &a[left]);
15 } // merge()
11
Top-down Merge Sort
Advantages (compare to Quicksort)
• Fast: O(n log n)
• Stable when a stable merge is used (it’s
always used! See std::stable_sort<>)
• Normally implemented to access data
sequentially
– Does not require random access
– Great for linked lists, external-memory and
parallel sorting
12
Disadvantages
• Best case performance W(n log n)
is slower than some elementary sorts
– Insensitive to input
• Q(n) additional memory, while Quicksort
is in-place
– Also extra data movement to/from copy
• Slower than Quicksort on typical inputs
13
Top-down Merge Sort
14
Bottom-up Merge Sort
1 void merge_sortBU(Item a[], size_t left, size_t right) {
2 for (size_t size = 1; size <= right - left; size *= 2) 3 for (size_t i = left; i <= right - size; i += 2 * size) 4 merge(a, i, i + size, std::min(i + 2 * size, right)); 5 } // merge_sortBU() Prototypical ‘combine and conquer’ algorithm • View original file as n ordered sublists of size 1 • Perform 1-by-1 merges to produce n/2 ordered sublists of size 2 • Perform 2-by-2 merges to produce n/4 ordered sublists of size 4 • … Job Interview Questions Q: In a file with 100M elements, how would you find the most frequent element? Q: What is the time complexity? Q: What is the space complexity? 16 Merge Sort Data Structures & Algorithms Questions for Self-study • Can merge-sort (both versions) be implemented on linked lists? – How will this affect runtime complexity? – Can the merge step be done in-place? • Show that both merge-sorts are stable iff the merge step is stable • Why is the best-case complexity of merge- sort worse than linear? – How can it be improved? 18