Sorting
Sorting
Dr Timothy Kimber
January 2018
The Sorting Problem
The Sorting Problem
Sorting data is one of the most thoroughly explored computing
problems.
Problem (Sort)
Input: a sequence A of values 〈a1, a2, . . . , aN〉
Output: a permutation (reordering) 〈a′1, a
′
2 . . . , a
′
N〉 of A
such that a′1 ≤ a
′
2 ≤ · · · ≤ a
′
N
Sorting is an important problem. It is part of the solution to many
other problems.
Understanding the complexity of sorting algorithms helps design good
solutions to these other problems.
Algorithms (580) Sorting January 2018 2 / 32
Incremental Sorting Design
Incremental Sorting
The incremental approach of Simple Search can be applied to sorting.
Proceed from left
Gradually grow a sorted region (note loop invariant)
EXERCISE
Invent an incremental sorting algorithm.
Algorithms (580) Sorting January 2018 3 / 32
Incremental Sorting Design
Incremental Sorting
There are two options:
1 Add the next element a[i] to the sorted region
2 Add the lowest element outside the sorted region
Option 1 leads to the Insertion Sort algorithm
Algorithms (580) Sorting January 2018 4 / 32
Incremental Sorting Design
Insertion Sort
Insertion Sort divides a into a sorted part, initially just a[0], and the
remaining unsorted part
Elements from the unsorted part are then inserted into the sorted part
Algorithms (580) Sorting January 2018 5 / 32
Incremental Sorting Insertion Sort
Insertion Sort
Insertion Sort(Input: sequence a = [a0, . . . , aN−1])
For i = 1 to N // initialise invariant
next = a[i] // save. a[i] overwritten later
j = i
While j > 0 and next < a[j-1] // use invariant
a[j] = a[j-1]
j = j - 1
EndWhile
a[j] = next
EndFor
The sorted region can be initialised to contain a[0]
Do not need to compare next with all a[0,..,i-1] (sorted)
Algorithms (580) Sorting January 2018 6 / 32
Incremental Sorting Performance
Time Complexity
What is the worst case input?
What is the best case input?
What is the time complexity in the best and worst cases?
Algorithms (580) Sorting January 2018 7 / 32
Incremental Sorting Performance
Worst Case
Running time of Insertion Sort has two dimensions:
Number of insertions
‘Size’ of insertion
Informally: both dimensions are Θ(N), so T (N) = Θ(N2)
Algorithms (580) Sorting January 2018 8 / 32
Incremental Sorting Performance
Worst Case
More formally, the total number of iterations of the inner loop is
t(N) = 1 + 2 + · · ·+ (N − 1) =
N−1∑
i=1
i
=
(N − 1 + 1)× (N − 1)
2
=
N2 − N
2
Algorithms (580) Sorting January 2018 9 / 32
Incremental Sorting Performance
Worst Case
So, the worst case time complexity is aN2 + bN + c = Θ(N2)
Algorithms (580) Sorting January 2018 10 / 32
Incremental Sorting Performance
Best Case
In the best case
t(N) = N − 1
So, T (N) = aN + b = Θ(N)
Algorithms (580) Sorting January 2018 11 / 32
Incremental Sorting Performance
Other Properties
Insertion Sort works in place (space complexity is Θ(1))
Assuming randomised input, average case is T (N) = Θ(N2)
For any input T (N) = O(N2)
Algorithms (580) Sorting January 2018 12 / 32
Divide and Conquer Sorting Design
Divide and Conquer
Will a divide and conquer approach work?
Divide into subproblems
Solve the subproblems
Combine into overall solution
EXERCISE
Design a combining algorithm.
Algorithms (580) Sorting January 2018 13 / 32
Divide and Conquer Sorting Combining Sorted Sequences
Combining Sorted Sequences
Merge (Input: array a, indices l , m and r , where r > m ≥ l)
left = a[l,…,m-1]
right = a[m,…,r-1]
i = j = 0, k = l
while k < r if (i > (m-l)) or (j < (r-m) and right[j] < left[i]) a[k] = right[j] j = j + 1 else a[k] = left[i] i = i + 1 end k = k + 1 end The procedure takes Θ(N) time for N total elements Algorithms (580) Sorting January 2018 15 / 32 Divide and Conquer Sorting Predicted Performance Divide and Conquer Time to combine subproblem solutions is Θ(N) EXERCISE What is worst case time complexity of divide and conquer algorithm? Write recurrence (assume N = 2a so no floors) Solve using master theorem Algorithms (580) Sorting January 2018 16 / 32 Divide and Conquer Sorting Predicted Performance Time Complexity The proposed algorithm divides the problem (in constant time) into 2 subproblems of size N/2, solves both, and combines the solutions in Θ(N) time, so the time complexity is: T (N) = { Θ(1) , if N = 1 2T (N/2) + Θ(N) , if N > 1
So, N logb a = N log2 2 = N1 = N, and therefore
f (N) = Θ(N logb a) = Θ(N logb a × log02N)
and Case 2, with k = 0, applies.
T (N) = Θ(N logb a log12N) = Θ(N log2N)
The divide and conquer algorithm is faster than Insertion Sort. Surprised?
Algorithms (580) Sorting January 2018 17 / 32
Divide and Conquer Sorting Predicted Performance
Time Complexity
Alternative informal view of time complexity: recursion tree
Each level of the tree contributes cN
There are log2N + 1 levels
Algorithms (580) Sorting January 2018 18 / 32
Divide and Conquer Sorting MergeSort
MergeSort
You have invented Mergesort
MergeSort (Input: array a, index l , index r)
if r – l < 2 return m = (l + r) / 2 MergeSort(a, l, m) MergeSort(a, m, r) Merge(a,l,m,r) return The sorting appears to be happening in place, but the list is copied during Merge What is the best case? Algorithms (580) Sorting January 2018 19 / 32 Divide and Conquer Sorting Properties of Merge Sort Properties of Merge Sort Space complexity is Θ(N) Time complexity is Θ(N log2N) Faster than Insertion Sort for large, unsorted lists Slower than Insertion Sort if the list is already sorted Slower than Insertion Sort for small N Algorithms (580) Sorting January 2018 20 / 32 Quicksort Design Alternative Divide and Conquer Merge Sort divides the data in half and sorts the halves Alternative approach: do a quick (rough) sort into two groups ap is called the pivot and the division ensures that ∀a ∈ Alow (a < ap) and ∀a ∈ Ahigh(a ≥ ap) Left with subproblems of sorting Alow and Ahigh No combining needed Algorithms (580) Sorting January 2018 21 / 32 Quicksort Algorithm Quicksort This procedure sorts the array a[l , ..., r − 1] Quicksort (Input: array a, index l , index r) if r < l + 2 // 0 or 1 elements to sort return p = Partition(a, l, r) Quicksort(a, l, p) Quicksort(a, p + 1, r) return The Quicksort divide step is called partitioning The Partition procedure must return the final index of the pivot The base case must work for an empty array Algorithms (580) Sorting January 2018 22 / 32 Quicksort Algorithm Suggested Partition Design Use last element as the pivot Maintain two subarrays which grow Elements before j must be less than pivot Elements j,...,i-1 must be equal or greater than the pivot Elements i,... are unseen so far Tutorial Exercise Write the Partition procedure. Algorithms (580) Sorting January 2018 23 / 32 Quicksort Algorithm Lomuto Partitioning Algorithms (580) Sorting January 2018 24 / 32 Quicksort Algorithm Partition This procedure partitions the array a[l , ..., r − 1] Partition (Input: array a, index l , index r) i = j = l // both partitions are empty p = r - 1 while i < p if a[i] < a[p] swap(a, i, j) j = j + 1 i = i + 1 swap(a, p, j) return j The time complexity is Θ(N) (where N = r − l) Algorithms (580) Sorting January 2018 25 / 32 Quicksort Performance Quicksort Performance Question What is the worst case time complexity of Quicksort. And why? Algorithms (580) Sorting January 2018 26 / 32 Quicksort Performance Quicksort Worst Case The given partition procedure: will only remove one element from sorted or reverse-sorted data will only remove one element from data with many duplicates This leads to incremental execution resembling insertion sort N levels of recursion N − i elements to partition at level i So worst case time complexity is Θ(N2) Algorithms (580) Sorting January 2018 27 / 32 Quicksort Performance Quicksort Best Case Fewest levels of recursion when the partitioning is balanced Subproblems are no larger than N/2 Same bound as Mergesort: Θ(N log2N) As recursion tree suggests, constants smaller than Mergesort Algorithms (580) Sorting January 2018 28 / 32 Quicksort Performance Quicksort Performance With rather unbalanced partitioning performance is still O(N log2N) Any dividing factor will introduce a logN term e.g. 9 to 1 (Cormen p. 176) Algorithms (580) Sorting January 2018 29 / 32 Quicksort Performance Randomised Quicksort If the partition procedure is altered to choose the pivot at random, then the worst case behaviour becomes a matter of chance for all inputs. Partition (Input: array a, index l , index r) x = random(l, r) // random integer in [l,r-1] swap(a, x, r-1) ... as before Assuming N distinct values: The probability of choosing the worst pivot in every call is 1/N! This becomes vanishingly small as N increases Randomised Quicksort is algorithm of choice if N more than ∼ 10 Algorithms (580) Sorting January 2018 30 / 32 Quicksort Randomised Quicksort Expected Performance Possible to give average case time complexity for Randomised Quicksort Assume N distinct values again Randomisation means all inputs equally likely Time depends on number of comparisons performed Probability of comparing a[i] with a[j] determined by their rank by value (see books for full explanation) Average number of comparisons is sum of probabilities for all i,j Average case complexity is Θ(N log2N) This is called expected running time for randomised algorithm Algorithms (580) Sorting January 2018 31 / 32 Quicksort Other Partitions Partition Variations There are many ways to implement partitioning. e.g. Hoare paritioning Partitions grow inwards from end Handles duplicates better Three-way paritioning Includes a region for values equal to pivot Handles duplicates better Median of 3 Choose pivot as median of three random elements Better balance between subproblems Algorithms (580) Sorting January 2018 32 / 32 The Sorting Problem Incremental Sorting Design Insertion Sort Performance Divide and Conquer Sorting Design Combining Sorted Sequences Predicted Performance MergeSort Properties of Merge Sort Quicksort Design Algorithm Performance Randomised Quicksort Other Partitions