COMP20007 Design of Algorithms
Input Enhancement Part 1: Distribution Sorting
Daniel Beck
Lecture 17
Semester 1, 2020
1
Simple Distribution Sort
1406532
2
Simple Distribution Sort
Looks Θ(n) even in worst case! Is it really?
1406532
2
Simple Distribution Sort
10 41 02 64 53 39 27
3
Simple Distribution Sort
Θ(n + k) worst case.
10 41 02 64 53 39 27
3
Simple Distribution Sort
10 41 02 10 41 10 10
4
Simple Distribution Sort
Use the auxiliary array to store counts.
10 41 02 10 41 10 10
4
Counting Sort
633810879253531876512153
5
Counting Sort
633810879253531876512153
Key 0123456789 Counts 1425042231
5
Counting Sort
633810879253531876512153
Key 0123456789 Counts 1425042231 Counts(shift) 01425042231
5
Counting Sort
633810879253531876512153
Key 0123456789 Counts 1425042231 Counts(shift) 01425042231
Key 0123456789 Counts(acc) 0 1 5 712121618202324
5
Counting Sort
633810879253531876512153
Key 0123456789 Counts 1425042231 Counts(shift) 01425042231
Key 0123456789 Counts(acc) 0 1 5 712121618202324
011112233333555566778889
5
Counting Sort
function Counting Sort(A[0..n − 1]) for j ← 0 to k do
C[j] ← 0
for i ← 0 to n − 1 do
C[A[i] + 1] ← C[A[i] + 1] + 1 for j ← 1 to k do
C[j] = C[j] + C[j − 1]
for i ← 0 to n − 1 do B[C[A[i]]] ← A[i] C[A[i]] ← C[A[i]] + 1
return B[1..n]
◃ (shift)
6
Counting Sort
Questions!
7
Counting Sort
Questions!
• Stable? • In-place?
7
Counting Sort
8
Counting Sort
• Stable: Yes, but depends on how it is implemented.
8
Counting Sort
• Stable: Yes, but depends on how it is implemented. • In-place: No, requires Θ(n + k) memory.
8
Counting Sort
• Stable: Yes, but depends on how it is implemented. • In-place: No, requires Θ(n + k) memory.
Take-home message: Counting Sort only works for integer keys and it works best when the key range is small.
8
Bucket Sort
9
Bucket Sort
• Generalisation of Counting Sort.
9
Bucket Sort
• Generalisation of Counting Sort. • Split data into k buckets.
9
Bucket Sort
• Generalisation of Counting Sort. • Split data into k buckets.
• Sort each bucket separately.
9
Bucket Sort
• Generalisation of Counting Sort. • Split data into k buckets.
• Sort each bucket separately.
• Concatenate results.
9
Bucket Sort
• Generalisation of Counting Sort. • Split data into k buckets.
• Sort each bucket separately.
• Concatenate results.
If K is the maximum key value and k = K , then Bucket Sort becomes Counting Sort.
9
Bucket Sort
633810879253531876512153
10
Bucket Sort
633810879253531876512153
Bucket1(A[i]<3): 1021121 Bucket2(3<=A[i]<6): 335353553 Bucket3(A[i]>=6): 68879876
10
Bucket Sort
633810879253531876512153
Bucket1(A[i]<3): 1021121 Bucket2(3<=A[i]<6): 335353553 Bucket3(A[i]>=6): 68879876
SortedBucket1(A[i]<3): 0111122 SortedBucket2(3<=A[i]<6): 333335555 SortedBucket3(A[i]>=6): 66778889
10
Bucket Sort
633810879253531876512153
Bucket1(A[i]<3): 1021121 Bucket2(3<=A[i]<6): 335353553 Bucket3(A[i]>=6): 68879876
SortedBucket1(A[i]<3): 0111122 SortedBucket2(3<=A[i]<6): 333335555 SortedBucket3(A[i]>=6): 66778889
011112233333555566778889
10
Bucket Sort
function Bucket Sort(A[0..n − 1], k ) K ← max key value
for j ← 0 to k − 1 do
Initialise(B [j ]) for i ← 0 to n − 1 do
Insert(B [⌊k × A[i ]/K ⌋], A[i ]) for j ← 0 to k − 1 do
AuxSort(B [j ])
return Concatenate(B[0..k − 1])
11
Bucket Sort
function Bucket Sort(A[0..n − 1], k ) K ← max key value
for j ← 0 to k − 1 do
Initialise(B [j ]) for i ← 0 to n − 1 do
Insert(B [⌊k × A[i ]/K ⌋], A[i ]) for j ← 0 to k − 1 do
AuxSort(B [j ])
return Concatenate(B[0..k − 1])
Stable?
11
Bucket Sort
12
Bucket Sort
Some properties depend on AuxSort:
12
Bucket Sort
Some properties depend on AuxSort: • Stability.
12
Bucket Sort
Some properties depend on AuxSort:
• Stability.
• Worst case complexity. (assuming k = Θ(n))
12
Bucket Sort
Some properties depend on AuxSort: • Stability.
• Worst case complexity. (assuming k = Θ(n))
Average complexity: Θ(n + n2 + k). Linear if k = Θ(n). k
12
Bucket Sort
Some properties depend on AuxSort: • Stability.
• Worst case complexity. (assuming k = Θ(n)) Average complexity: Θ(n + n2 + k). Linear if k = Θ(n).
k
Take-home message: Compared to Counting Sort, Bucket Sort provides more control over how much memory to use, but it is slower in the worst case.
12
Radix Sort
63310725353176512153
13
Radix Sort
63310725353176512153
110 011 011 001 000 111 010 101 011 101 011 001 111 110 101 001 010 001 101 011
13
Radix Sort
63310725353176512153
110 011 011 001 000 111 010 101 011 101 011 001 111 110 101 001 010 001 101 011 Bucket 1: 110 000 010 110 010
Bucket 2: 011 011 001 111 101 011 101 011 001 111 101 001 001 101 011
13
Radix Sort
63310725353176512153
110 011 011 001 000 111 010 101 011 101 011 001 111 110 101 001 010 001 101 011 Bucket 1: 110 000 010 110 010
Bucket 2: 011 011 001 111 101 011 101 011 001 111 101 001 001 101 011
110 000 010 110 010 011 011 001 111 101 011 101 011 001 111 101 001 001 101 011
13
Radix Sort
63310725353176512153
110 011 011 001 000 111 010 101 011 101 011 001 111 110 101 001 010 001 101 011 Bucket 1: 110 000 010 110 010
Bucket 2: 011 011 001 111 101 011 101 011 001 111 101 001 001 101 011
110 000 010 110 010 011 011 001 111 101 011 101 011 001 111 101 001 001 101 011 Bucket 1: 000 001 101 101 001 101 001 001 101
Bucket 2: 110 010 110 010 011 011 111 011 011 111 011
13
Radix Sort
63310725353176512153
110 011 011 001 000 111 010 101 011 101 011 001 111 110 101 001 010 001 101 011 Bucket 1: 110 000 010 110 010
Bucket 2: 011 011 001 111 101 011 101 011 001 111 101 001 001 101 011
110 000 010 110 010 011 011 001 111 101 011 101 011 001 111 101 001 001 101 011 Bucket 1: 000 001 101 101 001 101 001 001 101
Bucket 2: 110 010 110 010 011 011 111 011 011 111 011
000 001 101 101 001 101 001 001 101 110 010 110 010 011 011 111 011 011 111 011
13
Radix Sort
63310725353176512153
110 011 011 001 000 111 010 101 011 101 011 001 111 110 101 001 010 001 101 011 Bucket 1: 110 000 010 110 010
Bucket 2: 011 011 001 111 101 011 101 011 001 111 101 001 001 101 011
110 000 010 110 010 011 011 001 111 101 011 101 011 001 111 101 001 001 101 011 Bucket 1: 000 001 101 101 001 101 001 001 101
Bucket 2: 110 010 110 010 011 011 111 011 011 111 011
000 001 101 101 001 101 001 001 101 110 010 110 010 011 011 111 011 011 111 011 Bucket 1: 000 001 001 001 001 010 010 011 011 011 011 011
Bucket 2: 101 101 101 101 110 110 111 111
13
Radix Sort
63310725353176512153
110 011 011 001 000 111 010 101 011 101 011 001 111 110 101 001 010 001 101 011 Bucket 1: 110 000 010 110 010
Bucket 2: 011 011 001 111 101 011 101 011 001 111 101 001 001 101 011
110 000 010 110 010 011 011 001 111 101 011 101 011 001 111 101 001 001 101 011 Bucket 1: 000 001 101 101 001 101 001 001 101
Bucket 2: 110 010 110 010 011 011 111 011 011 111 011
000 001 101 101 001 101 001 001 101 110 010 110 010 011 011 111 011 011 111 011 Bucket 1: 000 001 001 001 001 010 010 011 011 011 011 011
Bucket 2: 101 101 101 101 110 110 111 111
000 001 001 001 001 010 010 011 011 011 011 011 101 101 101 101 110 110 111 111
13
Radix Sort
63310725353176512153
110 011 011 001 000 111 010 101 011 101 011 001 111 110 101 001 010 001 101 011 Bucket 1: 110 000 010 110 010
Bucket 2: 011 011 001 111 101 011 101 011 001 111 101 001 001 101 011
110 000 010 110 010 011 011 001 111 101 011 101 011 001 111 101 001 001 101 011 Bucket 1: 000 001 101 101 001 101 001 001 101
Bucket 2: 110 010 110 010 011 011 111 011 011 111 011
000 001 101 101 001 101 001 001 101 110 010 110 010 011 011 111 011 011 111 011 Bucket 1: 000 001 001 001 001 010 010 011 011 011 011 011
Bucket 2: 101 101 101 101 110 110 111 111
000 001 001 001 001 010 010 011 011 011 011 011 101 101 101 101 110 110 111 111
01111223333355556677
13
Radix Sort
Assumptions:
14
Radix Sort
Assumptions:
• Maximum key length is known in advance.
14
Radix Sort
Assumptions:
• Maximum key length is known in advance.
• Keys can be sorted in lexicographical order (strings).
14
Radix Sort
Assumptions:
• Maximum key length is known in advance.
• Keys can be sorted in lexicographical order (strings).
Start sorting from least to the most significant digit. (also possible to do in reverse)
14
Radix Sort
Assumptions:
• Maximum key length is known in advance.
• Keys can be sorted in lexicographical order (strings).
Start sorting from least to the most significant digit. (also possible to do in reverse)
By using Bucket Sort with max buckets we have guaranteed Θ(n) performance per pass.
14
Radix Sort
Assumptions:
• Maximum key length is known in advance.
• Keys can be sorted in lexicographical order (strings).
Start sorting from least to the most significant digit. (also possible to do in reverse)
By using Bucket Sort with max buckets we have guaranteed Θ(n) performance per pass.
Total worst case performance is Θ(n × len(k))
14
Radix Sort
function Radix Sort(A[0..n − 1], k ) for j ← 0 to len(k) do
A ← AuxSort(A,k[j])
15
Radix Sort
function Radix Sort(A[0..n − 1], k ) for j ← 0 to len(k) do
A ← AuxSort(A,k[j])
• Typically, AuxSort is Bucket Sort with max buckets but
can be any sorting algorithm as long as it is stable (why?)
15
Radix Sort
function Radix Sort(A[0..n − 1], k ) for j ← 0 to len(k) do
A ← AuxSort(A,k[j])
• Typically, AuxSort is Bucket Sort with max buckets but
can be any sorting algorithm as long as it is stable (why?)
Take-home message: Radix Sort can be very fast (faster than comparison sorting) if keys are short (need to known in advance).
15
Summary
• Distribution Sorting is a sorting paradigm that trades memory for speed.
• Relies on more assumptions, unlike Comparison Sorting algorithms:
• Counting Sort, Bucket Sort: positive integer keys, with max bound known.
• Radix Sort: more general but max key length must be known and keys should have lexicographical order.
16
In Practice
17
In Practice
• Distribution Sort is not as widely used as Comparison Sort:
17
In Practice
• Distribution Sort is not as widely used as Comparison Sort:
• Less general.
• Require more memory.
• In practice, good sorting algorithms can be very close to
Θ(n) already (Timsort).
17
In Practice
• Distribution Sort is not as widely used as Comparison Sort:
• Less general.
• Require more memory.
• In practice, good sorting algorithms can be very close to
Θ(n) already (Timsort).
• However, they can be very useful as part of a more complex algorithm.
17
In Practice
• Distribution Sort is not as widely used as Comparison Sort:
• Less general.
• Require more memory.
• In practice, good sorting algorithms can be very close to
Θ(n) already (Timsort).
• However, they can be very useful as part of a more complex algorithm.
• Radix Sort is used to construct suffix arrays.
• Controlled environment with guaranteed short key size.
17
In Practice
• Distribution Sort is not as widely used as Comparison Sort:
• Less general.
• Require more memory.
• In practice, good sorting algorithms can be very close to
Θ(n) already (Timsort).
• However, they can be very useful as part of a more complex algorithm.
• Radix Sort is used to construct suffix arrays.
• Controlled environment with guaranteed short key size.
Next lecture: string matching revisited.
17