Lecture 10: Sorting in Linear Time
1. Lower Bounds and Decision Trees 2.The Lower Bound for
Comparison-Based Sorting 3.Linear-time Sorting
Copyright By PowCoder代写 加微信 powcoder
Counting Sort Radix Sort
4.Sorting Reprise
Running time of sorting algorithms
Do you still remember what these statements mean?
Sorting algorithm A runs in time.
Sorting algorithm A runs in time.
So far, all sorting algorithms have running time . Merge Sort and Heapsort also have running time .
Q: Is it possible to design an algorithm that runs faster than ?
A: No, if the algorithm is comparison-based. We will use the decision- tree model of computation to show that any comparison-based sorting algorithm requires time.
Remark: A comparison-based sorting algorithm is any algorithm that makes decisions only by using comparisons between items (and not by referencing their actual values).
Thus, Merge Sort and Heapsort are asymptotically optimal comparison- based algorithms.
The decision-tree model (i)
An algorithm in the decision-tree model
Solves the problem by asking questions with binary (Yes/No) answers
– For sorting algorithms instead of binary questions we have comparisons
with 2 outcomes (≤,>).
The worst-case running time is the height of the decision tree.
– Height is length of longest path from root to a leaf
Height = 3
The decision-tree model (ii)
An algorithm in the decision-tree model
Solves the problem by asking questions with binary (Yes/No) answers
The worst-case running time is the height of the decision tree.
FACT: A binary tree with leaves must have height .
=> An algorithm in the decision tree model that has n different outputs has running time .
Height = 3
Heights of Binary Trees
FACT: A binary tree with leaves must have height .
Proof: Consider a binary tree with n leaves and height h (i) A binary tree of height h has at most 2h leaves
(KnownFact: easytoprovebyinductiononh)
(ii) (iii)
The decision-tree for binary search
Theorem: Any algorithm for finding location of given element in a sorted array of size must have running time in the decision-tree model.
The algorithm must have at least different outputs. One for each element array & one to report search failure
=> The decision-tree has at least leaves.
Any binary tree with leaves must have height .
=> Algorithm has running time .
Exercise on Coins
1. You are given a set of n coins, and are told that at most one (possibly none) of the n coins is either too heavy or too light (but you do not know which). You must determine which of the n coins is defective, or report that none is defective. To do this test you have a scale; you place some of the coins on the left side of the scale and some of the coins on the right side. The scale indicates either (1) the left side is heavier, (2) the right side is heavier, or (3) both subsets have the same weight. It does not indicate how much heavier or lighter.
2. Use a decision tree argument to prove that the minimum number of scale comparisons is log3(1 + 2n)
3. Present a method to determine the defective coin using at most clog3n scale measurements, where c is a constant (independent of n). Assume that n is a power of 3. 1.
Exercise on Coins – Number of Comparisons
• The decision tree comes from the outcomes of using the scale.
• The total number of possible outcomes is 1+2n:
• If there is no defective coin, there is one possible outcome.
• Otherwise, there are n coins, among which one is defective. There
are n choices of the defective one. Since the defective coin can be either heavier or lighter, there are in all 2n possibilities for the case that there is one defective coin.
• Thus, the number of leaf nodes in the decision tree is 1+2n.
• Since there are three possible outcomes for comparison (balanced, heavier left and heavier right) the decision tree is ternary. A ternary tree with 1+2n leaf nodes has height at least log3(1 + 2n). Thus, we need at least log3(1 + 2n) comparisons.
Exercise on Coins – Algorithm
• Present a method to determine the defective coin using at most clog3n scale measurements, where c is a constant (independent of n). Assume that n is a power of 3.
< C1L,C2H C1?C3
= C1OK,C2OK C1?C3
C1L C2H C3H No Fake C3L C2L C1H
Exercise on Coins - Algorithm
• Present a method to determine the defective coin using at most clog3n scale measurements, where c is a constant (independent of n). Assume that n is a power of 3.
• Split the coins into three equal sets C1, C2, C3. With 2 comparisons you can determine 1 of the 7 possible outcomes: C1 H, C1 L, C2 H, C2 L, C3 H, C3 L, and no defective coin (on board).
• If no defective coin, then stop. Otherwise, repeat recursively step 1 with the set that contains the defective coin.
• Analysis. Since with 2 comparisons we reduce the size of the problem by 1/3, we have: T(n)=T(n/3) + 2= 2log3n
The decision-tree for sorting
Theorem: Any algorithm for sorting elements must have running time in the decision-tree model.
Proof: A sorting algorithm has at least different outputs (one for each possible permutation on n items) => Decision-tree has at least leaves =>
the height of the decision tree is at least (Stirling’s approximation – exercise solved in Lecture 2)
=> Sorting algorithm requires at least time
Can we do better?
Implication of the lower bound
Anything “better” must be a non comparison-based algorithm.
Integer sorting
Assumes that the elements are integers from to
Running time is expressed as function of BOTH and
Both are possible.
Functions of two variables might not be comparable to each other
We will now see Counting Sort.
It sorts n integers in the range [1,k] in Θ( + ) time.
Counting-sort(A, B, k )
Input:A[1…n],whereA[j] {1,2,…,k} Output:B[1…n],sorted letC[1…k]beanewarray;
fori←1tok do C [i ] ← 0;
//InitializeCounters
forj←1ton do
C [A[j]] ← C [A[j]] + 1;
// C [i ] = |{key = i}| C[i]←C[i]+C[i−1]; //C[i]=|{key≤i}|
fori←2tok do end
for j ← n to 1 do B[C [A[j]]] ← A[j];
C [A[j]] ← C [A[j]] − 1; end
// Move items into proper location
Range of Integers [1,4]
We will work through the algorithm, showing that initial array A[1..5] gets sorted B[1..5]. Pay attention to the fact that the algorithm will move the red entries on top into the red entries on bottom and the blue entries on top into the blue items on bottom.
A sorting algorithm is STABLE if two items with the same value appear in the same order in the sorted array as they did in the initial array.
Counting Sort is stable
Radix Sort (our next sorting algorithm), will depend upon the stability of counting sort
12345 1234
1st Loop: Counter Initialization
12345 1234
for i ← 1to k do C[i] ←0;
2nd Loop: Count # of each item
12345 1234
for j ← 1to n do
C[A[j]] ← C[A[j]] + 1; //C[i] = |{key = i}|
2nd Loop: Count # of each item
12345 1234
for j ← 1to n do
C[A[j]] ← C[A[j]] + 1; //C[i] = |{key = i}|
2nd Loop: Count # of each item
12345 1234
for j ← 1to n do
C[A[j]] ← C[A[j]] + 1; //C[i] = |{key = i}|
2nd Loop: Count # of each item
12345 1234
for j ← 1to n do
C[A[j]] ← C[A[j]] + 1; //C[i] = |{key = i}|
2nd Loop: Count # of each item
12345 1234
for j ← 1to n do
C[A[j]] ← C[A[j]] + 1; //C[i] = |{key = i}|
3rd Loop: Aggregation
12345 1234
for i ← 2to k do C[i]←C[i]+C[i−1];//C[i]= |{key≤ i}|
3rd Loop: Aggregation
12345 1234
for i ← 2to k do C[i]←C[i]+C[i−1];//C[i]= |{key≤ i}|
3rd Loop: Aggregation
12345 1234
for i ← 2to k do C[i]←C[i]+C[i−1];//C[i]= |{key≤ i}|
4th Loop: Place items properly
12345 1234
forj ←n to 1do B[C[A[j]]] ←A[j]; C[A[j]] ← C[A[j]] − 1;
WalkthroughA from end to front.
At step j, i.e.,
when scanning x=A[j],
C’[x] will hold appropriate location in B of the rightmost currently unplaced x
4th Loop: Place items properly
12345 1234
forj ←n to 1do B[C[A[j]]] ←A[j]; C[A[j]] ← C[A[j]] − 1;
WalkthroughA from end to front.
At step j, i.e.,
when scanning x=A[j],
C’[x] will hold appropriate location in B of the rightmost currently unplaced x
4th Loop: Place items properly
12345 1234
forj ←n to 1do B[C[A[j]]] ←A[j]; C[A[j]] ← C[A[j]] − 1;
WalkthroughA from end to front.
At step j, i.e.,
when scanning x=A[j],
C’[x] will hold appropriate location in B of the rightmost currently unplaced x
4th Loop: Place items properly
12345 1234
forj ←n to 1do B[C[A[j]]] ←A[j]; C[A[j]] ← C[A[j]] − 1;
WalkthroughA from end to front.
At step j, i.e.,
when scanning x=A[j],
C’[x] will hold appropriate location in B of the rightmost currently unplaced x
4th Loop: Place items properly
12345 1234
forj ←n to 1do B[C[A[j]]] ←A[j]; C[A[j]] ← C[A[j]] − 1;
WalkthroughA from end to front.
At step j, i.e.,
when scanning x=A[j],
C’[x] will hold appropriate location in B of the rightmost currently unplaced x
Counting sort
Counting-Sort(𝐴, 𝐵):
let 𝐶[0 ..𝑘] be a new array for 𝑖←0 to 𝑘
𝐶𝑖←0 for 𝑗←1 to 𝑛
𝐶𝐴𝑗 ←𝐶[𝐴[𝑗]]+1
// 𝐶[𝑖] now contains the number of 𝑖’s for 𝑖←1 to 𝑘
𝐶𝑖 ←𝐶[𝑖]+𝐶[𝑖−1]
// 𝐶[𝑖] now contains the number of elements ≤ 𝑖 for 𝑗←𝑛 downto 1
𝐵 𝐶 𝐴 𝑗 ← 𝐴[𝑗] 𝐶𝐴𝑗 ←𝐶[𝐴[𝑗]]−1
Running time: Working space:
Counting Sort is a stable sorting algorithm.
Exercise on Range Queries
• You are given an array A of n integers in the range [1 .. k]. Describe a pre‐processing algorithm that generates a new array C. Using C you can answer any range query of the form “how many elements are in the range (a..b]” in constant time?
• Solution: You generate the same array C[1 .. k] as in counting sort, i.e., the element C[i] is the total number of elements smaller or equal to i. The answer to the query is: C[b] ‐ C[a].
Radix sort
Radix-Sort(𝐴, 𝑑): for 𝑖←1 to 𝑑
use counting sort to sort array 𝑨 on digit 𝑖
Input: Array of n numbers. Each number has d digits Each digit is in {0,1,…k-1}
A sorted array
Radix sort
Radix-Sort(𝐴, 𝑑): for 𝑖←1 to 𝑑
use counting sort to sort array 𝑨 on digit 𝑖
Input: Array of n numbers. Each number has d digits Each digit is in {0,1,…k-1}
A sorted array
Radix sort
Radix-Sort(𝐴, 𝑑): for 𝑖←1 to 𝑑
use counting sort to sort array 𝑨 on digit 𝑖
Input: Array of n numbers. Each number has d digits Each digit is in {0,1,…k-1}
A sorted array
Radix sort
Radix-Sort(𝐴, 𝑑): for 𝑖←1 to 𝑑
use counting sort to sort array 𝑨 on digit 𝑖
Input: Array of n numbers. Each number has d digits Each digit is in {0,1,…k-1}
A sorted array
Radix sort
Radix-Sort(𝐴, 𝑑): for 𝑖←1 to 𝑑
use counting sort to sort array 𝑨 on digit 𝑖
Input: Array of n numbers. Each number has d digits Each digit is in {0,1,…k-1}
A sorted array
Radix sort: Correctness
Proof: (induction on digit position)
Assume that the numbers are sorted by their low-order Digits
Sort on digit After the sort
Radix sort: Correctness
Proof: (induction on digit position)
Assume that the numbers are sorted by their low-order Digits
Sort on digit After the sort
– Two numbers that differ on digit are correctly sorted by their low-order digits
Radix sort: Correctness
Proof: (induction on digit position)
Assume that the numbers are sorted by their low-order
Sort on digit After the sort
– Two numbers that differ on digit are correctly sorted by their low-order digits
– Because of STABILITY of counting sort, two numbers that have same i-th digit are in the same order in output as they were in input
they are correctly sorted by their low-order
Radix sort: Running time analysis
Q: What is running time of Radix Sort?
n = # integers, k = # of values each digit can take, d= # digits
Counting sort takes
Counting sort is run d times
=> total run time is
n 16-bit binary words
Bit takes only two values so
Algorithm takes
Running time
Randomized
Working space
Comparison- based
Insertion sort
Summary of sorting algorithms
Merge sort
Other Properties
Cache performance
Parallelization
Note: Our versions of Insertion Sort and Mergesort are stable. Possible to write unstable versions. Also, our version of Quicksort is unstable. If allowed extra memory space, it’s possible to write a stable version of Quicksort.
Radix sort
Exercise on List Merging
You are given lists such that: (i) Each list contain real numbers
(ii) for to , the elements in list are all less than all the elements in list .
The obvious algorithm to fully sort these items is to sort each list separately and then concatenate the sorted lists together.
• What is the running time of this approach (in terms of comparisons).
• Sorting each list requires comparisons. Since, there lists, the
running time is .
• Use the decision tree model to show that this approach is asymptotically optimal.
Exercise on List Merging (cont)
I have to compute the number of leafs of the corresponding decision tree.
• How many possible arrangements (permutations) exist for each list?
• How many possible arrangements (permutations) exist in total?
• What is the height for a binary tree to accommodate all possible
arrangements?
• log((k!)n/k)=n/k log(k!) = n/k (klogk)= (nlogk)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com