COMP3600/6466 – Algorithms
Probabilistic Analysis Cont. [CLRS sec. 7.1, 7.3, 7.4]
Hanna Kurniawati
https://cs.anu.edu.au/courses/comp3600/ comp_3600_6466@anu.edu.au
Tutorial Group
• We have open a new face-to-face group: G9 Friday 13:00-15:00 in CSIT N115/116
• We have reopened tutorial group
registration until this Wednesday 4pm.
• If you like to change your tutorial group, please do so before this Wednesday (19 Aug) 4pm!!!
Assignment 1
• Due today 23:59
• Grace period until tomorrow 13:00
• Save as draft in wattle, to keep reuploading
• We will take the latest draft you have in wattle as your official submission
Final Project
• Description will be out this evening
• Thanks to CECS ANU, we’ll have prizes for
top 3 projects
1st prize
2nd prize
3rd prize
Milestone-1: Project Proposal
• Due 7 September, 23:59
• Grace period until 8 September, 13:00
Clarification
• The membership test (use of limit) for
asymptotic notation works one way:
• If the limit definition is satisfied, then the function belongs to the particular asymptotic notation
• But, if we know a function belongs to a particularasymptoticnotation(e.g.,𝑓𝑛 = Θ(log 𝑛)), it does not imply that the limit
condition (in our example, lim $(!) = 𝑐) holds !→# ‘() !
Recap on Average Case Time Complexity of Insertion Sort
Probabilistic Analysis of Deterministic Algorithms w. An Example on Insertion Sort
• To compute the expected number of inversion in a given input
• Let 𝑋 be the number of inversion in a given input
• Let I 𝑆!,# =%1 𝑖,𝑗 𝑖𝑠𝑎𝑛𝑖𝑛𝑣𝑒𝑟𝑠𝑖𝑜𝑛 0 𝑜𝑡h𝑒𝑟𝑤𝑖𝑠𝑒
• Then, 𝑋 = ∑&’% ∑&
!$% #$!(%
I 𝑆 !,#
•And𝐸𝑋 =𝐸∑&’%∑&
!$% #$!(%
= ∑&’% ∑& 𝐸 I 𝑆!,# !$% #$!(%
I𝑆 !,#
= ∑&’% ∑&
!$% #$!(%
=∑&’%∑&
!$% #$!(% )
𝑃 I 𝑆 !,#
= 1
% =% 1+1+⋯+1
=%.&&’% =&&’% ) ) *
) 𝑛 2
= &! =&&’% &’)! )! &’) ! ) &’) !
InsertionSort(A)
Cost
Times
1
for j = 2 to A.length
c1
n
2
Key = A[j]
c2
n-1
3
i = j-1
c3
n-1
4
While i > 0 and A[i] > key
c4
$
Replace with:
c5 $𝑛(𝑛 − 1)
𝑐 +𝑐 +𝑐!𝑡 −1
!𝑡! !”#
!”# +𝑐! 𝑛−1!”#
!4
5
A[i+1] = A[i]
6
i = i-1
c6
$
! 𝑡! − 1 !”#
7
A[i+1] = key
c7
n-1
Total: sum of cost*times Note: The total 𝑡# (i.e., total #swaps) is Best case: Θ(𝑛) different for different types of inputs.
Worst case: Θ(𝑛))
But, what’s the average?
𝑇 𝑛 𝑛(𝑛 − 1)
=𝑐%𝑛+ 𝑐)+𝑐,+𝑐* 𝑛−1 + 𝑐*+𝑐-+𝑐. 4 +𝑐/ 𝑛−1
=𝑛) 𝑐*+𝑐-+𝑐. +𝑛 𝑐%+𝑐)+𝑐,+𝑐*− 𝑐*+𝑐-+𝑐. +𝑐/ 44
− 𝑐)+𝑐,+𝑐*+𝑐/ = 𝑛)𝐶 + 𝑛𝐶0 − 𝐶′′
Where 𝑐* +𝑐- +𝑐. 𝐶=4
You might want to compare with the worst-case analysis in slide 29 of https://cs.anu.edu.au/c ourses/comp3600/wee k1-introduction- aftClass.pdf
𝐶0=𝑐%+𝑐)+𝑐,+,1!− 1″(1# +𝑐/
00 ** 𝐶 =𝑐) +𝑐, +𝑐* +𝑐/
Therefore, the average running time of insertion sort is Θ(𝑛)). In terms of asymptotic bound, this average case is the same as its worst case, but the coefficient is lower.
Topics
üProbability Refresher
üProbabilistic Analysis of Deterministic Algorithms • Randomized Algorithm, ex.: RandQuickSort
• Probabilistic Analysis of Randomized Algorithm
What is Randomized Algorithm?
• Recall the definition of algorithms from lecture 1: • A well-defined procedure that transforms input to output
• Randomized algorithm is an algorithm whose behavior is determined not just by its input but also by random-numbers
• Behavior can vary even for the same input
Input Random number
Randomized Algorithms
Output
Use of Randomization in Algorithms
• Provide better scalability (running time & space) • Abundance of witnesses
• Provide a compact representation of a large space (population)
• Random sampling
• Foiling an adversary
• Randomization adds unpredictability on the input—output
mapping.
• Can be viewed as having a distribution on many deterministic algorithms.
• Avoid pathological input that causes worst case scenario for the algorithm
Topics
üProbability Refresher
üProbabilistic Analysis of Deterministic Algorithms üRandomized Algorithm, ex.: RandQuickSort
• Probabilistic Analysis of Randomized Algorithm
Example: RandQuick Sort (Avoid pathological cases)
RandQuickSort(A, p, r) 1. If p < r
2. q = RandPartition(A, p, r)
3. RandQuickSort(A, p, q-1)
4. RandQuickSort(A, q+1, r)
RandPartition(A, p, r)
1. i = random(p, r)
2. Swap A[r] with A[i]
3. Return Partition(A, p, r)
Partition(A, p, r)
random(p, r): Pick an index between p and r (inclusive with equal probability
1. 2. 3. 4. 5. 6. 7. 8.
x=A[r]
i=p-1 Forj=ptor-1
If A[j] ≤ x i=i+1
Swap A[i] with A[j] Swap A[i+1] with A[r] Return i+1
RandQuickSort Illustration
• Input:
• Select a pivot uniformly at random (RandPartition)
• Set the pivot to be at the end
pr
20
5
10
8
7
15
12
3
20
5
10
8
7
15
12
3
20
5
3
8
7
15
12
10
RandQuickSort Illustration
• Split the array (Partition)
• i: index of the last element of the first whose value is smaller
than the pivot
• j: index of the element being examined at the moment
pr j
j ij
20
5
3
8
7
15
12
10
i i
20
5
3
8
7
15
12
10
5
20
3
8
7
15
12
10
RandQuickSort Illustration
ij
ij
⋮
i
5
20
3
8
7
15
12
10
5
3
20
8
7
15
12
10
5
3
8
7
20
15
12
10
ij
5
3
8
7
10
15
12
20
Example: RandQuick Sort (Avoid pathological cases)
RandQuickSort(A, p, r) 1. If p < r
2. q = RandPartition(A, p, r)
3. RandQuickSort(A, p, q-1)
4. RandQuickSort(A, q+1, r)
RandPartition(A, p, r)
1. i = random(p, r)
2. Swap A[r] with A[i]
3. Return Partition(A, p, r)
Partition(A, p, r)
random(p, r): Pick an index between p and r (inclusive with equal probability)
1. 2. 3. 4. 5. 6. 7. 8.
x=A[r]
i=p-1 Forj=ptor-1
If A[j] ≤ x i=i+1
Swap A[i] with A[j] Swap A[i+1] with A[r] Return i+1
Probabilistic Analysis Example:
A Randomized Algorithm (RandQuickSort)
• The majority of the work is in Partition, line 4-6
• It boils down to the #comparisons (line 4 of Partition)
• Without loss of generality, let’s assume the input are unique numbers
• Let’s first rename the input into 𝐵 = 𝑏%, 𝑏),...,𝑏& , where 𝑏! is the ith smallest number.
• Suppose 𝐵!,# is a subsequence of 𝐵 from index 𝑖 to 𝑗 (inclusive), i.e.,𝐵!,# = 𝑏!,𝑏!(%,...,𝑏#
• Notice that 𝑏! will only be compared with 𝑏# if the first pivot selected in 𝐵!,# is either 𝑏! 𝑜𝑟 𝑏#. For any other number 𝑘, 𝐵!,# will be split at 𝑘,𝑏! < 𝑘 < 𝑏#, and 𝑏! will never be compared with 𝑏#
• We need to compute the expected total #comparisons performed in 𝐵%,&
Probabilistic Analysis Example:
A Randomized Algorithm (RandQuickSort)
• Let 𝑋*,,= I 𝑆*,, = 01 𝑏* 𝑎𝑛𝑑 𝑏, 𝑎𝑟𝑒 𝑐𝑜𝑚𝑝𝑎𝑟𝑒𝑑 0 𝑜𝑡h𝑒𝑟𝑤𝑖𝑠𝑒
• Suppose 𝑋 is the total #comparisons over the entire run of RandQuickSort(A, 1, n)
• Since each pair of numbers in A is compared at most once (comparison is only to pivot and once a pivot is used it’ll never be compared again), we can define 𝑋
as 𝑋 = ∑!/. ∑! 𝑋*,, *-. ,-*0.
•Whichmeans,𝐸𝑋 =∑!/.∑!
*-. ,-*0.
= ∑!/. ∑!
*-. ,-*0.
𝐸[𝑋*,,] 𝑃(𝑋*,,)
Probabilistic Analysis Example:
A Randomized Algorithm (RandQuickSort)
• What is 𝑃(𝑋*,,)? (to be clear 𝑃(𝑋*,,) means 𝑃(𝑋*,, = 1)) • 𝑏! and 𝑏# will only be compared if 𝑏! or 𝑏# is selected as the
first pivot chosen from 𝐵!,#
• 𝑃 𝑏! 𝑖𝑠𝑡h𝑒𝑓𝑖𝑟𝑠𝑡𝑝𝑖𝑣𝑜𝑡𝑖𝑛𝐵!,# = %
• 𝑃 𝑏# 𝑖𝑠𝑡h𝑒𝑓𝑖𝑟𝑠𝑡𝑝𝑖𝑣𝑜𝑡𝑖𝑛𝐵!,# = % #'!(%
• Since the two events above are mutually exclusive, 𝑃(𝑋!,#) = 𝑃 𝑏! 𝑖𝑠 𝑡h𝑒 𝑓𝑖𝑟𝑠𝑡 𝑝𝑖𝑣𝑜𝑡 𝑖𝑛 𝐵!,# +
𝑃 𝑏# 𝑖𝑠𝑡h𝑒𝑓𝑖𝑟𝑠𝑡𝑝𝑖𝑣𝑜𝑡𝑖𝑛𝐵!,# = ) #'!(%
#'!(%
Probabilistic Analysis Example:
A Randomized Algorithm (RandQuickSort)
𝐸𝑋 =∑&'%∑&
!$% #$!(%
𝑃(𝑋 ) !,#
) !$% #$!(% #'!(%
= ∑&'% ∑&
= ∑&'% ∑&'! ) !$% 2$% 2(%
< ∑&'% ∑& ) !$% 2$% 2(%
< ∑&'% 2 ∑& % !$% 2$% 2
= ∑&'% 2 (l𝑛 𝑛 + 𝛾) !$%
= ∑&'% 2(345 & + 𝛾) !$% 345 6
< 𝑐. 𝑛 log 𝑛 = O(𝑛 log 𝑛)
𝛾 is Euler’s constant (~0.5772)
• •
• •
• • •
• •
Can we do better?
• Can we get a tighter bound on the expected running time?
• Let’s look at the best-case scenario
• This happens when the pivot selected always divide the
(sub)array by half
• The total time taken is then a recurrence
𝑇 𝑛 =2𝑇 &) +Θ 𝑛 =2𝑇 &) +𝑐𝑛forapositiveconstant𝑐
• We have computed the above recurrence (of https://cs.anu.edu.au/courses/comp3600/week3-recurrence- aftClass.pdf , slide 31): 𝑇 𝑛 = Θ 𝑛 log 𝑛
• Since the average case is O(𝑛 log 𝑛) and the best-case is Θ 𝑛 log𝑛 the average case must be Θ 𝑛 log𝑛 too.
Topics
üProbability Refresher
üProbabilistic Analysis of Deterministic Algorithms üRandomized Algorithm, ex.: RandQuickSort üProbabilistic Analysis of Randomized Algorithm
Next: Empirical Analysis