APS1070, Fall 2021: Lecture 2
Based on course material by . PS1070 Class Representatives
If you have any complaints or suggestions about this course, please talk to me or email me directly. Alternatively, talk to one of the class reps who will then talk to me and the teaching team.
▶ Morning Section ▶
▶
▶ Afternoon Section
▶
▶
What is Lecture 2 about?
▶ An introduction to the analysis of algorithms and data structures.
▶ Asymptotic complexity analysis. Sorting algorithms.
▶ Hashing and the dictionary abstract data type.
▶ Prerequisite: Fundamental concepts from discrete mathematics (sets, functions, inequalities, limits) covered as “backgrounds” in this slide deck.
▶ This is a “theory” lecture but we also assess ability to implement these abstract structures and algorithms in projects.
Background: Mathematics review
Sets
▶ A set is an unordered collection of objects (called elements).
Sets
▶ A set is an unordered collection of objects (called elements). ▶ Important sets:
Sets
▶ A set is an unordered collection of objects (called elements). ▶ Important sets:
▶ N = {0,1,2,…}, the set of natural numbers.
Sets
▶ A set is an unordered collection of objects (called elements). ▶ Important sets:
▶ N = {0,1,2,…}, the set of natural numbers. ▶ R, the set of real numbers.
Sets
▶ A set is an unordered collection of objects (called elements). ▶ Important sets:
▶ N = {0,1,2,…}, the set of natural numbers. ▶ R, the set of real numbers.
▶ ∅ = {}, the empty set having no elements.
Sets
▶ A set is an unordered collection of objects (called elements). ▶ Important sets:
▶ N = {0,1,2,…}, the set of natural numbers. ▶ R, the set of real numbers.
▶ ∅ = {}, the empty set having no elements.
▶ Notation: X = {2,3,5,7,11} or
X = {x ∈ N : x is prime and x < 12}.
Sets
▶ A set is an unordered collection of objects (called elements). ▶ Important sets:
▶ N = {0,1,2,...}, the set of natural numbers. ▶ R, the set of real numbers.
▶ ∅ = {}, the empty set having no elements.
▶ Notation: X = {2,3,5,7,11} or
X = {x ∈ N : x is prime and x < 12}.
▶ Operations:
Sets
▶ A set is an unordered collection of objects (called elements). ▶ Important sets:
▶ N = {0,1,2,...}, the set of natural numbers. ▶ R, the set of real numbers.
▶ ∅ = {}, the empty set having no elements.
▶ Notation: X = {2,3,5,7,11} or
X = {x ∈ N : x is prime and x < 12}.
▶ Operations:
▶ A∩B={x:x∈Aandx∈B}
Sets
▶ A set is an unordered collection of objects (called elements). ▶ Important sets:
▶ N = {0,1,2,...}, the set of natural numbers. ▶ R, the set of real numbers.
▶ ∅ = {}, the empty set having no elements.
▶ Notation: X = {2,3,5,7,11} or
X = {x ∈ N : x is prime and x < 12}.
▶ Operations:
▶ A∩B={x:x∈Aandx∈B} ▶ A∪B={x:x∈Aorx∈B}
Sets
▶ A set is an unordered collection of objects (called elements). ▶ Important sets:
▶ N = {0,1,2,...}, the set of natural numbers. ▶ R, the set of real numbers.
▶ ∅ = {}, the empty set having no elements.
▶ Notation: X = {2,3,5,7,11} or
X = {x ∈ N : x is prime and x < 12}.
▶ Operations:
▶ A∩B={x:x∈Aandx∈B}
▶ A∪B={x:x∈Aorx∈B}
▶ |A| is the number of elements of A.
Functions
▶ A function is a mapping f from a set X (the domain) to a set Y (the codomain) such that every x ∈ X maps to a unique y∈Y.
Functions
▶ A function is a mapping f from a set X (the domain) to a set Y (the codomain) such that every x ∈ X maps to a unique y∈Y.
▶ Important functions from R to R:
Functions
▶ A function is a mapping f from a set X (the domain) to a set Y (the codomain) such that every x ∈ X maps to a unique y∈Y.
▶ Important functions from R to R:
▶ Power functions f(x) = x, f(x) = x2, f(x) = x3, etc
Functions
▶ A function is a mapping f from a set X (the domain) to a set Y (the codomain) such that every x ∈ X maps to a unique y∈Y.
▶ Important functions from R to R:
▶ Power functions f(x) = x, f(x) = x2, f(x) = x3, etc ▶ Exponential functions f(x) = 2x, f(x) = (1.5)x, etc
Functions
▶ A function is a mapping f from a set X (the domain) to a set Y (the codomain) such that every x ∈ X maps to a unique y∈Y.
▶ Important functions from R to R:
▶ Power functions f(x) = x, f(x) = x2, f(x) = x3, etc
▶ Exponential functions f(x) = 2x, f(x) = (1.5)x, etc
▶ Logarithm (inverse of exponential) has a different domain.
Functions
▶ A function is a mapping f from a set X (the domain) to a set Y (the codomain) such that every x ∈ X maps to a unique y∈Y.
▶ Important functions from R to R:
▶ Power functions f(x) = x, f(x) = x2, f(x) = x3, etc
▶ Exponential functions f(x) = 2x, f(x) = (1.5)x, etc
▶ Logarithm (inverse of exponential) has a different domain.
▶ Ceiling rounds up to nearest integer, e.g. ⌈3.7⌉ = 4. Floor rounds down, e.g. ⌊3.7⌋ = 3 = ⌊3⌋.
Basic properties of important functions
▶ For a > 1 the exponential f(x) = ax is increasing and positive, and satisfies ax+y = axay for all x, y ∈ R.
Basic properties of important functions
▶ For a > 1 the exponential f(x) = ax is increasing and positive, and satisfies ax+y = axay for all x, y ∈ R.
▶ For a > 1 the logarithm loga is increasing and satisfies loga(xy) = loga x + loga y for all x, y > 0.
Basic properties of important functions
▶ For a > 1 the exponential f(x) = ax is increasing and positive, and satisfies ax+y = axay for all x, y ∈ R.
▶ For a > 1 the logarithm loga is increasing and satisfies loga(xy) = loga x + loga y for all x, y > 0.
▶ We write ln = loge and lg = log2. Note that loga x = loga b logb x.
Basic properties of important functions
▶ For a > 1 the exponential f(x) = ax is increasing and positive, and satisfies ax+y = axay for all x, y ∈ R.
▶ For a > 1 the logarithm loga is increasing and satisfies loga(xy) = loga x + loga y for all x, y > 0.
▶ We write ln = loge and lg = log2. Note that loga x = loga b logb x.
▶Derivatives: dex=ex,d lnx=1. dx dx x
Sums
▶ Asequenceisafunctionf:N→R.
Sums
▶ Asequenceisafunctionf:N→R.
▶ Notation: f0, f1, f2, · · · where fi = f(i).
Sums
▶ Asequenceisafunctionf:N→R.
▶ Notation: f0, f1, f2, · · · where fi = f(i). ▶ Sum: fm + fm+1 + · · · + fn = ni=m fi.
Sums
▶ Asequenceisafunctionf:N→R.
▶ Notation: f0, f1, f2, · · · where fi = f(i). ▶ Sum: fm + fm+1 + · · · + fn = ni=m fi. ▶ Important sums:
n n(n + 1)
n i=m
an+1 − am an= a−1 .
i=2 i=1
Part 1
Asymptotic Analysis of Algorithms
Warm call: Why should we analyse algorithms?
What is an algorithm?
▶ An algorithm is a sequence of clearly stated rules that specify a step-by-step method for solving a given problem.
What is an algorithm?
▶ An algorithm is a sequence of clearly stated rules that specify a step-by-step method for solving a given problem.
▶ The rules should be unambiguous and sufficiently detailed that they can be carried out without creativity.
What is an algorithm?
▶ An algorithm is a sequence of clearly stated rules that specify a step-by-step method for solving a given problem.
▶ The rules should be unambiguous and sufficiently detailed that they can be carried out without creativity.
▶ Examples of algorithms: a (sufficiently detailed) cake recipe, primary school method for multiplication of decimal integers; quicksort.
What is an algorithm?
▶ An algorithm is a sequence of clearly stated rules that specify a step-by-step method for solving a given problem.
▶ The rules should be unambiguous and sufficiently detailed that they can be carried out without creativity.
▶ Examples of algorithms: a (sufficiently detailed) cake recipe, primary school method for multiplication of decimal integers; quicksort.
▶ Algorithms predate electronic computers by thousands of years (example: Euclid’s greatest common divisor algorithm).
What is an algorithm?
▶ An algorithm is a sequence of clearly stated rules that specify a step-by-step method for solving a given problem.
▶ The rules should be unambiguous and sufficiently detailed that they can be carried out without creativity.
▶ Examples of algorithms: a (sufficiently detailed) cake recipe, primary school method for multiplication of decimal integers; quicksort.
▶ Algorithms predate electronic computers by thousands of years (example: Euclid’s greatest common divisor algorithm).
▶ A program is a sequence of computer instructions implementing the algorithm.
Why analyse an algorithm?
▶ Experience shows that enormously more performance gains can be achieved by optimizing the algorithm than by optimizing other factors such as:
Why analyse an algorithm?
▶ Experience shows that enormously more performance gains can be achieved by optimizing the algorithm than by optimizing other factors such as:
▶ processor
Why analyse an algorithm?
▶ Experience shows that enormously more performance gains can be achieved by optimizing the algorithm than by optimizing other factors such as:
▶ processor ▶ language
Why analyse an algorithm?
▶ Experience shows that enormously more performance gains can be achieved by optimizing the algorithm than by optimizing other factors such as:
▶ processor ▶ language ▶ compiler
Why analyse an algorithm?
▶ Experience shows that enormously more performance gains can be achieved by optimizing the algorithm than by optimizing other factors such as:
▶ processor
▶ language
▶ compiler
▶ human programmer
Why analyse an algorithm?
▶ Experience shows that enormously more performance gains can be achieved by optimizing the algorithm than by optimizing other factors such as:
▶ processor
▶ language
▶ compiler
▶ human programmer
▶ The analysis process often results in us discovering simpler algorithms.
Why analyse an algorithm?
▶ Experience shows that enormously more performance gains can be achieved by optimizing the algorithm than by optimizing other factors such as:
▶ processor
▶ language
▶ compiler
▶ human programmer
▶ The analysis process often results in us discovering simpler algorithms.
▶ Many algorithms have parameters that must be set before implementation. Analysis allows us to set the optimal values.
Why analyse an algorithm?
▶ Experience shows that enormously more performance gains can be achieved by optimizing the algorithm than by optimizing other factors such as:
▶ processor
▶ language
▶ compiler
▶ human programmer
▶ The analysis process often results in us discovering simpler algorithms.
▶ Many algorithms have parameters that must be set before implementation. Analysis allows us to set the optimal values.
▶ Algorithms that have not been analysed for correctness often lead to major bugs in programs.
Fibonacci numbers
This sequence is recursively defined by
n if n = 0 or n = 1;
F(n) =
This immediately suggests a recursive algorithm.
F (n − 1) + F (n − 2) if n ≥ 2.
Algorithm 1 Slow method for computing Fibonacci numbers
1: 2: 3: 4: 5:
function slowfib(integer n)
if n<0thenreturn0
else if n = 0 then return 0
else if n = 1 then return 1
else return slowfib(n − 1) + slowfib(n − 2)
Improving over slowfib
▶ The algorithm slowfib is obviously correct, but does a lot of repeated computation. With a small (fixed) amount of extra space, we can do better, by working from the bottom up instead of from the top down.
Algorithm 2 Fast method for computing Fibonacci numbers
1: 2: 3: 4: 5: 6: 7: 8: 9:
10: 11: 12:
function fastfib(integer n)
if n<0thenreturn0 elseifn=0thenreturn0 elseifn=1thenreturn1 else
a ← 1
b ← 0
for i ← 2 to n do
t←a a←a+b b←t
return a
▷ stores F(i) at bottom of loop ▷ stores F(i−1) at bottom of loop
Analysis of the fast algorithm
▶ Even a bad implementation in a slow interpreted language on an ancient machine of fastfib will beat the best implementation of slowfib, once n becomes big enough.
Colab time
▶ Implement fastfib and slowfib in Python. Which is faster for which values of n? What is the maximum value of n for which each gives a result in a reasonable time?
How to measure running time?
Basic performance measures
There are three main characteristics of an algorithm designed to solve a given problem.
▶ Domain of definition: the set of legal inputs.
Basic performance measures
There are three main characteristics of an algorithm designed to solve a given problem.
▶ Domain of definition: the set of legal inputs.
▶ Correctness: it gives correct output for each legal input. This depends on the problem we are trying to solve, and can be tricky to prove.
Basic performance measures
There are three main characteristics of an algorithm designed to solve a given problem.
▶ Domain of definition: the set of legal inputs.
▶ Correctness: it gives correct output for each legal input. This depends on the problem we are trying to solve, and can be tricky to prove.
▶ Resource use: usually computing time and memory space.
Basic performance measures
There are three main characteristics of an algorithm designed to solve a given problem.
▶ Domain of definition: the set of legal inputs.
▶ Correctness: it gives correct output for each legal input. This depends on the problem we are trying to solve, and can be tricky to prove.
▶ Resource use: usually computing time and memory space.
▶ This depends on the input, and on the implementation (hardware, programmer skill, compiler, language, ...).
Basic performance measures
There are three main characteristics of an algorithm designed to solve a given problem.
▶ Domain of definition: the set of legal inputs.
▶ Correctness: it gives correct output for each legal input. This depends on the problem we are trying to solve, and can be tricky to prove.
▶ Resource use: usually computing time and memory space. ▶ This depends on the input, and on the implementation
(hardware, programmer skill, compiler, language, ...). ▶ It usually grows as the input size grows.
Basic performance measures
There are three main characteristics of an algorithm designed to solve a given problem.
▶ Domain of definition: the set of legal inputs.
▶ Correctness: it gives correct output for each legal input. This depends on the problem we are trying to solve, and can be tricky to prove.
▶ Resource use: usually computing time and memory space.
▶ This depends on the input, and on the implementation
(hardware, programmer skill, compiler, language, ...).
▶ It usually grows as the input size grows.
▶ There is a tradeoff between resources (for example, time vs
space).
Basic performance measures
There are three main characteristics of an algorithm designed to solve a given problem.
▶ Domain of definition: the set of legal inputs.
▶ Correctness: it gives correct output for each legal input. This depends on the problem we are trying to solve, and can be tricky to prove.
▶ Resource use: usually computing time and memory space.
▶ This depends on the input, and on the implementation
(hardware, programmer skill, compiler, language, ...).
▶ It usually grows as the input size grows.
▶ There is a tradeoff between resources (for example, time vs
space).
▶ Running time is usually more important than space use.
Basic performance measures
There are three main characteristics of an algorithm designed to solve a given problem.
▶ Domain of definition: the set of legal inputs.
▶ Correctness: it gives correct output for each legal input. This depends on the problem we are trying to solve, and can be tricky to prove.
▶ Resource use: usually computing time and memory space.
▶ This depends on the input, and on the implementation
(hardware, programmer skill, compiler, language, ...).
▶ It usually grows as the input size grows.
▶ There is a tradeoff between resources (for example, time vs
space).
▶ Running time is usually more important than space use.
Basic performance measures
There are three main characteristics of an algorithm designed to solve a given problem.
▶ Domain of definition: the set of legal inputs.
▶ Correctness: it gives correct output for each legal input. This depends on the problem we are trying to solve, and can be tricky to prove.
▶ Resource use: usually computing time and memory space.
▶ This depends on the input, and on the implementation
(hardware, programmer skill, compiler, language, ...).
▶ It usually grows as the input size grows.
▶ There is a tradeoff between resources (for example, time vs
space).
▶ Running time is usually more important than space use.
In this lecture we mainly consider how to estimate resource use of an algorithm, ignoring implementation as much as possible.
Warm call: How to compare algorithms?
▶ Given an algorithm A, the actual running time on a given input ι depends on many implementation details. Can we compare algorithms when we do not know the exact input and details of implementation?
Warm call: How to compare algorithms?
▶ Given an algorithm A, the actual running time on a given input ι depends on many implementation details. Can we compare algorithms when we do not know the exact input and details of implementation?
▶ The running time usually grows with the size of the input. Running time for very small inputs is not usually important; it is large inputs that cause problems if the algorithm is inefficient.
Running time: input
▶ We define a notion of input size on the data. This is a positive integer.
Running time: input
▶ We define a notion of input size on the data. This is a positive integer.
▶ Example: number of records in a database to be sorted.
Elementary operations
▶ We use the concept of elementary operation as our basic measuring unit of running time. This is any operation whose execution time does not depend on the size of the input.
Elementary operations
▶ We use the concept of elementary operation as our basic measuring unit of running time. This is any operation whose execution time does not depend on the size of the input.
▶ The running time T (ι) of algorithm A on input ι is the number of elementary operations used when ι is fed into A.
Running time
Input size
Function
Notation
10 100 1000 107
Constant
1
1111
Logarithmic
log n
1237
Linear
n
1 10 100 106
“Linearithmic”
nlogn
1 20 300 7×106
Quadratic
n2
1 100 10000 1012
Cubic
n3
1 1000 106 1018
Exponential
2n
1 1027 10298 103010296
Note: there are about 3 × 1018 nanoseconds in a century.
Warm call
▶ Algorithm A takes n2 elementary operations to sort a file of n lines, while Algorithm B takes 50n log n. Which algorithm is better when n = 10?
Warm call
▶ Algorithm A takes n2 elementary operations to sort a file of n lines, while Algorithm B takes 50n log n. Which algorithm is better when n = 10?
▶ when n = 106? How do we decide which algorithm to use?
How to measure running time?
Running time techniques: easy
▶ Running time of disjoint blocks adds.
Running time techniques: easy
▶ Running time of disjoint blocks adds.
▶ Running time of nested loops with non-interacting variables multiplies.
Running time techniques: easy
▶ Running time of disjoint blocks adds.
▶ Running time of nested loops with non-interacting variables
multiplies.
▶ Example: single, double, triple loops with fixed number of elementary operations inside the inner loop yields linear, quadratic, cubic running time.
Algorithm 3 Swapping two elements in an array Require: 0≤i≤j≤n−1
function swap(array a[0..n − 1], integer i, integer j) t ← a[i]
a[i] ← a[j] a[j] ← t return a
▶ Running time?
Algorithm 4 Swapping two elements in an array Require: 0≤i≤j≤n−1
function swap(array a[0..n − 1], integer i, integer j) t ← a[i]
a[i] ← a[j] a[j] ← t return a
▶ Running time?
▶ This is a constant time algorithm.
Algorithm 5 Finding the maximum in an array function findmax(array a[0..n − 1])
k ← 0
for j ← 1 to n − 1 do
if a[k] < a[j] then k=j
return k
▶ Running time?
▷ location of maximum so far
Algorithm 6 Finding the maximum in an array function findmax(array a[0..n − 1])
k ← 0
for j ← 1 to n − 1 do
if a[k] < a[j] then k=j
return k
▶ Running time?
▷ location of maximum so far
▶ This is a linear time algorithm, since it makes one pass through the array and does a constant amount of work each time.
Snippet: other loop increments
Algorithm 7 Example: exponential change of variable in loop
i←1
while i ≤ n do
i←2∗i print i
▶ Running time?
Snippet: other loop increments
Algorithm 8 Example: exponential change of variable in loop
i←1
while i ≤ n do
i←2∗i print i
▶ Running time?
▶ This runs in logarithmic time because i doubles about lg n times until reaching n.
Example: nested loops
Algorithm 9 Snippet: Nested loops for i ← 1 to n do
for j ← i to n do print i + j
▶ Running time?
Example: nested loops
Algorithm 10 Snippet: Nested loops for i ← 1 to n do
for j ← i to n do print i + j
▶ Running time?
▶ The first iteration of the outer loop takes n elementary operations. The second iteration of the outer loop takes n − 1 operations and so forth. Therefore, the algorithm takes
n + (n − 1) + · · · + 1 = n(n + 1)/2 elementary operations for input size n.
Warm call
▶ What do we know about the running time T(n) of slowfib for term n of the Fibonacci sequence?
▶ slowfib makes F (n) function calls each of which involves a constant number of elementary operations. It turns out that F(n) grows exponentially in n, so this is an exponential time algorithm.
Asymptotic notation
Asymptotic comparison of functions
▶ In order to compare running times of algorithms we want a way of comparing the growth rates of functions.
Asymptotic comparison of functions
▶ In order to compare running times of algorithms we want a way of comparing the growth rates of functions.
▶ We want to see what happens for large values of n — small ones are not relevant.
Asymptotic comparison of functions
▶ In order to compare running times of algorithms we want a way of comparing the growth rates of functions.
▶ We want to see what happens for large values of n — small ones are not relevant.
▶ We are not usually interested in constant factors and only want to consider the dominant term.
Asymptotic comparison of functions
▶ In order to compare running times of algorithms we want a way of comparing the growth rates of functions.
▶ We want to see what happens for large values of n — small ones are not relevant.
▶ We are not usually interested in constant factors and only want to consider the dominant term.
▶ The standard mathematical approach is to use asymptotic notation O, Ω, Θ which we will now describe.
Big-O notation
▶ Suppose that f and g are functions from N to R, which take on nonnegative values.
Big-O notation
▶ Suppose that f and g are functions from N to R, which take on nonnegative values.
▶ Sayf isO(g)(“f isBig-Ohofg”)ifthereissomeC>0and somen0 ∈Nsuchthatforalln≥n0,f(n)≤Cg(n). Informally, f grows at most as fast as g.
Big-O notation
▶ Suppose that f and g are functions from N to R, which take on nonnegative values.
▶ Sayf isO(g)(“f isBig-Ohofg”)ifthereissomeC>0and somen0 ∈Nsuchthatforalln≥n0,f(n)≤Cg(n). Informally, f grows at most as fast as g.
▶ Say f is Ω(g) (“f is big-Omega of g”) if g is O(f). Informally, f grows at least as fast as g.
Big-O notation
▶ Suppose that f and g are functions from N to R, which take on nonnegative values.
▶ Sayf isO(g)(“f isBig-Ohofg”)ifthereissomeC>0and somen0 ∈Nsuchthatforalln≥n0,f(n)≤Cg(n). Informally, f grows at most as fast as g.
▶ Say f is Ω(g) (“f is big-Omega of g”) if g is O(f). Informally, f grows at least as fast as g.
▶ SayfisΘ(g)(“fisbig-Thetaofg”)iffisO(g)andgis O(f).
Informally, f grows at the same rate as g.
Big-O notation
▶ Suppose that f and g are functions from N to R, which take on nonnegative values.
▶ Sayf isO(g)(“f isBig-Ohofg”)ifthereissomeC>0and somen0 ∈Nsuchthatforalln≥n0,f(n)≤Cg(n). Informally, f grows at most as fast as g.
▶ Say f is Ω(g) (“f is big-Omega of g”) if g is O(f). Informally, f grows at least as fast as g.
▶ SayfisΘ(g)(“fisbig-Thetaofg”)iffisO(g)andgis O(f).
Informally, f grows at the same rate as g.
▶ Note that we could always reduce n0 at the expense of a bigger C but it is often easier not to.
Asymptotic comparison — examples
▶ Every linear function f(n) = an + b, a > 0, is O(n).
Asymptotic comparison — examples
▶ Every linear function f(n) = an + b, a > 0, is O(n). ▶ Proof: an+b≤an+|b|≤(a+|b|)nforn≥1.
What happens if there are many inputs of a given size?
▶ We usually don’t want to have to consider the distribution of running time over all possible inputs of a given size. There may be (infinitely) many inputs of a given size, and running time may vary widely on these.
What happens if there are many inputs of a given size?
▶ We usually don’t want to have to consider the distribution of running time over all possible inputs of a given size. There may be (infinitely) many inputs of a given size, and running time may vary widely on these.
▶ For example, for sorting the integers 1, . . . , n, there are n! possible inputs, and this is large even for n = 10.
What happens if there are many inputs of a given size?
▶ We usually don’t want to have to consider the distribution of running time over all possible inputs of a given size. There may be (infinitely) many inputs of a given size, and running time may vary widely on these.
▶ For example, for sorting the integers 1, . . . , n, there are n! possible inputs, and this is large even for n = 10.
▶ We consider statistics of T(ι) such as worst-case W(n) or average-case A(n) running time for instances ι of size n.
What are the pros and cons of worst and average case analysis?
▶ Worst-case bounds are valid for all instances: this is important for mission-critical applications.
What are the pros and cons of worst and average case analysis?
▶ Worst-case bounds are valid for all instances: this is important for mission-critical applications.
▶ Worst-case bounds are often easier to derive mathematically.
What are the pros and cons of worst and average case analysis?
▶ Worst-case bounds are valid for all instances: this is important for mission-critical applications.
▶ Worst-case bounds are often easier to derive mathematically.
▶ Worst-case bounds often hugely exceed typical running time and have little predictive or comparative value.
What are the pros and cons of worst and average case analysis?
▶ Worst-case bounds are valid for all instances: this is important for mission-critical applications.
▶ Worst-case bounds are often easier to derive mathematically.
▶ Worst-case bounds often hugely exceed typical running time
and have little predictive or comparative value.
▶ Average-case running time is often more realistic. Quicksort is a classic example.
What are the pros and cons of worst and average case analysis?
▶ Worst-case bounds are valid for all instances: this is important for mission-critical applications.
▶ Worst-case bounds are often easier to derive mathematically.
▶ Worst-case bounds often hugely exceed typical running time
and have little predictive or comparative value.
▶ Average-case running time is often more realistic. Quicksort is a classic example.
▶ Average-case analysis requires a good understanding of the probability distribution of the inputs.
What are the pros and cons of worst and average case analysis?
▶ Worst-case bounds are valid for all instances: this is important for mission-critical applications.
▶ Worst-case bounds are often easier to derive mathematically.
▶ Worst-case bounds often hugely exceed typical running time
and have little predictive or comparative value.
▶ Average-case running time is often more realistic. Quicksort is a classic example.
▶ Average-case analysis requires a good understanding of the probability distribution of the inputs.
▶ Conclusion: a good worst-case bound is always useful, but it is just a first step and we should aim to refine the analysis for important algorithms. Average-case analysis is often more practically useful, provided the algorithm will be run on “random” data.
Why can constants often be ignored?
▶ A linear time algorithm when implemented will take at most An + B seconds to run on an instance of size n, for some implementation-specific constants A, B.
Why can constants often be ignored?
▶ A linear time algorithm when implemented will take at most An + B seconds to run on an instance of size n, for some implementation-specific constants A, B.
▶ For large n, this is well approximated by An. Small n are not usually of interest anyway, since almost any algorithm is good enough for tiny instances.
Why can constants often be ignored?
▶ A linear time algorithm when implemented will take at most An + B seconds to run on an instance of size n, for some implementation-specific constants A, B.
▶ For large n, this is well approximated by An. Small n are not usually of interest anyway, since almost any algorithm is good enough for tiny instances.
▶ No matter what A is, we can easily work out how the running time scales with increasing problem size (linearly!).
Why can constants often be ignored?
▶ A linear time algorithm when implemented will take at most An + B seconds to run on an instance of size n, for some implementation-specific constants A, B.
▶ For large n, this is well approximated by An. Small n are not usually of interest anyway, since almost any algorithm is good enough for tiny instances.
▶ No matter what A is, we can easily work out how the running time scales with increasing problem size (linearly!).
▶ The difference between a linear and a quadratic time algorithm is usually huge, no matter what the constants are. For large enough n, a linear time algorithm will always beat a quadratic time one.
Why can constants often be ignored?
▶ A linear time algorithm when implemented will take at most An + B seconds to run on an instance of size n, for some implementation-specific constants A, B.
▶ For large n, this is well approximated by An. Small n are not usually of interest anyway, since almost any algorithm is good enough for tiny instances.
▶ No matter what A is, we can easily work out how the running time scales with increasing problem size (linearly!).
▶ The difference between a linear and a quadratic time algorithm is usually huge, no matter what the constants are. For large enough n, a linear time algorithm will always beat a quadratic time one.
▶ Conclusion: in practice we often need to make only crude distinctions. We only need to know whether the running time scales like n,n2,n3,nlogn,2n,…. If we need finer distinctions, we can do more analysis.
Can we always ignore constants?
▶ When we want to choose between two good algorithms for the same problem (“is my linear-time algorithm faster than your linear-time algorithm?”), we may need to know constants. These must be determined empirically.
Can we always ignore constants?
▶ When we want to choose between two good algorithms for the same problem (“is my linear-time algorithm faster than your linear-time algorithm?”), we may need to know constants. These must be determined empirically.
▶ For important algorithms that will be used many times, it is worth being more precise about the constants. Even small savings will be worth the trouble.
Can we always ignore constants?
▶ When we want to choose between two good algorithms for the same problem (“is my linear-time algorithm faster than your linear-time algorithm?”), we may need to know constants. These must be determined empirically.
▶ For important algorithms that will be used many times, it is worth being more precise about the constants. Even small savings will be worth the trouble.
▶ An algorithm with running time 10−10n2 is probably better in practice than one with running time 1010n, since the latter will eventually beat the former, but only on instances of size at least 1020, which is rarely met in practice.
Can we always ignore constants?
▶ When we want to choose between two good algorithms for the same problem (“is my linear-time algorithm faster than your linear-time algorithm?”), we may need to know constants. These must be determined empirically.
▶ For important algorithms that will be used many times, it is worth being more precise about the constants. Even small savings will be worth the trouble.
▶ An algorithm with running time 10−10n2 is probably better in practice than one with running time 1010n, since the latter will eventually beat the former, but only on instances of size at least 1020, which is rarely met in practice.
▶ Conclusion: we should have at least a rough feel for the constants of competing algorithms. However, in practice the constants are usually of moderate size.
Summary
▶ Our goal is to find an asymptotic approximation for the (worst or average case) running time of a given algorithm. Ideally we can find a simple function f and prove that the running time is Θ(f(n)).
Summary
▶ Our goal is to find an asymptotic approximation for the (worst or average case) running time of a given algorithm. Ideally we can find a simple function f and prove that the running time is Θ(f(n)).
▶ The main f(n) occurring in applications are
log n, n, n log n, n2, n3, 2n, and each grows considerably faster than the previous one. The gap between n and n log n is the smallest.
Part 2
The problem of sorting
Sorting
▶ Given n keys a1, . . . , an from a totally ordered set, put them in increasing order. The keys may be just part of a larger data record.
Sorting
▶ Given n keys a1, . . . , an from a totally ordered set, put them in increasing order. The keys may be just part of a larger data record.
▶ Common examples: integers with ≤, words with alphabetical (lexicographic) order. More complicated examples can often be reduced to these.
Sorting
▶ Given n keys a1, . . . , an from a totally ordered set, put them in increasing order. The keys may be just part of a larger data record.
▶ Common examples: integers with ≤, words with alphabetical (lexicographic) order. More complicated examples can often be reduced to these.
▶ Sorting data makes many other problems easier, e.g. selection, finding duplicates. Sorting is ubiquitous in computing.
Analysis of sorting algorithms
▶ We use comparisons and swaps as our elementary operations.
Analysis of sorting algorithms
▶ We use comparisons and swaps as our elementary operations.
▶ We can swap the position of elements (at positions i and j say). Each swap requires 3 data moves (updates of variables).
Mergesort and quicksort
▶ Each algorithm splits the input list into two sublists, recursively sorts each sublist, and then combines the sorted sublists to sort the original list.
Mergesort and quicksort
▶ Each algorithm splits the input list into two sublists, recursively sorts each sublist, and then combines the sorted sublists to sort the original list.
▶ Mergesort (J. von Neumann, 1945): splitting is very easy, most of the work is in combining. We divide the list into left and right half, and merge the recursively sorted lists with a procedure merge.
Mergesort and quicksort
▶ Each algorithm splits the input list into two sublists, recursively sorts each sublist, and then combines the sublists to sort the original list.
▶ Mergesort (J. von Neumann, 1945): splitting is very most of the work is in combining. We divide the list and right half, and merge the recursively sorted lists procedure merge.
sorted
easy, into left with a
▶ Quicksort (C. A. R. Hoare, 1962): most of the work
splitting, combining is very easy. We choose a pivot element, and use a procedure partition to alter the list so that the pivot is in its correct position. Then the left and right sublists on each side of the pivot are sorted recursively.
is in the
Algorithm 11 Mergesort
function mergesort(list a[0..n − 1])
if n = 1 then return a
else
m ← ⌊(n − 1)/2⌋
l ← mergesort(a, 0, m)
r ← mergesort(a, m + 1, n − 1)
return merge(l, r)
▷ median index of list ▷ sort left half ▷ sort right half
▷ merge both halves
Linear-time merging
▶ If L1 and L2 are sorted lists, they can be merged into one sorted list in linear time.
Linear-time merging
▶ If L1 and L2 are sorted lists, they can be merged into one sorted list in linear time.
▶ Start pointers at the beginning of each list.
Linear-time merging
▶ If L1 and L2 are sorted lists, they can be merged into one sorted list in linear time.
▶ Start pointers at the beginning of each list.
▶ Compare the elements being pointed to and choose the lesser
one to start the sorted list. Increment that pointer.
Linear-time merging
▶ If L1 and L2 are sorted lists, they can be merged into one sorted list in linear time.
▶ Start pointers at the beginning of each list.
▶ Compare the elements being pointed to and choose the lesser
one to start the sorted list. Increment that pointer.
▶ Iterate until one pointer reaches the end of its list, then copy
remainder of other list to the end of the sorted list.
Linear-time merging
▶ If L1 and L2 are sorted lists, they can be merged into one sorted list in linear time.
▶ Start pointers at the beginning of each list.
▶ Compare the elements being pointed to and choose the lesser
one to start the sorted list. Increment that pointer.
▶ Iterate until one pointer reaches the end of its list, then copy
remainder of other list to the end of the sorted list. ▶ In any case the number of comparisons is in Θ(n).
Mergesort analysis
▶ The number of comparisons Cn satisfies (approximately)
C⌈n/2⌉ + C⌊n/2⌋ + n if n > 1; Cn= 0 ifn=1.
Mergesort analysis
▶ The number of comparisons Cn satisfies (approximately) C⌈n/2⌉ + C⌊n/2⌋ + n if n > 1;
Cn= 0 ifn=1.
▶ Thusifnisapowerof2wehaveCn =2C(n/2)+n. We need to solve this recurrence!
Solving mergesort recurrence
▶ Consider the mergesort recurrence C(n) = 2C(n/2) + n (makes sense for n being a power of 2, n = 2k).
Solving mergesort recurrence
▶ Consider the mergesort recurrence C(n) = 2C(n/2) + n (makes sense for n being a power of 2, n = 2k).
▶ We can apply the recurrence on itself to find a general pattern.
Solving mergesort recurrence
▶ Consider the mergesort recurrence C(n) = 2C(n/2) + n (makes sense for n being a power of 2, n = 2k).
▶ We can apply the recurrence on itself to find a general pattern.
▶ Applying 2 steps of recurrence C(n/2) = 2C(n/4) + n/2, C(n) =
Solving mergesort recurrence
▶ Consider the mergesort recurrence C(n) = 2C(n/2) + n (makes sense for n being a power of 2, n = 2k).
▶ We can apply the recurrence on itself to find a general pattern. ▶ Applying 2 steps of recurrence C(n/2) = 2C(n/4) + n/2,
C(n) =
▶ Applying 3 steps of recurrence C(n/4) = 2C(n/8) + n/4, C(n) =
Solving mergesort recurrence
▶ Consider the mergesort recurrence C(n) = 2C(n/2) + n (makes sense for n being a power of 2, n = 2k).
▶ We can apply the recurrence on itself to find a general pattern.
▶ Applying 2 steps of recurrence C(n/2) = 2C(n/4) + n/2,
C(n) =
▶ Applying 3 steps of recurrence C(n/4) = 2C(n/8) + n/4,
C(n) =
▶ If we continue this for k steps, we would get C(n) = 2kC(n/2k) + nk
Solving mergesort recurrence
▶ Consider the mergesort recurrence C(n) = 2C(n/2) + n (makes sense for n being a power of 2, n = 2k).
▶ We can apply the recurrence on itself to find a general pattern.
▶ Applying 2 steps of recurrence C(n/2) = 2C(n/4) + n/2,
C(n) =
▶ Applying 3 steps of recurrence C(n/4) = 2C(n/8) + n/4,
C(n) =
▶ If we continue this for k steps, we would get
C(n) = 2kC(n/2k) + nk
▶ Replacing the terms with k based on terms based on n, we get C(n) = nC(1) + n log n = n log n
Running time of mergesort
▶ We proved that the running time of mergesort for input size n is O(n log n).
Quicksort
Algorithm 12 Quicksort – basic Require: 0≤i≤j≤n−1
function quicksort(list a[0..n − 1], integer i, integer j) if i