CSE 101 Exam 2 Review
CSE 101 Exam 2 Review
Shortest Paths (Ch 4)
BFS
Dijkstra
Priority Queues
Bellman-Ford
Shortest Paths in DAGs
Fundamental Shortest Paths Formula
dist(v)
ℓ(v,w)
s
w
v
Observation
If there is a length ≤d s-v path, then there is some w adjacent to v with a length ≤(d-1) s-w path.
Proof: w is the next to last vertex on the path.
This means that if we know all of the vertices at distance ≤(d-1), we can find all of the vertices at distance ≤d.
BFS(G,s)
For v ∈ V, dist(v) ← ∞
Initialize Queue Q
Q.enqueue(s)
dist(s) ← 0
While(Q nonempty)
u ← front(Q)
For (u,v) ∈ E
If dist(v) = ∞
dist(v) ← dist(u)+1
Q.enqueue(v)
v.prev ← u
Total runtime:
O(|V|+|E|)
Edge Lengths
The number of edges in a path is not always the right measure of distance. Sometimes, taking several shorter steps is preferable to taking a few longer ones.
We assign each edge (u,v) a non-negative length ℓ(u,v). The length of a path is the sum of the lengths of its edges.
Problem: Shortest Paths
Problem: Given a Graph G with vertices s and t and a length function ℓ, find the shortest path from s to t.
Algorithm
Distances(G,s,ℓ)
dist(s) ← 0
While(not all distances found)
Find minimum over (v,w) ∈ E
with v discovered w not
of dist(v)+ℓ(v,w)
dist(w) ← dist(v)+ℓ(v,w)
prev(w) ← v
Why does this work?
Claim: Whenever the algorithm assigns a distance to a vertex v that is the length of the shortest path from s to v.
Proof by Induction:
dist(s) = 0 [the empty path has length 0]
When assigning distance to w, assume that all previously assigned distances are correct.
Inductive Step
Correctly Assigned Distances
ℓ(v,w)
dist(v)
This is the shortest path from s to any vertex outside the bubble.
s
v
w
Priority Queue
A Priority Queue is a datastructure that stores elements sorted by a key value.
Operations:
Insert – adds a new element to the PQ.
DecreaseKey – Changes the key of an element of the PQ to a specified smaller value.
DeleteMin – Finds the element with the smallest key and removes it from the PQ.
Dijkstra(G,s,ℓ)
Initialize Priority Queue Q
For v ∈ V
dist(v) ← ∞
Q.Insert(v)
dist(s) ← 0
While(Q not empty)
v ← Q.DeleteMin()
For (v,w) ∈ E
If dist(v)+ℓ(v,w) < dist(w)
dist(w) ← dist(v)+ℓ(v,w)
Q.DecreaseKey(w)
Runtime:
O(|V|) Inserts +
O(|V|) DeleteMins +
O(|E|) DecreaseKeys
Binary Heap
Store elements in a balanced binary tree with each element having smaller key than its children.
Smallest Key
log(n)
levels
0
1
3
7
5
8
6
Operations
Insert
Add new element at bottom
Bubble up
DecreaseKey
Change key
Bubble up
DeleteMin
Remove root
Move bottom to root
Bubble down
Runtime
O(log(n)) per operation
d-ary Heap
Like binary heap, but each node has d children
Only log(n)/log(d) levels.
Bubble up faster!
Bubble down slower – need to compare to more children.
Summary of Priority Queues
Negative Edge Weights
Unusual case, but some applications
Dijkstra no longer works
Sometimes need to be able to look ahead to find savings
Negative Weight Cycles
Definition: A negative weight cycle is a cycle where the total weight of edges is negative.
If G has a negative weight cycle, then there are probably no shortest paths.
Go around the cycle over and over.
Note: For undirected G, a single negative weight edge gives a negative weight cycle by going back and forth on it.
Fundamental Shortest Paths Formula
dist(v)
ℓ(v,w)
System of equations to solve for distances.
When ℓ ≥ 0, Dijsktra gives an order to solve in.
With ℓ < 0, might be no solution.
s
w
v
Algorithm Idea
Instead of finding shortest paths (which may not exist), find shortest paths of length at most k.
distk-1(v)
ℓ(v,w)
k-1 edges
s
w
v
Algorithm
Bellman-Ford(G,s,ℓ)
dist0(v) ← ∞ for all v
//cant reach
dist0(s) ← 0
For k = 1 to n
For w ∈ V
distk(w)←min(distk-1(v)+ℓ(v,w))
distk(s) ← min(distk(s),0)
// s has the trivial path
O(|E|)
What value of k do we use?
Analysis
Proposition: If n ≥ |V|-1 and if G has no negative weight cycles, then for all v,
dist(v) = distn(v).
If there is a negative weight cycle, there probably is no shortest path.
If not, we only need to run our algorithm for |V| rounds, for a final runtime O(|V||E|).
Proof
If we have a path with |V| edges:
Must repeat some vertex
Contains a loop
Removing loop makes path shorter
Thus, there is a shortest path with at most |V|-1 edges.
Cycle Detection
Proposition: For any n ≥ |V| - 1, there are no negative weight cycles reachable from s if and only if for every v ∈ V
distn(v) = distn+1(v)
Detect by running one more round of Bellman-Ford.
Need to see if any v’s distance changes.
Proof of “Only If”
Suppose no negative weight cycles.
For any n ≥ |V| - 1, distn(v) = dist(v).
So
distn(v) = dist(v) = distn+1(v)
Proof of “If”
Suppose distn(v) = distn+1(v) for all v.
So
But if there were a negative weight cycle, distances would decrease eventually.
Algorithm
ShortestPathsInDAGs(G,s,ℓ)
TopologicalSort(G)
For w ∈ V in topological order
If w = s, dist(w) ← 0
Else
dist(w)←min(dist(v)+ℓ(v,w))
\\ dist(v) for all upstream v already computed
Runtime
O(|V|+|E|)
Divide & Conquer (Ch 2)
General Technique
Master Theorem
Karatsuba Multiplication
Merge Sort
Order Statistics
Binary Search
Closest Pair of Points
Divide and Conquer
This is the first of our three major algorithmic techniques.
Break problem into pieces
Solve pieces recursively
Recombine pieces to get answer
Master Theorem
Theorem: Let T(n) be given by the recurrence:
Then we have that
Karatsuba Multiplication
Want to multiply N and M:
Let X ≈ √(N+M) be a power of 2.
Write N = AX+B, M = CX+D
This can be done by just taking the high and low bits.
N·M = AC·X2+(AD+BC)X+BD
= AC·X2+[(A+B)(C+D)-AC-BD]X + BD
The multiplications by X are just bit shifts.
Algorithm
KaratsubaMult(N,M)
If N+M<99, Return Product(N,M)
Let X be a power of 2⌊log(N+M)/2⌋
Write N = AX + B, M = CX + D
P1 ← KaratsubaMult(A,C)
P2 ← KaratsubaMult(B,D)
P3 ← KaratsubaMult(A+B,C+D)
Return P1X2 + [P3-P1-P2]X + P3
Runtime Recurrence
Karatsuba multiplication on inputs of size n spends O(n) time, and then makes three recursive calls to problems of (approximately) half the size.
If T(n) is the runtime for n-bit inputs, we have the recursion:
Master Theorem:
Note
In divide and conquer, it is important that the recursive subcalls are a constant fraction of the size of the original.
Note on Proving Correctness
There’s a general procedure for proving correctness of a D&C algorithm:
Use Induction: Prove correctness by induction on problem size.
Base Case: Your base case will be the non-recursive case of your algorithm (which your algorithm does need to have).
Inductive Step: Assuming that the (smaller) recursive calls are correct, show that algorithm works.
Sorting
Problem: Given a list of n numbers, return those numbers in ascending order.
Divide and Conquer
How do we make a divide and conquer algorithm?
Split into Subproblems
Recursively Solve
Combine Answers
- Split list in two
- Sort each sublist
- ???
Merge
Problem: Given two sorted lists, combine them into a single sorted list.
Merge
Merge(A,B)
C ← List of length Len(A)+Len(B)
a ← 1, b ← 1
For c = 1 to Len(C)
If (b > Len(B))
C[c] ← A[a], a++
Else if (a > Len(A))
C[c] ← B[b], b++
Else if A[a] < B[b]
C[c] ← A[a], a++
Else
C[c] ← B[b], b++
Return C
Runtime: O(|A|+|B|)
MergeSort
MergeSort(L)
If Len(L) = 1 \\ Base Case
Return L
Split L into equal L1 and L2
A ← MergeSort(L1)
B ← MergeSort(L2)
Return Merge(A,B)
Runtime
T(n) = 2T(n/2)+O(n)
T(n) = O(n log(n))
Order Statistics
Problem: Given a list L of numbers and an integer k, find the kth largest element of L.
Divide Step
Select a pivot x ∈ L. Compare it to the other elements of L.
= x
< x
> x
Which list is our answer in?
Answer is > x if there are ≥ k elements bigger than x.
Answer is x if there are < k elements bigger and ≥ k elements bigger than or equal to x.
Otherwise answer is less than x.
Order Statistics
Select(L,k)
Pick x ∈ L
Sort L into L>x, L
Return Select(L>x,k)
Else if Len(L>x)+Len(L=x) ≥ k
Return x
Return
Select(L
Randomization
Fix: Pick a random pivot.
n/4
n/4
n/2
x
There’s a 50% chance that x is selected in the middle half.
If so, no matter where the answer is, recursive call of size at most 3n/4.
On average need two tries to reduce call.
Runtime
T(n) = T(3n/4) + O(n)
So…
T(n) = O(n)
Note
The algorithm discussed does give the correct answer in expected O(n) time.
There are deterministic O(n) algorithms using similar ideas, but they are substantially more complicated.
Search
Problem: Given a sorted list L and a number x, find the location of x in L.
Binary Search
BinarySearch(L,i,j,x)
\\Search between L[i] and L[j]
If j < i, Return ‘error’
k ← [(i+j)/2]
If L[k] = x, Return k
If L[k] > x
Return BinarySearch(L,i,k-1,x)
If L[k] < x
Return BinarySearch(L,k+1,j,x)
Runtime
T(n) = T(n/2) + O(1)
So…
T(n) = O(log(n))
Binary Search Puzzles
You have 27 coins one of which is heavier than the others, and a balance. Determine the heavy coin in 3 weightings.
Lots of puzzles have binary search-like answers. As long as you can spend constant time to divide your search space in half (or thirds). You can use binary search in O(log(n)) time.
Closest Pair of Points (Ex 2.32)
Problem: Given n points in the plane (x1,y1)…(xn,yn) find the pair (xi,yi) and (xj,yj) whose Euclidean distance is as small as possible.
Divide and Conquer Outline
Divide points into two sets by drawing a line.
Compute closest pair on each side.
What about pairs that cross the divide?
Observation
Suppose closest pair on either side at distance δ.
Only need to care about points within δ of dividing line.
Need to know if some pair closer than δ.
Main Idea
Proposition: Take the points within δ of the dividing line and sort them by y-coordinate. Any one of these points can only be within δ of the 8 closest points on either side of it.
Proof
Nearby points must have y-coordinate within δ.
Subdivide region into δ/2-sided squares.
At most one point in each square.
δ
δ
δ
δ
< δ
Algorithm
CPP(S)
If |S| ≤ 3
Return closest distance
Find line L evenly dividing points
Sort S into Sleft, Sright
δ ← min(CPP(Sleft),CPP(Sright))
Let T be points within δ of L
Sort T by y-coordinate
Compare each element of T to 8 closest
on either side. Let min dist be δ’.
Return min(δ,δ’)
Runtime
We have a recurrence
T(n) = O(n log(n)) + 2 T(n/2).
This is not quite covered by the Master Theorem, but can be shown to give
T(n) = O(n log2(n)).
Alternatively, if you are more careful and have CPP take points already sorted by y-coordinate, you can reduce to O(n log(n)).