程序代写代做代考 algorithm data structure ANLY550 – Spring 2017 Midterm Review

ANLY550 – Spring 2017 Midterm Review

1 Format

You will have 75 minutes to complete the exam. The exam will have true/false questions, multiple choice, example/counterexample
problems, run-this-algorithm problems, and Problem Set style present-and-prove problems.

The exam will be difficult. Comparable exams at other universities have seen average scores of about 60%.

2 Topics Covered

2.1 Math fundamentals

• Induction. If P (n) is a statement, P (1) is true, and for all integers n > 0, P (n) =⇒ P (n + 1), then for all integers
n > 0, P (n) is true.

• Recurrence Relations. We can solve simple ones by hand (or perhaps by writing out a few terms). More complicated
recurrences can be solved using the Master Theorem. You should be able to state and apply the Master Theorem
during a closed book exam.

• You should be comfortable with the definitions for O, o, Ω, ω, and Θ. You should be able to correctly answer
questions resembling Problem 2 from Assignment 1 (the one with the big table). Recall that, informally,

– f(n) = O(g(n)) is the asymptotic version of f(n) ≤ g(n).
– f(n) = o(g(n)) is the asymptotic version of f(n) < g(n). – f(n) = Θ(g(n)) is the asymptotic version of f(n) = g(n). – f(n) = Ω(g(n)) is the asymptotic version of f(n) ≥ g(n). – f(n) = ω(g(n)) is the asymptotic version of f(n) > g(n).

2.2 Data Structures

• Arrays, Linked Lists, Stacks, and Queues.

– A stack is a LIFO data structure, a queue is FIFO.

• Heaps and Priority Queues.

– Recall that a heap is a data structure to enable fast deletion of the maximal or minimal element in a dynamic
list. Consequently, we often use them to implement priority queues. The following are operations and runtimes
for a binary Max-Heap:

1. Build-Heap: Turn an (unordered) list of elements into a heap in time O(n).
2. Max: Identify the maximal element in time O(1).
3. Remove-Max: Remove the maximal element in time O(log n).
4. Insert: Insert a new element in time O(log n).

– While we often think of heaps as a tree, note that they can also be stored in memory as an array.

• Disjoint Set (aka Union-Find) Data Structure, including the “union by rank” implementation of the merge procedure.

– The disjoint-set data structure enables us to efficiently perform operations such as placing elements into sets,
querying whether two elements are in the same set, and merging two sets together. Formally, the data structure
must implement the following operations:

1

1. MAKESET(x) – create a new set containing the single element x.
2. UNION(x, y) – replace sets containing x and y by their union.
3. FIND(x) – return the name of the set containing x.

– The implementation of the data structure that we covered represents each set as a tree (one tree per set), where
the root of the tree is the canonical “name” of the tree. To ensure O(log n) runtime of FIND, we try to ensure that
the underlying tree structure is balanced. We do so by implementing UNION via the union-by-rank heuristic,
which tries to preserve as low rank as possible by making the set with larger rank the parent.

2.3 Algorithms

You should be able to run any of these algorithms by hand on the exam.

• Mergesort. Divide and Conquer. Split a list into two halves, recursively sort each half, and then merge the two sorted
sublists to complete the sort of the full input.

• Breadth First Search. Visit the vertices in the order of how close they are to the starting vertex. Runtime: O(|V |+
|E|).

• Depth First Search. Keep moving along edges of the graph until the only vertices you can reach have already been
visited, and then start backtracking. Runtime is O(|V |+ |E|). You should know what tree edges, forward edges, back
edges, and cross edges are!

• Topological Sort. Useful for ordering nodes in a Directed Acyclic Graph. The algorithm we saw for this uses DFS.

• Dijkstra’s Algorithm.

– Solves single-source shortest paths, but it only works for positive edge weights.
– Can be viewed as an efficient “simulation” of the following algorithm: Break each edge of weight w into w

edges of weight 1, and then run BFS on the resulting graph.

– Djikstra’s uses a priority queue to ensure the simulation runs fast.
– Runtime is O(|E| · log |V |) if the priority queue is implemented via a binary heap.

• Bellman Ford. Solves single-source shortest paths even when edge weights may be negative. Bonus: detects negative
cycles!

• Prim’s Algorithm. Build a minimum spanning tree by “growing one out of a single seed vertex”. Uses the Cut
Property to justify choosing the smallest edge leaving the current group of nodes reachable from the seed vertex.

• Kruskal’s Algorithm. Build a tree by sorting the edges, considering them one by one, and adding it to the tree as
long as it doesn’t create a cycle. Uses the Disjoint Set aka Union-Find data structure.

2.4 Algorithm Strategies

• Greedy. Pick the best option at each step. e.g., the algorithm we saw for finding a satisfying assignment for any Horn
Formula, if one exists.

• Divide and Conquer. Split the problem into smaller subproblems (often in half), solve the sub-problems, and
combine the results. e.g., Mergesort, Karatsuba’s Algorithm for Integer Multiplication, Strassen’s algorithm for
matrix multiplication.

• Dynamic Programming. Useful for when there are many overlapping subproblems. e.g., Edit Distance, Floyd-
Warshall (all pairs shortest paths).

2

3 Some Practice Problems

3.1 Asymptotic Notation

Exercise. Rank the following six functions by increasing order of growth. That is, find any arrangement g1, g2, g3, g4, g5,
g6 of the functions so that g1 = O(g2), g2 = O(g3), g3 = O(g4), g4 = O(g5), g5 = O(g6).

f1(n) = n
4 + log n, f2(n) = n + log

4 n, f3(n) = n log n, f4(n) =
(
n
3

)
, f5(n) = 2n, f6(n) = nlogn.

Solution. n + log4 n = O(n log n). n log n = O(
(
n
3

)
).
(
n
3

)
= O(n4 + log n). n4 + log n = O(nlogn). nlogn = O(2n).

3.2 Practice with Graphs

Exercise. Given a connected graph G, determine if it’s bipartite, and if it is, find a partition that establishes that. A bipartite
graph is a graph whose vertices can be divided into two disjoint sets such that no two vertices in the same set are connected
by an edge.

Solution. We modify BFS when run on any starting vertex v. First, we put v in partition 1 (imagine it being colored Red),
and put all neighbors u of v into partition 2 (imaging them being colored Blue). Then, for every neighbor u, place each of
their neighbors in partition 1 (imagine coloring them Red); continue the algorithm until no vertices are unexplored. If at any
time we encounter a neighboring vertex that will be in the same partition as the vertex being processed (aka neighboring
vertices have the same color), then we output “not bipartite”. Otherwise, output bipartite.

Proof of correctness: If the algorithm outputs bipartite, it has actually found a partition of the graph such that no edges
cross between the two sides of the partition, so the graph is clearly bipartite in this case.

If the algorithm outputs “not bipartite”, then we claim that the graph contains a cycle with an odd number of edges in
it, and it is easily seen that there is no way to partition the nodes in such a cycle into two sets so that no edge in the cycle
crosses between the two sets.

To see that the graph contains an odd cycle, observe that the the algorithm must have found two vertices u and w
that are neighbors, and are either both at even distance from v (this is the case if they are both in partition 1), or both at
odd distance from v (this is the case if they are both in partition 2). One obtains a cycle with an odd number of edges by
following the shortest path from v to u, then following edge (u,w), and then following the shortest path from w back to v.

3.3 Minimum Spanning Trees

Exercise. Let G = (V,E) be a connected, undirected graph with distinct edge weights. Consider a cycle v1, v2, . . . , vk, v1
in G, and let e = (vi, vi+1) be the edge in the cycle with the largest edge weight. Prove that e is not in any minimum
spanning tree of G.

Solution. Consider any MST T including e. If we remove e from T , T splits into two connected components. In addition
to e, there must exist another edge e′ from the cycle that crosses between these two connected components (make sure you
see why this is true). Consider the spanning tree (T \ {e})∪{e′}. The cost of this spanning tree is strictly less than the cost
of e, since e is the largest weight edge in the cycle. Hence, T could not have been a minimum spanning tree.

3.4 Shortest Paths

Exercise. Let G = (V,E) be a weighted, directed graph with exactly one negative weight edge and no negative-weight
cycles. Give an algorithm to find the shortest distance from s to all other vertices in V that has the same running time as
Dijkstra.

Solution. Let’s say the negative-weight edge is (u, v). First, remove the edge and run Dijkstra from s on this modified
graph. Then, letting ds[u] denote the distance from s to u in the modified graph, check if ds[u] + w(u, v) < ds[v]. If not, 3 then this means the shortest path from u to v in G does not use the edge (u, v). Hence, we are done in this case: the shortest path from s to any other node t in the modified graph is the same as the shortest path from s to t in the original graph G (make sure you understand why this is true). If yes, then run Dijkstra starting from v on the modified graph. Then, for any node t, its shortest distance from s will be min(ds[t], ds[u]+w(u, v)+dv[t]) (this holds because ds[t] captures the length of the shortest path from s to t that avoids the edge (u, v), while ds[u] + w(u, v) + dv[t] captures the length of the shortest path from s to t that uses the edge (u, v). Make sure you see why this is true). 3.5 Divide and Conquer Exercise. You’re given an array A of n distinct numbers that is sorted up to a cyclic shift, i.e., A = (a1, a2, . . . , an) where there exists a k such that ak < ak+1 < · · · < an < a1 < · · · < ak−1, but you don’t know what k is. Give an O(log n) time algorithm to find the largest number in this array. Solution. Here’s a recursive solution based on binary search that runs in O(log n) time. If a1 < an, then the array is actually sorted (make sure we see why this is true), so we can return the last element an, which will be the greatest. Otherwise, if a1 > an, then let m denote the middle element of the array. If am > a1, then it is guaranteed that the maximum element of
the array is in the right half (make sure you see why this is true), and the right half is also sorted up to a cyclic shift, so we
recursively solve the problem on the subarray consisting of the right half of the input.

If am < a1, then the maximum element of the array is in the left half, and the left half of the array is also sorted up to a cyclic shift, so we recursively solve the problem on the subarray consisting of theleft half of the input. Every stage of the recursion performs O(1) work and cuts the input size by a factor of 2 so the runtime is O(log n). One can also derive this time bound using the master theorem and the fact that the runtime satisfies the recurrence T (n) ≤ T (n/2) + O(1). 3.6 Dynamic Programming 3.6.1 Wood Cutting (From MIT 6.006, Spring 2011) Given a log of wood of length k, Woody the woodcutter will cut it once, in any place you choose, for the price of k dollars. Suppose you have a log of length L, marked to be cut in n different locations labeled 1, 2, . . . , n. For simplicity, let indices 0 and n + 1 denote the left and right endpoints of the original log of length L. Let the distance of mark i from the left end of the log be di, and assume that 0 = d0 < d1 < d2 < · · · < dn < dn+1 = L. Exercise. Give an O(n3) time algorithm to determine the sequence of cuts to the log that will (1) cut the log at all the marked places, and (2) minimize your total payment to Woody. Solution (a bit sketchy). Dynamic programming, where there is a subproblem from every pair (i, j) with 0 ≤ i < j ≤ n+1. The value of the subproblem for pair (i, j), denoted min-cost(i, j), is the minimum cost sequence of cuts to cut the sub-log whose left endpoint is point i, and whose right endpoint is point j. Observe that min-cost(i, j) = (dj − di) + mink : i 0, Opt(i, pi) = minposes p Opt(i−1, p)+D(p, pi).

3.7 Additional Practice Problems

Two good sources for practice problems (with solutions) are:

http://courses.csail.mit.edu/6.006/fall11/rec/rec10_review1.pdf

http://courses.csail.mit.edu/6.006/fall11/rec/rec18.pdf

Note that these sources cover some topics (such as Newton’s Method and the Rabin-Karp algorithm) that we will either
not cover at all, or that we will only cover after the midterm.

4 Some Study Tips (These Will Be Useful For the Final As Well)

In general, you can probably break down the types of questions you?ll be asked into two broad categories: those that test
you on more on specific algorithms we’ve covered (see Section 2.3), and those that test you on your ability to apply more
general algorithmic tools/paradigms (see Section 2.4).

4.1 When studying specific algorithms

• Understand the main idea behind the algorithm, the runtime, and the space complexity for any algorithm that we’ve
covered.

– Trying to boil down what the algorithm is doing to a 1-3 sentence summary can help to ensure you understand
the key point of the algorithm.

– For space and time complexity, you want to be able to identify what the most expensive operations are or what
the recurrence leading to the complexity is.

• Understand the constraints for what types of problems that algorithm solves.

– For example, if we have an algorithm on directed acyclic graphs (DAGs), do you know why it wouldn’t work
on graphs with cycles? (Try thinking about what would happen to the algorithm if you invalidated one of the

5

assumptions; what step(s) of the analysis would go wrong in this case? Could you adjust the algorithm to fix
them?)

• Focus on the big ideas of the analysis of the algorithms above all the details.

• Try thinking about variations of the algorithms. For example, if you have a major step in the algorithm, try thinking
about what would happen if tweaked that step. Alternatively, if the algorithm uses one data structure, what would
happen if you replaced it with a different one?

4.2 When studying more general tools/paradigms

• Practice applying them.

– Apart from the problems we’ve released, you can find others in textbooks and online.
– One thing that’s helpful here is to also solve problems where you don’t know what tool you’re supposed to be

using, so that you can practice choosing the right tool.

• Understand what the key characteristics are that make a tool work well. For example, dynamic programming tends to
work better when you can break your problem down into slightly smaller subproblems (say one size smaller), whereas
divide and conquer works when you can break your problem down into significantly smaller problems (say half the
size).

• When you’re working on the problems, a lot of times the most difficult piece is not the details of analysis/proof, but
the initial choices you make in set.

– For dynamic programming and divide-and-conquer, the first step is to choose a subproblem. However, a lot of
times the difficulty in getting to the answer is not in the analysis after this point, but in choosing the subproblem
itself (so if you find yourself getting stuck, try using a different subproblem).

6