代写 data structure algorithm math graph ANLY550 – Spring 2017 Midterm Review

ANLY550 – Spring 2017 Midterm Review
1 Format
You will have 75 minutes to complete the exam. The exam will have true/false questions, multiple choice, example/counterexample problems, run-this-algorithm problems, and Problem Set style present-and-prove problems.
2 2.1
• •

The exam will be difficult. Comparable exams at other universities have seen average scores of about 60%.
Topics Covered Math fundamentals
Induction. If P (n) is a statement, P (1) is true, and for all integers n > 0, P (n) =⇒ P (n + 1), then for all integers n > 0, P (n) is true.
Recurrence Relations. We can solve simple ones by hand (or perhaps by writing out a few terms). More complicated recurrences can be solved using the Master Theorem. You should be able to state and apply the Master Theorem during a closed book exam.
You should be comfortable with the definitions for O, o, Ω, ω, and Θ. You should be able to correctly answer questions resembling Problem 2 from Assignment 1 (the one with the big table). Recall that, informally,
– f(n) = O(g(n)) is the asymptotic version of f(n) ≤ g(n). – f(n) = o(g(n)) is the asymptotic version of f(n) < g(n). – f(n) = Θ(g(n)) is the asymptotic version of f(n) = g(n). – f(n) = Ω(g(n)) is the asymptotic version of f(n) ≥ g(n). – f(n) = ω(g(n)) is the asymptotic version of f(n) > g(n).
Data Structures
Arrays, Linked Lists, Stacks, and Queues.
– A stack is a LIFO data structure, a queue is FIFO.
Heaps and Priority Queues.
– Recall that a heap is a data structure to enable fast deletion of the maximal or minimal element in a dynamic list. Consequently, we often use them to implement priority queues. The following are operations and runtimes for a binary Max-Heap:
1. Build-Heap: Turn an (unordered) list of elements into a heap in time O(n). 2. Max: Identify the maximal element in time O(1).
3. Remove-Max: Remove the maximal element in time O(log n).
4. Insert: Insert a new element in time O(log n).
– While we often think of heaps as a tree, note that they can also be stored in memory as an array.
Disjoint Set (aka Union-Find) Data Structure, including the “union by rank” implementation of the merge procedure.
– The disjoint-set data structure enables us to efficiently perform operations such as placing elements into sets, querying whether two elements are in the same set, and merging two sets together. Formally, the data structure must implement the following operations:
2.2



1

1. MAKESET(x) – create a new set containing the single element x. 2. UNION(x, y) – replace sets containing x and y by their union.
3. FIND(x) – return the name of the set containing x.
– The implementation of the data structure that we covered represents each set as a tree (one tree per set), where the root of the tree is the canonical “name” of the tree. To ensure O(log n) runtime of FIND, we try to ensure that the underlying tree structure is balanced. We do so by implementing UNION via the union-by-rank heuristic, which tries to preserve as low rank as possible by making the set with larger rank the parent.
2.3 Algorithms
You should be able to run any of these algorithms by hand on the exam.
• • •
• •
• • •
2.4
• •

Mergesort. Divide and Conquer. Split a list into two halves, recursively sort each half, and then merge the two sorted sublists to complete the sort of the full input.
Breadth First Search. Visit the vertices in the order of how close they are to the starting vertex. Runtime: O(|V | + |E |).
Depth First Search. Keep moving along edges of the graph until the only vertices you can reach have already been visited, and then start backtracking. Runtime is O(|V |+|E|). You should know what tree edges, forward edges, back edges, and cross edges are!
Topological Sort. Useful for ordering nodes in a Directed Acyclic Graph. The algorithm we saw for this uses DFS. Dijkstra’s Algorithm.
– Solves single-source shortest paths, but it only works for positive edge weights.
– Can be viewed as an efficient “simulation” of the following algorithm: Break each edge of weight w into w
edges of weight 1, and then run BFS on the resulting graph.
– Djikstra’s uses a priority queue to ensure the simulation runs fast.
– Runtime is O(|E| · log |V |) if the priority queue is implemented via a binary heap.
Bellman Ford. Solves single-source shortest paths even when edge weights may be negative. Bonus: detects negative cycles!
Prim’s Algorithm. Build a minimum spanning tree by “growing one out of a single seed vertex”. Uses the Cut Property to justify choosing the smallest edge leaving the current group of nodes reachable from the seed vertex.
Kruskal’s Algorithm. Build a tree by sorting the edges, considering them one by one, and adding it to the tree as long as it doesn’t create a cycle. Uses the Disjoint Set aka Union-Find data structure.
Algorithm Strategies
Greedy. Pick the best option at each step. e.g., the algorithm we saw for finding a satisfying assignment for any Horn
Formula, if one exists.
Divide and Conquer. Split the problem into smaller subproblems (often in half), solve the sub-problems, and combine the results. e.g., Mergesort, Karatsuba’s Algorithm for Integer Multiplication, Strassen’s algorithm for matrix multiplication.
Dynamic Programming. Useful for when there are many overlapping subproblems. e.g., Edit Distance, Floyd- Warshall (all pairs shortest paths).
2

3 Some Practice Problems 3.1 Asymptotic Notation
Exercise. Rank the following six functions by increasing order of growth. That is, find any arrangement g1, g2, g3, g4, g5, g6 of the functions so that g1 = O(g2), g2 = O(g3), g3 = O(g4), g4 = O(g5), g5 = O(g6).
f1(n)=n4 +logn,f2(n)=n+log4n,f3(n)=nlogn,f4(n)=􏰀n􏰁,f5(n)=2n,f6(n)=nlogn. 3
Solution.n+log4n=O(nlogn).nlogn=O(􏰀n􏰁).􏰀n􏰁=O(n4 +logn).n4 +logn=O(nlogn).nlogn =O(2n). 33
3.2 Practice with Graphs
Exercise. Given a connected graph G, determine if it’s bipartite, and if it is, find a partition that establishes that. A bipartite graph is a graph whose vertices can be divided into two disjoint sets such that no two vertices in the same set are connected by an edge.
Solution. We modify BFS when run on any starting vertex v. First, we put v in partition 1 (imagine it being colored Red), and put all neighbors u of v into partition 2 (imaging them being colored Blue). Then, for every neighbor u, place each of their neighbors in partition 1 (imagine coloring them Red); continue the algorithm until no vertices are unexplored. If at any time we encounter a neighboring vertex that will be in the same partition as the vertex being processed (aka neighboring vertices have the same color), then we output “not bipartite”. Otherwise, output bipartite.
Proof of correctness: If the algorithm outputs bipartite, it has actually found a partition of the graph such that no edges cross between the two sides of the partition, so the graph is clearly bipartite in this case.
If the algorithm outputs “not bipartite”, then we claim that the graph contains a cycle with an odd number of edges in it, and it is easily seen that there is no way to partition the nodes in such a cycle into two sets so that no edge in the cycle crosses between the two sets.
To see that the graph contains an odd cycle, observe that the the algorithm must have found two vertices u and w that are neighbors, and are either both at even distance from v (this is the case if they are both in partition 1), or both at odd distance from v (this is the case if they are both in partition 2). One obtains a cycle with an odd number of edges by following the shortest path from v to u, then following edge (u, w), and then following the shortest path from w back to v.
3.3 Minimum Spanning Trees
Exercise. Let G = (V, E) be a connected, undirected graph with distinct edge weights. Consider a cycle v1, v2, . . . , vk, v1 in G, and let e = (vi,vi+1) be the edge in the cycle with the largest edge weight. Prove that e is not in any minimum spanning tree of G.
Solution. Consider any MST T including e. If we remove e from T , T splits into two connected components. In addition to e, there must exist another edge e′ from the cycle that crosses between these two connected components (make sure you see why this is true). Consider the spanning tree (T \ {e}) ∪ {e′ }. The cost of this spanning tree is strictly less than the cost of e, since e is the largest weight edge in the cycle. Hence, T could not have been a minimum spanning tree.
3.4 Shortest Paths
Exercise. Let G = (V, E) be a weighted, directed graph with exactly one negative weight edge and no negative-weight cycles. Give an algorithm to find the shortest distance from s to all other vertices in V that has the same running time as Dijkstra.
Solution. Let’s say the negative-weight edge is (u, v). First, remove the edge and run Dijkstra from s on this modified graph. Then, letting ds[u] denote the distance from s to u in the modified graph, check if ds[u] + w(u, v) < ds[v]. If not, 3 then this means the shortest path from u to v in G does not use the edge (u, v). Hence, we are done in this case: the shortest path from s to any other node t in the modified graph is the same as the shortest path from s to t in the original graph G (make sure you understand why this is true). If yes, then run Dijkstra starting from v on the modified graph. Then, for any node t, its shortest distance from s will be min(ds[t], ds[u]+w(u, v)+dv[t]) (this holds because ds[t] captures the length of the shortest path from s to t that avoids the edge (u, v), while ds[u] + w(u, v) + dv[t] captures the length of the shortest path from s to t that uses the edge (u, v). Make sure you see why this is true). 3.5 Divide and Conquer Exercise. You’re given an array A of n distinct numbers that is sorted up to a cyclic shift, i.e., A = (a1, a2, . . . , an) where thereexistsaksuchthatak an, then let m denote the middle element of the array. If am > a1, then it is guaranteed that the maximum element of the array is in the right half (make sure you see why this is true), and the right half is also sorted up to a cyclic shift, so we recursively solve the problem on the subarray consisting of the right half of the input.
If am < a1, then the maximum element of the array is in the left half, and the left half of the array is also sorted up to a cyclic shift, so we recursively solve the problem on the subarray consisting of theleft half of the input. Every stage of the recursion performs O(1) work and cuts the input size by a factor of 2 so the runtime is O(log n). One can also derive this time bound using the master theorem and the fact that the runtime satisfies the recurrence T (n) ≤ T (n/2) + O(1). 3.6 Dynamic Programming 3.6.1 Wood Cutting (From MIT 6.006, Spring 2011) Given a log of wood of length k, Woody the woodcutter will cut it once, in any place you choose, for the price of k dollars. Suppose you have a log of length L, marked to be cut in n different locations labeled 1, 2, . . . , n. For simplicity, let indices 0 and n + 1 denote the left and right endpoints of the original log of length L. Let the distance of mark i from the leftendofthelogbedi,andassumethat0=d0 0, Opt(i, pi) = minposes p Opt(i−1, p)+D(p, pi). 3.7 Additional Practice Problems
Two good sources for practice problems (with solutions) are:
http://courses.csail.mit.edu/6.006/fall11/rec/rec10_review1.pdf
http://courses.csail.mit.edu/6.006/fall11/rec/rec18.pdf
Note that these sources cover some topics (such as Newton’s Method and the Rabin-Karp algorithm) that we will either not cover at all, or that we will only cover after the midterm.
4 Some Study Tips (These Will Be Useful For the Final As Well)
In general, you can probably break down the types of questions you?ll be asked into two broad categories: those that test you on more on specific algorithms we’ve covered (see Section 2.3), and those that test you on your ability to apply more general algorithmic tools/paradigms (see Section 2.4).
4.1 When studying specific algorithms
• Understand the main idea behind the algorithm, the runtime, and the space complexity for any algorithm that we’ve covered.
– Trying to boil down what the algorithm is doing to a 1-3 sentence summary can help to ensure you understand the key point of the algorithm.
– For space and time complexity, you want to be able to identify what the most expensive operations are or what the recurrence leading to the complexity is.
• Understand the constraints for what types of problems that algorithm solves.
– For example, if we have an algorithm on directed acyclic graphs (DAGs), do you know why it wouldn’t work
on graphs with cycles? (Try thinking about what would happen to the algorithm if you invalidated one of the 5

• •
4.2



assumptions; what step(s) of the analysis would go wrong in this case? Could you adjust the algorithm to fix them?)
Focus on the big ideas of the analysis of the algorithms above all the details.
Try thinking about variations of the algorithms. For example, if you have a major step in the algorithm, try thinking about what would happen if tweaked that step. Alternatively, if the algorithm uses one data structure, what would happen if you replaced it with a different one?
When studying more general tools/paradigms
Practice applying them.
– Apart from the problems we’ve released, you can find others in textbooks and online.
– One thing that’s helpful here is to also solve problems where you don’t know what tool you’re supposed to be using, so that you can practice choosing the right tool.
Understand what the key characteristics are that make a tool work well. For example, dynamic programming tends to work better when you can break your problem down into slightly smaller subproblems (say one size smaller), whereas divide and conquer works when you can break your problem down into significantly smaller problems (say half the size).
When you’re working on the problems, a lot of times the most difficult piece is not the details of analysis/proof, but the initial choices you make in set.
– For dynamic programming and divide-and-conquer, the first step is to choose a subproblem. However, a lot of times the difficulty in getting to the answer is not in the analysis after this point, but in choosing the subproblem itself (so if you find yourself getting stuck, try using a different subproblem).
6