CS计算机代考程序代写 algorithm Midterm practice problems

Midterm practice problems
March 8, 2021
Due date: Just before the exam.
Remember: For any question you answer “I do not know” you get 20% of the grade associated with this question. A totally wrong answer gets 0.
All solutions should be typed.
Exercise 1 (20 points): k-diameters: Given a set X of n points in Rd and an integer k, find a partition of X into k clusters {X1,…,Xk} (that is, 􏰀ki=1Xi = X and Xi ∩Xj = ∅ for all i ̸= j) such that the objective
k
􏰁 diam(Xi) i=1
is minimized. Here diam(Z) = maxx∈Z,y∈Z ∥x − y∥1 is the diameter of the set Z ⊆ X.
1. Design an optimal polynomial-time algorithm for the k-diameters problem when d = 1. The running
time of your algorithm should be linear assuming that your data points are sorted. (10 points)
2. Prove that your algorithm is optimal. (10 points)
Exercise 2 (35 points) Assume you are given d-dimensional data points X = {x1, . . . , xn} and you want to partition them into k clusters, C1 , . . . , Ck , such that if ri is the representative of cluster Ci , then the error function
k E1(C1,…,Ck)=􏰁 􏰁 ||x−ri||1
i=1 x∈Ci
is minimized.
For this exercise, assume a clustering algorithm A that solves the above problem in time O(n2). Ad-
ditionally, assume that A is an α-approximation algorithm; that is it produces solutions with k clusters
C1′,…,Ck′ such that
where E∗ is the cost of the optimal solution.
E 1 ( C 1′ , . . . , C k′ ) ≤ α · E ∗ ,
1. Design a clustering algorithm that uses A as a subroutine and runs in time less than O(n2) and is
polynomial in k (15 points).
2. Show that your algorithm has a constant approximation bound. This approximation factor might
depend on α. Your answer should consist of both the value of the factor as well as a proof (20 points). 1

Exercise 3 (20 points): Consider a set of n data points x1, . . . , xn.
1. Assume a random number generator R() that generates values in the interval (0,1]. Let distance function dR between two points xi and xj with xi ̸= xj be dR(xi,xj) = R(). Also dR(xi,xi) = 0 for every xi. Prove or disprove that dR is a metric. (10 points)
2. Construct a graph G = (V,E), where a point xi is represented by node vi ∈ V. For every pair of nodes vi,vj, there exists an undirected edge in G with weight wij = dR(xi,xj). We define the distance function between two nodes vi and vj, denoted by dG(vi,vj), be the weight of the shortest path between vi and vj in graph G. Prove or disprove that dG is a metric. (15 points)
Exercise 4 (25 points): In class we have defined the Edit Distance between two strings x and y, of length n and m respectively to be the minimum (weighted) number of insertions, deletions and substitutions that transform string x to string y. We also demonstrated that assuming different deletion, insertion and substitution costs for every letter (or pairs of letters), the following dynamic-programming recursion computes the edit distance between x and y:
D(x(1…i−1),y(1…j))+delete(x[i]), 
D(x(1…i),y(1…j))=min D(x(1…i),y(1…j−1))+insert(y[j]),
D(x(1 . . . i − 1), y(1 . . . j − 1)) + substitute(x[i], y[j]).
In the above equation x(1…i) (resp. y(1…j)) is the substring of x (resp. of y) that consists of the first i (resp. j) symbols appearing in x (resp. y). Also, for symbol a, delete(a), insert(a) correspond to the cost of deleting or inserting a respectively. All these costs are greater than 0. Finally, for symbols a, b, substitute(a, b) corresponds to the cost of substituting symbol a with symbol b – the substitution cost is non-zero when a ̸= b and zero when a = b.
1. (15 points:) Prove or disprove that the edit distance function as defined above is a metric.
2. (10 points:) Find two instantiations of the edit-distance function that are metrics. An instantiation of the edit distance function is defined by a specific way of allocating costs to operations such as deletions, insertions and substitutions.
2