ANLY550_asst3
ANLY550_asst3
Ruhan Wang
March 15, 2017
1.
Suppose there’re two minimum spanning tree A,B from a connected undirected graph where the edges weights
are unique. Also, there should be at least one edge e = (u, v) in A but not in B.
Considering the fact that minimum spanning tree do not have cycle, when we cut e, A should be split into 2
components U and V. If we use the edge e′ other than e to union the components U and V, the e′ must have
wight different with e, therefore, the total weight in B would be different with A, which means either A or B
is not minimum spanning tree for graph.
Therefore, there is a unique spanning tree for a connected undirected graph.
2.
The best replacement of jobs should satisfy
best_load > max(maxi=1,2,…,n(ji),
1
2
n∑
i=1
ji)
Therefore, we need to prove the load of given greedy algorithm should have
greedy_load ≤ max(
3
2
maxi=1,2,…,n(ji),
3
4
n∑
i=1
ji)
Case 1: If maxi=1,2,…,n(ji) > 12
∑n
i=1 ji),
considering the worst case for greedy algorithm is to divide all jobs except the job with max running time
evenly into to machine, and then add the max one into either one. In this case, the load time should satisfy
greedy_load =
1
2
(
n∑
i=1
ji −maxi=1,2,…,n(ji)) + maxi=1,2,…,n(ji) <
3
2
maxi=1,2,...,n(ji)
Case 2: If maxi=1,2,...,n(ji) < 12
∑n
i=1 ji),
considering the worst case for greedy algorithm is to divide all jobs except the job with max running time
evenly into to machine, and then add the max one into eithr one. In this case, the load time should satisfy
greed_load =
1
2
(
n∑
i=1
ji −maxi=1,2,...,n(ji)) + maxi=1,2,...,n(ji) <
3
4
n∑
i=1
ji
Therefore, the greedy algorithm yields a completion time which is always within a factor of 3/2 of the best
possible replacement of jobs.
1
3.
(a) Algorithm to multiply 2-n digit numbers x,y by spliting them into 3-parts each with n/3 digits
xy = 10
4
3 nad + 10n(bd + ae) + 10
2
3 n(cd + be + af) + 10
n
3 (ce + bf) + cf
= 10
4
3 nad+10n[(a+b)(d+e)−ad−be]+10
2
3 n{(a+b+c)(d+e+f)−(a+b)(d+e)−(b+c)(e+f)+2be}+10
n
3 [(b+c)(e+f)−be−cf ]+cf =
The algorithm should be
Step 1: split x,y into 3 parts
x = 10
2
3 na + 10
n
3 b + c
y = 10
2
3 nd + 10
n
3 e + f
Step 2: calculate multiplication including:
p1 = ad, p2 = be, p3 = cf
p4 = (a + b)(d + e)
p5 = (b + c)(e + f)
p6 = (a + b + c)(d + e + f)
Step 3: calculate answer for x*y
xy = 10
4
3 np1 + 10n(p4 − p1 − p2) + 10
2
3 n(p6 − p4 − p5 + 2p2) + 10
n
3 (p5 − p2 − p3) + p3
(b) Since we split each n-digit number into three parts, each with n/3 digits, the asymptotic running time
should satisfy
T (n) = 6T (
n
3
) + O(n)
According to master theorem, the asymptotic running time should be O(nlog36 ∼ O(log1.63093)
If we split it into two part with three multiplications on smallest parts, the asymptotic running time should
satisfy
T (n) = 3T (
n
2
) + O(n)
According to master theorem, the asymptotic running time should be O(nlog23 ∼ O(log1.584963)
Therefore, I would rather split it into two parts since the time complexity would be better than spliting into
three parts.
(c) Suppose I could only use five multiplications instead of six, the running time for spliting into 2 and 3
parts should be as follow:
T (n)2parts = 3T (
n
2
) + O(n) = O(log23) ∼ O(log1.584963)
T (n)3parts = 5T (
n
3
) + O(n) = O(log35) ∼ O(log1.464974)
In this case, I would rather use the algorithm which split number into 2 parts.
2
4.
Algorithm to find index i,j in array A that max sum for subarray A[i : j].
def max_sub_index(A):
"""find index i,j for 1-d array A that has max sum for subarray A[i:j]"""
max_ending_here = max_so_far = A[0]
i_meh = j_meh = 0
i_msf = j_msf = 0
for k in range(1,len(A)):
# if previous elements adds to negative, we throw them away
if max_ending_here < 0:
# i indicate start index of max subarray ending here
i_meh = k
j_meh = k
# if previous elements adds to positive
else: # if max_ending_here > 0
j_meh = k
max_ending_here = max(A[k], max_ending_here + A[k])
# if max subarray ending here is larger than previous record
if max_so_far < max_ending_here:
# j indicate the end index of max subarray current hold
i_msf = i_meh
j_msf = j_meh
max_so_far = max(max_so_far, max_ending_here)
return i_msf,j_msf
Proof of correctness:
We treaverse for the whole array A, each time we met a new element x in A, we record max_ending_here
as the max sum of subarray including x, the max_ending_here should either be the x plus itself or just x
depending on whether max_ending_here is positive or not. On the other hand, max_so_far records the
max sum of subarray but do not necessary to include x. Therefore, the larger one from them should be the
max subarray for A.
Analysis of Running time:
Since the algorithm would always traverse whole the array A, running time for this algorithm is depend on
the size of A, therefore, the algorithm runs in time O(n).
5.
Algorithm to calculate minimum imbalance for array A with n numbers into K+1 parts.
def imbalance(A,n,k):
"""return the partition with minimal imbalance given A,n and k"""
"""return the start point of each partition except the first one"""
3
# create k + 1 array s[1], s[2],..., s[n]
# s[k][i,j] indicate current minimum imbalance for kth part as A[i,j]
s = {}
partition = [0]*k
for i in range(1,k+2):
s[i] = np.zeros((n,n))
s[i][:] = np.inf
ave = sum(A)/(k+1)
# initialize imbalance for the first part
for j1 in range(len(A)):
s[1][0,j1] = abs(sum(A[0:j1]) - ave)
for p in range(2,k+2): # each part
for i in range(1,len(A)):
for j in range(i,len(A)):
s[p][i,j] = max([min(s[p-1][:,i-1]),abs(sum(A[i:(j+1)])/(k+1)-ave)])
# start from the (k+1)th part, find the best partition point (possible result)
# with imbalance s.t the minimum imbalance
split_p = n # split should be the start of each partition except the first one
for p in range(k+1,1,-1):
split_p = [i for i,ele in enumerate(s[p][:, (split_p-1)])
if ele == min(s[p][:, (split_p-1)])][0]
partition[p-2] = split_p
return partition
Proof of correctness:
As we create k+1 matrix s1, s2, ..., sk+1 with n*n entries in each matrix, with each entry ski, j indicate
the current minimal imbalance when the kth part is A[i:j]. And the current minimal imbalance should
either be updated by weight of subarray A[i:j] or same as the minimal imbalance from A[:i-1]. Finally, the
min(s[k+1][:,n-1]) should be the minimal imbalance and we retrieve partition from that.
If imbalance redefined, the algorithm should change as follow, specifically, the current minimal imbalance
should sum of all weight of subarray re-defined.
def imbalance(A,n,k):
"""return the partition with minimal imbalance given A,n and k"""
# create k + 1 array s[1], s[2],..., s[n]
# s[k][i,j] indicate current minimum imbalance for kth part as A[i,j]
s = {}
partition = [0]*k
for i in range(1,k+2):
4
s[i] = np.zeros((n,n))
s[i][:] = np.inf
ave = sum(A)/(k+1)
# initialize imbalance for the first part
for j1 in range(len(A)):
s[1][0,j1] = abs(sum(A[0:j1]) - ave)
for p in range(2,k+2): # each part
for i in range(1,len(A)):
for j in range(i,len(A)):
# changed part: current minimal imbalance = sum of weight for all subarrays
s[p][i,j] = min(s[p-1][:,i-1]) + abs(sum(A[i:(j+1)])/(k+1)-ave)
# start from the (k+1)th part, find the best partition point (possible result)
# with imbalance s.t the minimum imbalance
split_p = n # split should be the start of each part except the first one
for p in range(k+1,1,-1):
split_p = [i for i,ele in enumerate(s[p][:, (split_p-1)])
if ele == min(s[p][:, (split_p-1)])][0]
partition[p-2] = split_p
return partition
6
If we put words i + 1, ..., j on the same line, the extra space at the end of line is
s(i, j) = M − j + i + 1−
j∑
k=i
lk
Therefore, the penalty for correspondent line should be
p(i, j) =
∞ if s(i,j)<0
0 if j=n and s(i,j) >= 0
s(i, j)3 otherwise
Dynamic programming Algorithm:
Let C[j] denote the optimal solution for printing words from 1 to j such that words j ends in a line, then we
have
Base case C[0] = 0 and C[j] = min0≤i
5
def penalty(words_lis,i,j,M):
“””return penalty when put word i+1,…,j on the same line”””
s = M – j + i + 1 – sum([len(ele) for ele in words_lis[i:j]])
if s<0:
return math.inf
elif j == len(words_lis)-1 and s>=0:
return 0
else:
return s**3
def neatness(raw_text,M):
“””output neatness print style for given text”””
words_lis = raw_text.split()
Cost = [0]*(len(words_lis)+1)
split = []
for j in range(1,len(words_lis)+1):
cost_lis = [Cost[i] + penalty(words_lis,i,j,M) for i in range(j)]
Cost[j] = min(cost_lis)
if cost_lis.index(min(cost_lis)) not in split:
split.append(cost_lis.index(min(cost_lis)))
# create print style
print_style = []
for i in range(len(split)-1):
print_style.append(” “.join(words_lis[split[i]:split[i+1]]))
print(” “.join(words_lis[split[i]:split[i+1]]))
print_style.append(” “.join(words_lis[split[-1]:]))
print(” “.join(words_lis[split[-1]:]))
return print_style,Cost[-1]
# test code to print text split into line
def main():
con = open(“BuffyReview.txt”,’r’)
raw_text = con.readline().strip(‘\n’)
p_1,min_p1 = neatness(raw_text,40)
print(min_p1)
p_2, min_p2 = neatness(raw_text, 72)
print(min_p2)
Proof of Correctness
The optimal cost of printing the words from 1 to j can be computed as the optimal cost of printing words
from 1 to i plus penalty of printing words i+1,. . . ,j. The process Cost(i) to compute optimal cost C[i] is
defined recursively. Recursive definition of values of the optimal solution should be
Cost(j) = min0≤i
According to programming, the minimal penalty for nearly printingthe review for case M = 40 and M = 72 is
6
3973 and 6382 individually.
7
Since the graph G = (V,E) is a tree with a root r, which means each vertex in graph only have one edge
point to it. Therefore, for each non-leaf vertex in tree, the maximum-sized independent size should be either
1) union of all sub max-independent sets rooted at u’s children;
2) u union all the sub max-independent sets root at u’s grandchildren
Therefore. the recursive definition should be
MLS(u) = max{
∑
w: children of u
MLS(w), 1 +
∑
w: grandchildren of u
MLS(w)}
MLS(u) indicates the size of max-independent set rooted at u.
Recurive definition:
def MLS(G,u):
“””return size and max independent set in given tree graph rooted at u”””
if len(G.adj(u)) == 0: # leaf node does not have child
return 1
set_1 = []
set_2 = []
for w in G.adj(u): # child node of u
set_1.join(MLS(G,w))
for grad_c in G.adj(w): # grad children of u
set_2.join(MLS(G,grad_c))
if len(set_1) > len(set_2) + 1:
return set_1
return set_2
Since we want to find the maximum-sized independent set in linear time, we would like to process all MLS
size in bottom-up order, which means we calculate MLS for each vertex exactly once.
Algorithm with linear complexity:
Step 1: Run Depth-first Search on tree graph G starting from root r, and return the topological order of
vertices.
Step 2: Starting from the vertex v with least dependence (leaf node), retrieve Max-independence set from G
rooted at v, and store the result in a dictionary mle.
Step 3: return mle[r] as the Max-independence set in G and the size should be len(mle[r]).
Proof of Correctness:
According to the algorithm above, when we want to calculate Max-independence set for vertex v, we have
already calculate Max-independence set for all v’s children and grandchildren, therefore, we could always
retreive result from dictionary mle with constant time complexity.
Time Complexity for algorithm:
7
In this case, we first perform topological sort which takes Θ(V + E) times. Each time, we calculate Max-
independence set for vertex v, since we retieve all mle for v’s children and grandchildren from dictionary mle,
the time complexity should be O(1). Therefore, the time complexity should be
T (n) = O(V + E) + O(V ) = O(V + E)
satisfying linear time complexity.
8
1.
2.
3.
4.
5.
6
7