CS计算机代考程序代写 GMM “””

“””
CSCC11 – Introduction to Machine Learning, Winter 2021, Assignment 4 – Clustering
B. Chan, S. Wei, D. Fleet
“””

%%%%%%%%%%
%% Step 0
%%%%%%%%%%

1) What is the average sparsity of input vectors? (in [0, 1])

2) Find the 10 most common terms, find the 10 least common terms. (list, separated by commas)

3) What is the average frequency of non-zero vector entries in any document?

%%%%%%%%%%
%% Step 1
%%%%%%%%%%

1) Can you categorize the topic for each cluster? (list, comma separated)

2) What factors make clustering difficult?

3) Will we get better results with a lucky initial guess for cluster centers?
(yes/no and a short explanation of why)

%%%%%%%%%%
%% Step 2
%%%%%%%%%%

1) What problem from step 1 is solved now?

2) What are the topics for clusters?

3) Is the result better or worse than step 1? (give a short explanation as well)

%%%%%%%%%%
%% Step 3
%%%%%%%%%%

1) What are the topics for clusters?

2) Why is the clustering better now?

3) What is the general lesson you learned in clustering sparse, high-dimensional
data?

%%%%%%%%%%
%% Step 5
%%%%%%%%%%

1) What is the total error difference between K-Means++ and random center initialization?

2) What is the mean and variance of total errors after running K-Means++ 5 times?

3) Do the training errors appear to be more consistent?

4) Do the topics appear to be more meaningful?

%%%%%%%%%%
%% K-Means vs GMM
%%%%%%%%%%

1) Under what scenarios do the methods find drastically different clusters? Why?

2) What happens to GMM as we increase the dimensionality of input feature? Does K-Means suffer from the same problem?