Assignment 9
Due: 4/5
Note: Show all your work.
Problem 1 (20 points). Consider the following two clusters:
9 8 7 6 5 4 3 2 1
0123456789
Compute the distance between the two clusters (1) using minimum distance, (2) using average distance, and (3) using the Ward’s method. Use the Euclidean distance measure when calculating a distance between two objects.
Problem 2 (20 points). Use the provided a9-p2.arff dataset for this problem. This dataset has calories and total fat content of 75 candy bars.
Problem 2-1 Run the SimpleKMeans algorithm of Weka on this dataset with k = 2, 3, 4, 5, 6, and 7. For each k, record the value of within cluster sum of squared errors (which you can find in Weka’s cluster output window) and plot a graph where the x-axis is k and y-axis is within cluster sum of squared errors. Then, determine an optimal number of clusters using the elbow method that we discussed in the class.
Problem 2-2 Using the optimal number of clusters which you determined in Problem 3- 1, run SimpleKMeans again and characterize the generated clusters using the two attribute values. The following is an example of characterization of clusters:
Cluster 0:
Calories is mostly between 1000 and 2000, mean of Calories is 1500
Totalfat is mostly between 10 and 20, mean of Totalfat is 14
d
a
e
b
c
Cluster 1:
Calories is mostly between 2000 and 3000, mean of Calories is 2600
Totalfat is mostly between 15 and 25, mean of Totalfat is 20
.. .
Problem 3 (10 points). Follow the instructions in JMP-clustering-assignment.pdf file.
Include the required screenshots and your answers to some questions in your submission.
Submission:
Include all answers in a single file and name it LastName_FirstName_HW9.EXT. Here, “EXT” is an appropriate file extension (e.g., docx or pdf). If you have multiple files, then combine all files into a single archive file. Name the archive file as LastName_FirstName_HW9.EXT. Here, “EXT” is an appropriate archive file extension (e.g., zip or rar). Upload the file to Blackboard.