IT代考 COMP20008 2021S2 Clustering

Workshop Week 10
COMP20008 2021S2 Clustering

Q1: Consider the 1-dimensional data set with 10 data points {1,2,3,…10}. Show the iterations of the k-means algorithm using Euclidean distance when k = 2, and the random seeds are initialized to {1, 2}.

Copyright By PowCoder代写 加微信 powcoder

• Iteration 1 Data points: [ 1 2 3 4 5 6 7 8 9 10]
Assignments: [0, 1, 1, 1, 1, 1, 1, 1, 1, 1] Centroids: [1.0, 6.0]
• 0 means 1 , 1 means cluster 2
• Iteration 2 Data points: [ 1 2 3 4 5 6 7 8 9 10]
Assignments: [0, 0, 0, 1, 1, 1, 1, 1, 1, 1] Centroids: [2.0, 7.0]
• Iteration 3 Data points: [ 1 2 3 4 5 6 7 8 9 10]
Assignments: [0, 0, 0, 0, 1, 1, 1, 1, 1, 1] Centroids: [2.5, 7.5]

Consider the 1-dimensional data set with 10 data points {1,2,3,…10}. Show the iterations of the k- means algorithm using Euclidean distance when k = 2, and the random seeds are initialized to {1, 2}.
• Iteration 4 Data points: [ 1 2 3 4 5 6 7 8 9 10]
Assignments: [0, 0, 0, 0, 0, 1, 1, 1, 1, 1] Centroids: [3.0, 8.0]
• Iteration 5 Data points: [ 1 2 3 4 5 6 7 8 9 10]
Assignments: [0, 0, 0, 0, 0, 1, 1, 1, 1, 1] Centroids: [3.0, 8.0]

Q2: Repeat Exercise 1 using agglomerative hierarchical clustering and Euclidean distance,
with single linkage (min) criterion.

Dissimilarity Matrix
Initially, how many clusters do we have?
Step1: Calculate Distances between every pair of observation: Euclidean Distance
Inter-point distance Matrix

Dissimilarity Matrix
3 2.8 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8
1 2 3 4 5 6 7 8 9 10
Dendrogram Plot X-axis→observations , Y-axis→distances
Inter-point distance Matrix
Step 2: Choose the most similar two observations to merge (i.e. Closest) (i.e. pair with the minimum distance in Dissimilarity Matrix)

Dissimilarity Matrix
Inter-point distance Matrix
Step 3: Update Dissimilarity Matrix: Calculate the distance between Cluster12 and all other observations (calculate linkage using min)

Dissimilarity Matrix
Inter-point distance Matrix
Step 3: Update Dissimilarity Matrix: Calculate the distance between Cluster12 and all other observations (calculate linkage using min)
How many clusters do we have now?

Updated Dissimilarity Matrix
3 2.8 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8
Updated distance Matrix
1 2 3 4 5 6 7 8 9 10
Extended Dendrogram Plot X-axis→observations , Y-axis→distances
Repeat Step 2: Choose the most similar two observations to merge (i.e. Closest) (i.e. pair with the minimum distance in Dissimilarity Matrix)

Dissimilarity Matrix
Inter-point distance Matrix
Repeat Step 3: Update Dissimilarity Matrix: Calculate the distance between Cluster12 and all other observations (calculate single linkage using min)

Dissimilarity Matrix
Inter-point distance Matrix
Repeat Step 3: Update Dissimilarity Matrix: Calculate the distance between Cluster12 and all other observations (calculate linkage using min)
Let’s see some python code

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com