Evolutionary Computing
Artificial Intelligence
Unsupervised Learning:
The K-means Clustering Algorithm
Roadmap
Unsupervised Learning
Clustering algorithms
K-means
Fuzzy c-means
*
Unsupervised learning
Definition 1
Supervised: human effort involved
Unsupervised: no human effort
Definition 2
Supervised: learning conditional distribution P(Y|X), X: features, Y: classes
Unsupervised: learning distribution P(X), X: features
Slide credit: Min Zhang
*
Clustering
What is clustering?
Why clustering for unsupervised learning?
*
Clustering
Definition
Assignment of a set of observations into subsets so that observations in the same subset are similar in some sense
*
Cluster analysis is the process of partitioning a data set into subsets of objects (classes, clusters) which have similar properties.
Cluster Analysis
Natural Grouping Method
Cluster analysis: K-means algorithm, ISODATA algorithm, fuzzy c-means algorithm, etc.
Clustering
Hard vs. Soft
Hard: same object can only belong to single cluster
Soft: same object can belong to different clusters
Slide credit: Min Zhang
*
Clustering
Hard vs. Soft
Hard: same object can only belong to single cluster
Soft: same object can belong to different clusters
Gaussian mixture model
Fuzzy logic
Slide credit: Min Zhang
*
Clustering
Flat vs. Hierarchical
Flat: clusters are flat
Hierarchical: clusters form a tree
Agglomerative
Divisive
Hierarchical clustering
Agglomerative (Bottom-up)
Compute all pair-wise pattern-pattern similarity coefficients
Place each of n patterns into a class of its own
Merge the two most similar clusters into one
Replace the two clusters into the new cluster
Re-compute inter-cluster similarity scores w.r.t. the new cluster
Repeat the above step until there are k clusters left (k can be 1)
Slide credit: Min Zhang
*
Hierarchical clustering
Agglomerative (Bottom up)
Hierarchical clustering
Agglomerative (Bottom up)
1st iteration
1
Hierarchical clustering
Agglomerative (Bottom up)
2nd iteration
1
2
Hierarchical clustering
Agglomerative (Bottom up)
3rd iteration
1
2
3
Hierarchical clustering
Agglomerative (Bottom up)
4th iteration
1
2
3
4
Hierarchical clustering
Agglomerative (Bottom up)
5th iteration
1
2
3
4
5
Hierarchical clustering
Agglomerative (Bottom up)
Finally k clusters left
1
2
3
4
6
9
5
7
8
* From Marc Pollefeys COMP 256 2003
Dendrograms: The basics
Hierarchical clustering
Divisive (Top-down)
Start at the top with all patterns in one cluster
The cluster is split using a flat clustering algorithm.
This procedure is applied recursively until each pattern is in its own singleton cluster.
Hierarchical clustering
Divisive (Top-down)
Slide credit: Min Zhang
Bottom-up vs. Top-down
Which one is more complex?
Which one is more efficient?
Which one is more accurate?
Bottom-up vs. Top-down
Which one is more complex?
Top-down
Because a flat clustering is needed as a “subroutine”
Which one is more efficient?
Which one is more accurate?
*
Bottom-up vs. Top-down
Which one is more complex?
Which one is more efficient?
Which one is more accurate?
Bottom-up vs. Top-down
Which one is more complex?
Which one is more efficient?
Top-down
For a fixed number of top levels, using an efficient flat algorithm like K-means, divisive algorithms are linear in the number of patterns and clusters
Agglomerative algorithms are least quadratic
Which one is more accurate?
Bottom-up vs. Top-down
Which one is more complex?
Which one is more efficient?
Which one is more accurate?
Bottom-up vs. Top-down
Which one is more complex?
Which one is more efficient?
Which one is more accurate?
Top-down
Bottom-up methods make clustering decisions based on local patterns without initially taking into account the global distribution. These early decisions cannot be undone.
Top-down clustering benefits from complete information about the global distribution when making top-level partitioning decisions.
K-means clustering Algorithm
Choose a fixed number of clusters (i.e. K)
Choose cluster centers and point-cluster allocations to minimize error
The necessary conditions for a minimizer of objective function JK with
(1)
What are xj, A, and ai??
K-means clustering Algorithm
Choose K data points (or vectors) to be initial cluster centers
Until the clustering is satisfactory
Assign each data point to the cluster that has the nearest cluster center
Ensure each cluster has at least one data point
Update the cluster centers with the means of the elements in the clusters
K-means clustering Algorithm
LIESMARS, Wuhan University
K-means clustering algorithm:
Basic Procedure
Step 1: Randomly Select K cluster centers from the data space.
Step 2: Assign each element of the data set to the cluster based on the minimum distance measure.
Step 3: Re-calculate new cluster centers for each cluster after the assignment in step 2.
Step 4: Continue this process until the cluster centers do not change.
LIESMARS, Wuhan University
LIESMARS, Wuhan University
Flow Diagram for K-means Algorithm
LIESMARS, Wuhan University
Step 1: Initialization
Choose K initial Cluster centers
M1(1), M2(1), … , MK(1)
Method 1 – First K samples Method 2 – K data samples selected randomly Method 3 – K random vectors
Set iteration m = 1 and Go To Step 2
Step 2: Determine New Clusters
Based on Cluster centers Distribute pattern vectors using minimum distance.
Method 1 – Use Euclidean distance Method 2 – Use other distance measures
Assign data sample xj to class Mk if
Go to Step 3
Step 3: Compute New Cluster Centers
Using the new Cluster assignment
Clk(m) k = 1, 2, … , K
Compute new cluster centers
Mk(m+1) k = 1, 2, … , K
using
Go to Step 4
where Nk , k = 1, 2, … , K
is the number of pattern vectors in Clk(m)
LIESMARS, Wuhan University
Step 4: Check for Convergence
Using Cluster centers from step 3 check for convergence
Convergence occurs if the means do not change
If Convergence occurs Clustering is complete and the results given.
If No Convergence then Go to Step 5
LIESMARS, Wuhan University
Step 5: Check for Maximum Number of Iterations
Define MAXIT as the maximum number of iterations that is acceptable.
If m = MAXIT Then display no convergence
and Stop.
If m < MAXIT Then m=m+1 (increment m)
and Return to Step 2
LIESMARS, Wuhan University
Example: K-means cluster algorithm
Given the following set of pattern vectors
LIESMARS, Wuhan University
Plot of Data points in Given set of samples
LIESMARS, Wuhan University
Do the following
LIESMARS, Wuhan University
LIESMARS, Wuhan University
Plot of Data points in Given set of samples
Initial Cluster centers
(a) Solution – 2-class case
LIESMARS, Wuhan University
LIESMARS, Wuhan University
Initial Cluster Centers
Distances from all Samples to cluster centers
First Cluster assignment
Cl1
Cl2
Cl1
Cl2
Cl2
Cl2
Cl2
With tie select randomly
LIESMARS, Wuhan University
LIESMARS, Wuhan University
Plot of Data points in Given set of samples
Closest to x1
Closest to x2
LIESMARS, Wuhan University
LIESMARS, Wuhan University
Compute New Cluster centers
First Cluster Assignment
LIESMARS, Wuhan University
LIESMARS, Wuhan University
Plot of Data points in Given set of samples
New Cluster centers
LIESMARS, Wuhan University
LIESMARS, Wuhan University
Distances from all Samples to cluster centers
2
2
Cl1
Cl1
Cl1
Cl2
Cl2
Cl2
Cl2
Second Cluster assignment
3
3
LIESMARS, Wuhan University
LIESMARS, Wuhan University
Plot of Data points in Given set of samples
New Clusters
Old Cluster Center
M1(2)
M2(2)
Old Cluster Center
LIESMARS, Wuhan University
LIESMARS, Wuhan University
Compute New Cluster Centers
3
3
LIESMARS, Wuhan University
LIESMARS, Wuhan University
Plot of Data points in Given set of samples
New Clusters
ClusterCenters
M2(3)
M1(3)
LIESMARS, Wuhan University
LIESMARS, Wuhan University
Distances from all Samples to cluster centers
Compute New Cluster centers
Cl2
Cl2
Cl2
Cl1
Cl1
Cl2
Cl1
3
3
4
4
4
4
LIESMARS, Wuhan University
Exercise using K-means algorithm
[1, 1], [2, 1], [2, 2], [2, 3], [3, 3], [4, 3]
With the initial cluster centers [1, 1] and [4, 3].
*
*
LIESMARS, Wuhan University
Plot of Data points in Given set of samples
Initial Cluster centers
(b) Solution – 3-class case
LIESMARS, Wuhan University
LIESMARS, Wuhan University
(b) Solution: 3-Class case
Select Initial Cluster Centers
First Cluster assignment using distances from pattern vectors to initial cluster centers
LIESMARS, Wuhan University
LIESMARS, Wuhan University
Compute New Cluster centers
Second Cluster assignment using distances from pattern vectors to cluster centers
LIESMARS, Wuhan University
LIESMARS, Wuhan University
At the next step we have convergence as the cluster centers do not change thus the Final Cluster Assignment becomes
LIESMARS, Wuhan University
LIESMARS, Wuhan University
Plot of Data points in Given set of samples
Final Cluster Centers
Final 3-Class Clusters
Cl1
Cl3
Cl2
LIESMARS, Wuhan University
The K-means Algorithm
The objective function can be defined as below: μij is either 0 or 1.
K-means algorithm
Select k sample points to be the initial cluster centers.
Assign each sample point to one of the clusters depending on the shortest Euclidean distance between them.
Once all the sample points are assigned, re-calculate the new cluster centers.
Repeat steps 2 and 3 until the clustering process converges.
*
*
*
K-means example …
*
K-means…
Advantages:
Fast.
Easy to implement.
Disadvantages:
It works only for simple data set, i.e. no overlapping data.
It largely depends on the choice of initial clusters.
Does not perform well in case of outliers.
*
Apply the K-means for image segmentation
Step 1: Choose K and their initial cluster centers.
Step 2: Assign each pixel to one of the clusters based on the similarity measure.
Step 3: Recalculate the cluster centers.
Step 4: Compare the current cluster centers and previous cluster centers (or maximum number of iterations) to determine if the algorithm converges.
Step 5: Classifying each pixel for image segmentation.
K-means clustering using intensity alone and color alone
Image
Clusters on intensity
Clusters on color
* From Marc Pollefeys COMP 256 2003
Results of K-means Clustering:
I gave each pixel the mean intensity or mean color of its cluster --- this is basically just vector quantizing the image intensities/colors. Notice that there is no requirement that clusters be spatially localized and they’re not.
K-means
Disadvantages
Dependent on initialization
K-means
Disadvantages
Dependent on initialization
K-means
Disadvantages
Dependent on initialization
K-means
Disadvantages
Dependent on initialization
Select random seeds with at least Dmin
Or, run the algorithm many times
K-means
Disadvantages
Dependent on initialization
Sensitive to outliers
*
K-means
Disadvantages
Dependent on initialization
Sensitive to outliers
Use K-medoids
Add definition of median
*
K-means
Disadvantages
Dependent on initialization
Sensitive to outliers (K-medoids)
Can deal only with clusters with spherical symmetrical point distribution
Kernel trick
*
K-means
Disadvantages
Dependent on initialization
Sensitive to outliers (K-medoids)
Can deal only with clusters with spherical symmetrical point distribution
Deciding K
Deciding K
Try a couple of K
Image: Henry Lin
Deciding K
When k = 1, the objective function is 873.0
Image: Henry Lin
Deciding K
When k = 2, the objective function is 173.1
Image: Henry Lin
Deciding K
When k = 3, the objective function is 133.6
Image: Henry Lin
Deciding K
We can plot objective function values for k=1 to 6
The abrupt change at k=2 is highly suggestive of two clusters
“knee finding” or “elbow finding”
Note that the results are not always as clear cut as in this toy example
Image: Henry Lin
*
Application to image segmentation
Original images
Segmentations
Homogenous intensity corrupted by 5% Gaussian noise
Sinusoidal inhomogenous intensity corrupted by 5% Gaussian noise
Image: Dao-Qiang Zhang, Song-Can Chen
Accuracy = 96.02%
Accuracy = 94.41%
*
The Fuzzy C-means Algorithm
The objective function can be defined as below: μij is between 0 and 1.
Fuzzy C-means
Minimize
subject to
Fuzzy C-means
Minimize
subject to
How to solve this constrained optimization problem?
*
Fuzzy C-means
Minimize
subject to
How to solve this constrained optimization problem?
Introduce Lagrangian multipliers
Show equations here
*
Fuzzy c-means
Introduce Lagrangian multipliers
Iterative optimization
Fix V, optimize w.r.t. U
Fix U, optimize w.r.t. V
Application to image segmentation
Original images
Segmentations
Homogenous intensity corrupted by 5% Gaussian noise
Sinusoidal inhomogenous intensity corrupted by 5% Gaussian noise
Image: Dao-Qiang Zhang, Song-Can Chen
Accuracy = 96.02%
Accuracy = 94.41%
*
FCM applied to segmentation
Original images
FCM
Accuracy = 96.02%
KFCM
Accuracy = 96.51%
SKFCM
Accuracy = 100.00%
SFCM
Accuracy = 99.34%
Image: Dao-Qiang Zhang, Song-Can Chen
Homogenous intensity corrupted by 5% Gaussian noise
FCM applied to segmentation
FCM
Accuracy = 94.41%
KFCM
Accuracy = 91.11%
SKFCM
Accuracy = 99.88%
SFCM
Accuracy = 98.41%
Original images
Image: Dao-Qiang Zhang, Song-Can Chen
Sinusoidal inhomogenous intensity corrupted by 5% Gaussian noise
FCM applied to segmentation
Original MR image corrupted by 5% Gaussian noise
FCM result
KFCM result
SFCM result
SKFCM result
Image: Dao-Qiang Zhang, Song-Can Chen
Summary
K-means clustering algorithm
Fuzzy C-means algorithm is popular using fuzzy logic.
Questions & Suggestions?
å
å
=
=
-
=
c
i
n
j
i
j
K
a
x
A
J
1
1
2
||
||
)
(
å
å
=
=
-
=
c
i
n
j
i
j
ij
KM
a
x
A
J
1
1
2
||
||
)
(
)
,
(
m
m
å
=
Î
=
c
i
ij
J
j
all
for
1
.
1
m
(
)
(
)
2
11
,
kn
m
ijji
ij
EUVuxv
==
=-
åå
1
1 1,,
k
ij
i
ujn
=
="=
å
K
(
)
(
)
2
11
,
kn
m
ijji
ij
EUVuxv
==
=-
åå
1
1 1,,
k
ij
i
ujn
=
="=
å
K
(
)
(
)
2
111
,= 1
knk
m
jijjijij
iji
LUVuxvu
a
===
æö
-+-
ç÷
èø
ååå
2
1
1
1
ij
m
c
ji
l
jl
u
xv
xv
-
=
=
æö
-
ç÷
ç÷
-
èø
å
(
)
(
)
1
1
n
m
ijj
j
i
n
m
ij
j
ux
v
u
=
=
=
å
å