Beacon Conference of Undergraduate Research
Unsupervised Learning: Clustering
Lingqiao Liu
University of Adelaide
Outlines
University of Adelaide 2
• Overview of Unsupervised Learning and Clustering
• K-means clustering
• Gaussian mixture models (GMM)
– Distribution modeling
– GMM
– Latent variable
– EM algorithm
Unsupervised learning
University of Adelaide 3
• Learning without supervision
– Find the distribution of data
– Learning to generate samples
– Clustering
– Anomaly detection
– Feature learning
• Clustering
– One of the most important unsupervised learning tasks
– Clustering is the process of identifying groups, or clusters, of
data points in a (usually) multidimensional space.
– Connection to distribution modeling: related to mixture model
Clustering
University of Adelaide 4
Application of clustering: Segmentation
University of Adelaide 5
Superpixel
University of Adelaide 6
X = (r,g,b,x,y)
Application of clustering: Vector
quantization in Bag-of-feature model
University of Adelaide 7
• Bag-of-features model
– Extended from bag-of-words model
Application of clustering:
University of Adelaide 8
Vector Quantization: Build a dictionary with k centres.
For a vector, find its closet centre and use its ID as its “visual word”
Ingredient for Clustering
University of Adelaide 9
Dissimilar function
University of Adelaide 10
Standardization
University of Adelaide 11
Standardization is not always helpful
University of Adelaide 12
Outlines
University of Adelaide 13
• Overview of Unsupervised Learning and Clustering
• K-means clustering
• Gaussian mixture models (GMM)
– Distribution modeling
– GMM
– Latent variable
– EM algorithm
K-means clustering
• One of the most commonly used
clustering methods
• Input: data points and the number of
clusters, k
• Basic idea
– Each sample can only fall into one of the
k groups
– From each group, we can calculate the
mean vectors, that is, the average of
samples falling into a group
– Any sample should be closest to the mean
vector of its own group than the mean
vectors of other groups
University of Adelaide 14
K-means clustering
• How could we find the assignment and mean vectors of
each group?
• It is a chicken egg problems
– If we know the assignment, we can easily calculate the mean
vectors
– If we know the mean vectors, we can easily calculate the
assignment
• K-means algorithm takes an iterative approach for
solving the problem
University of Adelaide 15
K means algorithm
• E-step: fixed the current mean vectors of each group. If a
sample is closest to the i-th mean vector, then the sample
is assigned to the i-th group
• M-step: fixed the assignment, calculate the mean vector
of each group
• Iterate E step and M step, until converge
University of Adelaide 16
K-means example
University of Adelaide 17
K-means example
University of Adelaide 18
K-means example
University of Adelaide 19
K-means example
University of Adelaide 20
Questions
• Why using those two steps can lead to converged result?
• Why always calculate mean vectors?
• What is the objective function of k-means clustering
algorithm?
– Objective function will give a measurement of the goodness of
clustering results
University of Adelaide 21
K-means algorithm: revisit
University of Adelaide 22
K-means loss function
University of Adelaide 23
Each point will be assigned to one of the prototype, the distance between
the point to the prototype is the cost of assignment. We are seeking the
optimal assignment that can minimize the cost
We are also seeking the optimal prototypes that can minimize the total
cost
K-means algorithm
University of Adelaide 24
University of Adelaide 25
E-step: Fix solve
Equivalent to solve
We also know that
After the E-step, the objective value will decrease (at least not increase)
University of Adelaide 26
M-step: Fix solve
Equivalent to solve
Thus, we also know that
After the M-step, the objective value will decrease (at least not increase)
Solve
K-means example: loss function
University of Adelaide 27
The loss function will monotonically decrease
K-means algorithm
University of Adelaide 28
How to choose K?
University of Adelaide 29
How to choose K?
• Unlike in supervised learning, we can evaluate validation
accuracy, for unsupervised learning we can use different
ways of evaluate the learning quality
• Cross-validation: Partition data into two sets. Estimate
prototypes on one and use these to compute the loss
function on the other.
– Choose K leads to the smallest loss value in the validation set
University of Adelaide 30
Limitations of k-means
• Hard assignments of data points to clusters can cause a
small perturbation to a data point to flip it to another
cluster
• Assumes spherical clusters and equal probabilities for
each cluster
• Those limitations can be solved by GMM based
clustering
University of Adelaide 31
Outlines
University of Adelaide 32
• Overview of Unsupervised Learning and Clustering
• K-means clustering
• Gaussian mixture models (GMM)
– Distribution modeling
– GMM
– Latent variable
– EM algorithm
Distribution Modeling
University of Adelaide 33
• Distribution Modeling
– Find a way to characterize the distribution of data
– Modelling
– Example:
Distribution Modeling
University of Adelaide 34
• What is the form of ?
– The unknow model parameters
• The function family
– Based on assumptions
– Based on prior knowledge about the data
– Model it as a general function: parametric or nonparametric
• Once the function form is known, distribution modeling
boils down to a parameter estimation problem
– MLE: maximal likelihood estimation
Assume samples are independently identical distributed (i.i.d.)
Commonly used assumption
University of Adelaide 35
• Multivariate Gaussian distribution
• Gaussian Mixture model
Gaussian Mixture Model (GMM)
University of Adelaide 36
GMM can characterize the distribution that contains K clusters
What is relationship between GMM PDF and clusters of data?
Warmup practice
• Two bags:
– Bag A: 60 black balls, 40 white balls
– Bag B: 50 black balls, 50 white balls
– Chance to select Bag A: 0.6 Bag B: 0.4
– What’s the probability of selecting a black ball (and a white ball)
from this process?
University of Adelaide 37
Gaussian Mixture Model (GMM)
• Generative process:
University of Adelaide 38
1. Randomly select the k-th Gaussian
distribution according to
2. Randomly sample x from the
selected Gaussian distribution
Imagine this process as a piece of program. Once written, it can generate
a sample when we run it. The output x will be a random variable.
Gaussian Mixture Model (GMM)
• Generative process:
University of Adelaide 39
1. Randomly select the k-th Gaussian
distribution according to
2. Randomly sample x from the
selected Gaussian distribution
Let
The distribution of x?
Gaussian Mixture Model (GMM)
• Generative process:
University of Adelaide 40
1. Randomly select the k-th Gaussian
distribution according to
2. Randomly sample x from the
selected Gaussian distribution
The selection made inside the generative
process, is an internal variable, hidden from
us. We call it latent variable
Latent variable
• Intermediate results inside a generative process
• Each sample is associated with a latent variable
• We do not know its exact value
• But we can infer how likely it can be
• Example
University of Adelaide 41
We may know this point is more likely
to belong to the blue cluster
Latent variable and Inference
• Given an observation, we can estimate the likelihood of
the latent variable. In the case of GMM, we try to
estimate
• Because the latent variable in GMM indicates the
membership of a sample belonging to a cluster.
can be seen as a soft-membership (soft-clustering)
University of Adelaide 42
Latent variable and Inference
• Given an observation, we can estimate the likelihood of
the latent variable. In the case of GMM, we try to
estimate
• The difference between
• The process of calculate or the most likely r is
called Inference
University of Adelaide 43
Posterior probability. The likelihood about c after x is observed
Prior probability. The likelihood before any observation
Inference in GMM
• Inference can be done by using Bayes theory
University of Adelaide 44
Parameter estimation for GMM
• Use MLE (maximal likelihood estimation)
• Unfortunately, it is difficult to solve
University of Adelaide 45
EM Algorithm solution
• What If we know the latent variable for each sample, say,
?
– Let’s use one hot encoding (same as the case in k-means) to
represent the selection result, e.g., , if select k-th Gaussian
for i-th sample
– Then the parameter estimation problem becomes easy because
we know which sample is assigned to which Guassian
– Solution:
University of Adelaide 46
EM Algorithm solution
• However, we do not know the assignment but only know
its probability
• We can actually use the expectation of instead
University of Adelaide 47
EM algorithm solution to GMM
• E-step: calculate the posterior probability about the
latent variable:
• M-step: estimate parameters based the expectation of
latent variables
University of Adelaide 48
EM algorithm (More general case)
• The above solution represents a special case of a more
general method called EM-algorithm
• It is widely used for learning the probabilistic models
with latent variable inside its generative process.
• It iterates between a E-step and a M-step.
– We can theoretically prove that each step will lead to a non-
increase of loss function
– The iterative process will converge to a local minimum
University of Adelaide 49
EM algorithm in more general form
University of Adelaide 50
Connection to K-means
University of Adelaide 51
GMM example
University of Adelaide 52
GMM example
University of Adelaide 53
GMM example
University of Adelaide 54
GMM example
University of Adelaide 55
GMM example
University of Adelaide 56
GMM example
University of Adelaide 57
Summary
• Unsupervised learning and clustering
• Clustering
– Grouping data
– Estimating a mixture model
• K-means clustering
– How to do that?
– The limitation
• GMM model
– Connection between a distribution and group assignment: a
generative process
– Parameter estimation: E-M algorithm
University of Adelaide 58