CS计算机代考程序代写 GMM algorithm Beacon Conference of Undergraduate Research

Beacon Conference of Undergraduate Research

Unsupervised Learning: Clustering

Lingqiao Liu
University of Adelaide

Outlines

University of Adelaide 2

• Overview of Unsupervised Learning and Clustering

• K-means clustering

• Gaussian mixture models (GMM)

– Distribution modeling

– GMM

– Latent variable

– EM algorithm

Unsupervised learning

University of Adelaide 3

• Learning without supervision
– Find the distribution of data

– Learning to generate samples

– Clustering

– Anomaly detection

– Feature learning

• Clustering
– One of the most important unsupervised learning tasks

– Clustering is the process of identifying groups, or clusters, of
data points in a (usually) multidimensional space.

– Connection to distribution modeling: related to mixture model

Clustering

University of Adelaide 4

Application of clustering: Segmentation

University of Adelaide 5

Superpixel

University of Adelaide 6

X = (r,g,b,x,y)

Application of clustering: Vector
quantization in Bag-of-feature model

University of Adelaide 7

• Bag-of-features model

– Extended from bag-of-words model

Application of clustering:

University of Adelaide 8

Vector Quantization: Build a dictionary with k centres.
For a vector, find its closet centre and use its ID as its “visual word”

Ingredient for Clustering

University of Adelaide 9

Dissimilar function

University of Adelaide 10

Standardization

University of Adelaide 11

Standardization is not always helpful

University of Adelaide 12

Outlines

University of Adelaide 13

• Overview of Unsupervised Learning and Clustering

• K-means clustering

• Gaussian mixture models (GMM)

– Distribution modeling

– GMM

– Latent variable

– EM algorithm

K-means clustering

• One of the most commonly used
clustering methods

• Input: data points and the number of
clusters, k

• Basic idea

– Each sample can only fall into one of the
k groups

– From each group, we can calculate the
mean vectors, that is, the average of
samples falling into a group

– Any sample should be closest to the mean
vector of its own group than the mean
vectors of other groups

University of Adelaide 14

K-means clustering

• How could we find the assignment and mean vectors of
each group?

• It is a chicken egg problems

– If we know the assignment, we can easily calculate the mean
vectors

– If we know the mean vectors, we can easily calculate the
assignment

• K-means algorithm takes an iterative approach for
solving the problem

University of Adelaide 15

K means algorithm

• E-step: fixed the current mean vectors of each group. If a
sample is closest to the i-th mean vector, then the sample
is assigned to the i-th group

• M-step: fixed the assignment, calculate the mean vector
of each group

• Iterate E step and M step, until converge

University of Adelaide 16

K-means example

University of Adelaide 17

K-means example

University of Adelaide 18

K-means example

University of Adelaide 19

K-means example

University of Adelaide 20

Questions

• Why using those two steps can lead to converged result?

• Why always calculate mean vectors?

• What is the objective function of k-means clustering
algorithm?

– Objective function will give a measurement of the goodness of
clustering results

University of Adelaide 21

K-means algorithm: revisit

University of Adelaide 22

K-means loss function

University of Adelaide 23

Each point will be assigned to one of the prototype, the distance between
the point to the prototype is the cost of assignment. We are seeking the
optimal assignment that can minimize the cost

We are also seeking the optimal prototypes that can minimize the total
cost

K-means algorithm

University of Adelaide 24

University of Adelaide 25

E-step: Fix solve

Equivalent to solve

We also know that

After the E-step, the objective value will decrease (at least not increase)

University of Adelaide 26

M-step: Fix solve

Equivalent to solve

Thus, we also know that

After the M-step, the objective value will decrease (at least not increase)

Solve

K-means example: loss function

University of Adelaide 27

The loss function will monotonically decrease

K-means algorithm

University of Adelaide 28

How to choose K?

University of Adelaide 29

How to choose K?

• Unlike in supervised learning, we can evaluate validation
accuracy, for unsupervised learning we can use different
ways of evaluate the learning quality

• Cross-validation: Partition data into two sets. Estimate
prototypes on one and use these to compute the loss
function on the other.

– Choose K leads to the smallest loss value in the validation set

University of Adelaide 30

Limitations of k-means

• Hard assignments of data points to clusters can cause a
small perturbation to a data point to flip it to another
cluster

• Assumes spherical clusters and equal probabilities for
each cluster

• Those limitations can be solved by GMM based
clustering

University of Adelaide 31

Outlines

University of Adelaide 32

• Overview of Unsupervised Learning and Clustering

• K-means clustering

• Gaussian mixture models (GMM)

– Distribution modeling

– GMM

– Latent variable

– EM algorithm

Distribution Modeling

University of Adelaide 33

• Distribution Modeling

– Find a way to characterize the distribution of data

– Modelling

– Example:

Distribution Modeling

University of Adelaide 34

• What is the form of ?

– The unknow model parameters

• The function family

– Based on assumptions

– Based on prior knowledge about the data

– Model it as a general function: parametric or nonparametric

• Once the function form is known, distribution modeling
boils down to a parameter estimation problem

– MLE: maximal likelihood estimation

Assume samples are independently identical distributed (i.i.d.)

Commonly used assumption

University of Adelaide 35

• Multivariate Gaussian distribution

• Gaussian Mixture model

Gaussian Mixture Model (GMM)

University of Adelaide 36

GMM can characterize the distribution that contains K clusters

What is relationship between GMM PDF and clusters of data?

Warmup practice

• Two bags:

– Bag A: 60 black balls, 40 white balls

– Bag B: 50 black balls, 50 white balls

– Chance to select Bag A: 0.6 Bag B: 0.4

– What’s the probability of selecting a black ball (and a white ball)
from this process?

University of Adelaide 37

Gaussian Mixture Model (GMM)

• Generative process:

University of Adelaide 38

1. Randomly select the k-th Gaussian
distribution according to

2. Randomly sample x from the
selected Gaussian distribution

Imagine this process as a piece of program. Once written, it can generate
a sample when we run it. The output x will be a random variable.

Gaussian Mixture Model (GMM)

• Generative process:

University of Adelaide 39

1. Randomly select the k-th Gaussian
distribution according to

2. Randomly sample x from the
selected Gaussian distribution

Let

The distribution of x?

Gaussian Mixture Model (GMM)

• Generative process:

University of Adelaide 40

1. Randomly select the k-th Gaussian
distribution according to

2. Randomly sample x from the
selected Gaussian distribution

The selection made inside the generative
process, is an internal variable, hidden from
us. We call it latent variable

Latent variable

• Intermediate results inside a generative process

• Each sample is associated with a latent variable

• We do not know its exact value

• But we can infer how likely it can be

• Example

University of Adelaide 41

We may know this point is more likely
to belong to the blue cluster

Latent variable and Inference

• Given an observation, we can estimate the likelihood of
the latent variable. In the case of GMM, we try to
estimate

• Because the latent variable in GMM indicates the
membership of a sample belonging to a cluster.
can be seen as a soft-membership (soft-clustering)

University of Adelaide 42

Latent variable and Inference

• Given an observation, we can estimate the likelihood of
the latent variable. In the case of GMM, we try to
estimate

• The difference between

• The process of calculate or the most likely r is
called Inference

University of Adelaide 43

Posterior probability. The likelihood about c after x is observed

Prior probability. The likelihood before any observation

Inference in GMM

• Inference can be done by using Bayes theory

University of Adelaide 44

Parameter estimation for GMM

• Use MLE (maximal likelihood estimation)

• Unfortunately, it is difficult to solve

University of Adelaide 45

EM Algorithm solution

• What If we know the latent variable for each sample, say,
?

– Let’s use one hot encoding (same as the case in k-means) to
represent the selection result, e.g., , if select k-th Gaussian
for i-th sample

– Then the parameter estimation problem becomes easy because
we know which sample is assigned to which Guassian

– Solution:

University of Adelaide 46

EM Algorithm solution

• However, we do not know the assignment but only know
its probability

• We can actually use the expectation of instead

University of Adelaide 47

EM algorithm solution to GMM

• E-step: calculate the posterior probability about the
latent variable:

• M-step: estimate parameters based the expectation of
latent variables

University of Adelaide 48

EM algorithm (More general case)

• The above solution represents a special case of a more
general method called EM-algorithm

• It is widely used for learning the probabilistic models
with latent variable inside its generative process.

• It iterates between a E-step and a M-step.

– We can theoretically prove that each step will lead to a non-
increase of loss function

– The iterative process will converge to a local minimum

University of Adelaide 49

EM algorithm in more general form

University of Adelaide 50

Connection to K-means

University of Adelaide 51

GMM example

University of Adelaide 52

GMM example

University of Adelaide 53

GMM example

University of Adelaide 54

GMM example

University of Adelaide 55

GMM example

University of Adelaide 56

GMM example

University of Adelaide 57

Summary

• Unsupervised learning and clustering

• Clustering

– Grouping data

– Estimating a mixture model

• K-means clustering

– How to do that?

– The limitation

• GMM model

– Connection between a distribution and group assignment: a
generative process

– Parameter estimation: E-M algorithm

University of Adelaide 58