PowerPoint Presentation
CSE 3521:Unsupervised Learning
[Many slides are adapted from and at UC Berkeley CS-188 and previous CSE 5521 course at OSU.]
Copyright By PowCoder代写 加微信 powcoder
Recap: K-means
An iterative clustering algorithm
Pick random cluster centers:
For : [or, stop if assignments don’t change]
for : [update cluster assignments]
for : [update cluster centers]
Recap: K-means: animation
Recap: K-means: animation
Recap: (One) bad case for “hard assignments”?
K-means and agglomerative clustering do hard assignments
However, clusters may overlap
Some clusters may be “wider” than others
Distances can be deceiving!
Need other distance metrics
Feature normalization
Other algorithms
Gaussian mixture models (GMM): probabilistic clustering
Probabilistic clustering
High level idea: Data:
What to find?
How to cluster?
Iterative algorithm:
Probabilistic clustering:
What to find?
How to cluster?
Iterative algorithm: TODAY
High level idea: Data:
Probabilistic clustering:
Probabilistic clustering
We can use a probabilistic model to do assignment!
Change from distance to likelihood: vs.
Allows overlaps, clusters of different “radiuses”, etc.
Question: How can we find and ?
If the training dataset is
That is, if we know the label assignment of each
We can do MLE by maximizing
Assume each instance is generated by
Estimate parameters (MLE)
for each class c
Prediction for
Recall: probabilistic models for classification
Probabilistic clustering
We can use a probabilistic model to do assignment!
Change from distance to likelihood: vs.
Allows overlaps, clusters of different “radiuses”, etc.
Question: How can we find and ?
If the training dataset is
That is, if we know the label assignment of each
We can do MLE by maximizing
Challenges:
Like K-means, we only have Data: , and don’t know for each
We need to estimate model parameters without the labels!
What probabilistic models to use?
Depends on
Mixture of [SOMETHING]
: Categorical over clusters ;
: depend on
Can be multi-dimensional Gaussian
Can be multiple one-dimensional Gaussian
Can be multiple Bernoulli
Why is it called Mixture of [SOMETHING]?
What is the probability of ?
Why is it called Mixture of [SOMETHING]?
What is the probability of ?
Example: what is the probability scores in this play
Ways: shoot 3 points, shoot 2 points, free throw, others
Gaussian mixture models (GMM)
P(Y): There are K components (classes)
P(X|Y): Each component generates data from a Gaussian with mean μk and covariance matrix Σk
Each data point is sampled from a generative process:
: choose a component
: sample data from the corresponding Gaussian
How to estimate the parameters?
: all model parameters
e.g., class probability, means, covariances, …
But we don’t know but only !!!
Maximize the marginal likelihood of seeing :
How do we optimize? Closed form?
Maximize marginal likelihood:
Almost always a hard problem!
Usually no closed form solution
CANNOT separate and like in Naïve Bayes
There, you know so there is no BUT
Gradient descent?
Several may have constraints (like sum to 1)
Lots of local minima
Learning general mixtures of Gaussian
Likelihood, for data
Need to differentiate and solve for
There will be no closed-form solution
The EM (expectation-maximization) algorithm
A widely-used, clever method for maximizing marginal likelihood:
A type of gradient ascent that can be easy to implement (e.g., no line search, learning rates, etc.)
An “iterative” algorithm that alternates between two steps:
Compute an expectation (E-step)
Compute a maximization (M-step)
Not magic: still optimizing a “non-convex” function with lots of local optima
The computations are just easier (often, significantly!)
EM: 2 easy steps
Objective:
Especially useful when the E and M steps have closed form solutions!!!
EM: 2 easy steps
Objective:
E-step: Compute expectations to “fill in” missing values according to current parameters
For all examples and possible class assignments , compute:
Like K-means, but with “soft assignment”
Especially useful when the E and M steps have closed form solutions!!!
EM: 2 easy steps
Objective:
M-step: Re-estimate the parameters with “weighted” MLE estimates
Given the current for each
Treat as with a proportion
Now you get a “pseudo-labeled” dataset
Like K-means, but with “soft assignment”
Especially useful when the E and M steps have closed form solutions!!!
GMM: animation
GMM: animation
GMM: animation
Simple example : Learn means only
1D data, points
Mixture of Gaussians
Let us fix the variances to
Need to estimate and
EM for GMMs: Learn means only
Iterate: On the t-th iteration, let our estimates be
E-Step: Compute “expected” classes of all datapoints
M-Step: Compute most likely new given class expectations, by doing weighted ML estimates (weighted mean!):
Pick K random cluster centers, µ1… µk For t=1..T:
EM for GMMs
Initialization, random means and σ=1:
μ1=-1, μ2=0
P(y=1|x1) α exp(-0.5 × ( -1 + 1)2) = 1
P(y=2|x1) α exp(-0.5 × (-1 – 0)2) = 0.6
P(y=1|x1) = 0.63
P(y=2|x1)=0.37
P(y=1|x2) α exp(-0.5×(0+1)2) = 0.6
P(y=2|x2) α exp(-0.5×(0-0)2) = 1
P(y=1|x2) = 0.37
P(y=2|x2)=0.63
P(y=1|x3) α exp(-0.5×(2+1)2) = 0.07
P(y=2|x3) α exp(-0.5×(2-0)2) = 0.93
P(y=1|x3) = 0.07
P(y=2|x3)=0.93
μ1 = (0.63×-1+ 0.37×0 + 0.07×2 ) / (0.63 + 0.37 + 0.07 ) = -0.45
μ2 = (0.37×-1+ 0.63×0 + 0.93×2 ) / (0.37+0.63+0.93) = 0.77
Learning continues, when do we stop?
EM for general GMMs
Iterate On the t-th iteration, let our estimates be
E-Step: Compute “expected” classes of all datapoints for each class
M-Step: Compute weighted MLE for given the expected classes above
What if we do hard assignment and learn mean only?
Compute “expected” classes → set most likely class
Re-compute cluster mean
Compute most likely new → averages over hard assignments
With hard assignments and unit variance, EM is equivalent to k-means clustering algorithm!!!
Comparison
Marginal likelihood
Joint likelihood
“Expected” joint likelihood
High level idea: Data:
What to find?
How to cluster?
Iterative algorithm:
Probabilistic clustering:
What to find?
How to cluster?
Iterative algorithm: TODAY
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com