程序代写代做代考 data mining algorithm GMM Data Mining and Machine Learning

Data Mining and Machine Learning
Statistical Modelling (2) Peter Jančovič
Slide 1
Data Mining and Machine Learning

Objectives
 In – – –
 In – –
part 1 of this topic we
Reviewed univariate Gaussian PDF
Introduced multivariate Gaussian PDF
Introduced maximum likelihood (ML) estimation of Gaussian PDF parameters
this part, we will
Introduce Gaussian Mixture Models (GMMs) Introduce ML estimation of GMM parameters
Slide 2
Data Mining and Machine Learning

Fitting a Gaussian PDF to Data
 Suppose y = y1,…,yt,…,yT is a set of T data values
 For a Gaussian PDF p with mean  and variance ,
define:
py|,py |,
t1
 The ‘best fitting’ Gaussian maximises p(y|,)
 Maximising p(y|,) with respect to , is called Maximum Likelihood (ML) estimation of ,
T
t
1T 1T
 y,  y
t
T t1 T t1
t
2
Slide 3
Data Mining and Machine Learning

Multi-modal distributions
 In practice the distributions of many naturally occurring phenomena do not follow the simple bell- shaped Gaussian curve
 For example, if the data arises from several difference sources, there may be several distinct peaks (e.g. distribution of heights of adults)
 These peaks are the modes of the distribution and the distribution is called multi-modal
Slide 4
Data Mining and Machine Learning

Gaussian Mixture PDFs
 Gaussian Mixture PDFs, or Gaussian Mixture Models (GMMs) used to model multi-modal and other non-Gaussian distributions.
 A GMM is just a weighted average of several Gaussian PDFs, called the component PDFs
 For example, if p1 and p2 are Gaussian PDFs, then p(y) = w1p1(y) + w2p2(y)
defines a 2 component Gaussian mixture PDF Data Mining and Machine Learning
Slide 5

Gaussian Mixture – Example
 2 component mixture model – Component1:=0,=0.1
– Component2:=2,=1 – w1 = w2=0.5
1.4 1.2 1 0.8 0.6 0.4 0.2 0
-4 -2-0.20 2 4 6
Slide 6
Data Mining and Machine Learning
N(0,0.1) N(2,1) Mixture

Example 2
 2 component mixture model – Component1:=0,=0.1
– Component2:=2,=1 – w1 = 0.2 w2=0.8
1.4 1.2 1 0.8 0.6 0.4 0.2 0
-4 -2-0.20 2 4 6
N(0,0.1) N(2,1) Mixture
Slide 7
Data Mining and Machine Learning

Example 3
 2 component mixture model – Component1:=0,=0.1
– Component2:=2,=1 – w1 = 0.2 w2=0.8
1.4 1.2 1 0.8 0.6 0.4 0.2 0
-4 -2-0.20 2 4 6
N(0,0.1) N(2,1) Mixture
Slide 8
Data Mining and Machine Learning

Example 4
 5 component Gaussian mixture PDF
1.6 1.4 1.2
1 0.8 0.6 0.4 0.2 0
-2 0 2 4
N(0,0.1) N(2,1) N(3,0.2) N(3,0.2) N(3,0.2) Mixture
Slide 9
Data Mining and Machine Learning

Gaussian Mixture Model
 In general, an M component Gaussian mixture PDF is defined by:
M pyw p y
where each pm is a Gaussian PDF and 0w 1,M w 1
mm m1
m
m1
m
Slide 10
Data Mining and Machine Learning

Relationship with Clustering
 Both model data using a set of centroids / means
 In clustering there is no parameter that specifies the ‘spread’ of a cluster. In a GMM component this is done by the covariance matrix.
 In clustering we assign a sample to the closest centroid. In a GMM a sample is assigned to all components with varying probability.
Slide 11
Data Mining and Machine Learning

Estimating the parameters of a Gaussian mixture model
 A Gaussian Mixture Model with M components has – M means: 1,…,M
– M variances: 1,…,M
– M mixture weights: w1,…,wM
 Given y = y1,…,yT, how do we estimate these parameters?
 I.e. how do we find a maximum likelihood estimate of 1,…,M, 1,…,M, w1,…,wM?
Slide 12
Data Mining and Machine Learning

Parameter Estimation
 If we knew which component each sample yt came from, then parameter estimation would be easy:
– Set m to be the average value of the samples which belong to the mth component
– Set m to be the variance of the samples which belong to the mth component
– Set wm to be the proportion of samples which belong to the mth component
 But we don’t know which component each sample belongs to
Slide 13
Data Mining and Machine Learning

The E-M Algorithm
 Step 1:
Choose number of GMM components, M, and initial GMM parameters:
0,…,0,0,…,0,w0,…,w0 1M1M1M
Slide 14
Data Mining and Machine Learning

The E-M Algorithm
 Step 2: For each sample yt and each GMM component m calculate P(m|yt) using Bayes theorem and current set of parameters (see next slide)
 Step 3: Define the new estimate of GMM parameters, 𝜇𝑚(1) and 𝜎𝑚(1) as:
(1) 1𝑇
𝜇𝑚 = 𝑃𝑖
𝑡=1
1𝑇
𝜎𝑚(1)=𝑃𝑖
𝑃 𝑚 𝑦𝑡 𝑦𝑡
𝑃𝑚𝑦𝑡 (𝑦𝑡−𝜇𝑚(1))2
𝑇 𝑡=1
𝑃 𝑚 𝑦𝑡
where 𝑃𝑖 =
𝑡=1
REPEAT (Step 2 and 3)
Slide 15
Data Mining and Machine Learning

E-M continued  From Bayes’ theorem:
Calculate from mth Gaussian component
This is a measure of how much yt ‘belongs to’ the mth component
 pyt |mPm py 
p y w mtm
Pm | yt 
M
ktk
p y w k1
Sum over all components
Slide 16
Data Mining and Machine Learning
mth
weight

Example
g2 g1x0 g2x0.054
g1
P1|x 00.3 0 00.30.0540.7
P 2 | x   1
Slide 17
x = -3
Data Mining and Machine Learning

Example
g2 g1x0.176 g2x0
g1
P1| x 0.1760.3 1 0.1760.300.7
P2|x0
Slide 18
x= 7
Data Mining and Machine Learning

Example
g1x0.027 g2x0.004
P1| x 0.0270.3  0.723 0.027  0.3  0.004  0.7
P2|x 0.0040.7 0.277 0.027  0.3  0.004  0.7
g2
g1
Slide 19
x =2
Data Mining and Machine Learning

Example (continued)
 So, given these initial estimates of g1 and g2, and data points X = {x1, x2, x3} = {-3, 2, 7}, the new values of μ1 and μ2 are:
m 0x 0.723x 1x 0(3)0.7232174.9
μ1 1
123
00.7231 1.723
m 1x 0.277x 0x 1(3)0.277207 1.92
μ2 2
123
1 0.277  0 1.277
Slide 20
Data Mining and Machine Learning

Example
g2
Original means Data points New means
g1
Slide 21
Data Mining and Machine Learning

E-M and k-means clustering
 Consider:
– Estimating GMM component means in E-M, and – Estimatingcentroidsink-meansclustering
 Notation
– GMMcomponentmeansμ1,…,μM – Clustercentroidsc1,…,cM
 Given a sample y
– E-M: Calculate P(m | y) for each GMM component m – K-means:Calculated(cm,y)foreachcentroidcm
 Reestimation
– E-M: For each m, allocate P(m|yt)yt to reestimation of μm
– K-means: Allocate all of yt to the closest centroid (min{d(cm,yt)})
Slide 22
Data Mining and Machine Learning

E-M and k-means clustering
 In some implementations of E-M, y is used only to reestimate the mean μm for the most probable GMM component m (i.e. max{P(m|y)})
 If the GMM component variances are all equal, and all of the component weights wm are equal, then the following are equivalent:
Slide 23
– m = argmin{d(y, cm)} (cm is closest centroid to y) – m = argmax{P(m|y)} (i.e. m is the most probable
GMM component)
Data Mining and Machine Learning

The E-M algorithm
p(y | )
local optimum
(0)… (i)
Parameter set 
Slide 24
Data Mining and Machine Learning

Example – initial model
P(m1|y6) =λ1
m1
m2
P(m2|y6) =λ2
y6
Slide 25
EE3J2 Data Mining 2013 – lecture 13

Example – after 1st iteration of E-M
Slide 26
EE3J2 Data Mining 2013 – lecture 13

Example – after 2nd iteration of E-M
Slide 27
EE3J2 Data Mining 2013 – lecture 13

Example – after 4th iteration of E-M
Slide 28
EE3J2 Data Mining 2013 – lecture 13

Example – after 10th iteration of E-M
Slide 29
EE3J2 Data Mining 2013 – lecture 13

Summary
 Gaussian mixture PDFs (GMMs)
 Maximum likelihood (ML) parameter estimation – the E-M algorithm
 Comparison of E-M for GMMs with k-means clustering
Slide 30
Data Mining and Machine Learning