Introduction to Machine Learning Maximum Likelihood Estimates:
(Spherical) Gaussians and GMMs
Prof. Kutty
Copyright By PowCoder代写 加微信 powcoder
MLE for single distributions
use this link for in-class exercises
https://forms.gle/jqAdK1sSMhcx6zDHA
Generative Models
• Goal here is to understand the internal structure of the process that generated the dataset
• We assume data are generated i.i.d. from an unknown distribution that has parameter “̅
For data that is • binary
• continuous from !
• continuous from !!
Bernoulli univariate Gaussian
spherical Gaussian
more complex distributions mixtures of Gaussians
Maximum Likelihood Estimation (MLE)
MLE for Bernoulli Given !” = #($) ” where each #($) ∈ {0,1}
• each #($) ~ Bern(#; 1)
i.e., each #($) = 1 with probability 1 and
(identically distributed)
#($) = 0 with probability 1 − 1
• ∀5≠7 1 #$ ,#( =Bern(#$ ;1)Bern(#$ ;1)
(independently distributed)
e.g.,1 # ‘ =8,# ) =9,# * =8,# + =8 =1*(1−1)
Consequently
1!” =:1(#$)
Goal: Determine 1 % e.g., ,”#$ = &
Maximum Likelihood Estimate: intuition
each datapoint was drawn from the same ‘bell curve’
we assume data are generated i.i.d. from an unknown Gaussian
distribution that has parameters #, %!
Use MLE to determine the likeliest parameter values, given the dataset
!”#,%! = 1 exp−”−#! 2)%! 2%!
generative story with i.i.d. assumption ” for univariate Gaussian
Given .” = “($) $&’ each “($) ∈ R drawn iid
• each “($) ~ 2(“|#, %!) (identically distributed)
• ∀7≠9 : “$ ,”( Consequently,
=2(“$ |#,%!)2(“( |#,%!)
(independently distributed)
Goal: Determine #, %!
• Want to maximize : .”
• Want to maximize : .”
:.” =;2(“$|#,%!) $&’
wrt #! wrt %
MLE for the univariate Gaussian
• Given!!=#(#)! drawniid #%&
$!! =%$## #%&
“#$,&! = 1 exp−#−$! 2*&! 2&!
• Wanttomaximize$!! wrt3 ! ##
3 ‘ ( ) = 4# % & 5
!##−3* 6* =4 ‘()
• Want to maximize $ !!
Multivariate Gaussian Distribution
for .̅ ∈ R! 2 ≥ 2
Here .̅ ∈ R(
Multivariate Gaussian (normal) Distribution
d by 1 mean vector
d by d covariance matrix
N ( x ̄ | μ ̄ , ⌃ ) = 1 e x p [ 1 ( x ̄ μ ̄ ) T ⌃ 1 ( x ̄ μ ̄ ) ] (2⇡)d/2|⌃|1/2 2
e.g., for d=2
$̅ = 0 #” #!
Σ = 0 #̅ − $̅ #̅ − $̅ # =
Σ$% measures the covariance between #& and #’
What does the pdf look like?
e.g., for d=2
Contour Plots
d by 1 mean vector
d by d covariance matrix
N ( x ̄ | μ ̄ , ⌃ ) = 1 e x p [ 1 ( x ̄ μ ̄ ) T ⌃ 1 ( x ̄ μ ̄ ) ] (2⇡)d/2|⌃|1/2 2
e.g., for d=2
Spherical Gaussian Distribution
Maximum Likelihood Estimate
We assume data are generated i.i.d. from an unknown spherical Gaussian distribution that has parameters #̅, %!
Use MLE to determine the likeliest parameter values, given the dataset spherical Gaussian
d by 1 mean vector
d by d covariance matrix
N ( x ̄ | μ ̄ , ⌃ ) = 1 e x p [ 1 ( x ̄ μ ̄ ) T ⌃ 1 ( x ̄ μ ̄ ) ] (2⇡)d/2|⌃|1/2 2
Spherical Gaussian ⌃ = 2Id
has one free parameter
Likelihood of the Spherical Gaussian
N ( x ̄ | μ ̄ , 2 ) = 1 e x p 1 2 | | x ̄ μ ̄ | | 2 (2⇡ 2)d/2 2
• Given &2 = (̅(4) 2 467
drawn iid according to )((̅|#̅, %!) Want to maximize 8 &2 wrt parameters “̅ = (#̅, %!)
iid2211! 8&2 ==8(̅4 == 8exp −2%! (̅4 −#̅
Instead of maximize Pfa wrt t o we maximize j in posh wrt Fe
increasing function
Log Likelihood of the Spherical Gaussian
N ( x ̄ | μ ̄ , 2 ) = 1
e x p 1 2 | | x ̄ μ ̄ | | 2 2
(2⇡ 2)d/2 ;(!”;=̅,>))=ln1!” =ln:1#̅$
=ln: 1 “$&’ 2A>)5)
exp − 1 2>)
since place is givenby ea
nAB enAtInB
InA c laA enat lnA
=Dln 1exp−1#̅$−=̅)
=D ln $ & ‘
) 5 2>) 2A> )
1 +lnexp − 1 #̅$ −=̅ ) 2 A > ) 5) 2 > )
“F1) ;(!;=̅,>))=D − lG2A>) − #̅$ −=̅
m”um $&’ 2 2>)
Spherical Gaussian: MLE of the mean “̅
Data drawn iid 6) = .̅(+) ) Log likelihood
7(6);9̅,;()=< −2log2A;( −2;( .̅+ −9̅ +-'
H;(!;=̅,>)) “H−FlG2A>) H 1)#̅$−=̅) ” =D2 −2>
H=̅ $&’ H=̅ H=̅
1 “H #̅$−=̅) = − 2 > ) D$ & ‘ H = ̅
ÉMance n Mmale
1″1″E =−2>)D2#̅$ −=̅ −1 =>)D#̅$ −=̅
$&’ $&’ CD(E;GF,H)) ‘
Set ( = ∑” #̅$ −=̅ =0andsolvefor=̅. =̅ = *+, CGF H) $&’ IJK ”
Spherical Gaussian: MLE of the variance !!
Data drawn iid 6) = .̅(+) ) Log likelihood
7(6);9̅,;()=< −2lN2A;( −2;( .̅+ −9̅ +-'
$&' 2O 2O!
NL " "̅$−#̅! = − 2 O + K$ & ' 2 O !
)*(+.;.-,0) "1 " Set )0 =−!0 +∑$&'
GH(. ;#̅,%!) "
letO=;( 251 ta
= K −L1+ "̅$ −#̅
3̅ / 4.- 0 !00
=0 andsolveforO.
∑" "̅$−#̅ !
%!567 = $&'
MLE for the spherical Gaussian
• Given ." = "($) " drawn iid $&'
• Wanttomaximize: ."
• Want to maximize : ."
:." =;:"$ $&'
wrt# " #̅567 = ∑$&' "̅
∑" "̅$−#̅ !
%!567 = $&'
try it yourself
>> import numpy as np
>> mu, sigma = 0, 0.1 # mean and standard deviation
>> s = np.random.normal(mu, sigma, 1000)
What is the MLE of the mean of the data you generated?
What (if any) changes do you observe for different settings of sigma?
use this link for in-class exercises
https://forms.gle/jqAdK1sSMhcx6zDHA
and datapoints
Mixture Distributions
Mixture Model Catana
Why Mixture of Distributions?
A single Gaussian is not a good fit for this dataset
Mixture of Distributions
In this model each datapoint (̅(4) is assumed to be generated from a mixture of B distributions.
MLE of a single Gaussian: intuition
MLE of GMM with known labels: intuition
parameter types
mixing coefficients
means of the k Gaussian 2
k Gaussian
variances of the
can also determine relative chance of each Gaussian
MLE of GMM with known labels: Example
tttt 2.1, 0, 3.5, −1, 1.5, 2.5, −0.5, 0.05,1, −2, 0, 1, −2, 1.1, −0.5, −0.03
MLE for GMMs with known labels
Define indicator function
C D E) = F1 if .̅(+) belongs to cluster j
0 otherwise
Log likelihood objective
lnXPr .̅(+) [̅)=
MLE solution (given “cluster labels”):
IH=∑2 CDE) 9 467
K 9 = 2: 8 2
#̅(9)=7∑2 CD E)(̅(4) 2:8 467
number of points assigned to cluster j
fraction of points assigned to cluster j
mean of points in cluster j
%!=7∑2 CDE)(̅4−#̅(9) 9 82:8 467
spread in cluster j
MLE for GMMs with known labels
In general, ! ” # is unknown!
Expectation Maximization for GMMs
MLE of GMM with unknown labels: intuition
initial guess
do soft clustering
MLE of GMM with unknown labels: soft clustering intuition
MLE of GMM with unknown labels: recompute parameters intuition
…and repeat
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com