CS代写 Introduction to Machine Learning Maximum Likelihood Estimates:

Introduction to Machine Learning Maximum Likelihood Estimates:
(Spherical) Gaussians and GMMs
Prof. Kutty

MLE for single distributions
use this link for in-class exercises
https://forms.gle/jqAdK1sSMhcx6zDHA

Generative Models
• Goal here is to understand the internal structure of the process that generated the dataset
• We assume data are generated i.i.d. from an unknown distribution that has parameter “̅
For data that is • binary
• continuous from !
• continuous from !!
Bernoulli univariate Gaussian
spherical Gaussian
more complex distributions mixtures of Gaussians

Maximum Likelihood Estimation (MLE)

MLE for Bernoulli Given !” = #($) ” where each #($) ∈ {0,1}
• each #($) ~ Bern(#; 1)
i.e., each #($) = 1 with probability 1 and
(identically distributed)
#($) = 0 with probability 1 − 1
• ∀5≠7 1 #$ ,#( =Bern(#$ ;1)Bern(#$ ;1)
(independently distributed)
e.g.,1 # ‘ =8,# ) =9,# * =8,# + =8 =1*(1−1)
Consequently
1!” =:1(#$)
Goal: Determine 1 % e.g., ,”#$ = &

Maximum Likelihood Estimate: intuition
each datapoint was drawn from the same ‘bell curve’
we assume data are generated i.i.d. from an unknown Gaussian
distribution that has parameters #, %!
Use MLE to determine the likeliest parameter values, given the dataset
!”#,%! = 1 exp−”−#! 2)%! 2%!

generative story with i.i.d. assumption ” for univariate Gaussian
Given .” = “($) $&’ each “($) ∈ R drawn iid
• each “($) ~ 2(“|#, %!) (identically distributed)
• ∀7≠9 : “$ ,”( Consequently,
=2(“$ |#,%!)2(“( |#,%!)
(independently distributed)
Goal: Determine #, %!
• Want to maximize : .”
• Want to maximize : .”
:.” =;2(“$|#,%!) $&’
wrt #! wrt %

MLE for the univariate Gaussian
• Given!!=#(#)! drawniid #%&
$!! =%$## #%&
“#$,&! = 1 exp−#−$! 2*&! 2&!
• Wanttomaximize$!! wrt3 ! ##
3 ‘ ( ) = 4# % & 5
!##−3* 6* =4 ‘()
• Want to maximize $ !!

Multivariate Gaussian Distribution
for .̅ ∈ R! 2 ≥ 2
Here .̅ ∈ R(

Multivariate Gaussian (normal) Distribution
d by 1 mean vector
d by d covariance matrix
N ( x ̄ | μ ̄ , ⌃ ) = 1 e x p [ 1 ( x ̄ μ ̄ ) T ⌃ 1 ( x ̄ μ ̄ ) ] (2⇡)d/2|⌃|1/2 2
e.g., for d=2
$̅ = 0 #” #!
Σ = 0 #̅ − $̅ #̅ − $̅ # =
Σ$% measures the covariance between #& and #’

What does the pdf look like?
e.g., for d=2

Contour Plots
d by 1 mean vector
d by d covariance matrix
N ( x ̄ | μ ̄ , ⌃ ) = 1 e x p [ 1 ( x ̄ μ ̄ ) T ⌃ 1 ( x ̄ μ ̄ ) ] (2⇡)d/2|⌃|1/2 2
e.g., for d=2

Spherical Gaussian Distribution

Maximum Likelihood Estimate
We assume data are generated i.i.d. from an unknown spherical Gaussian distribution that has parameters #̅, %!
Use MLE to determine the likeliest parameter values, given the dataset spherical Gaussian
d by 1 mean vector
d by d covariance matrix
N ( x ̄ | μ ̄ , ⌃ ) = 1 e x p [ 1 ( x ̄ μ ̄ ) T ⌃ 1 ( x ̄ μ ̄ ) ] (2⇡)d/2|⌃|1/2 2
Spherical Gaussian ⌃ = 2Id
has one free parameter

Likelihood of the Spherical Gaussian
N ( x ̄ | μ ̄ , 2 ) = 1 e x p 1 2 | | x ̄ μ ̄ | | 2 (2⇡2)d/2 2
• Given &2 = (̅(4) 2 467
drawn iid according to )((̅|#̅, %!) Want to maximize 8 &2 wrt parameters “̅ = (#̅, %!)
iid2211! 8&2 ==8(̅4 == 8exp −2%! (̅4 −#̅
Instead of maximize Pfa wrt t o we maximize j in posh wrt Fe
increasing function

Log Likelihood of the Spherical Gaussian
N ( x ̄ | μ ̄ , 2 ) = 1
e x p 1 2 | | x ̄ μ ̄ | | 2 2
(2⇡2)d/2 ;(!”;=̅,>))=ln1!” =ln:1#̅$
=ln: 1 “$&’ 2A>)5)
exp − 1 2>)
since place is givenby ea
nAB enAtInB
InA c laA enat lnA
=Dln 1exp−1#̅$−=̅)
=D ln $ & ‘
) 5 2>) 2A> )
1 +lnexp − 1 #̅$ −=̅ ) 2 A > ) 5) 2 > )
“F1) ;(!;=̅,>))=D − lG2A>) − #̅$ −=̅
m”um $&’ 2 2>)

Spherical Gaussian: MLE of the mean “̅
Data drawn iid 6) = .̅(+) ) Log likelihood
7(6);9̅,;()=< −2log2A;( −2;( .̅+ −9̅ +-' H;(!;=̅,>)) “H−FlG2A>) H 1)#̅$−=̅) ” =D2 −2>
H=̅ $&’ H=̅ H=̅
1 “H #̅$−=̅) = − 2 > ) D$ & ‘ H = ̅
ÉMance n Mmale
1″1″E =−2>)D2#̅$ −=̅ −1 =>)D#̅$ −=̅
$&’ $&’ CD(E;GF,H)) ‘
Set ( = ∑” #̅$ −=̅ =0andsolvefor=̅. =̅ = *+, CGF H) $&’ IJK ”

Spherical Gaussian: MLE of the variance !!
Data drawn iid 6) = .̅(+) ) Log likelihood
7(6);9̅,;()=< −2lN2A;( −2;( .̅+ −9̅ +-' $&' 2O 2O! NL " "̅$−#̅! = − 2 O + K$ & ' 2 O ! )*(+.;.-,0) "1 " Set )0 =−!0 +∑$&' GH(. ;#̅,%!) " letO=;( 251 ta = K −L1+ "̅$ −#̅ 3̅ / 4.- 0 !00 =0 andsolveforO. ∑" "̅$−#̅ ! %!567 = $&' MLE for the spherical Gaussian • Given ." = "($) " drawn iid $&' • Wanttomaximize: ." • Want to maximize : ." :." =;:"$ $&' wrt# " #̅567 = ∑$&' "̅ ∑" "̅$−#̅ ! %!567 = $&' try it yourself >> import numpy as np
>> mu, sigma = 0, 0.1 # mean and standard deviation
>> s = np.random.normal(mu, sigma, 1000)
What is the MLE of the mean of the data you generated?
What (if any) changes do you observe for different settings of sigma?
use this link for in-class exercises
https://forms.gle/jqAdK1sSMhcx6zDHA
and datapoints

Mixture Distributions
Mixture Model Catana

Why Mixture of Distributions?
A single Gaussian is not a good fit for this dataset

Mixture of Distributions
In this model each datapoint (̅(4) is assumed to be generated from a mixture of B distributions.

MLE of a single Gaussian: intuition

MLE of GMM with known labels: intuition
parameter types
mixing coefficients
means of the k Gaussian 2
k Gaussian
variances of the
can also determine relative chance of each Gaussian

MLE of GMM with known labels: Example
tttt 2.1, 0, 3.5, −1, 1.5, 2.5, −0.5, 0.05,1, −2, 0, 1, −2, 1.1, −0.5, −0.03

MLE for GMMs with known labels
Define indicator function
C D E) = F1 if .̅(+) belongs to cluster j
0 otherwise
Log likelihood objective
lnXPr .̅(+) [̅)=