UNSUPERVISED LEARNING
TODAY K MEANS Mixtureof Qaussians Em
tt
Supervised Seitwg Unsupervised Is HARDER
than supervises TECHNIQUES
a
Unsupervised Nolabels allow stronger ASSUMPTIONS
accept weaker GUARANTEES IDEAS Are VALUABLE
K MEANS
Given
Given X X EIRD Integer K alcluster
K 2
Ia 1
De
e.g
de
o
find Assignment of X
to ONE al k clusters
Do
C p Point i en clustery
while d I
C2
How do WE find these Clusters AH
ITERATIVE Approach
¦Ì
w
real
ee a
I
2
my
Randomly Isnt ¦Ì NC
Assign Eney Point to closest closter
foreach i i n Argmin Hub
X ll i C
3
EoEAT UNTIL NOPoints change ¦Ì
Fel K Efg X sit Rj
Compute NEWclusterCENTERS for g i K
j
Comments
DOES K MEANS TERMINATE Yes
JCC ee HX llc 112 decreases monotonically
CSEE NOTES
Does it find A Global minimum Nor necessarily NP HARD
ik MEANS.tt fromgreet Stanford students IMPROVED Apx Rano Through Clever Init
DEFAULT IN SKLEARN
SIDENEE
How do you choose K Xxx
xx
No 0 Xxx
xx
RightANSWER
Xxx Xxx
2 clusters
xx xx
4 clusters
Modeling Question
Mixture of Gaussian
Toy Astronomy Example BASED ON A paperfrow UW
GWEN
x
x
xxx
Both
Emit light AND we observe Photons
DI Assign EACH Pxoton to light souree PLZ j
Probability Point 2 I belongs to Objectj kmeans This Is A SOFT Assignment
Quasars STARS ARE Sources oflight
X
Challenges
Assumed
2 MANYSOURCES sayWE know k al sources Sources have different Intensity wanes
1 Sources ARE well modeled by Gaussian Khj 52
2 WE DO Not ASSUME equal of points per source
UNKNOWN MIXTURE
NB Physicsfolks CAN check if Recovereo VALUES MAKE SENSE
MIXTUREIGAUSSIANST MODEL
SETUP
Id for simplicity
WE OBSERVE POINTS w o sources
ER
MODEL
he
was
OBSERVATION 1 If WE KNEW Clusterlabels Solve w Goa Compute le AND BE DONE CHANENGE WE don’t
Given X X C IR AND Positive Integer k
P sit forEel n fy L K clusters
PLZ j soft Assignment According to the Gmm morsel
PXenz p x I21 PLZ BagesPole i in
DI
find
2 n Multinomial OI oIj o y j i whichsource X z j n N L k j o f g a u s s i a n I n CA a e s o u r c e
thePARAMETERS TO BE found ARE highlighters
WE call Z A hidden or LATENT VARIABLE Z Is
DirectlyObservers
helpful tothink In terms af Samely
1 12
I
10 0.7 0 2 0.3 ¦Ì I lez 2
0,2 02 I fraylf
GMM Algorithm
l STEP
2 I STEP
ABSTRACTLY OUR FIRST EXAMPLE ol EM Algorithm Expectation Maximization
Famous Algo class
levees Larent Values of 2Ci For EACKPOINT
Minnons k means
UPDATE PARAMETERS
E STEP
w
Given DATA CURRENT guess AT PARAMETERS ill 02
PREDICT LATENT VARIABLE 2 Plz jlX jdouo course
Pz yx o¦Ìo BayesRule P Cx d ee r
Plexus1 z 1 loveo P 2 L de
fore i
DI
i
n
Oj
PIX zcis yjd.no PLZ j dur
a
le
exp lxcisg.ee jf Howlikely is x Acconny to Gaussian
o3
how likely point from closter
key Point WE can compute all teams Return w
MSTEI
GHENT W OUR CURRENT Estimate al PLZ j for i i n
e.ge
Og
tTeyIE Wgc K III
fraction ofelements etc
Is clesbay
DI Estimate Observes PARAMETERS SlugMLED
j i Kdistr
leg
MLE let’s MAKE Rigorous
Ewg
Detour A SET
Convexity JENSEN This Is A keyresult we’ll goslowly
R IS CONVEX if for Any 9lb Ed theLINEjoing abIs en I aswell
A
Aa
function
Convex To
Is
convex if its graph is convexLAS A SET NOT convex IN symbols VA CCoil
feel 1 Alafca I1 fcbCEL lb
a
b
Convex
NEED to ChekHabfd
IN
symbols
NOT convex
Given a function f thegraphof f Gf Is defined As Gf IXy y fix
HAECQD
a b Er TatCi1 bed
ft F aZba2b
OI let E Aa t a 2 b
Every Calorie Is Above function
If f twice dilteratiable the f Cx o
Afca
Li 1 fcb FIZ
f is convex
PI fad fz t fTzCaz t fzaca212 SaECaZ
fcz t fCal b z tf 5k lb ZF 33C I2 b Afcath1Hb tf’t 0 teco
fcb
flz yattz i.e Xfcath1Clb Efc2 D defalz
WE say C is fC
fCx
strongly convex if the C Doma f ca o
x2 C Cx X2Cxc 2
z strongly convex graphABOVE Not convex
JENSEN’sInequality fCx 3 FIELD forconvex f
E
X takes VALUE a with prob X
IE fix f Efx
N
can Prove finitely Supporters Distribution
ByInaction
takes
Afca
value b with prob 1 I 1 fcb
I
fCz 2 Ka t Ci 1 b for convex f definition Implies thm In this CASE
stronger if f is strongly convex AND ECtexD FIELD X is a constant Cements Ahmostsurely
WE NEED CONCAVE functions g concave it g Is convex
E gud gCx Log glay
luor.rsBELOW
WHAT About hat ax tb CNDDE.TW
gCECxD
52 On Lo D negative CONVEX concave since hlx7 O
EM Algorithm AS Max likelihood
Picture al Algorithm
do
log PLx WE Assume PLxj Q
o DATA
PARAMETERS
P X Z o al 6mm LATENT VARIABLE
LTE
Ca Ell 07 Lo’t Lto’t
lowerbound
Eighty
gain RoughAlgolE
HowdoweconstructLts
Angfax Q
to
top Lt easier to optimize than lto
GWEN Ott find MSH 2 GWENLt SET
STEP 1
Lt
0
Argnfax LELA let’slook at singlePont
log
W E P i c k
Pdx
L z
Z o
S i t
log
Q C2 3 log
QCHPLx.to QCts
1 A N D Q a o
forany Qtt
Gimme Poming
Pl
E
I logPHILIP JENSEN Clog Is concave QQ dogPCxz DEFofE
3
QQ
Fbs GIVES A family of lowerBOUNDS ONE for Each CHOICE al Q
KeystEL holds for any sock Q Cx
How do we make it tight Select Q to make Inequality tight
whatif
log PCx Z e for some constant then JENSEN’s Is QED
Dt El
Equality
So QCzj PLzlx.se flew
G leg PCK e does not depr.org on 2 so constant
NB Ctl does depend ON 0 4X WE will select a Q t23 for Every Point INDEPENDENTLY
WE DEFINE Evidences BASED lower BOUND CELBO sum OVER 2
to logPLx D QCD
ELBOWQ Z ICE 3
forANy_ Q
PlexZ o P Ixo Pao
WE’VE Stown
lowerBOUND
satisfy
float
ELBO Xli Q 0
IIELBOX Q 041 forCuoiceofQ AoovE
WRAOOD
1CEsire Qz PLZIxjo forEin
2 LM STEP
t
Argfax Left
ELBO X t Q jo
Lf047 3110 41 NOPE SEE Picture
which 4Cf WHY DOES this terminate
IS IT Globally oetmnl
un
WE DERIVED HARD SOFT CLUSTENN mentors C m Algorithm IN terms of MLE