CS代考 CS 189 (CDSS offering)

Lecture 6: Multivariate Gaussians CS 189 (CDSS offering)
2022/01/31

Today’s lecture

Copyright By PowCoder代写 加微信 powcoder

Why a whole lecture on multivariate Gaussians?
• Multivariate Gaussians (MVGs, or MVNs for multivariate normals) show up a lot throughout this course and in machine learning in general
• Tractability: sometimes, they are useful for simplifying analyses
• Ubiquity: despite their simplicity, MVGs accurately describe many complex phenomena thanks to the central limit theorem (CLT)
• Also, familiarity with this material will generally improve mathematical maturity and understanding of concepts such as covariance

MVGs: the definition
recall the univariate Gaussian distribution:
the multivariate (“d-dimensional”) extension for x ! “d:
N x m E pgaexp E x putE x gu
I PSD matrix

Covariance and level sets
• The covariance matrix # contains covariances (variances along the diagonal): #ij = cov(Xi, Xj) = [(Xi % [Xi])(Xj % [Xj])]
For Gaussian RVs Xi, Xj, Xi & Xj if, and only if, cov(Xi, Xj) = 0
• When we visualize/draw two-dimensional MVGs, we will often draw level sets
{(x1, x2) : p(x1, x2) = c}
• How do we visualize higher dimensions? We don’t, really…
• For MVGs, level sets are ellipsoids: (x % !)’#%1(x % !) = c( 4

Intuition: shifting, scaling, and rotating
some properties of MVGs and addition of / multiplication by constants:
AEAT because # is PSD, we can invoke the following (spectral) decomposition:
E Qr QT where Q is orthogonal and contains eigenvectors
diagonal and contains eigenvalues

Intuition: shifting, scaling, and rotating
• So, (!, #) = Q*12 (0,I) + !, where # = Q*Q’
• Starting from (0,I), to get to (!, #), we:
• scale the dimensions by *12
• rotate the axes by Q
• shift everything by !

Marginal and conditional distributions
suppose I have a pair of RVs [X] +  !1 , 11 12 ([!][# ,# ])
Y 2 #21,#22
the marginal and conditional distributions of each RV is also Gaussian:
XNCMiE YNMa
XIY NMtEnEiY
En Iii E 7

Entropy and KL divergence
the entropy of a MVG (!, #) is given by: It IIn2Tt IIn E
the KL divergence between two MVGs (!1, #1) and (!2, #2) EJt me m Ilm m dtln

MLE for MVG parameters
we are given  = {x1, …, xN}
assume each data point is generated i.i.d. as Xi + (!,- #-) what are the maximum likelihood estimates !MLE, #MLE?
arg max ElogN Xi M E μI51
a r g m e n MI
argmax on e
ÉI I I x i g u E x i g u t l o g E l
Lixim E xim logEl 9

MLE for MVG parameters
ans IILixigu E xim logEl
use to to reorder the
quadraticform then look up
log det 10

(A form of) the central limit theorem
• Consider a sequence of RVs X1, …, XN drawn i.i.d. from any distribution with mean ! and covariance matrix #
• Informally, the CLT says: N ! Xi is approximately Gaussian, for large N
Straightforward extensions are possible to RVs that are not identically distributed The CLT motivates, e.g., modeling noise in linear regression as Gaussian
The CLT explains the “bell curve” phenomenon observed all throughout nature, e.g., heights, shoe sizes, exam scores, …

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com