Lecture 3:
Maximum likelihood estimation (1) CS 189 (CDSS offering)
2022/01/24
Copyright By PowCoder代写 加微信 powcoder
Today’s lecture
• Today we’ll cover one of the main principles for “learning” (fitting, estimating, …) good parameters for machine learning (and, more generally, statistical) models
• The principle is referred to as maximum likelihood estimation (MLE)
• We will go over the core concepts in this lecture and provide examples
• Next lecture, we will detail how the principle applies to machine learning and discuss some connections to other approaches
Recall: what is machine learning?
• Machine learning has three core components: model, optimization, and data
• The model has parameters that will be optimized (learned)
• The optimization algorithm finds (learns) parameters that are a good fit for the training data
• But how do we define “a good fit”?
• We need a loss function that measures how good parameters are
• The learning objective will then be to find parameters that minimize the loss
Loss functions and objectives
data: xi gi model:fo x
loss function: could be e.g lf
objective: find parameters that minimize the average loss
oh angmin I e yi to xi arson O
I yo to xi
linear model:
The maximum likelihood principle
• First, we require that the model defines a probability distribution over the data, and that this distribution is controlled by the model parameters
• We will focus heavily in this course on such probabilistic models
• We assume that some setting of the parameters generated the observed data
• That is assume, that the observed data were sampled from a distribution corresponding to the specified model with some unknown parameters
• The maximum likelihood principle says: we can recover these parameters through optimization, by maximizing the likelihood of the observed data
Probabilistic models
what does a probabilistic model look like? e.g., from linear to linear-Gaussian:
we now talk in terms of output probabilities
MLE for machine learning?
• MLE is a general tool for statistical inference, and much of machine learning is really just statistical inference
• However, machine learning focuses specifically on inference within models that go from inputs to outputs, rather than, e.g., a population level model of the data
• We also don’t tend to think as hard about whether the model is misspecified (though, perhaps we should)
• Let’s start with a non machine learning overview of MLE, for simplicity
MLE: the basic (non machine learning) setup
given data = {x1, …, xN}
assume a set (family) of distributions on x
assume was sampled from a member of this family, i.e.,
d pg recover
Once YEE Po D 11
for some sort of
goal: objective/definition:
Example: MLE for a univariate Gaussian
• Suppose you were given a set of height measurements, and you believe (justifiably) that the underlying distribution of heights is Gaussian
• How would you estimate the mean and variance of this Gaussian from the data?
• Our first example of MLE in action!
• However, this example isn’t really machine learning — why?
Example: MLE for a univariate Gaussian
2 assume each data point is generated i.i.d. as Xi ” (!,$ “$ )
we are given = {x1,…,xN} O EM
Once argmax ITN xi n o
our goal is to find
how? take the derivative and set to zero (and check second derivatives)
but doing this for a product of terms is tricky and messy…
Example: MLE for a univariate Gaussian
log is a monotonically increasing function:
asmaxm.orbgIN xi m r2
we wish to find
we will proceed by taking (partial) derivatives and setting equal to zero
Example: MLE for a univariate Gaussian
is equal to zero when me Mme
IfXi Joe zTÉÉ xi
the empirical average
N Mme Mine at
to zero fÉ
and multiply by
finally substitute Mace for M
Example: MLE for a multinomial distribution
• Suppose I am rolling a loaded (biased) six sided die, and I want to estimate the probability of each side being on top
• After rolling the die many times, we are now ready to use MLE
• This is still not a machine learning example! We will do that next time
• But this is a more involved mathematical exercise involving Lagrange multipliers
Example: MLE for a multinomial distribution
assume each data point is generated i.i.d. as X ” Multinomial(# $ , …, # $ ) i16
Po xi IOtt po x xx IIIOften IIOn Once 0 281logITOI 0 91 1 IE nu logOn
Lagrangian functions/multipliers for solving constrained optimization problems:
O X EnulogOut X 1 ÉOk 17
Example: MLE for a multinomial distribution
we solve Lagrangian functions by looking for the stationary points:
just the I
constraint
XÉ.nuN Onme
so Once E I
just the empiricalfrequencies
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com