Inference and simulation CompSci 369, 2022
School of Computer Science, University of Auckland
Copyright By PowCoder代写 加微信 powcoder
Last lecture
reviewing expectation and variance More common distributions
Uniform (discrete and continuous) Exponential
This lecture
Modeling systems
Likelihood
Inference via maximum likelihood A few words on Bayesian inference Intro to simulation
Modelling systems
Think of a model of a real process as a black box with parameters which produce some outcome :
The model is not deterministic so the same input can produce di erent outputs. The distribution of outcomes for given parameters has density .
Parameters θ
The likelihood
is known as the likelihood
If is discrete, is a probability distribution function for If is continuous, it is a probability density function. Often write
It sums/integrates to 1 over all possible values of . Since is xed and known, and is unknown
consider function of
Write to emphasize that it is a function of .
D )θ|D(P D
)θ|D( P = )D ;θ(L θ )θ|D( P
Likelihood example: Manufacturing defects
Suppose we are counting non-critical defects in a manufactured items.
We uniformly sample 25 items and count the number of defects and get the data vector
(4, 4, 5, 8, 3, 8, 9, 6, 6, 2, 3, 3, 6, 4, 7, 5, 6, 11, 4, 7, 9, 3, 6, 3, 4).
Model the number of defects as Poisson distributed.
The Poisson distribution has 1 parameter, the rate . So here .
Important assumption: Assume each item is independent of all others. This means we can factorise the likelihood as
1=i )λ|iD(P∏ = )λ|D(P
Likelihood example cont…
Since each is Poisson distributed,
Remember we think of the likelihood as a function of
!iD 1=i 1=i
λ ∏ = )λ|iD(P∏ = )λ|D(P = )D;λ(L
λ−e!iD =)λ|iD(P iDλ
Visualising the likelihood
The high values of the likelihood should be close to the true value of the parameter
(here, the true value is 5).
As we add more data the likelihood function becomes more peaked:
Study another 15 items for defects so that the full data is now 4, 4, 5, 8, 3, 8, 9, 6, 6, 2, 3, 3, 6, 4, 7, 5, 6, 11, 4, 7, 9, 3, 6, 3, 4, 4, 1, 4, 8, 4, 5, 5, 5, 3, 7, 6, 7, 2, 6, 4
Maximum likelihood
One choice for the best estimate for is the one that maximises the likelihood function.
These are parameters under which the observed data is most likely according to the model.
Called the maximum likelihood estimator and often denoted . Ignores any other information that we might have about
)D ;θ(L xaθm gra = )θ|D(f xaθm gra = ^θ θ
Finding the maximum likelihood estimate
The likelihood is just a function, , so can use standard techniques to maximise it such as:
Take derivative and nd zeros of derivative (e.g., analytically or using bisection, Euler etc)
Hill-climbing: start initial guess , choose a new random close to and if
, set . Stop when no longer moving.
′θ ← θ )θ(L > )′θ(L
Work with the log-likelihood rather than the likelihood
The log-likelihood is the logarithm of the likelihood function. Write for likelihood and for log-likelihood
Likelihood can be a very small number, often of form where is large. Want to avoid numerical under ow
is much more manageable and have the same maxima
lL )c(gol x− = )x−ec(gol
)θ(L gol = )θ(l )θ(L
Bayesian inference
Another framework for inference is based on Bayes’ theorem:
We are given data and want to estimate the model parameters.
So want , the probability distribution for the parameters given the data. Contrast with , lthe likelihood
is the posterior distribution of .
use Bayes’ Theorem: [ P(\theta|D) = \frac {P(D|\theta) P(\theta)}{P(D)}, ] where
is the prior distribution of is the likelihood
is a normalisation constant.
Bayesian inference provides a powerful framework for combining multiple forms of data. If we had a couple more weeks we would explore it further in this course but we are a bit short of time.
)D( P )θ|D( P θ )θ(P
)θ|D( P )D|θ( P
Simulation
Why simulate?
Gain an intuitive understanding of how a process operates
Distribution or properties of distribution often analytically unavailable
e.g. in Bayesian inference, the posterior distribution
involves an integral which is, typically, impossible to calculate exactly.
Even when is available, nding some property of it such as an expectation, involves another large integral.
Integration is hard!
Why simulate?
Basic idea is obtain samples of values from distribution of interest and use this sample to estimate properties of distribution.
Example: The mean of the distribution is
This can be approximated by
) i ( θ ∑ 1 = ̄θ ≈ ] θ [ E
Ω∈θ .θd)θ(πθ ∫ = ]θ[E
)θ(π })n(θ , … ,)3(θ ,)2(θ ,)1(θ{
is a univariate random variable.
Suppose we can simulate values of …
but cannot nd the distribution function analytically.
How could we study it?
Let’s draw 1000 samples, .
The rst 20 look like 5.673, 5.222, 3.185, 7.715, 6.249, 1.546, 1.943, 2.45, 2.754, 1.62, 0.575, 5.922, 2.04, 6.3, 5.161, 2.611, 1.993, 4.518, 1.55, 0.392 etc.
})0001(θ , … ,)3(θ ,)2(θ ,)1(θ{ θ
Now make a histogram to get an idea of the shape of the distribution:
Let’s approximate the mean of the distribution by the average of the sampled values:
=)i(θ 1∑0001 = ̄θ≈]θ[E 0001 1
How good are these estimates?
The sample is random so estimates based on it will vary with di erent samples.
As the number of samples increases, the approximation gets better.
In general, if want to estimate by , is normally distributed with mean and variance .
When stated formally, this is known as a central limit theorem (CLT).
So with lots of samples, arbitrarily good approximations to quantities of interest can be made.
Interactive example of the CLT here
n/)g(rav = ) ̄g(rav ]g[E
̄g ) θ(g 1∑ n = )θ( ̄g ])θ(g[E )i( n 1
})n(θ , … ,)3(θ ,)2(θ ,)1(θ{
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com