代写代考 MATH3821: Page 315 / 758

Slide 5.1 / 5.78
5 Likelihood inference and first order asymptotics 5.1 Why asymptotics
5.2 Convergence concepts in asymptotics
5.3 Consistency and asymptotic normality of MLE.

Copyright By PowCoder代写 加微信 powcoder

5.4 Additional comments on asymptotic properties of MLE. 5.5 Delta method
Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 315 / 758

5.1 Why asymptotics Slide 5.2 / 5.78
5.1 Why asymptotics
We have so far focused on obtaining unbiased estimators for parameters with very strong finite-sample optimality properties. The existence of unbiased estimators with uniformly minimum variance, the so-called UMVUE class of estimators, needed the sufficient statistics of the statistical model to be complete.
Also, the derivation of the UMVUE is tricky sometimes, especially when the Cram ́er–Rao bound is not attainable.
Fortunately, methods such as the MLE are easier and more universally applicable and they share some of the optimality properties of the UMVUE asymptotically, i.e., when the sample size n is large.
Let us now recall the definition of the Likelihood Function.
Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 316 / 758

5.1 Why asymptotics Slide 5.3 / 5.78
Denote by X the variable of interest with a pdf f(x,θ), where θ ∈ Rk. Next we denote by (X1, X2, . . . , Xn) a random sample of size n drawn onX. Thepdfofarandomsample f(x1,x2,…,xn,θ)canbeobtained from the knowledge we have about f and, therefore, it depends on θ.
Then the Likelihood Function is mathematically equivalent to the joint density function of the sample. However, instead of being a function of Xi’s, we treat it as a function of θ after the observed values Xi = xi of the sample have been substituted.
We will always assume that Xis are mutually independent. Then the Likelihood Function can be easily obtained as:
L(x, θ) = Chapter 5 : Likelihood inference and first order asymptotics
f (xi, θ).
MATH3821: Page 317 / 758

5.1 Why asymptotics Slide 5.4 / 5.78
There are essentially two different settings when trying to derive MLEs:
Regular case: when the log-likelihood function is differentiable with respect to the parameter, the MLE will be a stationary point, i.e., we need to solve the Equation
∂ logL(x,θ)=0 ∂θ
Then we need to investigate if this stationary point θˆ indeed delivers a maximum. Typically we need to investigate the sign of the second derivative of the log-likelihood function at the stationary point.
When θ is a vector rather than a scalar parameter, we need to set the gradient of logL(x,θ) to zero to find the stationary point and again include further arguments (by investigating the Hessian matrix) to argue that the stationary point maximises the log-likelihood.
Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 318 / 758

5.1 Why asymptotics Slide 5.5 / 5.78
Example 5.36
Let X1, X2, . . . , Xn be a sample (i.i.d.) from observations with support in [0, 1] with the density function
f(x;θ)=θxθ−1, 0 0 is an unknown parameter to be estimated. Find the MLE
of θ. Solution:
1) Write down the Likelihood function:
L(x; θ) = θn(􏰕 xi)θ−1.
Chapter 5 : Likelihood inference and first order asymptotics
MATH3821: Page 319 / 758

5.1 Why asymptotics
Slide 5.6 / 5.78
2) Write down the log-Likelihood function: n
lnL(x;θ) = nlnθ+(θ−1)
3) Calculate the partial derivative and set it to 0 to find a stationary
gives the root
Note that, as expected, θˆ > 0. Chapter 5 : Likelihood inference and first order asymptotics
MATH3821: Page 320 / 758
∂θ ln L(X; θ) = θ + ˆˆ −n
θ = θmle = 􏰐n i=1

5.1 Why asymptotics
Slide 5.7 / 5.78
4) Find the second partial derivative
∂θ2 lnL(x;θ)=−θ2. At the stationary point we get the value
−n <0, θˆ2 which implies that the stationary point θˆ delivers a maximum to the log-likelihood function, i.e., it is the MLE. Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 321 / 758 5.1 Why asymptotics Slide 5.8 / 5.78 Exercise 5.20 Let X ∼ Poisson(λ), where parameter λ > 0 and pmf λxe−λ
f(x,λ) = x! , x ∈ {0,1,2,…}.
Let (X1, . . . , Xn) denote a size n random sample drawn on X. Derive
λˆ, the MLE of λ.
Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 322 / 758

5.1 Why asymptotics Slide 5.9 / 5.78
Irregular Case: Sometimes the likelihood function L(x,θ) is not smooth (and is possibly discontinuous) with respect to the parameter and it is not possible to calculate derivatives for each value of θ. In this case, we need to examine L(x,θ) directly to find the argument θˆ that maximises it.
Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 323 / 758

5.1 Why asymptotics Slide 5.10 / 5.78
Example 5.37
Typically, irregular cases happen when the support of the distribution depends on the unknown parameter. Consider deriving the MLE for the parameter θ of the continuous uniform distribution in [0,θ] by using a sample of n observations from this distribution. Show that the MLE is equal to the maximal of the observations in the sample, i.e., θˆ = x(n).
Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 324 / 758

5.1 Why asymptotics Slide 5.11 / 5.78
Clearly the right-hand side of the support depends on the parameter. We have investigated L(x, θ) before. Using indicator function notation, it was equal to:
L(x,θ)= 1I(x ,∞)(θ) θn (n)
Considered as a function of θ, this is a discontinuous function. It is
equal to zero when θ < x(n). From x(n) onwards, L(x, θ) just coincides with the function 1 . Since this is monotonically decreasing in θ, the θn maximum of L(x,θ) over the interval [0,∞) is attained at θˆ = x(n). That is, the maximal of the n observations in the sample is the MLE of θ in this example. Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 325 / 758 5.1 Why asymptotics Slide 5.12 / 5.78 Exercise 5.21 Let X1,X2,...,Xn be a sample from the density f (x; θ) = θx−2I[θ,∞)(x) where θ > 0. Find the MLE of θ.
Chapter 5 : Likelihood inference and first order asymptotics
MATH3821: Page 326 / 758

5.1 Why asymptotics Slide 5.13 / 5.78
The likelihood function is:
L(X; θ) = θn 􏰕 x−2I (x
i [θ,∞) (1) i=1
Note that now the smallest of the n observations (x(1)) comes into play. Indeed, since now θ is the left-hand side end of the support, we need all observations to be not smaller than θ, which logically implies that even the smallest of them should be not smaller than θ.
We consider L as a function of θ after the sample has been substituted. When θ moves on the positive half-axis, this function first grows monotonically (when θ moves between 0 and x(1)) and then drops to zero onward since the indicator becomes equal to zero. Hence L is a discontinuous function of θ and its maximum is attained at x(1). This means that θˆmle = X(1).
Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 327 / 758

5.2 Convergence concepts in asymptotics Slide 5.14 / 5.78
5.2 Convergence concepts in asymptotics
We realised that finding the UMVUE for a fixed sample size n could be difficult in some cases especially when the CR bound is not attainable.
Finding them requires some skills, and there is no easy-to-follow constructive algorithm for their determination.
On the other hand, the MLEs are typically easy to construct by following a general recipe of optimising either directly the Likelihood or the log-likelihood function, i.e., by following an easy general recipe.
Both the regular and the irregular case for deriving MLE are clearly formulated algorithmically and typically are easy to deal with.
Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 328 / 758

5.2 Convergence concepts in asymptotics Slide 5.15 / 5.78
It should be pointed out that sometimes the MLE could be biased or, even if unbiased, could not attain the CR bound when outside the exponential family setting.
Yet, it is simpler to work with the MLEs and, as shown in the many examples below, usually the UMVUE is just a bias-corrected MLE.
Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 329 / 758

5.2 Convergence concepts in asymptotics Slide 5.16 / 5.78
Some examples to confirm this statement follow. These examples have either already been discussed and are just summarised here, or are left as an easy-to-do exercise for you.
UMVUE for the variance of Bernoulli trials was X ̄(1−X ̄) n n−1
whereas the MLE is X ̄ (1 − X ̄ ).
UMVUE for the endpoint θ of uniform (0,θ) distribution was
n+1 X whereas the MLE is X . n (n) (n)
UMVUE for the probability of no occurrence based on n indepen- dent Poisson random variables: (1 − 1n )nX ̄ whereas the MLE is e x p ( − X ̄ ) .
Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 330 / 758

5.2 Convergence concepts in asymptotics Slide 5.17 / 5.78
The bias-correction itself tends to be negligible as the sample size increases. Therefore the UMVUEs are either MLEs or almost MLEs. Hence, it is justified to look for a strong backing of the properties of MLEs in a general setting.
This can be done using asymptotic arguments, i.e. by looking at the performance of MLEs when n → ∞, i.e. by letting the amount of information become arbitrarily large.
Statistical folklore says then that“nothing can beat the MLE asymp- totically”.
When trying to defend the MLE on asymptotic grounds, we need some concepts about convergence of random variables and random vectors (since the MLE is a random variable in the univariate case and a random vector in the multivariate case).
Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 331 / 758

5.2 Convergence concepts in asymptotics
Slide 5.18 / 5.78
We briefly discuss some stochastic convergence concepts first. An estimator Tn of the parameter θ is said to be:
i) consistent (or weakly consistent) if
lim Pθ(| Tn − θ |> ε) = 0
n→∞ forallθ∈Θandforeveryfixedε>0. Wedenotethisby
and say that Tn tends to θ in probability.
In other words, considering that the complementary probability Pθ(| Tn − θ |≤ ε) tends to one when n → ∞, we can say that the consistency implies that the probability of the estimator being in an ε neighbourhood of the true unknown parameter tends to one no matter how small the ε > 0, as long as the sample size is large enough.
Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 332 / 758

5.2 Convergence concepts in asymptotics
Slide 5.19 / 5.78
ii) strongly consistent if
Pθ{ lim Tn = θ} = 1
iii) mean-square consistent if
MSEθ(Tn) →n→∞ 0
for all θ ∈ Θ
for all θ ∈ Θ
Chapter 5 : Likelihood inference and first order asymptotics
MATH3821: Page 333 / 758

5.2 Convergence concepts in asymptotics Slide 5.20 / 5.78
Example 5.38
For a random sample of size n from a N(θ, σ2) population, the sample mean X ̄n is proposed as an estimator of θ. In this example we will show that X ̄n is a consistent estimator of θ.
We know that the exact distribution of X ̄ n is standardisation, this implies that
Z = √ n X ̄ − θ σ
is standard normally distributed (Z ∼ N(0,1)).
N (θ, 1n σ2 ).
Chapter 5 : Likelihood inference and first order asymptotics
MATH3821: Page 334 / 758

5.2 Convergence concepts in asymptotics Slide 5.21 / 5.78
If Φ denotes the cdf of the standard normal distribution then we have
Since clearly Φ(
􏰉 √ n ( X ̄ − θ ) Pθ(|X ̄−θ|>ε)=2Pθ σ
√ n ε 􏰊 > σ
􏰉 √nε􏰊 =2PZ> σ
􏰉 􏰉 √nε 􏰊􏰊 =21−Φ σ
nε ) → 1 as n → ∞ we get the consistency statement
Pθ(|X ̄ − θ| > ε) → 0. Hence, X ̄n →P θ and X ̄n is consistent for θ.
Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 335 / 758

5.2 Convergence concepts in asymptotics Slide 5.22 / 5.78
Remark 5.13
It is important to note that if the estimator is mean-square consistent then it is also consistent. This relation has probably the most impor- tant practical consequence.
The reason is that, most often, we are interested in weak consistency and a common method that often works in proving it, is by showing mean-square consistency first.
Note also that strong consistency implies weak consistency.
Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 336 / 758

5.2 Convergence concepts in asymptotics Slide 5.23 / 5.78
To justify the relation between mean-square consistency and consis- tency we can use the Chebyshev Inequality. It states that for any random variable X and any ε > 0 it holds for the k-th moment:
P(|X| > ε) ≤ E(|X|k). εk
Applying this inequality for X being Tn − θ and k = 2 we get 0≤P(|Tn−θ|>ε)≤ MSEθ(Tn).
Therefore, if an estimator Tn is mean-square consistent and the right- hand side tends to zero, the left-hand side will also tend to zero thus implying consistency.
Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 337 / 758

5.2 Convergence concepts in asymptotics Slide 5.24 / 5.78
Example 5.39
LetX∼Uniform(0,1)withpdf f(x)=1for0 ε) ≤ θ2 − θ + 41 n→∞ ε2
Chapter 5 : Likelihood inference and first order asymptotics
MATH3821: Page 339 / 758

5.2 Convergence concepts in asymptotics Slide 5.26 / 5.78
It follows from the definition of convergence in probability that
X ̄ n →P 12
and ensures, due to uniqueness, that X ̄n cannot converge in probability to any other point in the parameter space. X ̄n is a consistent estimator of θ0 = 21.
Additionally, we have E[X] = 1/2. Thus, the sample mean X ̄n is a consistent estimator of the population mean.
The above example is suggestive of a more general result encapsulated in a set of theorems known as the Law of Large Numbers.
Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 340 / 758

5.2 Convergence concepts in asymptotics Slide 5.27 / 5.78
Let {Xn}∞n=1 be a sequence of mutually independent and identically distributed random variables with finite mean μ. The sample mean:
There is one more form of convergence of random variables. It is called convergence in distribution and is the weakest form of convergence.
It follows from any of the three convergences discussed above. Not surprisingly it is called a weak convergence (or convergence in distri- bution).
Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 341 / 758

5.2 Convergence concepts in asymptotics Slide 5.28 / 5.78
Assume that the sequence of random variables X1, X2, . . . , Xn,. . . have cumulative distribution functions F1,F2,…,Fn,… respectively. As- sume the continuous random variable X has a cdf F and that it holds for each argument x ∈ R that
lim Fn(x) = F(x). n→∞
Then we say that the sequence of random variables {Xn}, n = 1, 2, . . . converges weakly (or in distribution) to X and denote this fact by
X n →d X .
In practice, however, the distribution of a sequence of random variables is often not available. In this case, if the variables in the sequence are used to form sums and averages, the limiting distribution can often be derived by applying the Central Limit Theorem.
Chapter 5 : Likelihood inference and first order asymptotics MATH3821: Page 342 / 758

5.2 Convergence concepts in asymptotics Slide 5.29 / 5.78
Theorem 5.19 (Central Limit Theorem)
Let the random variables in the sequence {Xn}∞n=1 be independent and identically distributed, each with finite mean μ and finite variance σ2. Then the random variable
S n − μ n √ n ( X ̄ − μ ) √nσ= σ
converges in distribution to a random variable Z ∼ N(0,1). Here
S n = 􏰔 X i = n X ̄ .
Chapter 5 : Likelihood inference and first order asymptotics
MATH3821: Page 343 / 758

5.2 Convergence concepts in asymptotics Slide 5.30 / 5.78
Example 5.40
Let X ∼ Uni f orm(0, 1), the Uniform distribution on the interval (0, 1) has pdf
f(x)=1 for 0CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com