ECONOMETRICS I ECON GR5411
Lectures 21 – MLE by
Seyhan Erden Columbia University MA in Economics
Maximum Likelihood Estimation:
Probability density function (pdf) of a RV, 𝑦, conditioned on a set of parameters, 𝜃, is denoted as 𝑓 𝑦 𝜃 . This function
𝑓𝑦𝜃
identifies the data generating process that underlies an observed sample of data and, at the same time, provides a mathematical description of the data that the process will produce.
11/30/20 Lectures 21 – GR5411 by Seyhan Erden 2
The joint density of n independent and identically distributed (i.i.d.). observations from this process is the product of individual densities,
(
𝑓𝑦%,…,𝑦(𝜃 =*𝑓𝑦+|𝜃 =𝐿(𝜃|𝑦) +,%
This joint density is the likelihood function.
Note that we write the joint density as a function of the data conditioned on the parameters 𝑓 𝑦 𝜃 whereas the likelihood function in reverse, as a function of parameters conditioned on data 𝐿(𝜃|𝑦) . Reason?
11/30/20
Lectures 21 – GR5411 by Seyhan Erden 3
Reason?
Two functions the joint density and the likelihood function are the same, but we write the likelihood as function of parameters conditioned on data 𝐿(𝜃|𝑦), in order to emphasize our interest in the parameters and the information about them that is contained in the observed data.
11/30/20 Lectures 21 – GR5411 by Seyhan Erden 4
It is simpler to work with the log of the likelihood function (
𝐿𝑛𝐿𝜃𝑦 = 2𝐿𝑛𝑓𝑦+|𝜃 +,%
Again, to emphasize our interest in the parameters, given the observed data, we denote 𝑳 𝜽 𝒅𝒂𝒕𝒂 = 𝑳 𝜽 𝒚 , the likelihood function and its logarithm, evaluated at 𝜽.
Sometimes, this is denoted as just 𝐿(𝜃) and 𝐿𝑛 𝐿(𝜃), respectively, or when it is clear, just 𝐿 and 𝐿𝑛 𝐿
11/30/20 Lectures 21 – GR5411 by Seyhan Erden 5
Example 1: Poisson Distribution
Probability density function of Poisson distribution is
𝑓 𝑦+|𝜃 = 𝑒:;𝜃<= 𝑦+!
Consider a random sample of the following 10 observations from Poisson distribution: 5,0,1,1,0,3,2,3,4,1.
Because the observations are independent their joint density, which is the likelihood for this sample, is
𝑓 𝑦%,𝑦?,...,𝑦%@ 𝜃 =*𝑓 𝑦+|𝜃 = +,%
∏%+@𝑦+!
%@ 𝑒:%@;𝜃∑B= C <=
𝑒:%@;𝜃?@ = 207,360
11/30/20 Lectures 21 - GR5411 by Seyhan Erden
6
Then the log likelihood is
((
𝐿𝑛𝐿𝜃|𝑦 =−𝑛𝜃+𝐿𝑛𝜃2𝑦+−2𝐿𝑛𝑦+! +,% +,%
We maximize with respect to 𝜃 The F.O.C is
𝜕𝐿𝑛 𝐿 𝜃|𝑦 𝜕𝜃
1 ( =−𝑛+𝜃2𝑦+ =0
+,%
11/30/20
Lectures 21 - GR5411 by Seyhan Erden 7
1( −𝑛+𝜃2𝑦+ =0
+,%
𝜃N O P = 𝑦Q .
For the random sample of 10 observations from Poisson distribution: 5,0,1,1,0,3,2,3,4,1. We have the joint density of Poisson distribution given as;
𝑓 𝑦 % , ... , 𝑦 ( 𝜃 = 𝐿 𝜃 | 𝑦 = 𝑒 : ( ; 𝜃 ∑ R= < = ∏ (+ 𝑦 + !
((
S.O.C is
𝐿𝑛𝐿𝜃|𝑦 =−𝑛𝜃+𝐿𝑛𝜃2𝑦+−2𝐿𝑛𝑦+! +,% +,%
𝐿𝑛 𝐿 𝜃|𝑦 = −10𝜃 + 20𝐿𝑛𝜃 − 12.242
𝜕𝐿𝑛𝐿 𝜃|𝑦 =−10+20=0 ⟹𝜃UOP =2 𝜕𝜃 𝜃
𝑑?𝐿𝑛𝐿 𝜃|𝑦 =−20<0 ⟹ 𝑡h𝑖𝑠𝑖𝑠𝑎𝑚𝑎𝑥𝑖𝑚𝑢𝑚
11/30/20
𝑑𝜃? 𝜃?
Lectures 21 - GR5411 by Seyhan Erden 8
Example 2: Normal Distribution
Probability density function (pdf) of a normally distributed random variable is
? 2𝜋𝜎? 2𝜎?
𝑓𝑦+𝜃= 1 𝑒𝑥𝑝−𝑦+−𝜇
Joint pdf the can be written as
(∑ 𝑓(𝑦%,𝑦?,...,𝑦(|𝜃) =*𝑓 𝑦+|𝜃 = 2𝜋𝜎? :(/?𝑒𝑥𝑝 −
+,%
? 2𝜎?
𝑦+ − 𝜇
11/30/20
Lectures 21 - GR5411 by Seyhan Erden
9
(
𝐿𝑛𝐿𝜃|𝑦 =2𝐿𝑛𝑓𝑦+|𝜃 +,%
1 (
=−22 𝐿𝑛𝜎?+𝐿𝑛2𝜋 +
+,%
𝑦+ − 𝜇 𝜎?
?
? 2 2 2𝜎?
=−𝑛𝐿𝑛2𝜋 −𝑛 𝐿𝑛𝜎? − ∑𝑦+−𝜇
11/30/20
Lectures 21 - GR5411 by Seyhan Erden
10
We maximize with respect to 𝜃 (parameters are 𝜇 and 𝜎? here) The F.O.C is
and
𝜕𝐿𝑛 𝐿 𝜃|𝑦 𝜕𝜇
=0
𝜕𝐿𝑛 𝐿 𝜃|𝑦 𝜕𝜎?
=0
11/30/20
Lectures 21 - GR5411 by Seyhan Erden 11
𝜕𝐿𝑛𝐿𝜃|𝑦 =+1 ∑2𝑦+−𝜇 =∑𝑦+−𝜇 𝜕𝜇2𝜎? 𝜎?
∑
𝜎?
𝑦+ − 𝜇
=0 (1)
? 𝜕𝜎? 2𝜎? 2 𝜎? ?
𝜕𝐿𝑛𝐿𝜃|𝑦 =− 𝑛 +∑𝑦+−𝜇
𝑛 ∑ 𝑦+ − 𝜇 ? −2𝜎?+2𝜎?? =0
(2)
11/30/20 Lectures 21 - GR5411 by Seyhan Erden
12
From (1)
∑
𝜎?
𝑦+ − 𝜇
=0 2𝑦+−𝜇 =0
2 𝑦+ − 𝑛𝜇 = 0
𝜇̂OP =∑𝑦+ 𝑛
11/30/20
Lectures 21 - GR5411 by Seyhan Erden
13
From(2) 𝑛 ∑𝑦+−𝜇? −2𝜎?+2𝜎?? =0
−𝑛 +
∑ 𝑦+ − 𝜇 ?
𝜎? = 0
Let𝑥+ be 𝑖.𝑖.𝑑. for𝑖=1,2,...,𝑛where𝑓 𝑥+ (exponential) with restrictions Ω = 𝜃|𝜃 > 0 maximum likelihood estimator for 𝜃.
=𝜃𝑒
?
OP 𝑛 :;g
𝜎f? =∑
𝑦+ − 𝜇
=
and 𝑥 > 0. Find the
11/30/20 Lectures 21 – GR5411 by Seyhan Erden
14
Example 3: Exponential Distribution
Let 𝑥+ be 𝑖.𝑖.𝑑. for 𝑖 = 1,2,…,𝑛 where 𝑓 𝑥+ = 𝜃𝑒:;g= with restrictions Ω = 𝜃|𝜃 > 0 and 𝑥 > 0. Find the maximum likelihood estimator for 𝜃.
Joint density for this is ( :; ∑ g 𝑓𝑥%,𝑥?,…,𝑥( =𝜃𝑒
which also is the likelihood function
𝐿𝜃𝑥 =𝜃(𝑒:;∑g=
The log likelihood function:
=
𝐿𝑛𝐿𝜃𝑥 =𝑛𝐿𝑛𝜃−𝜃∑𝑥+
11/30/20 Lectures 21 – GR5411 by Seyhan Erden 15
F.O.C:
𝜕𝐿𝑛𝐿𝜃𝑥 =𝑛−2𝑥+=0 𝜕𝜃 𝜃
𝜃𝑛 = 2 𝑥 +
𝜃 = ∑𝑛 𝑥+
𝜃U =1 OPj 𝑥̅
11/30/20
Lectures 21 – GR5411 by Seyhan Erden 16
Properties of MLE:
𝑀𝐿𝐸𝑠 are most attractive because of their large sample or asymptotic properties.
The 𝑀𝐿𝐸 has following properties:
M1. Consistency: 𝑝𝑙𝑖𝑚𝜃U = 𝜃@
M2. Asymptotic normality: 𝜃U ∽ ͣ𝑁 𝜃@, 𝐼 𝜃@ :%
where
𝐼𝜃@ =−𝐸@
𝜕?𝑙𝑛𝐿 𝜕 𝜃 @ 𝜕 𝜃 @s
11/30/20 Lectures 21 – GR5411 by Seyhan Erden
17
Properties of MLE:
M3. Asymptotic efficiency: 𝜃U is asymptotically efficient and achieves the Cramer Rao Lower Bound for consistent estimators, given in M2 and Theorem C2 (Cramer Rao Lower Bound theorem)
M4. Invariance: The maximum likelihood estimator of 𝛾@ = 𝑐(𝜃@) is 𝑐(𝜃U ) if 𝑐(𝜃 ) is continuous and continuously differentiable
function.
@@
11/30/20 Lectures 21 – GR5411 by Seyhan Erden 18
Cramer Rao Lower Bound Theorem:
Minimum variance unbiased estimator.
No matter what estimate you use, it won’t have smaller variance
than 𝐶𝑅𝐿𝐵
No unbiased estimator has a smaller variance than this:
1
𝜕?𝐿𝑛𝐿 𝜃 𝜕𝜃?
The quantity in the denominator is known as the Fisher’s information number, 𝑰 𝜽 .
−𝐸
11/30/20 Lectures 21 – GR5411 by Seyhan Erden 19
Cramer Rao Lower Bound Theorem:
We will also show that the negative of the expected second derivative equals the expected square of the first derivative. Hence Cramer-Rao Lower Bound is that the asymptotic variance of a consistent and asymptotically normal estimator of a parameter 𝜃 will always be at least as large as
:%
:% 𝜕?𝐿𝑛𝐿𝜃 :% 𝜕𝐿𝑛𝐿𝜃 ? 𝑰𝜽 =−𝐸 𝜕𝜃? =𝐸 𝜕𝜃
To achieve this result, we need three regularity conditions derivatives of
log-likelihood function of the distribution at hand (see slide 27 below)
11/30/20 Lectures 21 – GR5411 by Seyhan Erden 20
Example for Cramer-Rao Lower Bound
In sampling from an exponential distribution, the log likelihood function was:
( 𝐿𝑛𝐿𝜃𝑥 =𝑛𝐿𝑛𝜃−𝜃2𝑥+
+,%
+,%
𝜕?𝐿𝑛𝐿𝜃 =−𝑛
𝜕𝜃? 𝜃?
𝜕𝐿𝑛 𝐿 𝜃 𝑥 𝑛 (
𝜕𝜃 = 𝜃 − 2 𝑥+
11/30/20
Lectures 21 – GR5411 by Seyhan Erden
21
From
𝜕?𝐿𝑛𝐿𝜃 =−𝑛 𝜕𝜃? 𝜃?
The variance bound is
:% 𝜕?𝐿𝑛𝐿 𝜃 𝐼𝜃 =−𝐸 𝜕𝜃?
:% 𝜃? =𝑛
In many situations, the 2nd derivative is a random variable with a distribution of its own. The following show such cases:
11/30/20 Lectures 21 – GR5411 by Seyhan Erden 22
Another Example for Cramer-Rao Lower Bound
Variance bound for the Poisson distribution: Probability density function of Poisson distribution is
𝑓 𝑦+|𝜃 = 𝑒:;𝜃<= 𝑦+!
𝐿 𝜃 | 𝑦 = 𝑒 : ( ; 𝜃 ∑ R= < = ∏ (+ 𝑦 + !
𝐿𝑛𝐿𝜃|𝑦 =−𝑛𝜃+𝐿𝑛𝜃2𝑦+−2𝐿𝑛𝑦+! +,% +,%
((
𝜕𝐿𝑛 𝐿 𝜃|𝑦 𝜕𝜃
1 (
= −𝑛 + 𝜃 2 𝑦+
+,%
11/30/20
Lectures 21 - GR5411 by Seyhan Erden
23
Then
𝜕?𝐿𝑛𝐿𝜃 1(
𝜕𝜃? = − 𝜃? 2 𝑦+ +,%
The sum of n identical Poisson variables has a Poisson distribution with parameters equal to n times the parameter of individual variables.
Because
((
𝐸2𝑦+ =2𝐸(𝑦+)=𝑛𝐸𝑦+ =𝑛𝜃 +,% +,%
11/30/20
Lectures 21 - GR5411 by Seyhan Erden 24
Then, the variance bound of the Poisson distribution is
:%
:%
𝜕?𝐿𝑛𝐿 𝜃 𝜕𝜃?
1(
= − −𝜃?2𝐸(𝑦+)
+,%
𝐼𝜃 :%= −𝐸
:% 𝜃?
= 1 𝑛𝜃
= 𝜃𝑛
11/30/20
Lectures 21 - GR5411 by Seyhan Erden
25
Multivariate case Cramer Rao Lower Bound:
If 𝜃 is a vector of parameters, and 𝐼 𝜃 is the information matrix, the Cramer-Rao theorem states that the difference between the covariance matrix of any unbiased estimator and the inverse of the information matrix,
𝐼𝜃 :%= −𝐸 =𝐸
𝜕?𝐿𝑛𝐿 𝜃
:% 𝜕𝜃𝜕𝜃′
𝜕𝐿𝑛𝐿 𝜃 𝜕𝜃
𝜕𝐿𝑛𝐿 𝜃 𝜕𝜃′
:%
will be a non-negative definite matrix. No negative sign in the last term
(why? reason see below: Information Matrix Equality)
11/30/20 Lectures 21 - GR5411 by Seyhan Erden 26
About the efficiency rule and CRLB:
If an estimator attains the Cramer Rao lower bound, then that estimator is efficient.
If a given estimator does not attain the CR variance bound then we do not know whether this estimator is efficient or not.
However, as we said to achieve the result given by Cramer-Rao lower bound we need three regularity conditions: (i) density fn. has to be differentiable twice; (ii) sample space does not depend on parameters, (iii) we can interchange differentiation and integration when required.
11/30/20 Lectures 21 - GR5411 by Seyhan Erden 27
Moments of the Derivatives of the log likelihood:
D1.𝐿𝑛𝑓 𝑦+|𝜃 ,
𝐻+ =𝜕?𝐿𝑛𝑓 𝑦+|𝜃
𝜕𝜃𝜕𝜃′
𝑖 = 1, ... , 𝑛 are all random samples of random variables. The
notation 𝑔+ 𝜃@ and 𝐻+ 𝜃@ indicates the derivative evaluated at 𝜃@
D2.𝐸𝑔+ 𝜃@ =0.
D3.𝑉𝑎𝑟𝑔+ 𝜃@ =−𝐸𝐻+ 𝜃@ .
11/30/20 Lectures 21 - GR5411 by Seyhan Erden 28
𝑔+ =𝜕𝐿𝑛𝑓 𝑦+|𝜃 𝜕𝜃
Score and Information Matrix in multivariate case:
From 𝐿𝑛 𝑓 𝑦+|𝜃 , define for 𝑖 = 1,...,𝑛
ØScore vector (the first derivative vector): P( P ;|<
𝑔𝜃 =
𝜕𝐿𝑛𝐿 𝜃|𝑦 ( 𝜕𝐿𝑛𝑓 𝑦+|𝜃
;
=2𝑔+(𝜃) +
where
𝑔+ =𝜕𝐿𝑛𝑓 𝑦+|𝜃 𝜕𝜃
𝜕𝜃
=2 𝜕𝜃 +,%
11/30/20
Lectures 21 - GR5411 by Seyhan Erden
29
Score and Information Matrix in multivariate case:
Because we are adding terms, it follows, from D1 and D2 that at 𝜃@ 𝐸𝜕𝐿𝑛𝐿𝜃@|𝑦 =𝐸𝑔𝜃@ =𝟎
This is known as the likelihood equation.
𝜕𝜃@
11/30/20 Lectures 21 - GR5411 by Seyhan Erden 30
The Information Matrix equality:
ØHessian of the log-likelihood is 𝜕?𝐿𝑛 𝐿 𝜃|𝑦 (
𝜕?𝐿𝑛 𝑓 𝑦+|𝜃 𝐻𝜃 = 𝜕𝜃𝜕𝜃′ =2 𝜕𝜃𝜕𝜃′
=2𝐻+ 𝜃 +
ØEvaluating at 𝜃@
𝐸𝑔𝜃 𝑔𝜃 ′=𝐸𝑔𝑔s =𝐸22𝑔 𝑔s
+,%
((
@@@@ @+@Å +,% Å,%
because of D1, we can drop terms with unequal subscripts
11/30/20 Lectures 21 - GR5411 by Seyhan Erden 31
The Information Matrix equality:
because of D1, we can drop terms with unequal subscripts, we obtain
((
𝐸𝑔𝜃 𝑔𝜃 ′ =𝐸 2𝑔 𝑔s =𝐸 2−𝐻 =−𝐸𝐻 @@ @+@+ @+ @
+,% +,%
So that,
ØInformation matrix: 𝐸 𝑔 𝜃@ 𝑔 𝜃@ ′ = −𝐸 𝐻@
𝑖. 𝑒.
𝑉𝑎𝑟 𝜕𝐿𝑛𝐿𝜃@𝑦 =𝐸𝜕𝐿𝑛𝐿𝜃@𝑦𝜕𝐿𝑛𝐿𝜃@𝑦 =−𝐸𝜕?𝐿𝑛𝐿𝜃@𝑦 𝜕 𝜃 @ 𝜕 𝜃 @ 𝜕 𝜃 @s 𝜕 𝜃 @ 𝜕 𝜃 @s
11/30/20
Lectures 21 - GR5411 by Seyhan Erden
32
The Information Matrix equality:
Let 𝑦%,𝑦?,...,𝑦( be 𝑖.𝑖.𝑑 with density 𝑓 𝑦+|𝜃
(regularity conditions: density fn. has to be differentiable twice; sample space does not depend on parameters (for example CRLB does not work with uniform distribution because the sample space depends upon the parameters); we can interchange differentiation and integration when required)
ÇÇÇ...Ç𝑓 𝑦%,𝑦?,...,𝑦( 𝜃 𝑑𝑦+ =1
Ç 𝐿 𝜃 𝑦 𝑑𝑦 = 1
11/30/20 Lectures 21 - GR5411 by Seyhan Erden 33
Differentiating both sides
or,
Ç 𝜕𝐿 𝜃 𝑦 𝑑𝑦 = 0 𝜕𝜃
Ç𝜕𝐿𝑛𝐿 𝜃𝑦 𝐿 𝜃𝑦 𝑑𝑦=0 𝜕𝜃
note that:
𝜕𝐿𝑛𝑓 𝜃 = 1 𝑓s 𝜃 ⟹ 𝑓s 𝜃 = 𝜕𝐿𝑛𝑓 𝜃 𝑓(𝜃)
11/30/20
Lectures 21 - GR5411 by Seyhan Erden
34
𝜕𝜃 𝑓(𝜃)
𝜕𝜃
Differentiating again (product rule)
Ç 𝜕?𝐿𝑛𝐿(𝜃|𝑦)𝐿𝜃𝑦 +𝜕𝐿𝑛𝐿𝜃𝑦 𝜕𝐿𝜃𝑦 𝑑𝑦=0 𝜕𝜃𝜕𝜃′ 𝜕𝜃 𝜕𝜃′
The integral of a sum is the sum of the integrals. Hence,
Ç𝜕?𝐿𝑛𝐿 𝜃𝑦 𝐿 𝜃𝑦 𝑑𝑦+Ç𝜕𝐿𝑛𝐿 𝜃𝑦 𝜕𝐿 𝜃𝑦 𝑑𝑦=0
𝜕𝜃𝜕𝜃′
𝜕𝜃 𝜕𝜃′
𝜕?𝐿𝑛𝐿𝜃𝑦
𝐿 𝜃 𝑦 𝑑𝑦 = − Ç 𝜕𝜃𝜕𝜃s 𝐿 𝜃 𝑦 𝑑𝑦
or𝜕𝐿𝑛𝐿𝜃𝑦 𝜕𝐿𝑛𝐿𝜃𝑦
Ç 𝜕𝜃 𝜕𝜃s
11/30/20
Lectures 21 - GR5411 by Seyhan Erden 35
Rewriting last line from the slide before:
Ç𝜕𝐿𝑛𝐿 𝜃𝑦 𝜕𝐿𝑛𝐿 𝜃𝑦 𝐿 𝜃𝑦 𝑑𝑦=−Ç𝜕?𝐿𝑛𝐿 𝜃𝑦 𝐿 𝜃𝑦 𝑑𝑦 𝜕𝜃 𝜕𝜃s 𝜕𝜃𝜕𝜃s
or
𝑉𝑎𝑟 𝜕𝐿𝑛𝐿 𝜃𝑦 =𝐸 𝜕𝐿𝑛𝐿 𝜃𝑦 𝜕𝐿𝑛𝐿 𝜃𝑦
𝜕𝜃
𝜕𝜃
𝜕?𝐿𝑛𝐿 𝜃 𝑦 𝜕𝜃𝜕𝜃s
𝜕?𝐿𝑛𝐿 𝜃 𝑦
𝜕𝜃𝜕𝜃s
𝜕𝜃s
𝑉𝑎𝑟 𝑠𝑐𝑜𝑟𝑒 = 𝐸 𝑠𝑐𝑜𝑟𝑒? = −𝐸
= −𝐸
11/30/20 Lectures 21 - GR5411 by Seyhan Erden
36
Fundamental identity of maximum likelihood theory (Information Matrix Equality):
𝜕?𝐿𝑛𝐿 𝜃 𝑦 𝜕𝜃𝜕𝜃s
This is known as Information Matrix Equality and it states that the variance of the first derivative of 𝐿𝑛𝐿 equals the expected value of the square of it which is equal to the negative of the expected value of the second derivative.
The last term is the information matrix 𝐼 𝜃 , it is the negative expected value of the 2nd derivative of the log likelihood function.
𝑉𝑎𝑟 𝑠𝑐𝑜𝑟𝑒 = 𝐸 𝑠𝑐𝑜𝑟𝑒? = −𝐸
11/30/20 Lectures 21 - GR5411 by Seyhan Erden 37
How do we estimate the variance?
MLE 𝜃U is asymptotically efficient and achieves CR lower bound and the asymptotic variance can be estimated by
:%
NU:% 𝜕?𝐿𝑛𝐿𝜃U :% 𝐼𝜃=−𝐸𝜕𝜃U𝜕𝜃U′ =2𝑔f+𝑔f+s
+
The estimator is computed simply by evaluating actual (not estimated) 2nd derivatives matrix of log-likelihood function at the ML estimators.
Recall that 𝑔+ = P( Ñ <=|;
11/30/20 ; Lectures 21 - GR5411 by Seyhan Erden 38
Hypothesis Testing:
𝐻@: 𝑐𝜃=0
These three tests are asymptotically equivalent under 𝐻@ (however they
behave differently in finite samples)
1. Likelihood ratio test: if the restriction is valid, then imposing it
should not cause a large reduction in 𝐿𝑛𝐿, thus 𝐿𝑛𝐿Ü ≈ 𝐿𝑛𝐿à
2. Wald test: if the restriction is valid, 𝑐 𝜃U should be close to zero ,
since 𝜃U →ä 𝜃 (consistency of MLE) Ü Ü
Reject 𝐻@ if different than zero.
3. Lagrange multiplier test: if restriction is valid, the restricted estimator should be near the point that maximizes log-likelihood, thus the slope of log-likelihood evaluated at 𝜃U should be close to
zero à
11/30/20 Lectures 21 - GR5411 by Seyhan Erden 39
The Likelihood Ratio Test:
𝜃 is the vector of parameters to be estimated
𝐻@ specify some sort of restrictions on these parameters
𝜃U unconstrained ML estimator of 𝜃 (ignoring restrictions) Ü
𝜃U constraint MLE
Uà U
𝐿Ü the likelihood evaluated at 𝜃Ü 𝐿U the likelihood evaluated at 𝜃U
àà
Then the likelihood ratio is
𝐿UÜ
𝜆 = 𝐿U à
11/30/20 Lectures 21 - GR5411 by Seyhan Erden 40
The Likelihood Ratio Test:
Itmustbetruethat 0<𝜆<1, 0