Economics 430
Likelihood Inference
1
Today’s Class
• Introduction
• The Likelihood Function
• Maximum Likelihood Estimation
• Inferences Based on the MLE – ConfidenceIntervals
– TestsofHypotheses
• Distribution-Free Methods – MethodofMoments
– Bootstrapping
2
Introduction
• Apointestimateisareasonablevalueofa population parameter.
• X1, X2,…, Xn are random variables.
• Functions of these random variables (e.g., s2), are
also random variables called statistics.
• Statistics have their unique distributions which
are called sampling distributions.
• Apointestimateofsomepbopulationparameter𝜃 is a singlbe numerical value ✓ of a statistic. The statistic ⇥ is called the point estimator.
3
Introduction
Wish List for a Good Estimate
• Unbiased
• Efficient
• Consistent
• Small Variance
4
The Likelihood Function
Introduction
• Example: Estimation by guessing: Suppose an
urn contains 1 million marbles, a fraction of
which are blue.
– Denote the unknown fraction of blue marbles by 𝜋.
– We draw 3 marbles at random from the urn and obtain: green(G), blue(B), green(G)
• Q:Whatistheprobabilityofthis(GBG)sequence?
5
The Likelihood Function
Introduction
• A: We can guess how likely the sequence G B G is for different values of 𝜋. Let L denote the probability of observing the sequence G B G, then
• If𝜋=0.2àL=0.8×0.2× 0.8=0.128
• If𝜋=0.3àL=0.7×0.3× 0.7=0.147
• If𝜋=0.4àL=0.6×0.4× 0.6=0.144
• If𝜋=0.5àL=0.5×0.5× 0.5=0.125
• If𝜋=0.6àL=0.4×0.6× 0.4=0.096
• If𝜋=0.7àL=0.3×0.7× 0.3=0.063
Which one would you choose for 𝜋?
• If𝜋=1/3àL=2/3×1/3× 2/3=0.148
Note: Likelihood inference about 𝜋 is based on ordering
6
The Likelihood Function
Introduction
• Example:AnalyticalEstimate:Couldwehavederived 𝜋 = 1/3 analytically instead of trial and error?
• Solution: L(𝜋) = (1-𝜋)𝜋(1- 𝜋) = 𝜋(1- 𝜋)2, using calculus, the max can be obtained by taking
d/d𝜋 (L) = 0
d/d𝜋(L) =(3𝜋-1)(𝜋-1)=0
à𝜋 =1, 1/3
This example illustrates that MLE is simply the best guess for 𝜋, and the procedure for obtaining it, the foundation of Maximum Likelihood Estimation.
7
The Likelihood Function
• Def: The likelihood function L(•|x) is determined by the model and the data (x), where L(𝜃|x) represents the probability of
observing the data (x) given that 𝜃 is true. L(𝜃|x) is referred to as the likelihood of 𝜃.
8
The Likelihood Function
• Example (1 obs): Lets revisit the original insurance claims
problemàSuppose that you work for an insurance company and historically they found that the number of claims they receive can be described by a Poisson distribution but the parameter value (i.e., 𝜃) is unknown.
– Recall that:
• They assign you the task of figuring out the probability that 6 claims
will be made tomorrow given that today 4 were received.
• Q: What can we learn from the respective likelihood function?
9
The Likelihood Function
• Wecanstartbyfirstplugininthevalue𝑥=4into P(X=4|𝜃) = (1/24) 𝜃4 e-𝜃 à L(𝜃) = (1/24) 𝜃4 e-𝜃
• Therearemanyvaluesof𝜃thatwecanchoose,so instead, we can plot L(𝜃):
According to the plot, the value 𝜃 = 4 is most likely to produce x =4
2 4 6 8 10 12 14 theta
10
L
0.00 0.05 0.10 0.15 0.20
The Likelihood Function
• Example (2 observations): Same problem as before, but this timeweobserveX=4andX=2,i.e.,𝑥⃗= 4,2 .
• We need to compute P(X=4|𝜃) and P(X=2|𝜃): àP(X =4|✓)= ✓4e ✓ and P(X =2|✓)= ✓2e ✓
à P(X = (4,2)|✓) = P(X = 4|✓)P(X = 2|✓) = ✓✓4e ✓ ◆✓✓2e ✓ ◆
4! 2!
= ✓6e 2✓
!
4! 2!
Can you guess the value of 𝜃?
48
L(✓|(4, 2)) = ✓6e 2✓ 48
11
The Likelihood Function
According to our result, the value 𝜃 = 3 maximizes L(𝜃).
Q1: How is 𝜃=3 related to our data (4,2)?
Q2: How would you generalize the previous result if you had X =(x1,…, xn) observations?
𝜃=3
! L(✓|(x1,…,xn)) = Yn ✓xie ✓ i=1 xi!
2 4 6 8 10 12 14 theta
12
L
0.00 0.01 0.02 0.03
Maximum Likelihood Estimation (MLE)
Introduction *
• MLEconsistsoffindingtheparametervalue(s)𝜃 that maximize the likelihood function L(𝜃), given the data.
• Note: Since MLE requires taking derivatives of products, for convenience we often take the log of L(𝜃) and differentiate a sum instead.
• Def: Log-Likelihood = 𝑙(𝜃) = ln(𝐿(𝜃))
13
Maximum Likelihood Estimation (MLE)
1. Find L(𝜃)
Steps for MLE
2. Compute 𝑙(𝜃) = ln(𝜃)
3. Compute 𝑆(𝜃|𝑥) = 𝑑/𝑑𝜃 (𝑙(𝜃))
– Note: S(𝜃|x) is known as the Score Function
4. Solve the score equation 𝑆(𝜃|𝑥) = 0 for 𝜃 – The solution gives us the maximum likelihood
estimator of 𝜃, we call this 𝜃6
– As a sanity check, we should verify that
𝑑2/𝑑𝜃2 (𝑙 𝜃 )<0
14
Maximum Likelihood Estimation (MLE)
• Example:AssumeX~Poisson(𝜃),findtheMLEof𝜃.
Step 1: Find 𝐿(𝜃)àStart with the PDFP(X = x|✓) = ✓xe ✓
! L(✓) = =
=
✓ ✓x1 e ✓ ◆ ✓ ✓xn e ✓ ◆ x! x1! ··· xn!
✓x1 +···+xn e n✓
✓
Px1!···xn! ni = 1 x i e n ✓
x1!···xn!
15
Maximum Likelihood Estimation (MLE)
• Example:AssumeX~Poisson(𝜃),findtheMLEof𝜃. Step 2: Compute 𝑙(𝜃) = ln(𝐿(𝜃))à
ln(L(✓)) = ln ✓ ✓Pni=1 xi e n✓ ◆ x1!···xn!
⇣ PX ⌘
=ln ✓ ni=1xi +ln e n✓ ln(x1!···xn!)
n
= ln(✓) xi n✓ ln(x1!···xn!)
i=1
16
Maximum Likelihood Estimation (MLE)
• Example:AssumeX~Poisson(𝜃),findtheMLEof𝜃. Step 3: Compute S(𝜃|x)à
S(✓|x) = d ln(✓) d✓
d Xn
= d✓ ln(✓) 1 Xn
= ✓ xi n i=1
!
i=1
xi n✓ ln(x1!···xn!) 17
Maximum Likelihood Estimation (MLE)
• Example:AssumeX~Poisson(𝜃),findtheMLEof𝜃. Step 4: Solve S(𝜃|x) = 0à
1 Xn
! ✓ xi n = 0
i 1 Xn !✓=1 xi
n i=1
! ✓b = x 2
Note: Sanity checkà d d✓2
l(✓) = n < 0 x
18
Maximum Likelihood Estimation (MLE)
• Example:AssumeX~Exp(𝜆),findtheMLEof𝜆. n
n dl = l -åx
n -låx
L(l)=Õle-lxi =lne i=1 i=1
i
n
ln L(l)= nln(l)-låx
dlnL(l) n
i
i i=1
à b = P n = 1 ni=1xi X
19
i=1
Maximum Likelihood Estimation (MLE)
• Example:AssumeX~N(𝜇,𝜎2),findtheMLEof𝜇and𝜎.
2 n 1 -(xi-μ)2 (2s2) L(μ,s )=Õs 2pe
lnL(μ,s2)=
ln(2ps2)- n (x -μ)
i=1
= 1
-1n 2 e2s2 å(xi -μ)
i=1 n2
(2ps 2 )
-n 1å2
i
2 2s2 i=1
¶lnL(μ,s2 ) å
= 1 n (xi -μ)=0
¶μ s2 i=1
¶lnL(μ,s2) -n 1 å 2 ()= + n(xi-μ)=0
¶s2 2s2 2s4 i=1
2 (x - X )
ån
2
i
μ=X and s = n 20
! !i=1
Maximum Likelihood Estimation (MLE)
• Example:AssumeX~Weibull(𝛿,𝛽),findtheMLEof𝛿 and 𝛽.
b-1 æx öb n n bæxö -çi÷ L(d,b)=Õf(t)=Õçi÷ eèdø
én n ù-1 êåxblnx ålnxiú
i=1 i=1 dèdø n
i=1
b=ê i i - ú
lnL(d,b)=nlnb-bnlnd+å(b-1)lnx -
å
åç÷ ê ú
¶lnL(d,b) bå
êå b iú
¶d =-bn+db
x =0 ix
i=1 ênb nú
n æxöb
i x
i
û
i=1 n b
i
i=1èdø ë i=1
én ù1/b
i=1 nn
d = ê i=1 ú ênú êú
n å æx öb æx ö
¶ lnL(d,b)
¶b b i=1 i=1èdøèdø
ëû
=-nlnd+lnx- i lni=0
åç ÷ ç ÷
i
21
Maximum Likelihood Estimation (MLE)
• Example(whenitdoesn’twork):AssumeX~Gamma (r,𝜆), find the MLE of r and 𝜆.
æ n lrxr-1e-lxi ö lnL(r,l)=lnçÕ i ÷
ç G(r)÷ è i=1 ø
¶lnL(r,l) n G'(r) ¶r =nln(l)+åln(x )-n G(r)
i i=1
nn
=nrln(l)+(r-1)åln(x )-nlnéG(r)ù-l
å i=1
x i
i
i=1 ë
û
¶lnL(r,l) nr
= -åx
n
i
¶ l l i = 1 r b b
Xn àThere is no closed form solution!i=1
0 ( r b )
b = X ( 1 ) a n d n l n ( ) +
l n ( x ) = n ( rb ) i
(2)
22
Maximum Likelihood Estimation (MLE)
• TextbookExample: R Code:
L<-function(theta){ return((exp(-(theta-1)^2)/2)+3*exp(-((theta-2)^2)/2)) }
#Generate 1000 equispaced points
theta<-seq(-10, 10, length=1000)
#MLE
theta[which(L(theta)==max(L(theta)))]
#Plot:
MLE = 1.87
plot(theta, log(L(theta)), type='l', main="Log Likelihood Function")
23
Inferences Based on the MLE
Standard Errors, Bias, and Consistency
• Q:NowthatwehaveourMLEof𝜃,whatdowedo with it?
• A:Typicallyweuseittofindfunctionsof𝜃,suchas the mean, median, variance, quartiles, etc. We denote them in general as 𝜓(𝜃).
• Thesefunctionsrepresentscharacteristicsofthe underlying population that we wish to estimate,
therefore, we denote such estimates as: (x) =
(✓(x)) b 24
b
Inferences Based on the MLE
Standard Errors, Bias, and Consistency
for every value of 𝜃.
• Theorem:Let✓bbeanMLEof𝜃,andletbg(𝜃)bea
• Q: How reliable is (x) = (✓(x)) ?
• A: We can look at the sampling distribution of (x)
bbb
function of 𝜃. Then an MLE of g(𝜃) is g(✓).
• Def:TheMeanSquaredError(MSE)oftheestimator
T(x) of 𝜓(𝜃) is given by MSE𝜃(T) = E𝜃[(T - 𝜓(𝜃))2].
• Note:InpracticeweinsteadevaluateMSE✓(x)(T)
25
b
Inferences Based on the MLE
Standard Errors, Bias, and Consistency
• Def: The bias in the estimator T of 𝜓(𝜃) is given by Eθ(T) − 𝜓(𝜃) whenever Eθ(T) exists. When the bias in an estimator T is 0 for every 𝜃, we call T an unbiased estimator of 𝜓, i.e., T is unbiased whenever Eθ(T) = 𝜓(𝜃).
• Note: MSEb (T ) = Varb (T ) (for unbiased estimators) and
✓(x) ✓(x)
the respective standard error is given by SDb =
q
• Def: A sequence of estimates T1, T2, ... is said to be consistent (in probability) for 𝜓(𝜃) if P✓ as n ! 1 for every
✓ 2 ⌦.
– Asweincreasetheamountofdatawecollect,thesequenceof estimates should converge to the true value of 𝜓(𝜃)
Tn ! (✓)
Varb (T ) ✓(x) ✓(x)
26
Inferences Based on the MLE
• Let ⇥b be a point estimator of a parameter 𝜃:
• Consistency: the more data we get, the sequence of estimates
should converge to the true value of 𝜓(𝜃)
• Biasoftheestimator:E[⇥b] ✓
• Efficiency: An estimator is efficient if it has the lowest possible variance among all unbiased estimators (MSE is the lowest)
• Variance of the estimator: V (⇥b) = E h⇥b E[⇥b]i2
• MSE of the estimator: MSE(⇥b) = E h⇥b ✓i2
= V ( ⇥b ) + h E [ ⇥b ] ✓ i 2
27
Inferences Based on the MLE
• The MSE is an important criterion for comparing two estimators.
M S E ( ⇥b 1 ) Def: Relative Efficiency= MSE(⇥b2)
• If the relative efficiency is less than 1, we conclude that the 1st estimator is superior than the 2nd estimator.
28
Inferences Based on the MLE
• Abiasedestimatorcanbepreferredthanan unbiased estimator if it has a smaller MSE.
• Biasedestimatorsareoccasionallyusedinlinear regression.
• AnestimatorwhoseMSEissmallerthanthatofany other estimator is called an optimal estimator.
A biased estimator ⇥b 1 that has a smaller variance than the unbiased estimator⇥b2 .
29
Inferences Based on the MLE
• Abiasedestimatorcanbepreferredthanan unbiased estimator if it has a smaller MSE.
• Biasedestimatorsareoccasionallyusedinlinear regression.
• AnestimatorwhoseMSEissmallerthanthatofany other estimator is called an optimal estimator.
A biased estimator ⇥b 1 that has a smaller variance than the unbiased estimator⇥b2 .
30
Inferences Based on the MLE
• Example:SampleMeanisUnbiasedestimatorofμ.
X is a random variable with mean μ and variance σ2. Let 𝑋1, 𝑋2, ... , 𝑋𝑛 be a random sample of size 𝑛.
31
Inferences Based on the MLE
• Example: Sample Variance (S2) is Unbiased unbiased estimator of σ2
32
Distribution Free Methods
Advantages of MLE
• Often yields good estimates, especially for large sample size.
• Invariance property of MLEs
• Asymptotic distribution of MLE is Normal.
• Most widely used estimation technique.
• Usually they are consistent estimators. Disadvantages of MLE
• Requires that the PDF or PMF is known except the value of parameters.
• MLE may not exist or may not be unique.
• MLE may not be obtained explicitly (numerical or search methods may be required.). It is sensitive to the choice of starting values when using numerical estimation.
• MLEs can be heavily biased for small samples.
33
Inferences Based on the MLE
Accuracy and Precision
a. Both accurate and precise. b. Precise but not accurate.
c. Accuratebutnotprecise.
d. Neither accurate nor precise.
34
Inferences Based on the MLE
Accuracy and Precision, Bias and Standard Error
• Biasisameasureoftheaccuracy.
– If only basketball players are measured to estimate the proportion of Americans who are taller than 6 feet, then there is a bias for a larger proportion.
• StandardErrorisameasureofprecision.
– If the sample size is only three, the estimate of the proportion of tall people using the sample is likely to be far from the proportion of tall people in the US. The standard error will be large.
35
Inferences Based on the MLE
Confidence Intervals
• Q:Nowthatwehaveour𝜓(𝜃),canwefindalower and an upper bound for it with some degree of confidence?
– Note: Tradeoff between the length of the interval and our degree of confidence that the interval will contain the true value 𝜓(𝜃).
– The length of the interval is a measure of how accurately the data allows us to know the true value of 𝜓(𝜃).
36
Inferences Based on the MLE
Confidence Intervals
• Increasing the level of confidence increases the margin of error.
• Decreasing the level of confidence decreases the margin of error.
37
Inferences Based on the MLE
Confidence Interval Interpretation
(e.g., for the population proportion and 95% level of confidence)
• Foreveryrandomsamplethatcanbetakenfroma population there corresponds a 95% confidence interval. 95% of these confidence intervals will successfully contain the population proportion and 5% will not.
• Q:Givena95%CIforapopulationproportionfroma particular sample, what is the probability that it
contains the true population proportion?
38
Inferences Based on the MLE
0 10 20 30 40 50 50 Random Samples
39
Confidence Intervals
2.6 2.8 3.0 3.2 3.4
Inferences Based on the MLE
Confidence Intervals
• Def: An interval C(x) = (l(x), u(x)) is a 𝛾-confidence interval for 𝜓(𝜃) if P𝜃(𝜓(𝜃) ∊ C(x)) = P𝜃(l(x)≤ 𝜓(𝜃)≤ u(x)) ≥ 𝛾 for every 𝜃 ∊ Ω, where 𝛾 is referred to as the level of confidence.
• Popularvaluesfor𝛾are0.99,0.95,and0.90 bb
• Let𝛼=1-𝛾àCI=✓±(t⌫,↵/2,z↵/2)se(✓) Margin of Error = E
40
Inferences Based on the MLE
Confidence Intervals
• Example: X ~ N(𝜇,𝜎2), and 𝛷 = CDF
P(X £a)=Pæ X -μ £ a-μö=PæZ £ a-μö=Fæa-μö
çσσ÷çσ÷çσ÷ èøèøèø
P(X >a)=Pæ X -μ > a-μö=PæZ > a-μö=1-Fæa-μö çσσ÷çσ÷çσ÷ èøèøèø
P(a£X £b)=Pæa-μ£Z£b-μö=Fæb-μö-Fæa-μö
çσ σ÷ çσ÷ çσ÷ è øèøèø
41
Inferences Based on the MLE
42
Inferences Based on the MLE
43
Inferences Based on the MLE
Tests of Hypotheses
• A statistical hypothesis is a statement about the parameters of one or more populations.
• Hypothesis-testing procedures rely on using the information in a random sample from the population of interest.
• If this information is consistent with the hypothesis, then we will conclude that the hypothesis is true; if this information is inconsistent with the hypothesis, we will conclude that the hypothesis is false.
44
Inferences Based on the MLE
Tests of Hypotheses
• Thehypothesisthatthecharacteristicofinterest 𝜓(𝜃)= 𝜓0, is often referred to as the ‘Null Hypothesis’, denoted as H0. This hypothesis is associated with a treatment having no effect.
• WhentheobservationsdonotsupportH0,wethen favor the ‘Alternative Hypothesis’, denoted as H1.
45
Inferences Based on the MLE
Decisions in Hypothesis Testing
Fail to reject the Null
Reject the Null
H0 True
Correct
Type I Error
Observe a difference when none exists
H0 False
Type II Error
Fail to observe a difference when one exists
Correct
P(Type II Error) = 𝛽
P(Type I Error) = 𝛼
Note: The type I error probability is called the significance level, or the a-error, or the size of the test. 46
Inferences Based on the MLE
Decisions in Hypothesis Testing
• The power of a statistical test is the probability of rejecting the null hypothesis H0 when the alternative hypothesis is true.
• The power is computed as 1 – β, and power can be interpreted as the probability of correctly rejecting a false null hypothesis.
• The P-value is the smallest level of significance that would lead to rejection of the null hypothesis H0 with the given data.
• P-value is the observed significance level.
47
Inferences Based on the MLE Application: Two-sided hypothesis test on the mean with known variance
• Two-sided hypothesis test on the mean: Test H0: μ = μ0 against H1: μ ≠ μ0
The test-statistic (known variance) is: X μ0 Z0 = /pn
Decision Rule: Reject H0 if the observed value of the test statistic z0 is either: z0 > za/2 or z0 < -za/2
àsame as P-value < a
Fail to reject H0 if the observed value of the test statistic
z0 is -za/2 < z0 < za/2 àsame as P-value ≥ a
48
Inferences Based on the MLE Application: Two-sided hypothesis test on the mean with known variance
• FindingtheProbabilityofTypeIIError𝛽
– Suppose the null hypothesis is false and the true
valueofthemeanis: μ=μ0 +d,whered>0.
0
pn
0
is: Z
=
p = / n
p
+
X μ X (μ + ) 00
• The test statistic Z
• Hence, the distribution of Z0 when H1 is true is:
Z0 ⇠N✓ pn,1◆
/ n
a n d f i n a l l y, = ✓ z ↵ / 2 p n ◆ ✓ z ↵ / 2 p n ◆
49
Inferences Based on the MLE Application: Two-sided hypothesis test on the mean with known variance
• Example:(a)Testthenullthatthemeanhourlywage for recent MBA graduates is $50. Assume a=0.05,
𝜎=2. From a sample of 25 graduates we obtainX = 51.3
Six Step Solution
1. Parameter of interest: μ, mean hourly wage for recent MBA graduates
2. Null hypothesis: H0 : μ = 50
3. Alternative Hypothesis: H1 : μ ≠ 50
X μ 51.3 50
4. Teststatistic:Z0 = p 0 = p
=3.25
/ n 2/ 25
50
Inferences Based on the MLE Application: Two-sided hypothesis test on the mean with known variance
Seven Step Solution
5. Find the critical value za/2: za/2 = 1.96
6. Conclusion:
(Method 1) Compare z0 against za/2:
z0 = 3.25 > za/2 = 1.96àReject H0
(Method 2) Compare P-value against 𝛼:
P-value = 2[1 – F(3.25)] = 0.0012 < 𝛼 =0.05àReject H0
Interpretation: The mean hourly wage for recent MBA graduates differs from $50, based on a sample of 25 individuals.
51
Inferences Based on the MLE Application: Two-sided hypothesis test on the mean with known variance
•
Example: (b) Find the probability of a Type II error (b) for the two-sided test with a = 0.05, assuming that μ = $49. Note: Here d = 1 and za/2 = 1.96.
= ✓z↵/2 pn ◆ ✓ z↵/2 pn ◆
! = 1.96 p25! 1.96 p25! 22
= ( 0.54) ( 4.46) = 0.295 ⇡ 0.30
– The probability is about 0.3 that the test will fail to reject the null hypothesis when the true mean hourly wage for recent MBA graduates is $49.
• Interpretation: A sample size of n = 25 results in reasonable, but not great power = 1 - β = 1 - 0.3 = 0.70.
52
Inferences Based on the MLE Application: Two-sided hypothesis test on the mean with known variance
• FindingtheSampleSizeforaTest – Sample Size for a Two-Sided Test
n⇡ (z↵/2 +z )2 2 2
– Sample Size for a One-Sided Test
(z↵ +z )2 2 2
n=
where = μ μ0
53
Inferences Based on the MLE Application: Two-sided hypothesis test on the mean with known variance
• Example: (c) Suppose that an analyst wishes to design the test so that if the true mean hourly wage for recent MBA graduates differs from $50 by as much as $1, the test will detect this (i.e., Reject H0: μ = 50) with a high probability, say, 0.90. How large should the sample size be?
Note: s=2,d=51-50=1,a=0.05,andb=0.10.
• Since za/2 = z0.025 = 1.96 and zb = z0.10 = 1.28, the sample size
required to detect this departure from H0: μ = 50 is found by (z↵/2 + z )2 2 (1.96 + 1.28)222
n⇡ 2 = 12 ⇡42 • The approximation is good here, since
( z↵/2 pn/ ) = ( 5.20) ⇡ 0, which is small relative to b.
• Interpretation: To achieve a much higher power of 0.90 we need a considerably large sample size, n = 42 instead of n = 25.
54
Distribution Free Methods
Method of Moments
• Let X1, X2,...,Xn be a random sample from the probability distribution f(x) (PMF or PDF) with m unknown parameters 𝜃1, 𝜃2, ..., 𝜃m.
– The kth population moment is 𝜇k = E[Xk], k = 1, 2, ... – Thekth samplemomentis𝑀 = G ∑H 𝑋F
F H JKG J
• The moment estimators ⇥b1,⇥b2,...,⇥bm are found by equating the first m population moments to the first m sample moments, and then solving the resulting simultaneous equations for the unknown parameters.
55
Distribution Free Methods
Method of Moments
• Example: Suppose that X1, X2, ..., Xn is a random sample from a Gamma distribution with parameter r and λ where E(X) = r/ λ and E(X2) = r(r+1)/ λ2.
r = E ( X ) = X is the mean l
r2
= E ( X 2 ) - E ( X ) is the variance or
r (r +1) = E (X 2 ) and now solving for r and l :
l2 l2
!
!
X2 n
r=
(1/n)åX2 -X2
i i =1
l=å
1/n X2 -X2
X
()n i =1
i
56
Distribution Free Methods
Method of Moments
• Example:Fortheexponentialdistribution: f(x) = e x, x 0
Since E[X] = 1/l, 1ˆ1n
àX=ˆ,l==n
lXåX i=1
i
57
Distribution Free Methods
Method of Moments
• Example:FortheParetodistribution:
f(x)= ,x>1 x +1
where
μ= 1
SinceM1 =X,thenμ1 =M1à b= X X 1
58
Distribution Free Methods
Method of Moments
• Example:FortheParetodistribution:
f(x)= ,x>1 x +1
where
μ= 1
SinceM1 =X,thenμ1 =M1à b= X X 1
Histogram of betahat
Sample mean beta = 3.05 True value: 𝛽 = 3.0
1.5 2.0 2.5 3.0 3.5 4.0
59
betahat
Frequency
0 50 100 150 200
•
Distribution Free Methods
Method of Moments
Example (when it doesn’t work): Consider the uniform distribution, where X ~ U(a, b).
𝜇1 =𝐸[𝑋]=(𝑎+𝑏)/2,andb
𝜇2 = 𝐸[𝑋2] = (𝑎2 +𝑎𝑏+𝑏2)/3, We can solve for (𝑎, 𝑏) à
Assume we obtain the sample {0,0,0,1} from a U(0, 1). Our
estimates for e.g., b, based on the sample moments would be:
b = 0.94. This suggests it would be impossible to have obtained this
sample from U(0, 1).
•
60
Distribution Free Methods
Advantages of the Method of Moments
• They are often simple to derive
• They are consistent estimators when θ1, . . . , θk are continuous
• It is consistent
• They provide starting values in search for maximum likelihood estimates
Disadvantages of the Method of Moments
• They may not be unique in a given set of data. (Multiple solutions to set of equations) Also, they need not exist
• When they exist they are not necessarily constrained to fall in the parameter space. (Variance component estimation is an ex- ample of this situation.)
• They may be inefficient. Sometimes this is because they violate the Sufficiency Principle.
functions of μ1, . . . , μk.
61
Distribution Free Methods
Bootstrapping
• There are many instances when we do not know the properties of the estimator of interest.
• Bootstrap is founded on the idea of performing a simulation on our data to get an estimate of the sampling distribution of the estimator.
• We repeatedly resample the data with replacement and calculate the estimate each time. The distribution of these bootstrap estimates approximates the sampling distribution of the estimate.
62
Distribution Free Methods
Bootstrapping
• The bootstrap mean, standard error, and confidence interval: Assume X1, X2,…, Xn is a sample from an unknow distribution with CDF F𝜃, such that E✓ (Fb(x)) = F✓ (x) and we are interested in estimating (✓) = T (F✓ ) .
1 Xm
•Mean:b=mi=1bi vuq 0 !1
q u1Xm 1Xm2 se()= VarF()=tm 1 i m
• Confidence Interval: b
• Standard Error: b d b b @ b2 bi A i=1 I=1
± t(1+ )/2,n 1 VarFb( )
d b
63
Distribution Free Methods
Bootstrapping
• Example: Find the distribution of R2 from the regression: mpg =𝛽1 disp + 𝛽2 weight + e
– Sample 1000 (with replacement)from the original dataset
– Run the regression, and store R2 for each one
– Plot the histogram of R2’s, compute the mean and
respective confidence interval across all 1000 estimates of R2
64
Distribution Free Methods
Coefficient of Determination:mpg ~ wt+disp
Mean from 1000 bootstrapped R2’s = 0.788
95% Confidence Intervals = [0.645, 0.859]
Note: We can easily perform the same estimates for the regression coefficients.
Note: Use the R library ‘boot’
0.5 0.6
0.7 0.8 0.9
65
R− Squared
Density
02468