Interval estimation: Part 2
(Module 4)
Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2022
1 Confidence intervals 1
Copyright By PowCoder代写 加微信 powcoder
1.1 Lesscommonscenarios……………………………………… 1
1.2 Generaltechniques……………………………………….. 3
1.3 Properties……………………………………………. 4
1.4 Choiceofconfidencelevel ……………………………………. 5
1.5 Interpretation………………………………………….. 6
1.6 Summary ……………………………………………. 6
2 Prediction intervals 7
3 Sample size determination 8
Aims of this module
• Explain some less common scenarios where confidence intervals are used
• Describe some general aspects of confidence intervals
• Introduce prediction intervals, an interval estimator in the context of predicting a future value • Explain how to calculate the sample size required for a study
1 Confidence intervals 1.1 Less common scenarios
Less common scenarios: overview
• One-sided CIs
• CIs based on discrete statistics
One-sided confidence intervals
We can construct one-sided confidence intervals, e.g. just an upper or lower bound. For example, if we sample from N(μ,σ2) with known σ:
Pr √
+ conf.level = 0.90,
+ alternative = “less”)
90 percent confidence interval:
-Inf 534.146
> t.test(butterfat,
+ conf.level = 0.90,
+ alternative = “greater”)
90 percent confidence interval:
480.854 Inf
Confidence intervals based on discrete statistics*
Our starting point has been probability intervals like:
Pr (a(θ) < T < b(θ)) = 0.95
What if T is discrete? For example, T ∼ Bi(n, θ)
Limitation: a() and b() can only take specific (discrete) values. ⇒ Cannot guarantee an exact probability (confidence level).
where c = 1.699 is the 0.95 quantile from t29.
⇒ Inversion is messy.
Usually aim for something close, with ‘at least’ probability. For example,
Pr (a(θ) T b(θ)) 0.95
• a(θ)isthelargestvalueofxsuchthatPr(xT |θ)0.975 • b(θ) is the smallest value of x such that Pr(T x | θ) 0.975
How do we invert these?
For an observed value tobs (of T ), we have:
• cissuchthatPr(tobs T |θ=c)=0.025
• dissuchthatPr(T tobs |θ=d)=0.025 Then, the ‘at least’ 95% confidence interval is (c, d).
1.2 General techniques
CIs from MLEs
Maximum likelihood estimators have many convenient properties. We will cover some of the theory later in the semester. For now, it is useful to know the following. . .
V (θ) = − ∂2 ln L ∂θ2
This is known as the observed information function. It can be used to estimate the standard deviation of the MLE: ˆ1
se(θ) = V (θˆ)
Moreover, the MLE is asymptotically unbiased and asymptotically normally distributed. Therefore, for large sample sizes, we can construct approximate CIs using:
θ ± V ( θˆ )
where c = Φ−1(1 − α/2).
Example (approximate CI from MLE)
Sampling (iid) from: X ∼ Exp(θ). Previously we found that θˆ = X ̄ and ∂lnL n xi
Differentiate once more,
and so we have,
∂θ =−θ+ θ2 . ∂2 lnL = n − 2xi
ˆ n 2xi−12
se(θ)= −θˆ2+ θˆ3
and an approximate 95% confidence interval is given by θˆ ± 1.96 se(θˆ).
Review of general methods for constructing CIs
• Invert a probability interval based on a known sampling distribution (use a pivot) • Use the asymptotic MLE result
Common approximations:
• Normality (based on the CLT or the asymptotic MLE)
• Substitute parameter estimates into the expression for the standard deviation of the estimator
1.3 Properties
CIs are random intervals
Recall: the CI estimator is a random interval.
A CI consists of two statistics: the lower bound and the upper bound of the interval. They both have sampling distributions.
The random elements are therefore the endpoints, not the parameter: Pr(L < θ < U) = 0.95
Contrast this with a probability statement for a statistic:
Pr(l < T < u) = 0.95
The coverage or coverage probability of a confidence interval (estimator) is the probability it contains the true value of the parameter,
C = Pr(L < θ < U)
Usually this is equal to the confidence level, which is also known as the nominal coverage probability.
However, due to various approximations we use, the actual coverage achieved may vary from the confidence level.
Example: Bernoulli sampling, n = 25, Quadratic approximation CI
0.0 0.2 0.4
0.6 0.8 1.0
More detail about the quadratic approximation will be shown in the tutorials and lab classes.
1.4 Choice of confidence level
Choice of confidence level
This is somewhat arbitrary. If very high:
• More likely to capture the true value.
• Impractically wide: won’t act as a useful guide for showing plausible values based on the data.
• It will place too much emphasis on tails of the sampling distribution, which aren’t actually all that likely.
If very low:
• More ‘useful’ in the sense of being more selective about the possible values of the parameter.
• This comes at the expense of the loss of ‘confidence’, i.e. not as certain about whether the true value is captured inside the interval.
Choice of confidence level: some guidelines
• 95% is a very common convention. If you follow this, it will rarely be questioned. Others may be expecting
this, so always be clear if you deviate from it.
• 90% can also be a reasonable choice.
• 50% is sometimes useful, due to easy interpretation. A good use case: plotting a large number of overlapping intervals, to reduce visual clutter.
• The choice can vary by application, and you may even use different choices for the same problem (e.g. 50% for a particular plot, but 95% when reporting a headline result in text).
• Whatever you choose, remember that the true value is never guaranteed to be inside the interval. There is always a chance it will be outside.
0.85 0.90 0.95 1.00
1.5 Interpretation
Explaining CIs
The probability associated with a CI (i.e. the confidence level) relates to the sampling procedure. In particular, it
refers to hypothetical repeated samples.
Once a specific sample is observed and a CI is calculated, the confidence level cannot be interpreted probabilistically
in the context of the specific data at hand. It is incorrect to say things like:
• This CI has a 95% chance of including the true value
• We can be ‘95% confident’ that this CI includes the true value Don’t do it!
The probability only has a meaning when considering potential replications of the whole sampling and estimation procedure.
We can only say something like:
• If we were to repeat this experiment, then 95% of the time the CI we calculate will cover the true value.
(This is a bit of a mouthful...) In practice:
• If you are reporting results to people who know what they are, you can just state that the “95% confidence interval is. . . ”
• If people want to know what this means, use an intuitive notion like, “it is the set of plausible values of the parameter that are consistent with the data”. (Note: this is not actually true in general, but will be accurate enough for all of the examples we cover this semester.)
• If you need to actually explain what a CI is precisely, you need to explain it in terms of repeated sampling. (No shortcuts!)
Communicating results: general tips
• Describe the extent of your uncertainty
• Emphasise a range a plausible values
• Phrase results in terms of the degree of evidence (e.g. ‘strong/modest/weak evidence of. . . ’)
1.6 Summary
Confidence intervals: summary
• Interval estimates are the most common way to quantify uncertainty.
• Confidence intervals are the most common type of interval estimate.
• Confidence intervals are straightforward to construct if we know or can approximate the sampling distribution of the statistic and can construct a pivot.
• We have looked at some well known (and widely used) examples for means, variances and proportions.
• We can derive CIs, whether exact or approximate, for a variety of scenarios, and have techniques for constructing
them in general.
• 95% CIs are the most common convention.
2 Prediction intervals
Prediction intervals
Suppose we want to estimate the value of a future observation, rather than a parameter of the distribution. We usually call this prediction rather than ‘estimation’.
We have available data that arose from the same probability distribution. Can we use this to come up with an interval estimate?
Yes. Easiest to see with an example. . .
Example (prediction interval)
Random sample (iid): X1,...,Xn on X ∼ N(μ,1)
Let X∗ be a future observation on X, independent of those currently observed. By independence, we have:
Therefore we can write,
X ̄ ∼ N(μ, n1 ) X∗ ∼N(μ,1)
X ̄−X∗ ∼N(0,1+n1) 11
Pr −1.96 1+n