Chapter 4
Some Elementary Statistical Inference
4.1 Sampling and Statistics
1/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Outline
1 Two basic sampling methods. Slide 5 – 8.
2 Two general statistical models. Slide 10 – 12.
3 Definition and properties of statistic. Slide 14 – 16.
2/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
3/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Basic Sampling Methods
4/53
Two Basic Sampling Methods
Target: get a sample from the distribution that we are interested in.
Sampling with replacement.
Sampling without replacement.
5/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Sampling with replacement
An urn contains m balls, labeled from 1 to m.
At the first trial, we take one ball out of the urn at random and let X1 denote the number. Because we can not know the number in advance, X1 is a random variable.
What is the distribution of X1?
Then, we put the first ball back in the urn, mix the balls, take
another ball out and let X2 denote the number.
What is the distribution of X2?
We can repeat this procedure n times and get n random variables X1, X2, · · · , Xn.
X1, · · · , Xn are independent.
Each of Xi has the same distribution.
6/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Definition (4.1.1) of random sample
If the random variables X1, X2, . . . , Xn are independent and identically distributed (iid), then these random variables constitute a random sample of size n from the common distribution.
Example:
Suppose X1, . . . , Xn constitute a random sample from N(θ, 1). What is the joint pdf of X1,…,Xn?
7/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Sampling without replacement
At the first trial, we take one ball out of the urn and let X1 denote the number.
Then, we don’t put the first ball back and take the second ball out of the urn. The number is denoted by X2.
What is the distribution of X2?
Are X1 and X2 independent?
We can repeat this sampling method n times and get n random variables X1, · · · , Xn.
Typically, X1, · · · , Xn are NOT independent.
8/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Two General Statistical Models
9/53
In general, statistics problems assume some structure (called models) on the data.
Every model assumes that data are random variables following some distribution / density.
Such distribution or density is unknown, and needs to be estimated from the data.
Two models are studied in this class: non-parametric and parametric models.
10/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Non-parametric and parametric models
Non-parametric model:
The pdf f(x) or pmf p(x) of a random variable we are interested in is completely unknown.
Parametric model:
The form of f(x) or p(x) is known up to a parameter θ (or a
vector θ).
Usually the pdf (or pmf) is denoted by f (x; θ) (or p(x; θ)).
With θ ∈ Ω for a specified set Ω, Ω is called the parameter space.
We will focus on the parametric model in STAT 4101.
11/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Examples of parameter space
NormaldistributionN(μ,1),μ≡θ,μ∈R≡Ω.
Exponential distribution exp(β), β ≡ θ, β ∈ R+ ≡ Ω.
Binomial distribution Binom(n, p), p ≡ θ, p ∈ (0, 1) ≡ Ω.
Normal distribution N(μ, σ2) has a θ = (μ, σ2), Ω = R × R+, where R+ is the positive half of the real line.
12/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Statistic
13/53
Definition (4.1.2) of statistic
Suppose now that we want to use a random sample to gain information about an unknown parameter θ.
Definition: Let X1, . . . , Xn denote a sample on a random variable X. Let T = T(X1,X2,…,Xn) be a function of the sample. Then T is called a statistic.
Example:
1 Sample mean: T(X1,X2,…,Xn) = X ̄ = X1+…+Xn .
2 Sample variance: T(X1,X2,…,Xn) = S
Estimator or estimate:
Once the sample x1,…,xn is drawn, then t = T(x1,…,xn) is called the realization of T . The random variable T is called an estimator of θ and t an estimate of θ.
n
2 ni=1 (Xi −X ̄ )2
= n−1 .
14/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Properties of statistic
Given a parameter θ of interest, every statistic can be called an estimator of θ.
We need to choose “good” estimators according to some criteria.
For example, unbiased estimator, consistent estimator, estimator that has minimum MSE, etc.
15/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Definition (4.1.3) of unbiasedness
Let X1, . . . , Xn denote a random sample on a random variable X with pdf f(x;θ), θ ∈ Ω. Let T = T(X1,X2,…,Xn) be a statistic. We say that T is an unbiased estimator of θ if E(T ) = θ.
Example
Let X1, . . . , Xn denote a random sample on a random variable X with mean μ and variance σ2.
1 2 3
X ̄ is an unbiased estimator of μ and find its variance.
1 X1 + 2 X2 is an unbiased estimator of μ and find its variance.
S2 = 1 n (Xi − X ̄)2 is an unbiased estimator of σ2. n−1 i=1
33
16/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Chapter 4
Some Elementary Statistical Inference
4.2 Confidence Intervals
17/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Outline
1 Motivation and definition of confidence intervals.
2 Examples of CI in different situations.
A detailed content table is seen on Slide 52
18/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Motivation
Suppose the parameter θ is estimated by a statistic θˆ = θˆ ( X 1 , . . . , X n ) .
When the sample is drawn, it is unlikely the value of θˆ is exactly the true value of θ.
We need an estimate of the error of the estimation θˆ, i.e., by how much did θˆ miss θ?
Confidence interval…
19/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Definition (4.2.1) of confidence interval
Let X1, . . . , Xn be a sample from a distribution that involves a parameter θ whose value is unknown. Let L = L(X1, . . . , Xn) and U = U(X1, . . . , Xn) be two statistics. We say that the interval
(L, U ) is a (1 − α)100% confidence interval for θ if
1−α=Pθ[θ∈(L,U)] ∀θ
The probability that the interval includes θ is 1 − α, which is called confidence coefficient (or confidence level) of the confidence interval.
20/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
CI for μ under normality (σ known)
Suppose that X1, . . . , Xn is a random sample from N(μ, σ2) distribution. Assume that σ2 is known and μ is unknown. A (1 − α)100% confidence interval for μ is given by
̄σ ̄σ X−zα/2√n, X+zα/2√n .
Remark: Pivot random variable:
̄ 2 X ̄−μ
X ∼ N(μ,σ /n) ⇒ σ/√n ∼ N(0,1). P(Z > zα/2) = α/2, Z ∼ N(0,1).
Critical point: Confidence level:
̄σ ̄σ
P X−zα/2√n <μ
2 P(−2≤Z≤2). 3 P(|Z| > 1.645).
25/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
CI for μ under normality (σ known) (cont’d)
When α = 0.05, zα/2 = 1.96.
The 95% CI
̄σ ̄σ X − 1.96√n, X + 1.96√n
is a random interval.
Once the sample is drawn, denote the observed value of the
statistic X ̄ by x ̄, then the value of the CI is
σσ x ̄−1.96√n, x ̄+1.96√n .
26/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Interpretation:
if we drawn B samples independently from the underlying
distribution and we construct B confidence intervals for μ,
we would expect about B(1 − α) successul confidence
intervals that trap μ.
Thus we feel (1 − α)100% confidence that θ is within the
confidence interval.
A measure of efficiency: the expected length.
Suppose (L1, U1) and (L2, U2) are two confidence intervals for θ at the same confidence level. Then we say (L1, U1) is more efficient than (L2, U2) if
Eθ(U1 − L1) ≤ Eθ(U2 − L2), ∀θ ∈ Ω.
Selection of higher value of confidence level (1 − α) leads to a wider confidence interval.
Choosing a larger sample size n leads to a narrower confidence interval.
27/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Example (4.2.1) CI for μ under normality (σ unknown)
Suppose that X1, . . . , Xn is a random sample from N(μ, σ2) distribution. Assume that σ2 is unknown and μ is unknown. A (1 − α)100% confidence interval for μ is given by
̄S ̄S X − tα/2,n−1 √n, X + tα/2,n−1 √n .
Remark: Pivot random variable: Student’s theorem, Thm 3.6.1: X ̄ − μ
S/√n ∼ tn−1.(requires the normality of X) Critical point:
P (T > tα/2,n−1) = α/2, T ∼ tn−1. Confidence level:
̄S ̄S
P X−tα/2,n−1√n <μ
See Example 3.6.1.
30/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Properties of t-Distribution
The density function of t-distribution is symmetric, bell-shaped, and centered at 0.
The variance of t-distribution is larger than the standard normal distribution.
The tail of t-distribution is heavier (larger kurtosis).
31/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Student’s Theorem (Thm 3.6.1)
Suppose X1, · · · , Xn are iid N(μ, σ2) random variables. Define the random variables,
1n 1n
X= XiandS2= n i=1
Xi−X2 n − 1 i=1
Then
1 X∼N(μ,σ2);
n
2 X and S 2 are independent;
3 (n − 1)S2/σ2 ∼ χ2(n−1);
4 The random variable
X−μ T = S/√n
has a t-distribution with n − 1 degrees of freedom.
32/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
CI for σ2
Suppose that X1, . . . , Xn is a random sample from N(μ, σ2) distribution. Assume that both μ and σ2 are unknown. For
0 < α < 1, define χ2r,α/2 as the upper α/2 critical point of a χ2(r)
distribution. A (1 − α)100% confidence interval for σ2 is given by
(n−1)S2 2 (n−1)S2 P χ2 <σ < χ2
=1−α.
r,α/2
Remark: Pivot random variable: (n − 1)S2
σ2
r,1−α/2
2 (n−1)S22 P χr,1−α/2 < σ2 < χr,α/2
= 1 − α.
2
∼ χ (n − 1).
33/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Central Limit Theorem
34/53
Theorem (4.2.1): Central limit theorem
Let X1, . . . , Xn denote the observations of a random sample from a distribution that has mean μ and finite variance σ2. Then the√ distribution function of the random variable Zn = (X ̄ − μ)/(σ/ n) converges to Ψ, the distribution function of the N(0, 1) distribution, as n → ∞ (n > 30 in practice).
35/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
When σ is known,
The pivot random variable Z = (X ̄ − μ)/(σ/√n) ∼ N(0, 1),
when X is normally distributed.
With CLT,
X ̄ − μ Z = σ/√n
is approximately N(0,1), regardless of the distribution of X. When σ is unknown,
X ̄ − μ
By Student’s theorem, the pivot random variable T = S/√n
follows tn−1, when X is normally distributed.
With CLT and Slutsky’s theorem (will be discussed in Chapter
5),
X ̄ − μ T = S/√n
is approximately N(0,1), regardless of the distribution of X.
36/53
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Large-sample CI for μ with known σ
Suppose X1, X2, . . . , Xn is a random sample on a random variable X with mean μ and known variance σ2, but the distribution of X is not normal. Since
X ̄ − μ √
σ/ n
is approximately N(0, 1), ̄σ ̄σ
P X−zα/2√n <μ