Chapter 4
Some Elementary Statistical Inference
4.5 Introduction to Hypothesis Testing
1/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Motivating example
Suppose you are suspecting that the students in Boxiang’s STAT 4101 sleep fewer than 6 hours on the average each day.
To support your hypothesis, you randomly find 20 students, and you find the average sleeping time is 5.9 hours.
The sample average is less than 6 hours. Does this observation implies the average sleep time of all students is less than 6 hours?
2/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
The null and alternative hypothesis
How to find statistical evidence in favor of certain belief?
Assume X has a density function f (x; θ), where θ ∈ Ω. H0:θ∈ω0 vsH1:θ∈ω1.
Suppose ω0 ∪ ω1 = Ω (exclusive) and ω0 ∩ ω1 = ∅ (disjoint). H0 is referred as the null hypothesis.
H1 is referred as the alternative hypothesis.
Goal: use the data (X1, · · · , Xn) to choose between H0 and H1.
3/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Example (1)
You are testing whether a student in Boxiang’s STAT 4101 course sleep less than 6 hours on the average. Let μ by the average sleep hours.
H0 : μ = 6 vs H1 : μ < 6.
4/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Example (2)
Let θ denote the proportion of defective items in a large shipment. Someone buying this shipment might be interested in testing
H0 :θ≤.01vsH1 :θ>.01.
Example (3)
Let (X1, · · · , Xn) be a random sample from N(μ1, 1) and (Y1, · · · , Yn) a random sample from N(μ2, 1). Assume (X1,··· ,Xn) and (Y1,··· ,Yn) are independent.
H0 :μ1 −μ2 =0vsH1 :μ1 −μ2 >0.
4/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Critical region and decision rule
How to conduct hypothesis tests?
Assume that X1,…,Xn is a random sample from a distribution with the pdf f (x; θ).
Sample space: D, the possible values for (X1, . . . , Xn). Test:H0:θ∈ω0 vsH1:θ∈ω1.
Critical region (rejection region): a subset C of D.
Decision rule (test) is
Reject H0 (Accept H1) if (X1,…,Xn) ∈ C. Fail to reject H0 if (X1,…,Xn) ∈ Cc.
5/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Test statistic
Usually, we use a statistic to help us describe the critical region. For example, the critical region may look like:
C = {(X1,··· ,Xn) : T(X1,··· ,Xn) < k)}.
Here, T (X1, · · · , Xn) is called a test statistic for the hypotheses.
Example
Suppose that X1, . . . , Xn is a random sample from N(μ, σ2) distribution with unknown mean μ and known variance σ2. Suppose we want to test
H0 : μ = 0 vs H1 : μ ̸= 0
It is reasonable to reject H0 if |X ̄ | > k for some k > 0. Therefore, C = {(x1,…,xn) : |x ̄| > k}.
6/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Type I error and type II error
How to come up with a good decision rule?
Decision table for a hypothesis test
H0 is true
H1 is true
Reject H0
Type I Error
Correct Decision
Not reject H0
Correct Decision
Type II Error
Type I Error: H0 is rejected when it is true (i.e., θ ∈ ω0).
Type II Error: H0 is not rejected when it is not true (H1 is true, i.e, θ ∈ ω1).
7/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
https://effectsizefaq.com/2010/05/31/
i- always- get- confused- about- type- i- and- ii- errors- can- you- show- me- something- to- help- me- remember- the- diff
8/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Power function
Definition. The power function of a critical region C is defined to be
γC(θ) = Pθ[(X1,··· ,Xn) ∈ C], θ ∈ ω1.
1 For each fixed C, “power” is a function of the parameter θ.
2 Relationship with Type I error and Type II error:
Type I error: when θ ∈ ω0,
1−TypeIIerror:whenθ∈ω1.
9/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
How to come up with a good decision rule?
Ideally, from all possible critical regions, we try to find a critical region which minimizes the probabilities of both Type I and II errors. BUT, in general, this is impossible.
Consider the case when C = ∅.
See another example in the next slide.
10/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Suppose that X1, . . . , X100 is a random sample from N(μ, 1) distribution with unknown mean μ. Suppose we want to test
H0 : μ=0 versus H1 : μ̸=0
Let Ck = {(X1,…,Xn) : γC(μ) = 1−Pμ = 1−Pμ
|X ̄| > k} for k > 0. Then
− k − μ X ̄ − μ k − μ 1/10 < 1/10 < 1/10
−k − μ k − μ 1/10
19/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Example (4.5.4): (Exact) Test for the mean
Let X1 · · · , Xn be a random sample from a normal distribution with mean μ and unknown finite variance σ2. For a pre-specified μ0, test the hypotheses
H0 :μ=μ0 vsH1 :μ>μ0.
20/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Example
Suppose that X1, . . . , Xn form a random sample from a uniform distribution on the interval [0, θ], θ > 0 and the following hypotheses are to be tested:
H0 : θ ≥ 2 vs H1 : θ < 2.
Let Yn = max(X1, . . . , Xn), and consider a test procedure such that the critical region contains all the outcomes for which
Yn ≤ 1.5. Assume that n = 10.
1 Determine and sketch the power function of the test.
2 Determine the size of the test.
21/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Chapter 4
Some Elementary Statistical Inference
4.6 Additional Comments About Statistical Tests
22/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
One-sided hypothesis tests
Let X1, X2, . . . , Xn be a random sample from a normal distribution with unknown mean μ and unknown variance. For a specified value of μ0:
H0:μ=μ0 vs.H1:μ>μ0. Reject H0 in favor of H1 if
X ̄ − μ 0
T = S/√n > tα,n−1.
For a specified value of μ0:
H0:μ=μ0 vs.H1:μ<μ0.
Reject H0 in favor of H1 if
T = S/√n < t1−α,n−1 .
X ̄ − μ 0
23/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Let X1, X2, . . . , Xn be a random sample from a normal distribution with unknown mean μ and unknown variance. For a specified value of μ0:
H0 : μ=μ0 vs. H1 : μ̸=μ0. RejectH0 infavorofH1 ifX ̄ ≤horX ̄ ≥k.
Size:
α=PH0{X ̄ ≤horX ̄ ≥k}=PH0{X ̄ ≤h}+PH0{X ̄ ≥k}.
PH0 {X ̄ ≤ h} = α/2 and PH0 {X ̄ ≥ k} = α/2. Reject H0 in favor of H1 if
X ̄ − μ 0
S/√n ≥ tα/2,n−1.
24/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Two-sided tests and confidence intervals
How to connect hypothesis testing with confidence intervals?
Size α test: reject H0 in favor of H1 if X ̄ − μ 0
S/√n ≥ tα/2,n−1.
A (1 − α)100% confidence interval for μ is ̄S ̄S
(L,U) = X −tα/2,n−1√n, X +tα/2,n−1√n .
Conclusion: we reject H0 at significance level α if and only if μ0 is not in the (1 − α)100% confidence interval for μ.
25/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Example 4.6.2
Let independent random samples be taken from N(μ1, σ2) and N(μ2, σ2), respectively. Say these have the respective sample characteristics n1, X ̄, S12, and n2, Y ̄, S2. Let n = n1 + n2 denote the combined sample size and let
Sp2 = (n1 −1)S12 +(n2 −1)S2 n−2
be the pooled estimator of the common variance.
A (1 − α)100% confidence interval for ∆ = μ1 − μ2 is given by
̄ ̄11 (X−Y)±tα/2,n−2Sp n +n .
12
Atlevelα,rejectH0 :μ1 =μ2 andacceptH1 :μ1 ̸=μ2 if
X ̄ − Y ̄ − 0
T =
S p
1 n1
+
1 ≥ tα/2,n−2.
n2
26/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Atlevelα,rejectH0 :μ1 ≤μ2 andacceptH1 :μ1 >μ2 if X ̄ − Y ̄ − 0
T= 1 1 >tα,n−2. Sp n+n
Atlevelα,rejectH0 :μ1 ≥μ2 andacceptH1 :μ1 <μ2 if X ̄ − Y ̄ − 0
12
T= 1 1
The smaller the p-value is, the stronger the evidence is against H0.
A small number α (significance level of the test) is often selected. If p-value ≤ α, e.g., α = 0.05, then it is believed that H1 is true.
In other word, the p-value is the lowest significance level α of the null hypothesis that we will reject.
30/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Example
Suppose that X1, . . . , Xn is a random sample from N(μ, 1) distribution with unknown mean μ. Suppose we want to test
H0 : μ = 1 vs H1 : μ ̸= 1.
√
X ̄ − 1 Let Ck = (X1,…,Xn) : 1/√n > k
for k > 0.
Assume that z = x ̄−1 = 1.98, where z is the observed value.
1/ n
X ̄ − 1
p-value = Pμ=1 1/√n > 1.98 = P [|Z| > 1.98] = 2×0.024 = 0.048.
For a pre-specified α = 0.05, say, we reject H0 as p-value is less than 0.05.
The hypothesis H0 would be rejected for every level of significance α ≥ 0.048.
31/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Example 4.6.5
Let X1 · · · , X25 be a random sample from a distribution with mean μ and variance 4. The significance level of the test is 0.05. Suppose the observed sample mean x = 76.1. Use the p-value method to do the following tests:
1 H0 :μ=77vsH1 :μ<77.
2 H0 :μ=77vsH1 :μ>77.
3 H0 :μ=77vsH1 :μ̸=77.
For all the three tests, we see the test statistic
Z = X ̄ − 77 ∼ N(0, 1). 0.4
The observed value of the test statistic is
z = (76.1 − 77)/0.4 = −2.25.
Given Φ(−2.25) = 0.012, the p-values are 0.012, 0.098, and 0.024, respectively.
32/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Review
1 How to find statistical evidence in favor of certain belief? (Slide 3).
Hypothesis testing, null hypothesis, alternative hypothesis.
2 How to conduct hypothesis tests? (Slide 5).
Critical region, decision rule, test statistic.
3 How to come up with a good decision rule? (Slide 7).
Type I error, type II error, power function, size (significance level), better tests.
4 How to connect hypothesis testing with confidence intervals? (Slide 25).
One-sided tests, two-sided tests.
5 How to quantify the statistical significance? (Slide 29).
p-values.
33/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Chapter 4
Some Elementary Statistical Inference
4.7 Chi-Square Tests
34/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Example 1: Flipping a Coin
35/50
Flipping a coin once results in one of two possible outcomes: head or tail.
Suppose one tosses a coin 100 times independently. Let X1 denote the number of heads and X2 the number of tails. (Note that X2 = 100 − X1.)
Question. Is it a balanced coin?
Need to test the following hypotheses:
H0 :p=0.5 vs H1 :p̸=0.5.
36/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
If H0 is true, for 100 flips, we expect about 100 × 0.5 = 50 heads and 50 tails.
(Because X1 ∼ Bin(100, 0.5) and X2 = 100 − X1.)
For each category, head or tail, we compute the quantity
(xi − 100 × 0.5)2
100 × 0.5
This quantity measures the divergence between the observed
counts (xi) and expected counts (50) in each category.
Let Q1 = 2 (xi−50)2 . By large sample theory, under H0,
i=1 50
Q1 is approximately distributed as χ21 (df equals 2 − 1 = 1).
Suppose we desire the significant level of the test to be α. Then we can use the table of χ2 distribution to find a constant c such that
P(Q1 >c)=α.
Accordingly, the size α reject region is
{(x1,x2):Q1 >c}
37/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Why Q1 approximately follows a χ21 distribution?
Since X1 ∼ Bin(n, p1), we have
X1 − np1
Y =np1(1−p1)∼N(0,1)
approximately. Thus Q1 = Y 2 is approximately χ21.
Let X2 = n − X1 and p2 = 1 − p1. Then
(X1 − np1)2 (X1 − np1)2 (X1 − np1)2 Q1=np(1−p)= np + n(1−p)
= np + np , 12
which is due to
(X1 −np1)2 = (n−X2 −n+np2)2 = (X2 −np2)2 .
How about more than two outcomes?
1111 (X1 − np1)2 (X2 − np2)2
38/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Example 2: Rolling a Die
39/50
Rolling a die once results in one of six possible outcomes: {1,2,3,4,5,6}.
Suppose one rolled a die 60 times independently. Let Xi denote the number of times that No.i appeared in the 60 trials. (NotethatX6 =60−X1 −···−X5.)
Question. Is it a fair die?
Need to test the following hypotheses:
H0 : p1 = 1/6,··· ,p6 = 1/6 vs H1 : any other cases.
40/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
If H0 is true, for 60 rolls, we expect that each number appeared about 60 × 1/6 = 10 times. Because jointly
(X1, · · · , X6) ∼ Mulitnomial(60, 1/6, 1/6, 1/6, 1/6, 1/6, 1/6) or marginally, Xi ∼ Bin(60, 1/6).
For each category, we compute the quantity (xi − 60 × 1 )2
6
This quantity measures the divergence between the observed counts (xi) and expected counts (10) in each category.
Let Q5 = 6 (xi−10)2 . By large sample theory, under H0, i=1 10
Q5 is approximately distributed as χ25 (df equals 6 − 1 = 5). By using the table of χ2 distribution, We can find a constant c
such that
P(Q5 >c)=α,
where α is the desired significant level of the test.
6. 60 × 1
41/50
Chi-Square Test
41/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Generally speaking, the classical Chi-Square test is used to test if the observed data (sample) came from a specific theoretic probability distribution (underlying distribution).
There are several variations of Chi-Square test. We discuss two commonly used tests in the next two subsections.
1 (Slide 43): The theoretic probability distribution (the underlying distribution) is fully specified. For example, classical goodness of fit test (like example 1 and 2).
2 (Slide 45): The theoretic probability distribution (the underlying distribution) is not fully specified. Instead, we only know that the underlying distribution belongs to a certain collection of distributions. Chi-square tests used in a lot of contingency table problems belong to this case.
42/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Classical Goodness of Fit Test
1. Let A denote the sample space of a random experiment.
2. A=A1∪A2∪···∪Ak.Aistheunionofanumberkof mutually exclusive events (subsets of the sample space). Like “head” and “tail”.
3. Let pi denote the probability that event Ai occurs in one experiment. That is pi = P(Ai), i = 1,2,··· ,k.
4. The random experiment is to be repeated n independent times and Xi, i = 1,2,··· ,k denote the number of times that event Ai has occurred.
Clearly,Xk =n−X1 −···−Xk−1.
5. Jointly,(X1,X2,···,Xk)∼Multinomial(n,p1,···,pk).
43/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
6. Test the hypothesis:
H0 :pi =pi0,∀i=1,…,kvsH1 :pi ̸=pi0,foratleastonei.
7. The test statistic
k (Xi − npi0)2 Q= np .
i=1 i0
Under H0, Q can be approximated by χ2k−1 distribution when
n large enough. 8. The decision rule:
1 Size α critical region: reject H0 if {Q ≥ c}. The number c satisfies P (Q ≥ c) = α.
2 p-value: Let q denote the observed value of Q. The p-value equalsP(Q≥q)≈1−F(q)whereF standsforthecdfofa χ2k−1 distribution.
If the p-value ≤ α, then reject H0.
44/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Contingency Tables
Let A and B be two random variables:
A
B
Total
B1 B2 ··· Bb
A1 A2 .
. .
Aa
p11 p12 ··· p1b
p21 p22 ··· p2b
…. …. ….
pa1 pa2 ··· pab
p1. = bj=1 p1j p2. = bj=1 p2j .
. .
pa. = bj=1 paj
Total
p.1 = aj=1 pj1 p.2 = aj=1 pj2 ··· p.b = aj=1 pjb
1
Now perform the random experiment n times. Let Xij be the frequencies of the event Ai ∩ Bj . Data collected from these two variables can be summarized in a a × b contingency table.
A
B
Total
B1 B2 ··· Bb
A1 A2 .
. .
Aa
X11 X12 ··· X1b
X21 X22 ··· X2b
…. …. ….
Xa1 Xa2 ··· Xab
X1. = bj=1 X1j X2. = bj=1 X2j .
. .
Xa. = bj=1 Xaj
Total
X.1 = aj=1 Xj1 X.2 = aj=1 Xj2 ··· X.b = aj=1 Xjb
n
45/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
Now, we try to use this contingency table to decide if A and B are independent. So, we have the following hypotheses:
H0 :pij =pi.p.j foralli=1,···,a;j=1,···,b H1 : pij ̸= pi.p.j for at least one pair of i, j.
How to use the chi-square statistic to do the test?
46/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
The following table summarizes the fate of the passengers and crew when the Titanic sank on Monday, April 15, 1912. It has two variables.
A row variable (Survival Status), which indicates whether the person survived or died.
A column variable (Gender/Age), which lists the demographic categories: men, women, boys, girls.
Question. Are the two random variables independent?
47/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
To do test of independence, we use the following formula to construct the Chi-Square statistic Q.
ab
(Xij −Eij)2
Q=E. i=1 j=1 ij
Here,
Xij = the observed count in the (i, j)th cell, and
Eij = the expected count in the (i, j)th cell had H0 been true
= the corresponding row total × the corresponding column total . n
48/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
49/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021
We will use a 0.05 significance level, test whether someone survived or died is independent of whether the person is a man, woman, boy, or girl.
H0: Whether a person survived is independent of whether the person is a man, woman, boy, or girl.
H1: Surviving the Titanic sinking and being a man, woman, boy, or girl are dependent.
The observed value (q) of the chi-square statistic (Q) is: (332 − 537.360)2 (18 − 30.709)2
q = 537.360 + · · · + 30.709 = 507.084. Thedfofthischi-squarestatisticis(2−1)×(4−1)=3.
With degree of freedom 3, we check the p-value: P (Q > 507.084) << 0.05.
On the basis of this small p-value, we reject H0 and conclude that there is sufficient sample evidence to warrant rejection of the independence.
50/50
Boxiang Wang
Chapter 4 STAT 4101 Spring 2021