Paired T test and One-sample T test
Miaoyan Wang
Department of Statistics UW Madison
Copyright By PowCoder代写 加微信 powcoder
Two-Sample Studies
Two-sample studies aim to compare two populations. For example, the goals are to:
A. Compare milk yield of cows on two different diets.
B. Compare timber volumes of two species of trees.
C. Compare heart rates of patients before and after a drug treatment. D. Compare test scores of 7th graders before and after the summer break.
Paired vs. Independent Two Samples
There are two types of two-sample studies:
Two samples are paired.
Two samples are independent or unpaired.
A paired two-sample study (or experiment) is a study (or experiment) with two levels of a factor (or treatment) where each observation on one level of the factor (or treatment) is naturally paired with an observation on the other level of the factor (or treatment).
An independent two-sample study (or experiment) is a study (or experiment) with two levels of a factor (or treatment) where there is no relationship between the observations on the two levels of factor (or treatment).
Paired vs. Independent Two Samples
The choice of a paired-sample study versus an independent two-sample study is an important design issue.
When to use which study? For example, consider
Heart rates of 10 patients before and after a drug treatment.
Heart rates of 10 patients before the drug treatment and heart rates of
another 10 patients after the drug treatment.
Which study would be better for detecting the drug effect?
Paired studies are usually preferred, because of increased precision (i.e. reduced variability) in estimating population mean difference.
The method we use for data analysis should follow the study design.
Example: Lake Clarity 1980 vs. 1990
Lake 1 2 3 4 5 6 7 … 17 18 19 20 21 22 sample mean sample variance sample sd
1980 1990 2.11 3.67 1.79 1.72 2.71 3.46 1.89 2.60 1.69 2.03 1.71 2.10 2.01 3.01
… … 1.47 2.43 1.67 1.91 2.31 3.06 1.76 2.26 1.58 1.48 2.55 2.35 1.854 2.351 0.168 0.354 0.410 0.595
Question of interest: Are the population mean in 1990 the same as that in 1980?
In general, how to perform statistical inference on two population means?
Null Hypothesis vs. Alternative Hypothesis
Y1i: Random variable of Secchi depth of the ith lake in 1990 for i = 1, . . . , n.
Y2i: Random variable of Secchi depth of the ith lake in 1980 for i = 1, . . . , n.
μ1 = E (Y1i ): Population mean Secchi depth in 1990. μ2 = E (Y2i ): Population mean Secchi depth in 1980. OurgoalistotestH0 :μ1 =μ2 vs. HA :μ1 ̸=μ2. Under the null hypothesis H0 : μ1 = μ2
The null hypothesis H0 is generally the claim initially favored or believed to be true.
Under the alternative hypothesis HA : μ1 ̸= μ2
The alternative hypothesis HA is generally the departure from H0 that one wishes to be able to detect.
Null Hypothesis vs. Alternative Hypothesis
Di =Y1i −Y2i: Secchidepthdifferenceoftheithlakebetween1990 and 1980.
μD =E(Di)=μ1−μ2: PopulationmeanSecchidepthdifference between 1990 and 1980.
Equivalent to testing H0 : μ1 = μ2 vs. HA : μ1 ̸= μ2, we now consider testing
H0 :μD =0vs. HA :μD ̸=0.
A statistic is the sample mean Secchi depth difference D ̄ based on an
i.i.d. sample of size n = 22 (D1,D2,…,D22).
Sample average D ̄ = n1 i Di = 0.497 and sample variance
1 (Di −D ̄i)2 =0.19. n−1 i
Test Statistic
Assume that the H0 : μD = 0 holds. Assume that Di ∼i.i.d. N(0,σD2 ). What is the distribution of D ̄?
̄ σD2 D∼N0,n .
Why not D ̄ ∼ N(0,0.190/22)?
Because σD2 is unknown, we plug in the estimator
S2:= 1 (Di−D ̄i)2inplaceσ2. D n−1 i D
The test statistic is a function of the data that is useful in testing and leads to a value that can be directly interpreted by using an appropriate statistical table.
D ̄ T = SD ,
T Distribution
The test statistics T follows a T-distribution with degree of freedom
n under the null hypothesis.
df=3 df=6 df=18 N(0,1)
observation
Probability Density
−4 −2 0 2 4
P-value: tail probability under the null
Example: Lake Clarity 1980 vs. 1990
From the summary statistics, we have n = 22, d ̄ = 0.497, and sD = 0.435.
The standard error is:
sd / n = 0.435/ 22 = 0.0927 The observed test statistic is:
sd/ n 0.0927
Compute a p-value defined as the probability of observing a value as extreme or more extreme than what we observed, if the H0 is true.
2 × P(T21 ≥ 5.357) which is less than 0.002.
Interpretation: If H0 is true, then we observed a very rare event. In other words, we have strong evidence to reject H0.
t = d ̄−0 = 0.497−0 = 5.357 √
Interpretation of the p-value
The p-value can be interpreted as evidence again H0. The smaller the p-value, the greater the evidence.
In the classical hypothesis testing, a threshold value α is determined and the p-value is compared against it.
If the p-value is less than α, then we reject the H0.
If the p-value is greater than α, then we do not reject the H0.
Lake clarity 1980 vs. 1990 example: Reject H0 at the 5% level. There is very strong evidence that the mean Secchi depths in 1980 and 1990 are different.
Another Example: Lake Clarity 1980 vs. 1990
How about testing
H0 :μ1 =μ2+0.5vs. HA :μ1 >μ2+0.5
The test statistic is:
D ̄ − 0 . 5
T = SD/√n ∼ Tn−1
The standard error is: sd / n = 0.435/ 22 = 0.0927 √√
The observed test statistic is: t = d ̄−0.5 = 0.497−0.5 = −0.0294 sd / n 0.435/ 22
The p-value is: P(T21 ≤ −0.0294) = pt(−0.0294,df = 21) which is more than 0.48 from calculator, R, or T-table.
The conclusion is: Do not reject H0 at 5% level. There is no evidence against that the H0 that the mean Secchi depths differ by 0.5 m between 1990 and 1980.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com