Computer Lab Week 10: solutions
STAT221
Hypothesis testing using a parametric t-test
Consider the sample, xi ∈ {1.5, 3.0, 2.2, 2.5, 4.1, 3.7}.
1. Test the hypothesis
H1: μ>2
by writing the R code needed to perform a t-test, but without using the R function t.test(). Can
the null-hypothesis be rejected at a type-I-error rate of α = 0.05?
2. Modify the above code to test the alternative hypothesis
H1: μ̸=2.
Compare with the previous one-sided hypothesis and test decision.
## [1] TRUE
## [1] FALSE
Hypothesis testing using a permutation test
Generate two normally distributed random samples
Xi ∼ Normal(μX = 0,σX = 1)
and
x = c(1.5, 3.0, 2.2, 2.5, 4.1, 3.7)
stat = (mean(x) – 2) / (sd(x)/sqrt(length(x)))
pval = pt(stat, df = length(x) – 1, lower.tail = FALSE) pval < 0.05
pval2 <- 2*pt(abs(stat), df = length(x) - 1, lower.tail = FALSE) pval2 < 0.05
Yj ∼ Normal(μY = 1.5, σY = 2)
1. Perform a one-sided t-test (using α = 0.05) to test if μX < μY without using the function t.test().
with sample sizes nX = nY = 4.
2. Write the R code to perform a permutation test for the t-test statistic and compare the resulting
p-value with the p-value of the t-test above.
3. By creating a suitable plot, compare the sampling distribution of the test statistic (plotted as a histogram, say) with the density of a t-distribution (plotted by adding a curve to the histogram).
4. Create plots for different sample sizes, nX and nY , e.g. 5, 10, 25, 50, 100, and with a range of different seeds used for the random number generation.
x = rnorm(4, 0, 1)
y = rnorm(4, 1.5, 2)
stat = (mean(y) - mean(x)) / sqrt(var(x)/length(x) + var(y)/length(y)) df = (var(x)/length(x) + var(y)/length(y))^2 / ((var(x)/length(x))^2 /
(length(x)-1) + (var(y)/length(y))^2 / (length(y)-1)) (pval = pt(stat, df = df, lower.tail = FALSE))
## [1] 0.2731349
1
pval < 0.05 ## [1] FALSE
## [1] 0.273
permstat <- function(x, y){
xy = sample(c(x, y), replace = FALSE)
xp = xy[1:length(x)]
yp = xy[-(1:length(x))]
stat = (mean(yp) - mean(xp)) / sqrt(var(xp)/length(xp) + var(yp)/length(yp)) return(stat)
}
sdist = replicate(1000, permstat(x, y)) mean(sdist > stat)
hist(sdist, freq = FALSE, breaks = 100) xc = seq(-5, 5, length = 100)
tdens = dt(xc, df = df)
lines(xc, tdens, col = “blue3”)
Histogram of sdist
−3 −2 −1 0 1 2 3
sdist
2
Density
0.0 0.2 0.4 0.6 0.8