留学生代考 MAST20005/MAST90058"

title: “Week 8 Lab, MAST20005/MAST90058”
author: “School of Mathematics and Statistics, University of Melbourne”
date: “Semester 2, 2022”
institute: “University of Melbourne”

fontsize: 11pt
classoption: a4paper
html_document:
toc_float:
collapsed: false

**Goals:** (i) Practise various parametric and non-parametric tests for
independent samples; (ii) Goodness-of-fit test; (iii) Association tests between
categorical variables.

**Data for the [Goodness-of-fit test] section:** Butterfat data
(`butterfat.txt`). Measurements of the amount of butterfat (in pounds)
produced by 90 cows during a 305-day milk production period following their
first calf. The data file can be obtained from the shared folder in the
computer labs, or from the LMS.

“`{r echo=FALSE}
set.seed(2339)

# Two-sample tests

Let $X$ and $Y$ be, respectively, the blood volumes in milliliters of a male
who is a paraplegic and participates in vigorous physical activities and of
a male who is able-bodied and participates in everyday, ordinary activities.
Assume $X \sim \mathrm{N}(\mu_X, \sigma_X^2)$ and $Y \sim \mathrm{N}(\mu_Y,
\sigma_Y^2)$. The following are observations on $X$ and $Y$:
x <- c(1612, 1352, 1256, 922, 1560, 1456, 2324) y <- c(1082, 1300, 1092, 1040, 910 , 1248, 1092, 1040, 1092, 1288) Test the null hypothesis $H_0\colon \mu_X = \mu_Y$ against the two-sided alternative $H_1\colon \mu_X \neq \mu_Y$ at the $\alpha = 0.05$ level of significance. t.test(x, y, var.equal = TRUE) Since the p-value is smaller than $\alpha = 0.05$, we reject $H_0$ and conclude that there is sufficient evidence to show that the population means are the different. Let us check the assumptions for the above test. Note that by using the option `var.equal = TRUE` we are assuming $\sigma_X = \sigma_Y$. We can check this assumption by testing $H_0\colon \sigma_X = \sigma_Y$ against the alternative $H_1\colon \sigma_X \neq \sigma_Y$ as follows: var.test(x, y) We reject $H_0$ at the usual significance level of 5\%. We can see that there is substantial evidence that the two variances differ. A modification of the t-test known as Welch's test can help us correct for this problem by estimating both variances, and adjusting the degrees of freedom to use in the test. (This is the default procedure, if we do not specify `var.equal = TRUE` argument.) t.test(x, y) Now the p-value is larger than $\alpha = 0.05$, so we cannot reject $H_0\colon \mu_X = \mu_Y$. It turns out that our earlier test was too optimistic: once we factor in the additional uncertainty from estimating two unknown variances it is harder to reject the null hypothesis. Another assumption used in the t-test is normality. The following creates two normal QQ-plots: ```{r fig.width=6, fig.height=4} par(mfrow = c(1, 2)) qqnorm(x, main = "X", col = 4) qqline(x, lty = 2) qqnorm(y, main = "Y", col = 4) qqline(y, lty = 2) In the $X$ sample there is a potential outlier in the right tail of the distribution, suggesting some degree of incompatibility with the normality assumption. The $Y$ sample shows other deviations from the straight line. The previous analyses suggest that a distribution-free test could suit these data better than the t-test. Consider a two-sample Wilcoxon test for the null hypothesis $H_0\colon m_X = m_Y$ (population medians are equal) against the alternative $H_0\colon m_X \neq m_Y$. # Wilcoxon test with normal approximation wilcox.test(x, y, exact = FALSE) The option `exact` specifies whether we wish to use a normal approximation for the distribution of the test statistic instead of the exact distribution. The Wilcoxon test is said to be a robust test because outliers do not affect its results very much. For example, replace the smallest observation in the $X$ sample by some more extreme number. What happens to the p-value? ```{r results='hide'} y2[5] <- 470 wilcox.test(x, y2, exact = FALSE) # Goodness-of-fit test Consider the butterfat dataset. Let $X$ be the amount of butterfat. Let's test the hypothesis that $X \sim \mathrm{N}(\mu, \sigma^2)$. We first need to fit the model. To this end, compute the sample mean $\bar{x}$ and standard deviation $s_x$. butterfat <- read.table("butterfat.txt")[, 1] x.bar <- mean(butterfat) s <- sd(butterfat) Next, partition the data into a few discrete classes and compute observed frequencies. b <- c(0, seq(374, 624, 50), 1000) # class boundaries O <- table(cut(butterfat, breaks = b)) # observed counts We could compute expected frequencies using: prob <- rep(0, 7) prob[1] <- pnorm(b[2], x.bar, s) - pnorm(b[1], x.bar, s) prob[7] <- pnorm(b[8], x.bar, s) - pnorm(b[7], x.bar, s) E <- prob * length(butterfat) Let's package this up into a loop: prob <- rep(0, 7) for (k in 1:7) prob[k] <- pnorm(b[k + 1], x.bar, s) - pnorm(b[k], x.bar, s) E <- prob * length(butterfat) # expected frequencies and then carry out the test: C <- sum((O - E)^2 / E) # chi-squared statistic d1 <- length(O) # number of classes qchisq(0.95, d1 - 3) # critical value 1 - pchisq(C, d1 - 3) # p-value cbind(O, E) We could also conduct the chi-squared test using: chisq.test(O, p = prob) but we need to remember that the final degrees of freedom should be $k - p - 1$, where $k$ is the number of classes and $p$ the number of estimated parameters. That means we need to recalculate the p-value using the given value of the statistic and the correct degrees of freedom: 1 - pchisq(1.0144, d1 - 3) Have a go at using the `qqnorm()` function to illustrate your result. # Chi-squared test for association The following data describe the state of grief for a sample of mothers who had suffered a neonatal death. The table relates this to the amount of support given to these women: | Grief state | Good support | Adequate support | Poor support | |:-----------:|-------------:|-----------------:|-------------:| | I | 171 | 93 | 82 | | II | 62 | 54 | 11 | | III | 31 | 55 | 43 | | IV | 11 | 27 | 57 | We enter these data in R as follows: X.raw <- c(171, 93, 82, 62, 54, 11, 31, 55, 43, 11, 27, 57) X <- matrix(X.raw, nrow = 4, byrow = TRUE) test <- chisq.test(X) The p-value is very small so we reject the null hypothesis that the rows and columns are independent. Hence, we conclude that we have strong evidence that the amount of support is associated with the grief state of mothers. The following shows the expected frequencies: test$expected In R, show how to compute the first cell in the table of expected frequencies and give the first term in the chi-squared statistic: E.1 <- sum(X[, 1]) * sum(X[1, ]) / sum(X) O.1 <- 171 (O.1 - E.1)^2 / E.1 # Exercises Do each of the tutorial problems but now using R. 1. Do question 1 from the tutorial problems. # Insert answer here... 2. Do question 2 from the tutorial problems. # Insert answer here... # Insert answer here... # Insert answer here... # Insert answer here... 3. Do question 3 from the tutorial problems. # Insert answer here... 4. Do question 4 from the tutorial problems. # Insert answer here... 程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts