title: ‘Lab session 1Rep: Solutions’
author: “RMIT University-MATH1302”
date: “02/03/2022”
output: html_document
Copyright By PowCoder代写 加微信 powcoder
“`{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
**Library**
“`{r message=FALSE, warning=FALSE}
library(mosaic) # favstats()
library(ggplot2)
library(tidyverse) # summarise()
library(car) #ncvTest()
library(randtests) # runs.test()
library(stats) # TukeyHSD()
library(effectsize) # sd_pooled
# Question 1:
##### Prior to 1990 it was thought that the average oral human body temperature of a healthy adult was 37°Celsius (C). Investigators at that time were interested to know if this mean was correct. They gathered a sample of 130 adults and measured their oral body temperature. The dataset, Body_temp.csv, can be downloaded from the data repository. The descriptive statistics and a box plot of the data produced using R are reported below.
Body<-read.csv("Body_temp.csv") head(Body) **Display the structure of the data.** ###### Calculate all the descriptive statistics. **using mosaic package** favstats(~ Body_temp | Gender, data = Body) The sample mean is found to be $36.81°C$. The mean was lower, but we know that samples bring with them sampling error. The researcher needs a way to determine if there was sufficient evidence from the sample to support the idea that the mean body temperature was not equal to $37°C$. ###### Prepare a boxplot that display the Body_temp. boxplot(Body$Body_temp, ylab = "Temperature (Celsius)") **The one-sample t-test in R** We are going to state the hypothesis to test if the average oral human body temperature of a healthy adult was less than 37: $H_0: \mu= 37$. $H_a: \mu < 37$ where $\mu$ denotes a population mean for human body temperature. The alternate hypothesis is stated as a **lower one-tailed test**. *We can use R to easily perform the one-sample t-test in a split second. We use the* $t.test()$ *function for this purpose. Here's an example of the lower-tailed hypothesis test:* ```{r echo=TRUE} t.test(~ Body_temp, data=Body ,mu = 37, alternative="less") Here's an example of the two-tailed hypothesis test, where hypothesis should be written differently: $H_0: \mu = 37$. $H_a: \mu \neq 37$ t.test(~ Body_temp, data=Body ,mu = 37, alternative="two.sided") *You can use Help to know more about the t.test() funcation* **Two-sample t-tests - Body Temperatures VS Gender** The two-sample t-test has the following statistical hypotheses: $H_0: \mu_1 - \mu_2 = 0$ OR $H_0: \mu_1 = \mu_2$. $H_a: \mu_1 - \mu_2 \neq 0$ OR $H_a: \mu_1 neq \mu_2$ where $\mu_1$ and $\mu_2$ refer to the population means of group 1 and 2 respectively. Body$Gender <- factor(Body$Gender, levels = c(1,2), labels = c("Male","Female")) #Assign correct labels favstats(~ Body_temp | Gender,data = Body) ggplot(Body, aes(x = Gender, y = Body_temp, fill = Gender)) + geom_boxplot() + theme_classic() + labs( title= "Comparative Box Plot") ###### Test the Homogeneity of Variance Homogeneity of variance, or the assumption of equal variance, is tested using the Levene's test. Not like other program, such SPSS, where this test is reporeted by default. The Levene's test has the following statistical: hypotheses: $H_0: \sigma^2_1 = \sigma^2_2$. $H_a: \sigma^2_1 \neq \sigma^2_2$ where $\sigma^2_1$ and $\sigma^2_2$ refer to the population variance of group 1 and 2 respectively. The Levene's test reports a p-value that is compared to the standard $0.05$ significant level. WE can use the $leveneTest()$ function in R to compare the variance of male and female body temperatures: leveneTest(Body_temp ~ Gender, data = Body) The p-value for the Levene's test of equal variance for body temperature between males and females was $p = 0.84$. WE find $p>0.05$, therefore, we fail to reject $H_0$ In plain language, we are safe to assume equal variance.
**Two-sample t-test – Assuming Equal Variance**
Let’s jump straight into R and perform a two-sample t-test assuming equal variance and a two-sided hypothesis test. We use the $t.test()$ function.
t.test(Body_temp ~ Gender, data = Body, var.equal=TRUE,
alternative=”two.sided”, conf.level = 0.95) # conf.level = 0.95 by defult
**Two-sample t-test – Assuming Unequal Variance**
*If you don’t specify var.equal in the t.test() function for R, the two-sample t-test not assuming equal variance is reported by default. This test is also known as the Welch two-sample t-test.*
t.test(Body_temp ~ Gender, data = Body, var.equal=FALSE,
alternative=”two.sided”)
Because the variances were very similar between males and females, the adjusted test statistic $t = – 2.32$ and $df = 128$ for the Welch test are actually the same as the equal variances assumed two-sample t-test. This is why some recommend that you should also use the Welch test. If the variances are unequal, the test will make the required adjustment. If not, the test will be similar to a regular two-sample t-test. This means you can skip testing equal variance using the Levene’s test. This might make things simpler, but understanding the difference between these two versions of the two-sample t-test will help to make this decision.
**Two-sample t-test: Male is greater than Female**
“`{r eval=FALSE, include=FALSE}
Body$Gender <- factor(Body$Gender, levels = c("Male","Female"), labels = c(1,2)) #Assign correct labels to be applied in the next t.test code.
t.test(Body_temp ~ Gender, data = Body, var.equal=TRUE, alternative="greater")
**Two-sample t-test: Male is less than Female**
t.test(Body_temp ~ Gender, data = Body, var.equal=TRUE, alternative="less")
##### Using the R base function
###### Option 2: The data are saved in two different numeric vectors:
# Save the data in two different vector
Male <- Body %>%
filter(Gender == “Male”) %>%
pull(Body_temp)
Female <- Body %>%
filter(Gender == “Female”) %>%
pull(Body_temp)
# Compute t-test
res <- t.test(Male, Female, var.equal=TRUE, alternative="less") ##### (c) Find a 95 percent confidence interval on the difference in means. Provide a practical interpretation of this interval. The two-sample t-test has the following statistical hypotheses: $H_0: \mu_1 - \mu_2 = 0$ $H_a: \mu_1 - \mu_2 \neq 0$ DescTools::MeanDiffCI(Male, Female , conf.level = 0.95, sides = c("two.sided")) # sides = c("two.sided", "left", "right") based on your hypothesis From the computer output, the $95\%$ confidence interval is $(-0.30214491, -0.02400893)$. This confidence bound is not include $0$; therefore, there is a difference in the body temperature of the two gender. ##### Draw dot diagrams to assist in interpreting the results from this experiment.` ggplot(Body, aes(x=Gender, y=Body_temp , fill=Gender)) + geom_dotplot(binaxis='y', stackdir='center') + #stat_summary(fun.data=mean_sdl, fun.args = list(mult=1), # geom="pointrange", color="red") + scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9")) + labs(title = "Effect of gender on the body temperatures", subtitle = "Dotplot of body temperatures by gender", caption = "Data source: Link", x = "Gender", y = "Body Temperatures (in Celsius)", fill = "Body") ##### Check the assumption of normality of the body temperature for each gender. #define plotting region par(mfrow=c(1,2)) #create Q-Q plot for both datasets qqnorm(Male, main='Male') qqline(Male) qqnorm(Female, main='Female') qqline(Female) # Shapiro-Wilk normality test for Men's weights with(Body, shapiro.test(Body_temp[Gender == "Male"]))# p = 0.4818 # Shapiro-Wilk normality test for Women's weights with(Body, shapiro.test(Body_temp[Gender == "Female"])) # p = 0.03351 From the output, the p-values of Male are greater than the significance level $\alpha=0.05$ implying that the distribution of the data are not significantly different from the normal distribution. In other hand,the p-values of Female are less than the significance level $\alpha=0.05$ implying that the distribution of the data are significantly different from the normal distribution. ##### Find the power of this test for detecting an actual difference in means of -0.25 °C. s<-sd_pooled(Male,Female) s # sd_pooled *(need to be checked) power.t.test(n = 65, delta = -0.25, sd = 0.40, sig.level = 0.05, power = NULL, type = c("two.sample"), alternative = c("two.sided"), strict = TRUE) ##### What sample size would be necessary to detect an actual difference in means of 0.5 °C with a power of at least 0.85? power.t.test(n = NULL, delta = 0.5, sd = 0.40, sig.level = 0.05, power = 0.85, type = c("two.sample"), alternative = c("two.sided"), strict = TRUE) This result makes intuitive sense. Less samples are needed to detect a bigger difference. 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com