Lab session 1Rep: Solutions
Lab session 1Rep: Solutions
Copyright By PowCoder代写 加微信 powcoder
RMIT University-MATH1302
02/03/2022
library(mosaic) # favstats()
library(ggplot2)
library(tidyverse) # summarise()
library(car) #ncvTest()
library(randtests) # runs.test()
library(stats) # TukeyHSD()
library(effectsize) # sd_pooled
Question 1:
Prior to 1990 it was thought that the average oral human body temperature of a healthy adult was 37°Celsius (C). Investigators at that time were interested to know if this mean was correct. They gathered a sample of 130 adults and measured their oral body temperature. The dataset, Body_temp.csv, can be downloaded from the data repository. The descriptive statistics and a box plot of the data produced using R are reported below.
Body<-read.csv("Body_temp.csv")
head(Body)
## Body_temp Gender Heart_rate
## 1 35.7 1 70
## 2 35.9 1 71
## 3 36.1 1 74
## 4 36.1 1 80
## 5 36.2 1 73
## 6 36.2 1 75
Display the structure of the data.
## 'data.frame': 130 obs. of 3 variables:
## $ Body_temp : num 35.7 35.9 36.1 36.1 36.2 36.2 36.2 36.2 36.3 36.3 ...
## $ Gender : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Heart_rate: int 70 71 74 80 73 75 82 64 69 70 ...
Calculate all the descriptive statistics.
using mosaic package
favstats(~ Body_temp | Gender, data = Body)
## Gender min Q1 median Q3 max mean sd n missing
## 1 1 35.7 36.4 36.7 37.0 37.5 36.72615 0.3882158 65 0
## 2 2 35.8 36.7 36.9 37.1 38.2 36.88923 0.4127359 65 0
The sample mean is found to be 36.81°C36.81°C. The mean was lower, but we know that samples bring with them sampling error. The researcher needs a way to determine if there was sufficient evidence from the sample to support the idea that the mean body temperature was not equal to 37°C37°C.
Prepare a boxplot that display the Body_temp.
boxplot(Body$Body_temp, ylab = "Temperature (Celsius)")
The one-sample t-test in R
We are going to state the hypothesis to test if the average oral human body temperature of a healthy adult was less than 37:
H0:μ=37H0:μ=37.
Ha:μ<37Ha:μ<37
where μμ denotes a population mean for human body temperature. The alternate hypothesis is stated as a lower one-tailed test.
We can use R to easily perform the one-sample t-test in a split second. We use the t.test()t.test() function for this purpose. Here’s an example of the lower-tailed hypothesis test:
t.test(~ Body_temp, data=Body ,mu = 37, alternative="less")
## One Sample t-test
## data: Body_temp
## t = -5.3818, df = 129, p-value = 1.68e-07
## alternative hypothesis: true mean is less than 37
## 95 percent confidence interval:
## -Inf 36.86689
## sample estimates:
## mean of x
## 36.80769
Here’s an example of the two-tailed hypothesis test, where hypothesis should be written differently:
H0:μ=37H0:μ=37.
Ha:μ≠37Ha:μ≠37
t.test(~ Body_temp, data=Body ,mu = 37, alternative="two.sided")
## One Sample t-test
## data: Body_temp
## t = -5.3818, df = 129, p-value = 3.361e-07
## alternative hypothesis: true mean is not equal to 37
## 95 percent confidence interval:
## 36.73699 36.87839
## sample estimates:
## mean of x
## 36.80769
You can use Help to know more about the t.test() funcation
Two-sample t-tests - Body Temperatures VS Gender
The two-sample t-test has the following statistical hypotheses:
H0:μ1−μ2=0H0:μ1−μ2=0 OR H0:μ1=μ2H0:μ1=μ2.
Ha:μ1−μ2≠0Ha:μ1−μ2≠0 OR Ha:μ1neqμ2Ha:μ1neqμ2
where μ1μ1 and μ2μ2 refer to the population means of group 1 and 2 respectively.
Body$Gender <- factor(Body$Gender, levels = c(1,2),
labels = c("Male","Female")) #Assign correct labels
favstats(~ Body_temp | Gender,data = Body)
## Gender min Q1 median Q3 max mean sd n missing
## 1 Male 35.7 36.4 36.7 37.0 37.5 36.72615 0.3882158 65 0
## 2 Female 35.8 36.7 36.9 37.1 38.2 36.88923 0.4127359 65 0
ggplot(Body, aes(x = Gender, y = Body_temp, fill = Gender)) +
geom_boxplot() +
theme_classic() +
labs( title= "Comparative Box Plot")
Test the Homogeneity of Variance
Homogeneity of variance, or the assumption of equal variance, is tested using the Levene’s test. Not like other program, such SPSS, where this test is reporeted by default. The Levene’s test has the following statistical: hypotheses:
H0:σ21=σ22H0:σ12=σ22.
Ha:σ21≠σ22Ha:σ12≠σ22
where σ21σ12 and σ22σ22 refer to the population variance of group 1 and 2 respectively. The Levene’s test reports a p-value that is compared to the standard 0.050.05 significant level. WE can use the leveneTest()leveneTest() function in R to compare the variance of male and female body temperatures:
leveneTest(Body_temp ~ Gender, data = Body)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 0.0428 0.8365
## 128
The p-value for the Levene’s test of equal variance for body temperature between males and females was p=0.84p=0.84. WE find p>0.05p>0.05, therefore, we fail to reject H0H0 In plain language, we are safe to assume equal variance.
Two-sample t-test – Assuming Equal Variance
Let’s jump straight into R and perform a two-sample t-test assuming equal variance and a two-sided hypothesis test. We use the t.test()t.test() function.
t.test(Body_temp ~ Gender, data = Body, var.equal=TRUE,
alternative=”two.sided”, conf.level = 0.95) # conf.level = 0.95 by defult
## Two Sample t-test
## data: Body_temp by Gender
## t = -2.3204, df = 128, p-value = 0.0219
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.3021399 -0.0240139
## sample estimates:
## mean in group Male mean in group Female
## 36.72615 36.88923
Two-sample t-test – Assuming Unequal Variance
If you don’t specify var.equal in the t.test() function for R, the two-sample t-test not assuming equal variance is reported by default. This test is also known as the Welch two-sample t-test.
t.test(Body_temp ~ Gender, data = Body, var.equal=FALSE,
alternative=”two.sided”)
## Sample t-test
## data: Body_temp by Gender
## t = -2.3204, df = 127.52, p-value = 0.02191
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.30214491 -0.02400893
## sample estimates:
## mean in group Male mean in group Female
## 36.72615 36.88923
Because the variances were very similar between males and females, the adjusted test statistic t=−2.32t=−2.32 and df=128df=128 for the Welch test are actually the same as the equal variances assumed two-sample t-test. This is why some recommend that you should also use the Welch test. If the variances are unequal, the test will make the required adjustment. If not, the test will be similar to a regular two-sample t-test. This means you can skip testing equal variance using the Levene’s test. This might make things simpler, but understanding the difference between these two versions of the two-sample t-test will help to make this decision.
Two-sample t-test: Male is greater than Female
t.test(Body_temp ~ Gender, data = Body, var.equal=TRUE, alternative=”greater”)
## Two Sample t-test
## data: Body_temp by Gender
## t = -2.3204, df = 128, p-value = 0.989
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -0.2795217 Inf
## sample estimates:
## mean in group Male mean in group Female
## 36.72615 36.88923
Two-sample t-test: Male is less than Female
t.test(Body_temp ~ Gender, data = Body, var.equal=TRUE, alternative=”less”)
## Two Sample t-test
## data: Body_temp by Gender
## t = -2.3204, df = 128, p-value = 0.01095
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -0.04663214
## sample estimates:
## mean in group Male mean in group Female
## 36.72615 36.88923
Using the R base function
Option 2: The data are saved in two different numeric vectors:
# Save the data in two different vector
Male <- Body %>%
filter(Gender == “Male”) %>%
pull(Body_temp)
Female <- Body %>%
filter(Gender == “Female”) %>%
pull(Body_temp)
# Compute t-test
res <- t.test(Male, Female, var.equal=TRUE, alternative="less") ## Two Sample t-test ## data: Male and Female ## t = -2.3204, df = 128, p-value = 0.01095 ## alternative hypothesis: true difference in means is less than 0 ## 95 percent confidence interval: ## -Inf -0.04663214 ## sample estimates: ## mean of x mean of y ## 36.72615 36.88923 (c) Find a 95 percent confidence interval on the difference in means. Provide a practical interpretation of this interval. The two-sample t-test has the following statistical hypotheses: H0:μ1−μ2=0H0:μ1−μ2=0 Ha:μ1−μ2≠0Ha:μ1−μ2≠0 DescTools::MeanDiffCI(Male, Female , conf.level = 0.95, sides = c("two.sided")) # sides = c("two.sided", "left", "right") based on your hypothesis ## meandiff lwr.ci upr.ci ## -0.16307692 -0.30214491 -0.02400893 From the computer output, the 95%95% confidence interval is (−0.30214491,−0.02400893)(−0.30214491,−0.02400893). This confidence bound is not include 00; therefore, there is a difference in the body temperature of the two gender. Draw dot diagrams to assist in interpreting the results from this experiment.` ggplot(Body, aes(x=Gender, y=Body_temp , fill=Gender)) + geom_dotplot(binaxis='y', stackdir='center') + #stat_summary(fun.data=mean_sdl, fun.args = list(mult=1), # geom="pointrange", color="red") + scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9")) + labs(title = "Effect of gender on the body temperatures", subtitle = "Dotplot of body temperatures by gender", caption = "Data source: Link", x = "Gender", y = "Body Temperatures (in Celsius)", fill = "Body") ## Bin width defaults to 1/30 of the range of the data. Pick better value with `binwidth`. Check the assumption of normality of the body temperature for each gender. #define plotting region par(mfrow=c(1,2)) #create Q-Q plot for both datasets qqnorm(Male, main='Male') qqline(Male) qqnorm(Female, main='Female') qqline(Female) # Shapiro-Wilk normality test for Men's weights with(Body, shapiro.test(Body_temp[Gender == "Male"]))# p = 0.4818 ## Shapiro-Wilk normality test ## data: Body_temp[Gender == "Male"] ## W = 0.98238, p-value = 0.4818 # Shapiro-Wilk normality test for Women's weights with(Body, shapiro.test(Body_temp[Gender == "Female"])) # p = 0.03351 ## Shapiro-Wilk normality test ## data: Body_temp[Gender == "Female"] ## W = 0.95981, p-value = 0.03351 From the output, the p-values of Male are greater than the significance level α=0.05α=0.05 implying that the distribution of the data are not significantly different from the normal distribution. In other hand,the p-values of Female are less than the significance level α=0.05α=0.05 implying that the distribution of the data are significantly different from the normal distribution. Find the power of this test for detecting an actual difference in means of -0.25 °C. s<-sd_pooled(Male,Female) s # sd_pooled *(need to be checked) ## [1] 0.4006635 power.t.test(n = 65, delta = -0.25, sd = 0.40, sig.level = 0.05, power = NULL, type = c("two.sample"), alternative = c("two.sided"), strict = TRUE) ## Two-sample t test power calculation ## n = 65 ## delta = 0.25 ## sd = 0.4 ## sig.level = 0.05 ## power = 0.9425131 ## alternative = two.sided ## NOTE: n is number in *each* group What sample size would be necessary to detect an actual difference in means of 0.5 °C with a power of at least 0.85? power.t.test(n = NULL, delta = 0.5, sd = 0.40, sig.level = 0.05, power = 0.85, type = c("two.sample"), alternative = c("two.sided"), strict = TRUE) ## Two-sample t test power calculation ## n = 12.53213 ## delta = 0.5 ## sd = 0.4 ## sig.level = 0.05 ## power = 0.85 ## alternative = two.sided ## NOTE: n is number in *each* group This result makes intuitive sense. Less samples are needed to detect a bigger difference. 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com