Lab session 2: Solutions
Lab session 2: Solutions
Copyright By PowCoder代写 加微信 powcoder
RMIT University-MATH1302
23/03/2022
# List of packages required for this analysis
pkg <- c("mosaic", "ggplot2", "tidyverse", "car", "randtests", "stats", "dplyr", "agricolae")
# Check if packages are not installed and assign the
# names of the packages not installed to the variable new.pkg
new.pkg <- pkg[!(pkg %in% installed.packages())]
# If there are any packages in the list that aren't installed,
# install them
if (length(new.pkg)) {
install.packages(new.pkg, repos = "http://cran.rstudio.com")
# Load the packages into R
library(mosaic) # favstats()
library(ggplot2)
library(tidyverse) # summarise()
library(car) #ncvTest()
library(randtests) # runs.test()
library(stats) # TukeyHSD()
library(dplyr) # mutate()
library(agricolae) # LSD.test()
Question 1:
The effect of three different lubricating oils on fuel economy in diesel truck engines is being studied. Fuel economy is measured using brake-specific fuel consumption after the engine has been running for 15 minutes. Five different truck engines are available for the study, and the experimenters conduct the following randomized complete block design.
a) Create a data frame called “Fuel” in R to store the above data. Use str() command to display the structure of the data.
Insert the data frame.
Fuel<- data.frame(Oil = c(1,1,1,1,1,
2,2,2,2,2,
3,3,3,3,3),
Truck = c(1,2,3,4,5,
1,2,3,4,5,
1,2,3,4,5),
observations = c(0.500, 0.634, 0.487, 0.329, 0.512,
0.535, 0.675, 0.520, 0.435, 0.540,
0.513, 0.595, 0.488, 0.400, 0.510))
head(Fuel)
## Oil Truck observations
## 1 1 1 0.500
## 2 1 2 0.634
## 3 1 3 0.487
## 4 1 4 0.329
## 5 1 5 0.512
## 6 2 1 0.535
Export the data to the desktop
write.csv(Fuel,"/Users/abdulrahmanalamri/Desktop/Data/Fuel.csv", row.names = TRUE) # to Export DataFrame as a CSV file from RStudio to desktop
Display the structure of the data.
## 'data.frame': 15 obs. of 3 variables:
## $ Oil : num 1 1 1 1 1 2 2 2 2 2 ...
## $ Truck : num 1 2 3 4 5 1 2 3 4 5 ...
## $ observations: num 0.5 0.634 0.487 0.329 0.512 0.535 0.675 0.52 0.435 0.54 ...
b) Analyse the data from this experiment. State your hypotheses (use α = 5%) and draw conclusions.
First: we should treat the Oil and Teuck variables as factors.
Fuel <- Fuel %>% mutate(Oil = factor(Oil),
Truck = factor(Truck))
## ‘data.frame’: 15 obs. of 3 variables:
## $ Oil : Factor w/ 3 levels “1”,”2″,”3″: 1 1 1 1 1 2 2 2 2 2 …
## $ Truck : Factor w/ 5 levels “1”,”2″,”3″,”4″,..: 1 2 3 4 5 1 2 3 4 5 …
## $ observations: num 0.5 0.634 0.487 0.329 0.512 0.535 0.675 0.52 0.435 0.54 …
Second: state your hypothesis testing for the lubricating oils:
\(H_0: \tau_1= \tau_2= \tau_3= 0\).
\(H_a: at \hspace{0.1cm} least \hspace{0.1cm} one \hspace{0.1cm} \tau_i \neq 0\)
where \(\tau_i\) are the treatment effects of the three different lubricating oils.
Third: we might looks into the Descriptive Statistics:
Compute the mean and the SD (standard deviation) of the reaction time by different Ingredients:
group_by(Oil) %>%
summarise(n = n(),
mean = mean(observations, na.rm = TRUE),
sd = sd(observations, na.rm = TRUE),
stderr = sd/sqrt(n),
LCL = mean – qt(1 – (0.05 / 2), n – 1) * stderr,
UCL = mean + qt(1 – (0.05 / 2), n – 1) * stderr,
median = median(observations, na.rm = TRUE),
min = min(observations, na.rm = TRUE),
max = max(observations, na.rm = TRUE),
IQR = IQR(observations, na.rm = TRUE))
## # A tibble: 3 x 11
## Oil n mean sd stderr LCL UCL median min max IQR
##
## 1 1 5 0.492 0.109 0.0486 0.357 0.627 0.5 0.329 0.634 0.0250
## 2 2 5 0.541 0.0861 0.0385 0.434 0.648 0.535 0.435 0.675 0.0200
## 3 3 5 0.501 0.0697 0.0312 0.415 0.588 0.51 0.4 0.595 0.0250
Also, you can graphically check if there is a difference between the distribution using Box plot. Note that you include the jittered dot.
ggplot(Fuel, aes(x = Oil, y = observations, fill = Oil)) +
geom_boxplot() +
geom_jitter(shape = 15,
color = “steelblue”,
position = position_jitter(0.21)) +
theme_classic() +
labs( title= “Comparative Box Plot”)
Apply Complete Block design into the linear regression model.
model_1 <- lm(formula = observations ~ Oil + Truck, data=Fuel)
anova(model_1)
## Analysis of Variance Table
## Response: observations
## Df Sum Sq Mean Sq F value Pr(>F)
## Oil 2 0.006706 0.0033529 6.3527 0.02229 *
## Truck 4 0.092100 0.0230249 43.6257 1.781e-05 ***
## Residuals 8 0.004222 0.0005278
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
Here, we might use two different approach in order to conclude on the output.
First: F-test.
\(H_0\) is rejected if \(F_0 > F_{\alpha,p-1,(p-2)(p-1)}\)
\(⸪ \hspace{0.2cm} F_0 = MS_R/ MS_Res = 6.3527\), We assumed \(\alpha = 0.05\)
qf(p=.05, df1=2, df2=8, lower.tail=FALSE) # To find F critical value
## [1] 4.45897
\(⸪ \hspace{0.2cm} F_{\alpha,p-1,(p-2)(p-1)} = F_{0.05,6,15} = 4.45897\)
\(⸫ \hspace{0.2cm} F_0 \geq F_{\alpha,p-1,(p-2)(p-1)}\)
As \(F_0 \geq F_{\alpha,p-1,(p-2)(p-1)}\), we reject the null hypothesis which means that at least one of three different lubricating oils effects is significant.
Second: p-value.
As the ANOVA table shown, the p-value of Ingredients is \(0.02229\).So, we should reject the null hypothesis which mean the Oil factor is significant.
To obtain the estimates of the treatment effects we sould use Summary.
summary(model_1)
## lm(formula = observations ~ Oil + Truck, data = Fuel)
## Residuals:
## Min 1Q Median 3Q Max
## -0.039867 -0.008967 0.003133 0.010667 0.022333
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.496867 0.015694 31.660 1.08e-09 ***
## Oil2 0.048600 0.014530 3.345 0.010157 *
## Oil3 0.008800 0.014530 0.606 0.561529
## Truck2 0.118667 0.018758 6.326 0.000226 ***
## Truck3 -0.017667 0.018758 -0.942 0.373847
## Truck4 -0.128000 0.018758 -6.824 0.000135 ***
## Truck5 0.004667 0.018758 0.249 0.809795
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
## Residual standard error: 0.02297 on 8 degrees of freedom
## Multiple R-squared: 0.959, Adjusted R-squared: 0.9283
## F-statistic: 31.2 on 6 and 8 DF, p-value: 3.958e-05
\(\tau_1= \mu = 0.496867, \tau_2= 0.048600, \tau_3= 0.008800\).
c) Construct the confidence intervals for each factor (for each level). Comment on the significance of the effect of each level based on this interval.
confint(model_1)
## 2.5 % 97.5 %
## (Intercept) 0.46067644 0.53305689
## Oil2 0.01509436 0.08210564
## Oil3 -0.02470564 0.04230564
## Truck2 0.07541107 0.16192226
## Truck3 -0.06092226 0.02558893
## Truck4 -0.17125559 -0.08474441
## Truck5 -0.03858893 0.04792226
We can check whether \(0\) is contained in the \(95\%\) confidence interval of all the estimators. Since the \(95\%\) confidence interval of Oil2 dose not contain \(0\), we do reject the null hypothesis at the significance level of \(0.05\), which mean that different lubricating oils will have a significant effect on fuel economy in diesel truck engines
d) Use the Fisher LSD method to make comparisons among the three lubricating oils to determine specifically which oils differ in brake-specific fuel consumption.
Use Fisher’s test to perform pairwise comparison.
LSD.test(y = model_1,
trt = “Oil”,
DFerror = model$df.residual,
MSerror = deviance(model)/model$df.residual,
alpha = 0.05,
group = TRUE,
console = TRUE)
## Study: model_1 ~ “Oil”
## LSD t Test for observations
## Mean Square Error: 0.0005277833
## Oil, means and individual ( 95 %) CI
## observations std r LCL UCL Min Max
## 1 0.4924 0.10865220 5 0.4687079 0.5160921 0.329 0.634
## 2 0.5410 0.08612491 5 0.5173079 0.5646921 0.435 0.675
## 3 0.5012 0.06969720 5 0.4775079 0.5248921 0.400 0.595
## Alpha: 0.05 ; DF Error: 8
## Critical Value of t: 2.306004
## least Significant Difference: 0.03350564
## Treatments with the same letter are not significantly different.
## observations groups
## 2 0.5410 a
## 3 0.5012 b
## 1 0.4924 b
As the test outlines, means that any pairs who do not share a letter are significantly different. Based on Fisher’s LSD method, All Oils have significantly different means. A more concise table that shows each pair-wise comparison is also provided below:
LSD.test(y = model_1,
trt = “Oil”,
DFerror = model$df.residual,
MSerror = deviance(model)/model$df.residual,
alpha = 0.05,
group = FALSE,
console = TRUE)
## Study: model_1 ~ “Oil”
## LSD t Test for observations
## Mean Square Error: 0.0005277833
## Oil, means and individual ( 95 %) CI
## observations std r LCL UCL Min Max
## 1 0.4924 0.10865220 5 0.4687079 0.5160921 0.329 0.634
## 2 0.5410 0.08612491 5 0.5173079 0.5646921 0.435 0.675
## 3 0.5012 0.06969720 5 0.4775079 0.5248921 0.400 0.595
## Alpha: 0.05 ; DF Error: 8
## Critical Value of t: 2.306004
## Comparison between treatments means
## difference pvalue signif. LCL UCL
## 1 – 2 -0.0486 0.0102 * -0.08210564 -0.01509436
## 1 – 3 -0.0088 0.5615 -0.04230564 0.02470564
## 2 – 3 0.0398 0.0255 * 0.00629436 0.07330564
From these intervals we can deduce that Oil \(2\) is significantly different than machines \(1\) and \(3\).
The Fisher’s LSD method shows us that the pairs \(1 – 2\) and \(2 – 3\) have means that are significantly different.
e) Analyse the residuals from this experiment and comment on model adequacy.
par(mfrow=c(2,2))
plot(model_1)
The normal probability plot shows the residuals follow an approximately straight line and that there are no outliers, verifying that the residuals are normally distributed. The residuals versus fits plots display points that have no recognisable pattern that fall randomly above and below zero, suggesting that the residuals are randomly distributed and verifying that they have constant variance and are homoscedastic.
Check assumptions statistically:
Residual Test:
a. Equality of variances – homogeneity
\(H_0\): variances are equal
\(H_1\): at least one variance is different
ncvTest(model_1)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 1.77572, Df = 1, p = 0.18268
The corresponding p-value (\(0.18268\)) is more than \(\alpha = 0.05\). Therefore, we fail to reject the null hypothesis which means the model has a constant variance or “homogeneity”.
b. Normality of Residual
\(H_0\): Errors are normally distributed
\(H_1\): Errors are not normally distributed
y = rstudent(model_1)
shapiro.test(y)
## Shapiro-Wilk normality test
## data: y
## W = 0.82186, p-value = 0.007113
The p-Value \(0.007113\) is less than the significant level \(\alpha = 0.05\). So, we conclude that we do reject the null hypothesis test. Therefore, there is statistically significant evident that the stochastic component of this model is not normally distributed.
c. Auto Correlation Function:
\(H_0\): Errors are uncorrelated
\(H_1\): Errors are correlated
acf(model_1$residuals)
The ACF plot confirms that there is no any lags exceeded the horizontal dashed line. Therefore, we fail to reject the null hypothesis which means we have non-autocorrelation on residual model.
durbinWatsonTest(model_1)
## lag Autocorrelation D-W Statistic p-value
## 1 -0.3423932 2.682435 0.45
## Alternative hypothesis: rho != 0
The p-Value \(0.482\) is much greater than the significant level \(\alpha = 0.05\). So, we conclude not to reject the null hypothesis test. Therefore, there is statistically significant evident that this model is non autocorrelation.
d. Randomness
\(H_0\): Errors are random
\(H_1\): Errors are not random
runs.test(rstudent(model_1))
## Runs Test
## data: rstudent(model_1)
## statistic = 2.2254, runs = 12, n1 = 7, n2 = 7, n = 14, p-value =
## 0.02605
## alternative hypothesis: nonrandomness
The corresponding p-value \(0.02605\) is less than the significant level \(\alpha = 0.05\). Therefore, we reject null hypothesis. which means the independence of this model was nonrandomness.
Question 2:
The effect of five different ingredients (A, B, C, D, E) on the reaction time of a chemical process is being studied. Each batch of new material is only large enough to permit five runs to be made. Furthermore, each run requires approximately 1 and 1/2 hours, so only five runs can be made in one day. The experimenter decides to run the experiment as a Latin square so that day and batch effects may be systematically controlled. She obtains the data that follow.
Solution 2:
The aim of this question is to investigate the effect of five different ingredients (A, B, C, D, E) on the reaction time of a chemical process. Note that a Latin Square design of order \(t=5\) has been used in this experiment since we have two blocking factors and each treatment appears exactly one time in each row or column.
Note: A Latin square having the Latin letters sorted in the first row and column is called Normal or standard Latin square.
Insert the data frame.
Chemical<- data.frame(Batch = c(1,1,1,1,1,
2,2,2,2,2,
3,3,3,3,3,
4,4,4,4,4,
5,5,5,5,5), # rows
Day = c(1,2,3,4,5,
1,2,3,4,5,
1,2,3,4,5,
1,2,3,4,5,
1,2,3,4,5), # columns
Ingredients = c("A","B","D","C","E",
"C","E","A","D","B",
"B","A","C","E","D",
"D","C","E","B","A",
"E","D","B","A","C"), # Latin Letter
Time = c(8,7,1,7,3,
11,2,7,3,8,
4,9,10,1,5,
6,8,6,6,10,
4,2,3,8,8))
head(Chemical)
## Batch Day Ingredients Time
## 1 1 1 A 8
## 2 1 2 B 7
## 3 1 3 D 1
## 4 1 4 C 7
## 5 1 5 E 3
## 6 2 1 C 11
a) Use the ANOVA table to decide whether the five treatments are different. State your hypotheses and draw conclusions.
First: we should treat the Batch, Day and Ingredients variables as factors.
Chemical <- Chemical %>% mutate(Batch = factor(Batch),
Day = factor(Day),
Ingredients = factor(Ingredients))
str(Chemical)
## ‘data.frame’: 25 obs. of 4 variables:
## $ Batch : Factor w/ 5 levels “1”,”2″,”3″,”4″,..: 1 1 1 1 1 2 2 2 2 2 …
## $ Day : Factor w/ 5 levels “1”,”2″,”3″,”4″,..: 1 2 3 4 5 1 2 3 4 5 …
## $ Ingredients: Factor w/ 5 levels “A”,”B”,”C”,”D”,..: 1 2 4 3 5 3 5 1 4 2 …
## $ Time : num 8 7 1 7 3 11 2 7 3 8 …
Second: state your hypothesis testing for treatments.
\(H_0: \tau_1= \tau_2= \tau_3= \tau_4= \tau_5= 0\).
\(H_a: at \hspace{0.1cm} least \hspace{0.1cm} one \hspace{0.1cm} \tau_i \neq 0\)
where \(\tau_i\) are the treatment effects of the five different Ingredients.
Third: we might looks into the Descriptive Statistics:
Compute the mean and the SD (standard deviation) of the reaction time by different Ingredients:
Chemical %>%
group_by(Ingredients) %>%
summarise(n = n(),
mean = mean(Time, na.rm = TRUE),
sd = sd(Time, na.rm = TRUE),
stderr = sd/sqrt(n),
LCL = mean – qt(1 – (0.05 / 2), n – 1) * stderr,
UCL = mean + qt(1 – (0.05 / 2), n – 1) * stderr,
median = median(Time, na.rm = TRUE),
min = min(Time, na.rm = TRUE),
max = max(Time, na.rm = TRUE),
IQR = IQR(Time, na.rm = TRUE))
## # A tibble: 5 x 11
## Ingredients n mean sd stderr LCL UCL median min max IQR
##
## 1 A 5 8.4 1.14 0.510 6.98 9.82 8 7 10 1
## 2 B 5 5.6 2.07 0.927 3.03 8.17 6 3 8 3
## 3 C 5 8.8 1.64 0.735 6.76 10.8 8 7 11 2
## 4 D 5 3.4 2.07 0.927 0.825 5.97 3 1 6 3
## 5 E 5 3.2 1.92 0.860 0.812 5.59 3 1 6 2
Also, you can graphically check if there is a difference between the distribution using Box plot. Note that you include the jittered dot.
ggplot(Chemical, aes(x = Ingredients, y = Time, fill = Ingredients)) +
geom_boxplot() +
geom_jitter(shape = 15,
color = “steelblue”,
position = position_jitter(0.21)) +
theme_classic() +
labs( title= “Comparative Box Plot”)
Apply Latin Square design into the linear regression model.
model_2 <- lm(formula = Time ~ Batch + Day + Ingredients, data=Chemical)
anova(model_2)
## Analysis of Variance Table
## Response: Time
## Df Sum Sq Mean Sq F value Pr(>F)
## Batch 4 15.44 3.860 1.2345 0.3476182
## Day 4 12.24 3.060 0.9787 0.4550143
## Ingredients 4 141.44 35.360 11.3092 0.0004877 ***
## Residuals 12 37.52 3.127
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
Here, we might use two different approach in order to conclude on the output.
First: F-test.
\(H_0\) is rejected if \(F_0 > F_{\alpha,p-1,(p-2)(p-1)}\)
\(⸪ \hspace{0.2cm} F_0 = MS_R/ MS_Res = 11.3092\), We assumed \(\alpha = 0.05\)
qf(p=.05, df1=4, df2=12, lower.tail=FALSE) # To find F critical value
## [1] 3.259167
\(⸪ \hspace{0.2cm} F_{\alpha,p-1,(p-2)(p-1)} = F_{0.05,3,15} = 3.259167\)
\(⸫ \hspace{0.2cm} F_0 \geq F_{\alpha,p-1,(p-2)(p-1)}\)
As \(F_0 \geq F_{\alpha,p-1,(p-2)(p-1)}\), we reject the null hypothesis which means that at least one of the five Ingredient effects is significant.
Second: p-value.
As the ANOVA table shown, the p-value of Ingredients is \(0.0004877\).So, we should reject the null hypothesis which mean the Ingredient factor is significant.
To obtain the estimates of the treatment effects we sould use Summary.
summary(model_2)
## lm(formula = Time ~ Batch + Day + Ingredients, data = Chemical)
## Residuals:
## Min 1Q Median 3Q Max
## -2.24 -1.24 -0.24 0.96 2.36
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.440 1.275 6.619 2.47e-05 ***
## Batch2 1.000 1.118 0.894 0.388803
## Batch3 0.600 1.118 0.537 0.601409
## Batch4 2.000 1.118 1.788 0.098971 .
## Batch5 -0.200 1.118 -0.179 0.861049
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com