title: ‘Lab session 2: Solutions’
author: “RMIT University-MATH1302”
date: “23/03/2022”
output: html_document
Copyright By PowCoder代写 加微信 powcoder
“`{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
**Library**
“`{r message=FALSE, warning=FALSE}
# List of packages required for this analysis
pkg <- c("mosaic", "ggplot2", "tidyverse", "car", "randtests", "stats", "dplyr", "agricolae")
# Check if packages are not installed and assign the
# names of the packages not installed to the variable new.pkg
new.pkg <- pkg[!(pkg %in% installed.packages())]
# If there are any packages in the list that aren't installed,
# install them
if (length(new.pkg)) {
install.packages(new.pkg, repos = "http://cran.rstudio.com")
# Load the packages into R
library(mosaic) # favstats()
library(ggplot2)
library(tidyverse) # summarise()
library(car) #ncvTest()
library(randtests) # runs.test()
library(stats) # TukeyHSD()
library(dplyr) # mutate()
library(agricolae) # LSD.test()
# Question 1:
##### The effect of three different lubricating oils on fuel economy in diesel truck engines is being studied. Fuel economy is measured using brake-specific fuel consumption after the engine has been running for 15 minutes. Five different truck engines are available for the study, and the experimenters conduct the following randomized complete block design.

##### a) Create a data frame called “Fuel” in R to store the above data. Use str() command to display the structure of the data.
**Insert the data frame.**
Fuel<- data.frame(Oil = c(1,1,1,1,1,
2,2,2,2,2,
3,3,3,3,3),
Truck = c(1,2,3,4,5,
1,2,3,4,5,
1,2,3,4,5),
observations = c(0.500, 0.634, 0.487, 0.329, 0.512,
0.535, 0.675, 0.520, 0.435, 0.540,
0.513, 0.595, 0.488, 0.400, 0.510))
head(Fuel)
**Export the data to the desktop**
```{r eval=FALSE}
write.csv(Fuel,"/Users/abdulrahmanalamri/Desktop/Data/Fuel.csv", row.names = TRUE) # to Export DataFrame as a CSV file from RStudio to desktop
**Display the structure of the data.**
##### b) Analyse the data from this experiment. State your hypotheses (use α = 5%) and draw conclusions.
- First: we should treat the Oil and Teuck variables as factors.
Fuel <- Fuel %>% mutate(Oil = factor(Oil),
Truck = factor(Truck))
– Second: state your hypothesis testing for the lubricating oils:
$H_0: \tau_1= \tau_2= \tau_3= 0$.
$H_a: at \hspace{0.1cm} least \hspace{0.1cm} one \hspace{0.1cm} \tau_i \neq 0$
where $\tau_i$ are the treatment effects of the three different lubricating oils.
– Third: we might looks into the Descriptive Statistics:
Compute the mean and the SD (standard deviation) of the reaction time by different Ingredients:
“`{r echo=TRUE}
group_by(Oil) %>%
summarise(n = n(),
mean = mean(observations, na.rm = TRUE),
sd = sd(observations, na.rm = TRUE),
stderr = sd/sqrt(n),
LCL = mean – qt(1 – (0.05 / 2), n – 1) * stderr,
UCL = mean + qt(1 – (0.05 / 2), n – 1) * stderr,
median = median(observations, na.rm = TRUE),
min = min(observations, na.rm = TRUE),
max = max(observations, na.rm = TRUE),
IQR = IQR(observations, na.rm = TRUE))
Also, you can graphically check if there is a difference between the distribution using Box plot. Note that you include the jittered dot.
“`{r echo=TRUE}
ggplot(Fuel, aes(x = Oil, y = observations, fill = Oil)) +
geom_boxplot() +
geom_jitter(shape = 15,
color = “steelblue”,
position = position_jitter(0.21)) +
theme_classic() +
labs( title= “Comparative Box Plot”)
##### Apply Complete Block design into the linear regression model.
model_1 <- lm(formula = observations ~ Oil + Truck, data=Fuel)
anova(model_1)
Here, we might use two different approach in order to conclude on the output.
- First: F-test.
$H_0$ is rejected if $F_0 > F_{\alpha,p-1,(p-2)(p-1)}$
$⸪ \hspace{0.2cm} F_0 = MS_R/ MS_Res = 6.3527$, We assumed $\alpha = 0.05$
qf(p=.05, df1=2, df2=8, lower.tail=FALSE) # To find F critical value
$⸪ \hspace{0.2cm} F_{\alpha,p-1,(p-2)(p-1)} = F_{0.05,6,15} = 4.45897$
$⸫ \hspace{0.2cm} F_0 \geq F_{\alpha,p-1,(p-2)(p-1)}$
As $F_0 \geq F_{\alpha,p-1,(p-2)(p-1)}$, we reject the null hypothesis which means that at least one of three different lubricating oils effects is significant.
– Second: p-value.
As the ANOVA table shown, the p-value of Ingredients is $0.02229$.So, we should reject the null hypothesis which mean the Oil factor is significant.
###### To obtain the estimates of the treatment effects we sould use Summary.
summary(model_1)
$\tau_1= \mu = 0.496867, \tau_2= 0.048600, \tau_3= 0.008800$.
##### c) Construct the confidence intervals for each factor (for each level). Comment on the significance of the effect of each level based on this interval.
confint(model_1)
We can check whether $0$ is contained in the $95\%$ confidence interval of all the estimators. Since the $95\%$ confidence interval of Oil2 dose not contain $0$, we do reject the null hypothesis at the significance level of $0.05$, which mean that different lubricating oils will have a significant effect on fuel economy in diesel truck engines
##### d) Use the Fisher LSD method to make comparisons among the three lubricating oils to determine specifically which oils differ in brake-specific fuel consumption.
**Use Fisher’s test to perform pairwise comparison**.
LSD.test(y = model_1,
trt = “Oil”,
DFerror = model$df.residual,
MSerror = deviance(model)/model$df.residual,
alpha = 0.05,
group = TRUE,
console = TRUE)
As the test outlines, means that any pairs who do not share a letter are significantly different. Based on Fisher’s LSD method, All Oils have significantly different means. A more concise table that shows each pair-wise comparison is also provided below:
LSD.test(y = model_1,
trt = “Oil”,
DFerror = model$df.residual,
MSerror = deviance(model)/model$df.residual,
alpha = 0.05,
group = FALSE,
console = TRUE)
From these intervals we can deduce that Oil $2$ is significantly different than machines $1$ and $3$.
The Fisher’s LSD method shows us that the pairs $1 – 2$ and $2 – 3$ have means that are significantly different.
##### e) Analyse the residuals from this experiment and comment on model adequacy.
par(mfrow=c(2,2))
plot(model_1)
The normal probability plot shows the residuals follow an approximately straight line and that there are no outliers, verifying that the residuals are normally distributed.
The residuals versus fits plots display points that have no recognisable pattern that fall randomly above and below zero, suggesting that the residuals are randomly distributed and verifying that they have constant variance and are homoscedastic.
##### Check assumptions statistically:
– Residual Test:
**a. Equality of variances – homogeneity**
$H_0$: variances are equal
$H_1$: at least one variance is different
ncvTest(model_1)
The corresponding p-value ($0.18268$) is more than $\alpha = 0.05$. Therefore, we fail to reject the null hypothesis which means the model has a constant variance or “homogeneity”.
**b. Normality of Residual**
$H_0$: Errors are normally distributed
$H_1$: Errors are not normally distributed
y = rstudent(model_1)
shapiro.test(y)
The p-Value $0.007113$ is less than the significant level $\alpha = 0.05$. So, we conclude that we do reject the null hypothesis test. Therefore, there is statistically significant evident that the stochastic component of this model is not normally distributed.
**c. Auto Correlation Function:**
$H_0$: Errors are uncorrelated
$H_1$: Errors are correlated
– ACF test
acf(model_1$residuals)
The ACF plot confirms that there is no any lags exceeded the horizontal dashed line. Therefore, we fail to reject the null hypothesis which means we have non-autocorrelation on residual model.
durbinWatsonTest(model_1)
The p-Value $0.482$ is much greater than the significant level $\alpha = 0.05$. So, we conclude not to reject the null hypothesis test. Therefore, there is statistically significant evident that this model is non autocorrelation.
**d. Randomness**
$H_0$: Errors are random
$H_1$: Errors are not random
runs.test(rstudent(model_1))
The corresponding p-value $0.02605$ is less than the significant level $\alpha = 0.05$. Therefore, we reject null hypothesis. which means the independence of this model was nonrandomness.
————————————————————————-
# Question 2:
##### The effect of five different ingredients (A, B, C, D, E) on the reaction time of a chemical process is being studied. Each batch of new material is only large enough to permit five runs to be made. Furthermore, each run requires approximately 1 and 1/2 hours, so only five runs can be made in one day. The experimenter decides to run the experiment as a Latin square so that day and batch effects may be systematically controlled. She obtains the data that follow.

## Solution 2:
The aim of this question is to investigate the effect of five different ingredients (A, B, C, D, E) on the reaction time of a chemical process. Note that a Latin Square design of order $t=5$ has been used in this experiment since we have two blocking factors and each treatment appears exactly one time in each row or column.
Note: A Latin square having the Latin letters sorted in the first row and column is called Normal or standard Latin square.
#### Insert the data frame.
Chemical<- data.frame(Batch = c(1,1,1,1,1,
2,2,2,2,2,
3,3,3,3,3,
4,4,4,4,4,
5,5,5,5,5), # rows
Day = c(1,2,3,4,5,
1,2,3,4,5,
1,2,3,4,5,
1,2,3,4,5,
1,2,3,4,5), # columns
Ingredients = c("A","B","D","C","E",
"C","E","A","D","B",
"B","A","C","E","D",
"D","C","E","B","A",
"E","D","B","A","C"), # Latin Letter
Time = c(8,7,1,7,3,
11,2,7,3,8,
4,9,10,1,5,
6,8,6,6,10,
4,2,3,8,8))
head(Chemical)
```{r eval=FALSE, include=FALSE}
write.csv(Chemical,"/Users/abdulrahmanalamri/Desktop/Data/Chemical.csv", row.names = TRUE) # to Export DataFrame to CSV in R
##### a) Use the ANOVA table to decide whether the five treatments are different. State your hypotheses and draw conclusions.
- First: we should treat the Batch, Day and Ingredients variables as factors.
Chemical <- Chemical %>% mutate(Batch = factor(Batch),
Day = factor(Day),
Ingredients = factor(Ingredients))
str(Chemical)
– Second: state your hypothesis testing for treatments.
$H_0: \tau_1= \tau_2= \tau_3= \tau_4= \tau_5= 0$.
$H_a: at \hspace{0.1cm} least \hspace{0.1cm} one \hspace{0.1cm} \tau_i \neq 0$
where $\tau_i$ are the treatment effects of the five different Ingredients.
– Third: we might looks into the Descriptive Statistics:
Compute the mean and the SD (standard deviation) of the reaction time by different Ingredients:
“`{r echo=TRUE}
Chemical %>%
group_by(Ingredients) %>%
summarise(n = n(),
mean = mean(Time, na.rm = TRUE),
sd = sd(Time, na.rm = TRUE),
stderr = sd/sqrt(n),
LCL = mean – qt(1 – (0.05 / 2), n – 1) * stderr,
UCL = mean + qt(1 – (0.05 / 2), n – 1) * stderr,
median = median(Time, na.rm = TRUE),
min = min(Time, na.rm = TRUE),
max = max(Time, na.rm = TRUE),
IQR = IQR(Time, na.rm = TRUE))
Also, you can graphically check if there is a difference between the distribution using Box plot. Note that you include the jittered dot.
“`{r echo=TRUE}
ggplot(Chemical, aes(x = Ingredients, y = Time, fill = Ingredients)) +
geom_boxplot() +
geom_jitter(shape = 15,
color = “steelblue”,
position = position_jitter(0.21)) +
theme_classic() +
labs( title= “Comparative Box Plot”)
##### Apply Latin Square design into the linear regression model.
model_2 <- lm(formula = Time ~ Batch + Day + Ingredients, data=Chemical)
anova(model_2)
Here, we might use two different approach in order to conclude on the output.
- First: F-test.
$H_0$ is rejected if $F_0 > F_{\alpha,p-1,(p-2)(p-1)}$
$⸪ \hspace{0.2cm} F_0 = MS_R/ MS_Res = 11.3092$, We assumed $\alpha = 0.05$
qf(p=.05, df1=4, df2=12, lower.tail=FALSE) # To find F critical value
$⸪ \hspace{0.2cm} F_{\alpha,p-1,(p-2)(p-1)} = F_{0.05,3,15} = 3.259167$
$⸫ \hspace{0.2cm} F_0 \geq F_{\alpha,p-1,(p-2)(p-1)}$
As $F_0 \geq F_{\alpha,p-1,(p-2)(p-1)}$, we reject the null hypothesis which means that at least one of the five Ingredient effects is significant.
– Second: p-value.
As the ANOVA table shown, the p-value of Ingredients is $0.0004877$.So, we should reject the null hypothesis which mean the Ingredient factor is significant.
###### To obtain the estimates of the treatment effects we sould use Summary.
summary(model_2)
$\tau_1= \mu = 8.440, \tau_2= -2.800, \tau_3= 0.400, \tau_4= -5.000 and \tau_5= -5.200$.
##### b) Are the blocking factors significant?
state your hypothesis testing for blocking
$H_0: \tau_1= \tau_2= \tau_3= \tau_4= \tau_5= 0$.
$H_a: at \hspace{0.1cm} least \hspace{0.1cm} one \hspace{0.1cm} \tau_i \neq 0$
where $\tau_i$ are the blocking factors for (Batch and Day).
anova(model_2)
##### c) Compare all pairs of treatment means by using either the Fisher or the Tukey tests.
###### Use Tukey’s test to perform pairwise comparison.
Tukey2<-aov(formula = Time ~ Batch + Day + Ingredients, data=Chemical)
Tuk1<-TukeyHSD(Tukey2)
Tuk1$Ingredients
The Tukey method shows us that there is no significantly different between Batch and Day levels while between Ingredients there is significantly different between their levels. The Ingredient $A$ is significantly different than Ingredient $D$ and $E$ and the Ingredient $C$ is significantly different than Ingredient $D$ and $E$, while the Ingredient $B$ is not significantly different than any other Ingredients, and the Ingredient $D$ is not significantly different than the Ingredient $E$. See plot below.
#par(mfrow=c(3,1))
plot(Tuk1)
###### Use Fisher’s test to perform pairwise comparison.
LSD.test(y = model_2,
trt = "Ingredients",
DFerror = model$df.residual,
MSerror = deviance(model)/model$df.residual,
alpha = 0.05,
group = TRUE,
console = TRUE)
As the test outlines, means that any pairs who do not share a letter are significantly different. Based on Fisher's LSD method, All Ingredients have significantly different means. A more concise table that shows each pair-wise comparison is also provided below:
LSD.test(y = model_2,
trt = "Ingredients",
DFerror = model$df.residual,
MSerror = deviance(model)/model$df.residual,
alpha = 0.05,
group = FALSE,
console = TRUE)
From these intervals we can deduce that Ingredients $D$ is significantly different than machines $A$, $B$ and $C$ which all have similar mean units produced.
The Fisher's LSD method shows us that the pairs $A - B$, $A - D$, $A - E$, $B - C$, $C - D$ and $C - E$ have means that are significantly different. Unlike the Tukey's test, the Fisher's LSD Method includes the pairs $A - B$ and $B - C$ at a 95% significance level. This distinction is important when using Fisher’s LSD Method, the family error rate is inflated, (i.e.judging a pair of means to be significantly different when they are not, which increases rapidly as the number of means increases) and this is the cause for the type $I$ error that was displayed in the output for the $A - B$ and $B - C$ pairs.
##### d) Suppose that there is an additional source of variation. A fourth factor in five levels, workplace (α,β,γ,δ,ε), needs to be considered.
- *What design should we employed and how the new experiment will look like.*
Since we want to block in three directions instead of two directions, we should use the Graeco-Latin square design in order to control systematically three sources of the extraneous variability. The design allows investigation of four factors (rows, columns, Latin letters and Greek letters). So, we will consider $5 \times 5$ Graeco-Latin square which has the similar analysis of variance to that of a Latin square. Because the Greek letters appear exactly once in each row and column and exactly once with each Latin letter, the factor represented by the Greek letters is orthogonal to rows, columns, and Latin letter treatments. Therefore, a sum of squares due to the Greek letter factor may be computed from Greek letter totals, and the experimental error is further reduced by this amount. The expeiment is shown below.

- *Test whether the new factor is significant in this new design.*
Insert the data frame again.
Chemical_1<- data.frame(Batch = c(1,1,1,1,1,
2,2,2,2,2,
3,3,3,3,3,
4,4,4,4,4,
5,5,5,5,5), # rows
Day = c(1,2,3,4,5,
1,2,3,4,5,
1,2,3,4,5,
1,2,3,4,5,
1,2,3,4,5), # columns
Ingredients = c("A","B","D","C","E",
"C","E","A","D","B",
"B","A","C","E","D",
"D","C","E","B","A",
"E","D","B","A","C"), # Latin Letter
Workplace = c("alpha","delta","beta","epsilon","gamma",
"beta","epsilon","gamma","alpha","delta",
"gamma","alpha","delta","beta","epsilon",
"delta","beta","epsilon","gamma","alpha",
"epsilon","gamma","alpha","delta","beta"), # Greek Letter
Time = c(8,7,1,7,3,
11,2,7,3,8,
4,9,10,1,5,
6,8,6,6,10,
4,2,3,8,8))
head(Chemical_1)
- First: treat variables as factors.
Chemical_1 <- Chemical_1 %>% mutate(Batch = factor(Batch),
Day = factor(Day),
Ingredients = factor(Ingredients),
Workplace = factor(Workplace))
– Second: state your hypothesis testing for Workplace factor:
$H_0: \tau_1= \tau_2= \tau_3= \tau_4= \tau_5= 0$.
$H_a: at \hspace{0.1cm} least \hspace{0.1cm} one \hspace{0.1cm} \tau_i \neq 0$
where $\tau_i$ are the treatment effects of the five different Workplace,
– Third: Apply Latin Square design into the line
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com