24/03/2022, 14:43 Lab session 2: Solutions
Lab session 2: Solutions
RMIT University-MATH1302 23/03/2022
# List of packages required for this analysis
Copyright By PowCoder代写 加微信 powcoder
pkg <- c("mosaic", "ggplot2", "tidyverse", "car", "randtests", "stats", "dplyr", "agricolae")
# Check if packages are not installed and assign the
# names of the packages not installed to the variable new.pkg new.pkg <- pkg[!(pkg %in% installed.packages())]
# If there are any packages in the list that aren't installed, # install them
if (length(new.pkg)) {
install.packages(new.pkg, repos = "http://cran.rstudio.com")
# Load the packages into R library(mosaic) # favstats() library(ggplot2) library(tidyverse) # summarise() library(car) #ncvTest() library(randtests) # runs.test() library(stats) # TukeyHSD() library(dplyr) # mutate() library(agricolae) # LSD.test()
Question 1:
The effect of three different lubricating oils on fuel economy in diesel truck engines is being studied. Fuel economy is measured using brake-specific fuel consumption after the engine has been running for 15 minutes. Five different truck engines are available for the study, and the experimenters conduct the following randomized complete block design.
file:///D:/OneDrive - RMIT University/MyWork/Mathimata/RMIT/MATH1302/2022/Lab Sessions/LAB2/Lab-session-2---Solutions.html 1/22
24/03/2022, 14:43 Lab session 2: Solutions
a) Create a data frame called “Fuel” in R to store the above data. Use str() command to display the structure of the data.
Insert the data frame.
Fuel<- data.frame(Oil = c(1,1,1,1,1,
2,2,2,2,2,
head(Fuel)
3,3,3,3,3),
Truck = c(1,2,3,4,5,
1,2,3,4,5,
1,2,3,4,5),
observations = c(0.500, 0.634, 0.487, 0.329, 0.512,
0.535, 0.675, 0.520, 0.435, 0.540,
0.513, 0.595, 0.488, 0.400, 0.510))
## Oil Truck observations
## 1 1 1 0.500
## 2 1 2 0.634
## 3 1 3 0.487
## 4 1 4 0.329
## 5 1 5 0.512
## 6 2 1 0.535
Export the data to the desktop
Display the structure of the data.
b) Analyse the data from this experiment. State your hypotheses (use α = 5%) and draw conclusions.
First: we should treat the Oil and Teuck variables as factors.
write.csv(Fuel,"/Users/abdulrahmanalamri/Desktop/Data/Fuel.csv", row.names = TRUE) # to Expor t DataFrame as a CSV file from RStudio to desktop
## 'data.frame':
## $ Truck
## $ observations: num 0.5 0.634 0.487 0.329 0.512 0.535 0.675 0.52 0.435 0.54 ...
15 obs. of 3 variables:
: num 1 1 1 1 1 2 2 2 2 2 ...
: num 1 2 3 4 5 1 2 3 4 5 ...
Fuel <- Fuel %>% mutate(Oil = factor(Oil),
Truck = factor(Truck))
## ‘data.frame’:
## $ Truck
## $ observations: num 0.5 0.634 0.487 0.329 0.512 0.535 0.675 0.52 0.435 0.54 …
15 obs. of 3 variables:
: Factor w/ 3 levels “1”,”2″,”3″: 1 1 1 1 1 2 2 2 2 2 …
: Factor w/ 5 levels “1”,”2″,”3″,”4″,..: 1 2 3 4 5 1 2 3 4 5 …
Second: state your hypothesis testing for the lubricating oils:
file:///D:/OneDrive – RMIT University/MyWork/Mathimata/RMIT/MATH1302/2022/Lab Sessions/LAB2/Lab-session-2—Solutions.html 2/22
24/03/2022, 14:43 Lab session 2: Solutions
where are the treatment effects of the three different lubricating oils. Third: we might looks into the Descriptive Statistics:
Compute the mean and the SD (standard deviation) of the reaction time by different Ingredients:
group_by(Oil) %>%
summarise(n = n(),
mean = mean(observations, na.rm = TRUE),
sd = sd(observations, na.rm = TRUE),
stderr = sd/sqrt(n),
LCL = mean – qt(1 – (0.05 / 2), n – 1) * stderr,
UCL = mean + qt(1 – (0.05 / 2), n – 1) * stderr,
median = median(observations, na.rm = TRUE),
min = min(observations, na.rm = TRUE),
max = max(observations, na.rm = TRUE),
IQR = IQR(observations, na.rm = TRUE))
## # A tibble: 3 x 11
## Oil n mean sd stderr LCL UCL median min max IQR
##
5 0.492 0.109 0.0486 0.357 0.627 0.5 0.329 0.634 0.0250
5 0.541 0.0861 0.0385 0.434 0.648 0.535 0.435 0.675 0.0200
5 0.501 0.0697 0.0312 0.415 0.588 0.51 0.4 0.595 0.0250
Also, you can graphically check if there is a difference between the distribution using Box plot. Note that you include the jittered dot.
ggplot(Fuel, aes(x = Oil, y = observations, fill = Oil)) +
geom_boxplot() +
geom_jitter(shape = 15,
color = “steelblue”,
position = position_jitter(0.21)) +
theme_classic() +
labs( title= “Comparative Box Plot”)
file:///D:/OneDrive – RMIT University/MyWork/Mathimata/RMIT/MATH1302/2022/Lab Sessions/LAB2/Lab-session-2—Solutions.html 3/22
0 ≠ iτenotsaelta : aH 0= 3τ= 2τ= 1τ: 0H
24/03/2022, 14:43 Lab session 2: Solutions
Apply Complete Block design into the linear regression model.
model_1 <- lm(formula = observations ~ Oil + Truck, data=Fuel)
anova(model_1)
## Analysis of Variance Table
## Response: observations
## Residuals 8 0.004222 0.0005278
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Sum Sq Mean Sq F value Pr(>F)
2 0.006706 0.0033529 6.3527 0.02229 *
4 0.092100 0.0230249 43.6257 1.781e-05 ***
Here, we might use two different approach in order to conclude on the output. First: F-test.
, We assumed
is rejected if
qf(p=.05, df1=2, df2=8, lower.tail=FALSE) # To find F critical value
## [1] 4.45897
file:///D:/OneDrive – RMIT University/MyWork/Mathimata/RMIT/MATH1302/2022/Lab Sessions/LAB2/Lab-session-2—Solutions.html 4/22
79854.4 = 51,6,50.0F = )1−p()2−p(,1−p,αF
50.0 = α 7253.6 = seRSM/RSM = 0F )1−p()2−p(,1−p,αF > 0F 0H
24/03/2022, 14:43 Lab session 2: Solutions
As , we reject the null hypothesis which means that at least one of three different lubricating oils effects is significant.
Second: p-value.
As the ANOVA table shown, the p-value of Ingredients is which mean the Oil factor is significant.
To obtain the estimates of the treatment effects we sould use Summary.
.So, we should reject the null hypothesis
summary(model_1)
## lm(formula = observations ~ Oil + Truck, data = Fuel)
## Residuals:
## Min 1Q Median 3Q Max
## -0.039867 -0.008967 0.003133 0.010667 0.022333
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.496867 0.015694 31.660 1.08e-09 ***
## Truck3 -0.017667
## Truck4 -0.128000
## Truck5 0.004667
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
## Residual standard error: 0.02297 on 8 degrees of freedom
## Multiple R-squared: 0.959, Adjusted R-squared: 0.9283
## F-statistic: 31.2 on 6 and 8 DF, p-value: 3.958e-05
0.018758 -0.942 0.373847
0.018758 -6.824 0.000135 ***
0.018758 0.249 0.809795
3.345 0.010157 *
0.606 0.561529
6.326 0.000226 ***
c) Construct the confidence intervals for each factor (for each level). Comment on the significance of the effect of each level based on this interval.
confint(model_1)
## 2.5 % 97.5 %
## (Intercept) 0.46067644 0.53305689
0.01509436 0.08210564
-0.02470564 0.04230564
0.07541107 0.16192226
-0.06092226 0.02558893
-0.17125559 -0.08474441
-0.03858893 0.04792226
file:///D:/OneDrive – RMIT University/MyWork/Mathimata/RMIT/MATH1302/2022/Lab Sessions/LAB2/Lab-session-2—Solutions.html 5/22
008800.0 = 3τ ,006840.0 = 2τ ,768694.0 = μ = 1τ
)1−p()2−p(,1−p,αF ≥ 0F )1−p()2−p(,1−p,αF ≥ 0F
24/03/2022, 14:43 Lab session 2: Solutions
We can check whether is contained in the confidence interval of all the estimators. Since the confidence interval of Oil2 dose not contain , we do reject the null hypothesis at the significance level of , which mean that different lubricating oils will have a significant effect on fuel economy in diesel truck engines
d) Use the Fisher LSD method to make comparisons among the three lubricating oils to determine specifically which oils differ in brake-specific fuel consumption.
Use Fisher’s test to perform pairwise comparison.
LSD.test(y = model_1,
trt = “Oil”,
DFerror = model$df.residual,
MSerror = deviance(model)/model$df.residual,
alpha = 0.05,
group = TRUE,
console = TRUE)
## Study: model_1 ~ “Oil”
## LSD t Test for observations
## Mean Square Error: 0.0005277833
## Oil, means and individual ( 95 %) CI
## Alpha: 0.05 ; DF Error: 8
## Critical Value of t: 2.306004
## least Significant Difference: 0.03350564
## Treatments with the same letter are not significantly different.
observations std r LCL UCL Min Max
0.4924 0.10865220 5 0.4687079 0.5160921 0.329 0.634
0.5410 0.08612491 5 0.5173079 0.5646921 0.435 0.675
0.5012 0.06969720 5 0.4775079 0.5248921 0.400 0.595
observations groups
0.5410 a
0.5012 b
0.4924 b
As the test outlines, means that any pairs who do not share a letter are significantly different. Based on Fisher’s LSD method, All Oils have significantly different means. A more concise table that shows each pair- wise comparison is also provided below:
LSD.test(y = model_1,
trt = “Oil”,
DFerror = model$df.residual,
MSerror = deviance(model)/model$df.residual,
alpha = 0.05,
group = FALSE,
console = TRUE)
file:///D:/OneDrive – RMIT University/MyWork/Mathimata/RMIT/MATH1302/2022/Lab Sessions/LAB2/Lab-session-2—Solutions.html 6/22
24/03/2022, 14:43 Lab session 2: Solutions
From these intervals we can deduce that Oil is significantly different than machines and .
The Fisher’s LSD method shows us that the pairs and have means that are significantly different. e) Analyse the residuals from this experiment and comment on model adequacy.
par(mfrow=c(2,2))
plot(model_1)
## Study: model_1 ~ “Oil”
## LSD t Test for observations
## Mean Square Error: 0.0005277833
## Oil, means and individual ( 95 %) CI
## Alpha: 0.05 ; DF Error: 8
## Critical Value of t: 2.306004
## Comparison between treatments means
observations std r LCL UCL Min Max
0.4924 0.10865220 5 0.4687079 0.5160921 0.329 0.634
0.5410 0.08612491 5 0.5173079 0.5646921 0.435 0.675
0.5012 0.06969720 5 0.4775079 0.5248921 0.400 0.595
difference pvalue signif.
-0.0486 0.0102
-0.0088 0.5615
0.0398 0.0255
LCL UCL
* -0.08210564 -0.01509436
-0.04230564 0.02470564
* 0.00629436 0.07330564
file:///D:/OneDrive – RMIT University/MyWork/Mathimata/RMIT/MATH1302/2022/Lab Sessions/LAB2/Lab-session-2—Solutions.html 7/22
3−2 2−1 312
24/03/2022, 14:43 Lab session 2: Solutions
The normal probability plot shows the residuals follow an approximately straight line and that there are no outliers, verifying that the residuals are normally distributed. The residuals versus fits plots display points that have no recognisable pattern that fall randomly above and below zero, suggesting that the residuals are randomly distributed and verifying that they have constant variance and are homoscedastic.
Check assumptions statistically: Residual Test:
a. Equality of variances – homogeneity
: variances are equal
: at least one variance is different
The corresponding p-value ( ) is more than . Therefore, we fail to reject the null hypothesis which means the model has a constant variance or “homogeneity”.
b. Normality of Residual
: Errors are normally distributed
: Errors are not normally distributed
ncvTest(model_1)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 1.77572, Df = 1, p = 0.18268
file:///D:/OneDrive – RMIT University/MyWork/Mathimata/RMIT/MATH1302/2022/Lab Sessions/LAB2/Lab-session-2—Solutions.html 8/22
50.0 = α 86281.0
24/03/2022, 14:43 Lab session 2: Solutions
## Shapiro-Wilk normality test
## data: y
## W = 0.82186, p-value = 0.007113
The p-Value is less than the significant level . So, we conclude that we do reject the null hypothesis test. Therefore, there is statistically significant evident that the stochastic component of this model is not normally distributed.
c. Auto Correlation Function:
: Errors are uncorrelated : Errors are correlated
The ACF plot confirms that there is no any lags exceeded the horizontal dashed line. Therefore, we fail to reject the null hypothesis which means we have non-autocorrelation on residual model.
acf(model_1$residuals)
y = rstudent(model_1)
shapiro.test(y)
durbinWatsonTest(model_1)
file:///D:/OneDrive – RMIT University/MyWork/Mathimata/RMIT/MATH1302/2022/Lab Sessions/LAB2/Lab-session-2—Solutions.html 9/22
50.0 = α 311700.0
24/03/2022, 14:43 Lab session 2: Solutions
The p-Value is much greater than the significant level . So, we conclude not to reject the null hypothesis test. Therefore, there is statistically significant evident that this model is non autocorrelation.
d. Randomness
: Errors are random
: Errors are not random
runs.test(rstudent(model_1))
## Runs Test
## data: rstudent(model_1)
## statistic = 2.2254, runs = 12, n1 = 7, n2 = 7, n = 14, p-value =
## 0.02605
## alternative hypothesis: nonrandomness
The corresponding p-value is less than the significant level . Therefore, we reject null hypothesis. which means the independence of this model was nonrandomness.
Question 2:
The effect of five different ingredients (A, B, C, D, E) on the reaction time of a chemical process is being studied. Each batch of new material is only large enough to permit five runs to be made. Furthermore, each run requires approximately 1 and 1/2 hours, so only five runs can be made in one day. The experimenter decides to run the experiment as a Latin square so that day and batch effects may be systematically controlled. She obtains the data that follow.
Solution 2:
## lag Autocorrelation D-W Statistic p-value
## 1 -0.3423932 2.682435 0.45
## Alternative hypothesis: rho != 0
file:///D:/OneDrive – RMIT University/MyWork/Mathimata/RMIT/MATH1302/2022/Lab Sessions/LAB2/Lab-session-2—Solutions.html 10/22
50.0 = α 50620.0
24/03/2022, 14:43 Lab session 2: Solutions
The aim of this question is to investigate the effect of five different ingredients (A, B, C, D, E) on the reaction time of a chemical process. Note that a Latin Square design of order has been used in this experiment since we have two blocking factors and each treatment appears exactly one time in each row or column.
Note: A Latin square having the Latin letters sorted in the first row and column is called Normal or standard Latin square.
Insert the data frame.
Chemical<- data.frame(Batch = c(1,1,1,1,1,
2,2,2,2,2,
3,3,3,3,3,
4,4,4,4,4,
head(Chemical)
5,5,5,5,5), # rows Day = c(1,2,3,4,5,
1,2,3,4,5,
1,2,3,4,5,
1,2,3,4,5, 1,2,3,4,5), # columns
Ingredients = c("A","B","D","C","E",
"C","E","A","D","B",
"B","A","C","E","D",
"D","C","E","B","A",
"E","D","B","A","C"), # Latin Letter Time = c(8,7,1,7,3,
11,2,7,3,8,
4,9,10,1,5,
6,8,6,6,10,
4,2,3,8,8))
## Batch Day Ingredients Time
## 1 1 1 A 8
## 2 1 2 B 7
## 3 1 3 D 1
## 4 1 4 C 7
## 5 1 5 E 3
## 6 2 1 C 11
a) Use the ANOVA table to decide whether the five treatments are different. State your hypotheses and draw conclusions.
First: we should treat the Batch, Day and Ingredients variables as factors.
Chemical <- Chemical %>% mutate(Batch = factor(Batch),
Day = factor(Day),
str(Chemical)
Ingredients = factor(Ingredients))
## ‘data.frame’:
## $ Batch
## $ Ingredients: Factor w/ 5 levels “A”,”B”,”C”,”D”,..: 1 2 4 3 5 3 5 1 4 2 …
## $ Time : num 8 7 1 7 3 11 2 7 3 8 …
25 obs. of 4 variables:
: Factor w/ 5 levels “1”,”2″,”3″,”4″,..: 1 1 1 1 1 2 2 2 2 2 …
: Factor w/ 5 levels “1”,”2″,”3″,”4″,..: 1 2 3 4 5 1 2 3 4 5 …
file:///D:/OneDrive – RMIT University/MyWork/Mathimata/RMIT/MATH1302/2022/Lab Sessions/LAB2/Lab-session-2—Solutions.html 11/22
24/03/2022, 14:43 Lab session 2: Solutions
Second: state your hypothesis testing for treatments. .
where are the treatment effects of the five different Ingredients. Third: we might looks into the Descriptive Statistics:
Compute the mean and the SD (standard deviation) of the reaction time by different Ingredients:
Chemical %>%
group_by(Ingredients) %>%
summarise(n = n(),
mean = mean(Time, na.rm = TRUE),
sd = sd(Time, na.rm = TRUE),
stderr = sd/sqrt(n),
LCL = mean – qt(1 – (0.05 / 2), n – 1) * stderr,
UCL = mean + qt(1 – (0.05 / 2), n – 1) * stderr,
median = median(Time, na.rm = TRUE),
min = min(Time, na.rm = TRUE),
max = max(Time, na.rm = TRUE),
IQR = IQR(Time, na.rm = TRUE))
## # A tibble: 5 x 11
## Ingredients n mean sd stderr LCL UCL median min max IQR
##
5 8.4 1.14
5 5.6 2.07
5 8.8 1.64
5 3.4 2.07
5 3.2 1.92
0.510 6.98 9.82
0.927 3.03 8.17
0.735 6.76 10.8
0.927 0.825 5.97
0.860 0.812 5.59
8 7 10 1
6 3 8 3
8 7 11 2
3 1 6 3
3 1 6 2
Also, you can graphically check if there is a difference between the distribution using Box plot. Note that you include the jittered dot.
ggplot(Chemical, aes(x = Ingredients, y = Time, fill = Ingredients)) +
geom_boxplot() +
geom_jitter(shape = 15,
color = “steelblue”,
position = position_jitter(0.21)) +
theme_classic() +
labs( title= “Comparative Box Plot”)
file:///D:/OneDrive – RMIT University/MyWork/Mathimata/RMIT/MATH1302/2022/Lab Sessions/LAB2/Lab-session-2—Solutions.html 12/22
0 ≠ iτenotsaelta : aH 0= 5τ= 4τ= 3τ= 2τ= 1τ: 0H
24/03/2022, 14:43 Lab session 2: Solutions
Apply Latin Square design into the linear regression model.
model_2 <- lm(formula = Time ~ Batch + Day + Ingredients, data=Chemical)
anova(model_2)
## Analysis of Variance Table
## Response: Time
## Ingredients 4 141.44 35.360 11.3092 0.0004877 ***
## Residuals 12 37.52 3.127
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Sum Sq Mean Sq F value Pr(>F)
4 15.44 3.860 1.2345 0.3476182
4 12.24 3.060 0.9787 0.4550143
Here, we might use two different approach in order to conclude on the output. First: F-test.
, We assumed
is rejected if
qf(p=.05, df1=4, df2=12, lower.tail=FALSE) # To find F critical value
## [1] 3.259167
file:///D:/OneDrive – RMIT University/MyWork/Mathimata/RMIT/MATH1302/2022/Lab Sessions/LAB2/Lab-session-2—Solutions.html 13/22
761952.3 = 51,3,50.0F = )1−p()2−p(,1−p,αF
50.0 = α 2903.11 = seRSM/RSM = 0F )1−p()2−p(,1−p,αF > 0F 0H
24/03/2022, 14:43 Lab session 2: Solutions
As , we reject the null hypothesis which means that at least one of the five Ingredient effects is significant.
Second: p-value.
As the ANOVA table shown, the p-value of Ingredients is which mean the Ingredient factor is significant.
To obtain the estimates of the treatment effects we sould use Summary.
.So, we should
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com