title: ‘Lab session 4: Solutions’
author: “RMIT University-MATH1302”
date: “`r format(Sys.Date(), ‘%B %d, %Y’)`”
output: html_document
Copyright By PowCoder代写 加微信 powcoder
“`{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
**Library**
“`{r message=FALSE, warning=FALSE}
# List of packages required for this analysis
pkg <- c("mosaic", "ggplot2", "tidyverse", "car", "randtests", "stats", "dplyr", "agricolae","nlme","lme4", "emmeans", "crossdes", "AlgDesign", "DoE.base", "FrF2","SixSigma","ggfortify","nortest","lmtest", "DAAG", "reshape2", "Rmisc")
# Check if packages are not installed and assign the
# names of the packages not installed to the variable new.pkg
new.pkg <- pkg[!(pkg %in% installed.packages())]
# If there are any packages in the list that aren't installed,
# install them
if (length(new.pkg)>0) {
install.packages(new.pkg, repos = “http://cran.rstudio.com”)
“`{r message=FALSE, warning=FALSE}
# Load the packages into R
library(mosaic) # favstats()
library(ggplot2)
library(tidyverse) # summarise()
library(car) #ncvTest()
library(randtests) # runs.test()
library(stats) # TukeyHSD()
library(dplyr) # mutate() and gather()
library(agricolae) # LSD.test()
library(nlme)
library(lme4) # lmer()
library(emmeans) # emmeans()
library(crossdes) # find.BIB()
#library(multcomp)
library(AlgDesign) # gen.factorial()
library(DoE.base)
library(FrF2)
library(SixSigma)
library(ggfortify)
library(nortest)
library(lmtest)
library(DAAG) # press()
library(reshape2)
library(Rmisc) # summarySE()
# Question 1: Full Factorial $2^k$ Designs
##### A typical golfer uses the putter for about 40% on strokes. Improving one’s putting is a logical and perhaps simple way to improve a golf score. An experiment was conducted to study the effects of four factors on putting accuracy. The design factors are length of putt, type of putter, breaking putt versus straight putt, and level versus downhill putt. The response variable is distance from the ball to the centre of the cup after the ball comes to rest. One golfer performs the experiment, a 24 full factorial design with seven replicates was used, and all putts are made in random order. The results are shown in the next Table.
![](Q1Data.jpg)
##### Input the data into the software.
Design <- expand.grid(Length = gl(2, 1, labels = c("10", "30")),
Type = gl(2, 1, labels = c("Mallet", "Cavity back")),
Break = gl(2, 1, labels = c("Straight", "Breaking")),
Slope = gl(2, 1, labels = c("Level", "Downhill")))
# Replicates
Q1Data <- data.frame(repl = rep(1:7, each = 16),
rbind(Design))
# Add the Response
Q1Data$response <- c(10, 0, 4, 0, 0, 5, 6.5, 16.5, 4.5, 19.5, 15, 41.5, 8, 21.5, 0, 18,
18, 16.5, 6, 10, 0, 20.5, 18.5, 4.5, 18, 18, 16, 39, 4.5, 10.5, 0, 5,
14, 4.5, 1, 34, 18.5, 18, 7.5, 0, 14.5, 16, 8.5, 6.5, 6.5, 6.5, 0, 7,
12.5, 17.5, 14.5, 11, 19.5, 20, 6, 23.5, 10, 5.5, 0, 3.5, 10, 0, 4.5, 10,
19, 20.5, 12, 25.5, 16, 29.5, 0, 8, 0, 10, 0.5, 7, 13, 15.5, 1, 32.5,
16, 17.5, 14, 21.5, 15, 19, 10, 8, 17.5, 7, 9, 8.5, 41, 24, 4, 18.5,
18.5, 33, 5, 0, 11, 10, 0, 8, 6, 36, 3, 36, 14, 16, 6.5, 8 )
head(Q1Data)
**Export the data to csv file**
```{r eval=FALSE}
write.csv(Q1Data,"Q1Data.csv", row.names = TRUE) # to Export Data as a CSV file from RStudio
**Display the structure of the data.**
str(Q1Data)
##### a) Analyse the data from this experiment by fitting an ANOVA model with intercept, main effects, and interactions of any order. Which of the main effects are significantly affect putting performance?
1) We need a linear model of intercept, main effects, and interactions of any order.
ourModel <- lm(formula = response ~ Length + Type + Break + Slope +
Length * Type + Length * Break + Length * Slope + # Two ineractions
Type * Break + Type * Slope + Break * Slope +
Length * Type * Break + Length * Type * Slope + # three ineractions
Length * Break * Slope + Type * Break * Slope +
Length * Type * Break * Slope, # four ineractions
data = Q1Data)
anova(ourModel)
From the ANOVA table, it seems that only the main effects of Length and Type are significant. Note that if we are interested in seen the actual model and the coefficients we need to use the coded levels for the variables.
**Second alternative solution for question 1 (a). Two similar generations with the fac.design function.**
plan <- fac.design(nlevels = c(2,2,2,2), replications=7, randomize=FALSE, block.name="Replications", factor.names=c("Length","Type","Break","Slope"))
Add the response to the design.
response <- c(10, 0, 4, 0, 0, 5, 6.5, 16.5, 4.5, 19.5, 15, 41.5, 8, 21.5, 0, 18,
18, 16.5, 6, 10, 0, 20.5, 18.5, 4.5, 18, 18, 16, 39, 4.5, 10.5, 0, 5,
14, 4.5, 1, 34, 18.5, 18, 7.5, 0, 14.5, 16, 8.5, 6.5, 6.5, 6.5, 0, 7,
12.5, 17.5, 14.5, 11, 19.5, 20, 6, 23.5, 10, 5.5, 0, 3.5, 10, 0, 4.5, 10,
19, 20.5, 12, 25.5, 16, 29.5, 0, 8, 0, 10, 0.5, 7, 13, 15.5, 1, 32.5,
16, 17.5, 14, 21.5, 15, 19, 10, 8, 17.5, 7, 9, 8.5, 41, 24, 4, 18.5,
18.5, 33, 5, 0, 11, 10, 0, 8, 6, 36, 3, 36, 14, 16, 6.5, 8 )
plan$response<-response
head(plan)
##### (b) Analyse the residuals from this experiment. Any indications of model inadequacy?.
ourModel <- lm(formula = response ~ Length + Type + Break + Slope +
Length * Type + Length * Break + Length * Slope +
Type * Break + Type * Slope + Break * Slope +
Length * Type * Break + Length * Type * Slope +
Length * Break * Slope + Type * Break * Slope +
Length * Type * Break * Slope,
data = plan)
anova(ourModel)
Plotting the residuals
plot(ourModel, 1) # residuals versus fitted values
plot(ourModel, 2) # QQ plot
##### Check assumptions statistically:
- Residual Test:
**a. Equality of variances - homogeneity**
$H_0$: variances are equal
$H_1$: at least one variance is different
ncvTest(ourModel)
The corresponding p-value ($0.000021$) is less than $\alpha = 0.05$. Therefore, we reject the null hypothesis which means the model does not have a constant variance or "homogeneity".
**b. Normality of Residual**
$H_0$: Errors are normally distributed
$H_1$: Errors are not normally distributed
y = rstudent(ourModel)
shapiro.test(y)
The p-Value $0.0181$ is less than the significant level $\alpha = 0.05$. So, we conclude that we reject the null hypothesis. Therefore, there is statistically significant evident that the stochastic component of this model is not normally distributed.
**c. Auto Correlation Function:**
$H_0$: Errors are uncorrelated
$H_1$: Errors are correlated
- ACF test
acf(ourModel$residuals)
The ACF plot confirms that there are only a few lags exceeded the horizontal dashed line. Therefore, it seems that we might need to reject the null hypothesis which means we have autocorrelation on residual model.
durbinWatsonTest(ourModel)
The p-Value $0.19$ is much greater than the significant level $\alpha = 0.05$. So, we fail to reject the null hypothesis. Therefore, there is no statistically significant evident that the residuals of this model have autocorrelation.
**d. Randomness**
$H_0$: Errors are random
$H_1$: Errors are not random
runs.test(rstudent(ourModel))
The corresponding p-value $0.002386$ is less than the significant level $\alpha = 0.05$. Therefore, we reject null hypothesis. which means the errors on this model are not random.
The analysis of the residuals indicates that some of the assumptions are not satisfied.
# Question 2: Fractional Factorial Designs $2^{k-p}$
##### A replicated fractional factorial design is used to investigate the effect of five factors on the free height of leaf springs used in an automotive application. The factors are:
A = furnace temperature,
B = heating time,
C = transfer time,
D = hold down time, and
E = quench oil temperature
The collected data are as shown below
![](Q2Data.jpg)
##### Input the data into the software.
Q2Data=read.csv(file=”Q2Data2.csv”,header = TRUE)
head(Q2Data)
DesignAndResponse<-Q2Data[1:16,1:6]
head(DesignAndResponse)
##### a) We have 5 factors and 16 runs so that means this a half fraction so we are looking for just 1 defining relationship.
aliases(lm(FreeHeight ~ (.)^2, data = DesignAndResponse), code=TRUE)
We can see that the the 2fi are fully aliased as shown above. We can see that D=ABC.To write down all the aliased structure we set up to 5 fi in R and we get:
aliases(lm(FreeHeight ~ (.)^5, data = DesignAndResponse), code=TRUE)
We can see all the aliased relationships above (All generated from th D=ABC defining relationship).
##### (b) What is the resolution of this design?
The resolution of the design is the minimum number of letters in all defining relationships. Here we only have one defining relationship(generator), i.e D=ABC and thus the resolution of this design is IV.
##### (c) Analyse the data using the 3 replications of this experiment. What factors influence the free height?
Q2Model <- lm(formula = FreeHeight ~ A+B+C+D+E+A*B+A*C+A*D+A*E+B*E+C*E+D*E+A*B*E+A*C*E+A*D*E,
data = Q2Data)
anova(Q2Model)
No other factor interactions can be included as they are aliased with other effects (see the alias structure from part a)). The factors that influence the free height are A, B, D and E.
##### (d) Analyse the data using the mean of the free height. What factors influence the mean free height?
Generating the average response over the 3 replicates.
averesponse<- (Q2Data[1:16,6] + Q2Data[17:32,6] + Q2Data[33:48,6])/3
averesponse
Building the data including the design and the avarage of the responce over the 3 replicates.
DesignAndAveResponse <- Q2Data[1:16,1:5]
DesignAndAveResponse$y <- averesponse
head(DesignAndAveResponse)
Fitting the maximum model as this is determined by the alias structure of the design.
Q2Model2 <- lm(formula = y ~ A+B+C+D+E+A*B+A*C+A*D+A*E+B*E+C*E+D*E,
data = DesignAndAveResponse)
anova(Q2Model2)
The factors that influence the free height(y) are A, B and E. Factor D is no longer significant using a significant level a=5%. The best way to analise the data when we have replications is to apply the analysis keeping all the data over the replications and nor summing or averaging as we did here.
##### (e) Analyse the residuals from this experiment, and comment on your findings.
Q2Model <- lm(formula = FreeHeight ~ A+B+C+D+E+A*B+A*C+A*D+A*E+B*E+C*E+D*E,
data = Q2Data)
anova(Q2Model)
##### Check assumptions statistically:
- Residual Test:
Plots (4 in one)
autoplot(Q2Model)
**a. Equality of variances - homogeneity**
$H_0$: variances are equal
$H_1$: at least one variance is different
ncvTest(Q2Model)
The corresponding p-value ($0.011922$) is less than $\alpha = 0.05$. Therefore, we reject the null hypothesis which means the model does not have a constant variance or "homogeneity".
**b. Normality of Residual**
$H_0$: Errors are normally distributed
$H_1$: Errors are not normally distributed
shapiro.test(Q2Model$residuals)
ks.test(Q2Model$residuals,"pnorm")
ad.test(Q2Model$residuals)
The p-Values are less than the significant level $\alpha = 0.05$. So, we conclude that we reject the null hypothesis. Therefore, there is statistically significant evident that the stochastic component of this model is not normally distributed.
**c. Auto Correlation Function:**
$H_0$: Errors are uncorrelated
$H_1$: Errors are correlated
- ACF test
acf(Q2Model$residuals)
The ACF plot confirms that there are no lags exceeded the horizontal dashed line. Therefore, it seems that we fail to reject the null hypothesis which means we do not have autocorrelation on the residuals of the model.
durbinWatsonTest(Q2Model)
bgtest(Q2Model,1)
bgtest(Q2Model,2)
The p-Values are greater than the significant level $\alpha = 0.05$. So, we fail to reject the null hypothesis. Therefore, there is no statistically significant evident that the residuals of this model have autocorrelation.
**d. Randomness**
$H_0$: Errors are random
$H_1$: Errors are not random
runs.test(rstudent(Q2Model2))
The corresponding p-value $0.1205$ is not less than the significant level $\alpha = 0.05$. Therefore, we fail reject null hypothesis. which means the errors on this model are random.
Some of the assumptions are sutisfied and some are not.
##### (f) Based on the means of factors (main effects) and their interactions plots, what level of the coded factors A, B, C, D and E would you recommend using to maximise the response?
Based on the ANOVA table shown in part (d), We can reduce the model to be included only Factors A, B, E and the interaction between B and E. However, Factor D is not significant using a significance level a=5%, but at a=1% it is significant. So, we need to compare these two models before we plot them.
The reduced model ANOVA is shown below.
Q2Model2 <- lm(formula = y ~ A+B+E+B*E,
data = DesignAndAveResponse)
Q2Model3 <- lm(formula = y ~ A+B+D+E+B*E,
data = DesignAndAveResponse)
DAAG::press(Q2Model2) # library(DAAG)
DAAG::press(Q2Model3)
We can see that the value of $PRESS$ statistic is smaller for the model which include Factor D as explanatory variable, which means including Factor D to the model is recommended.
anova(Q2Model3)
summary(Q2Model3)
MEPlot(Q2Model3) # To plot the mean effect
IAPlot(Q2Model3) # To plot the interaction
The effects of A and D, when going from level -1 to level 1, is positive and the effect of B and E when going from 1 to -1 is positive so the levels we choose to maximize the response are A(+), B(-), D(+), E(-) to increase the free height of leaf springs. The interaction effect will follow from the settings of the main effects and we can see that when both B and E are negative and thus the interaction effect is positive(+).
##### (g) Is this the best possible design for five factors in 16 runs? Specifically, can you find a fractional design for five factors in 16 runs with a higher resolution than this one?
We will use th frf2 package from R. Note that frf2 can also be used for the construction of a full factorial design and for any fractional factorial design.
BestPlan<-FrF2(16,5)
design.info(BestPlan)
generators(BestPlan)
We can see that the default (best) design from FrF2 has resolution V and an alias defining relationship E=ABCD. Thus, (also mentioned in the info) the 2fi and the main effects are all free (no aliases between them) and estimable. That means we could fit a model with all the main effects and all the 2fi if we had used this fractional factorial design rather than the one given in the data. The model would look like:
Q2Model <- lm(formula = FreeHeight ~ A + B + C + D + E +
A*B + A*C + A*D + A*E + B*C +
B*D + B*E + C*D + C*E + D*E,
data = HypotheticalDataWithTheNewDesign)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com