CS代写 MATH1302" - PowCoder代写

title: ‘Lab session 4: Solutions’
author: “RMIT University-MATH1302”
date: “`r format(Sys.Date(), ‘%B %d, %Y’)`”
output: html_document

“`{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

**Library**
“`{r message=FALSE, warning=FALSE}
# List of packages required for this analysis
pkg <- c("mosaic", "ggplot2", "tidyverse", "car", "randtests", "stats", "dplyr", "agricolae","nlme","lme4", "emmeans", "crossdes", "AlgDesign", "DoE.base", "FrF2","SixSigma","ggfortify","nortest","lmtest", "DAAG", "reshape2", "Rmisc") # Check if packages are not installed and assign the # names of the packages not installed to the variable new.pkg new.pkg <- pkg[!(pkg %in% installed.packages())] # If there are any packages in the list that aren't installed, # install them if (length(new.pkg)>0) {
install.packages(new.pkg, repos = “http://cran.rstudio.com”)

“`{r message=FALSE, warning=FALSE}
# Load the packages into R
library(mosaic) # favstats()
library(ggplot2)
library(tidyverse) # summarise()
library(car) #ncvTest()
library(randtests) # runs.test()
library(stats) # TukeyHSD()
library(dplyr) # mutate() and gather()
library(agricolae) # LSD.test()
library(nlme)
library(lme4) # lmer()
library(emmeans) # emmeans()
library(crossdes) # find.BIB()
#library(multcomp)
library(AlgDesign) # gen.factorial()
library(DoE.base)
library(FrF2)
library(SixSigma)
library(ggfortify)
library(nortest)
library(lmtest)
library(DAAG) # press()
library(reshape2)
library(Rmisc) # summarySE()

# Question 1: Full Factorial $2^k$ Designs

##### A typical golfer uses the putter for about 40% on strokes. Improving one’s putting is a logical and perhaps simple way to improve a golf score. An experiment was conducted to study the effects of four factors on putting accuracy. The design factors are length of putt, type of putter, breaking putt versus straight putt, and level versus downhill putt. The response variable is distance from the ball to the centre of the cup after the ball comes to rest. One golfer performs the experiment, a 24 full factorial design with seven replicates was used, and all putts are made in random order. The results are shown in the next Table.

![](Q1Data.jpg)

##### Input the data into the software.

Design <- expand.grid(Length = gl(2, 1, labels = c("10", "30")), Type = gl(2, 1, labels = c("Mallet", "Cavity back")), Break = gl(2, 1, labels = c("Straight", "Breaking")), Slope = gl(2, 1, labels = c("Level", "Downhill"))) # Replicates Q1Data <- data.frame(repl = rep(1:7, each = 16), rbind(Design)) # Add the Response Q1Data$response <- c(10, 0, 4, 0, 0, 5, 6.5, 16.5, 4.5, 19.5, 15, 41.5, 8, 21.5, 0, 18, 18, 16.5, 6, 10, 0, 20.5, 18.5, 4.5, 18, 18, 16, 39, 4.5, 10.5, 0, 5, 14, 4.5, 1, 34, 18.5, 18, 7.5, 0, 14.5, 16, 8.5, 6.5, 6.5, 6.5, 0, 7, 12.5, 17.5, 14.5, 11, 19.5, 20, 6, 23.5, 10, 5.5, 0, 3.5, 10, 0, 4.5, 10, 19, 20.5, 12, 25.5, 16, 29.5, 0, 8, 0, 10, 0.5, 7, 13, 15.5, 1, 32.5, 16, 17.5, 14, 21.5, 15, 19, 10, 8, 17.5, 7, 9, 8.5, 41, 24, 4, 18.5, 18.5, 33, 5, 0, 11, 10, 0, 8, 6, 36, 3, 36, 14, 16, 6.5, 8 ) head(Q1Data) **Export the data to csv file** ```{r eval=FALSE} write.csv(Q1Data,"Q1Data.csv", row.names = TRUE) # to Export Data as a CSV file from RStudio **Display the structure of the data.** str(Q1Data) ##### a) Analyse the data from this experiment by fitting an ANOVA model with intercept, main effects, and interactions of any order. Which of the main effects are significantly affect putting performance? 1) We need a linear model of intercept, main effects, and interactions of any order. ourModel <- lm(formula = response ~ Length + Type + Break + Slope + Length * Type + Length * Break + Length * Slope + # Two ineractions Type * Break + Type * Slope + Break * Slope + Length * Type * Break + Length * Type * Slope + # three ineractions Length * Break * Slope + Type * Break * Slope + Length * Type * Break * Slope, # four ineractions data = Q1Data) anova(ourModel) From the ANOVA table, it seems that only the main effects of Length and Type are significant. Note that if we are interested in seen the actual model and the coefficients we need to use the coded levels for the variables. **Second alternative solution for question 1 (a). Two similar generations with the fac.design function.** plan <- fac.design(nlevels = c(2,2,2,2), replications=7, randomize=FALSE, block.name="Replications", factor.names=c("Length","Type","Break","Slope")) Add the response to the design. response <- c(10, 0, 4, 0, 0, 5, 6.5, 16.5, 4.5, 19.5, 15, 41.5, 8, 21.5, 0, 18, 18, 16.5, 6, 10, 0, 20.5, 18.5, 4.5, 18, 18, 16, 39, 4.5, 10.5, 0, 5, 14, 4.5, 1, 34, 18.5, 18, 7.5, 0, 14.5, 16, 8.5, 6.5, 6.5, 6.5, 0, 7, 12.5, 17.5, 14.5, 11, 19.5, 20, 6, 23.5, 10, 5.5, 0, 3.5, 10, 0, 4.5, 10, 19, 20.5, 12, 25.5, 16, 29.5, 0, 8, 0, 10, 0.5, 7, 13, 15.5, 1, 32.5, 16, 17.5, 14, 21.5, 15, 19, 10, 8, 17.5, 7, 9, 8.5, 41, 24, 4, 18.5, 18.5, 33, 5, 0, 11, 10, 0, 8, 6, 36, 3, 36, 14, 16, 6.5, 8 ) plan$response<-response head(plan) ##### (b) Analyse the residuals from this experiment. Any indications of model inadequacy?. ourModel <- lm(formula = response ~ Length + Type + Break + Slope + Length * Type + Length * Break + Length * Slope + Type * Break + Type * Slope + Break * Slope + Length * Type * Break + Length * Type * Slope + Length * Break * Slope + Type * Break * Slope + Length * Type * Break * Slope, data = plan) anova(ourModel) Plotting the residuals plot(ourModel, 1) # residuals versus fitted values plot(ourModel, 2) # QQ plot ##### Check assumptions statistically: - Residual Test: **a. Equality of variances - homogeneity** $H_0$: variances are equal $H_1$: at least one variance is different ncvTest(ourModel) The corresponding p-value ($0.000021$) is less than $\alpha = 0.05$. Therefore, we reject the null hypothesis which means the model does not have a constant variance or "homogeneity". **b. Normality of Residual** $H_0$: Errors are normally distributed $H_1$: Errors are not normally distributed y = rstudent(ourModel) shapiro.test(y) The p-Value $0.0181$ is less than the significant level $\alpha = 0.05$. So, we conclude that we reject the null hypothesis. Therefore, there is statistically significant evident that the stochastic component of this model is not normally distributed. **c. Auto Correlation Function:** $H_0$: Errors are uncorrelated $H_1$: Errors are correlated - ACF test acf(ourModel$residuals) The ACF plot confirms that there are only a few lags exceeded the horizontal dashed line. Therefore, it seems that we might need to reject the null hypothesis which means we have autocorrelation on residual model. durbinWatsonTest(ourModel) The p-Value $0.19$ is much greater than the significant level $\alpha = 0.05$. So, we fail to reject the null hypothesis. Therefore, there is no statistically significant evident that the residuals of this model have autocorrelation. **d. Randomness** $H_0$: Errors are random $H_1$: Errors are not random runs.test(rstudent(ourModel)) The corresponding p-value $0.002386$ is less than the significant level $\alpha = 0.05$. Therefore, we reject null hypothesis. which means the errors on this model are not random. The analysis of the residuals indicates that some of the assumptions are not satisfied. # Question 2: Fractional Factorial Designs $2^{k-p}$ ##### A replicated fractional factorial design is used to investigate the effect of ﬁve factors on the free height of leaf springs used in an automotive application. The factors are: A = furnace temperature,
B = heating time,
C = transfer time,
D = hold down time, and
E = quench oil temperature
The collected data are as shown below

![](Q2Data.jpg)

##### Input the data into the software.

Q2Data=read.csv(file=”Q2Data2.csv”,header = TRUE)
head(Q2Data)
DesignAndResponse<-Q2Data[1:16,1:6] head(DesignAndResponse) ##### a) We have 5 factors and 16 runs so that means this a half fraction so we are looking for just 1 defining relationship. aliases(lm(FreeHeight ~ (.)^2, data = DesignAndResponse), code=TRUE) We can see that the the 2fi are fully aliased as shown above. We can see that D=ABC.To write down all the aliased structure we set up to 5 fi in R and we get: aliases(lm(FreeHeight ~ (.)^5, data = DesignAndResponse), code=TRUE) We can see all the aliased relationships above (All generated from th D=ABC defining relationship). ##### (b) What is the resolution of this design? The resolution of the design is the minimum number of letters in all defining relationships. Here we only have one defining relationship(generator), i.e D=ABC and thus the resolution of this design is IV. ##### (c) Analyse the data using the 3 replications of this experiment. What factors influence the free height? Q2Model <- lm(formula = FreeHeight ~ A+B+C+D+E+A*B+A*C+A*D+A*E+B*E+C*E+D*E+A*B*E+A*C*E+A*D*E, data = Q2Data) anova(Q2Model) No other factor interactions can be included as they are aliased with other effects (see the alias structure from part a)). The factors that influence the free height are A, B, D and E. ##### (d) Analyse the data using the mean of the free height. What factors inﬂuence the mean free height? Generating the average response over the 3 replicates. averesponse<- (Q2Data[1:16,6] + Q2Data[17:32,6] + Q2Data[33:48,6])/3 averesponse Building the data including the design and the avarage of the responce over the 3 replicates. DesignAndAveResponse <- Q2Data[1:16,1:5] DesignAndAveResponse$y <- averesponse head(DesignAndAveResponse) Fitting the maximum model as this is determined by the alias structure of the design. Q2Model2 <- lm(formula = y ~ A+B+C+D+E+A*B+A*C+A*D+A*E+B*E+C*E+D*E, data = DesignAndAveResponse) anova(Q2Model2) The factors that influence the free height(y) are A, B and E. Factor D is no longer significant using a significant level a=5%. The best way to analise the data when we have replications is to apply the analysis keeping all the data over the replications and nor summing or averaging as we did here. ##### (e) Analyse the residuals from this experiment, and comment on your ﬁndings. Q2Model <- lm(formula = FreeHeight ~ A+B+C+D+E+A*B+A*C+A*D+A*E+B*E+C*E+D*E, data = Q2Data) anova(Q2Model) ##### Check assumptions statistically: - Residual Test: Plots (4 in one) autoplot(Q2Model) **a. Equality of variances - homogeneity** $H_0$: variances are equal $H_1$: at least one variance is different ncvTest(Q2Model) The corresponding p-value ($0.011922$) is less than $\alpha = 0.05$. Therefore, we reject the null hypothesis which means the model does not have a constant variance or "homogeneity". **b. Normality of Residual** $H_0$: Errors are normally distributed $H_1$: Errors are not normally distributed shapiro.test(Q2Model$residuals) ks.test(Q2Model$residuals,"pnorm") ad.test(Q2Model$residuals) The p-Values are less than the significant level $\alpha = 0.05$. So, we conclude that we reject the null hypothesis. Therefore, there is statistically significant evident that the stochastic component of this model is not normally distributed. **c. Auto Correlation Function:** $H_0$: Errors are uncorrelated $H_1$: Errors are correlated - ACF test acf(Q2Model$residuals) The ACF plot confirms that there are no lags exceeded the horizontal dashed line. Therefore, it seems that we fail to reject the null hypothesis which means we do not have autocorrelation on the residuals of the model. durbinWatsonTest(Q2Model) bgtest(Q2Model,1) bgtest(Q2Model,2) The p-Values are greater than the significant level $\alpha = 0.05$. So, we fail to reject the null hypothesis. Therefore, there is no statistically significant evident that the residuals of this model have autocorrelation. **d. Randomness** $H_0$: Errors are random $H_1$: Errors are not random runs.test(rstudent(Q2Model2)) The corresponding p-value $0.1205$ is not less than the significant level $\alpha = 0.05$. Therefore, we fail reject null hypothesis. which means the errors on this model are random. Some of the assumptions are sutisfied and some are not. ##### (f) Based on the means of factors (main effects) and their interactions plots, what level of the coded factors A, B, C, D and E would you recommend using to maximise the response? Based on the ANOVA table shown in part (d), We can reduce the model to be included only Factors A, B, E and the interaction between B and E. However, Factor D is not significant using a significance level a=5%, but at a=1% it is significant. So, we need to compare these two models before we plot them. The reduced model ANOVA is shown below. Q2Model2 <- lm(formula = y ~ A+B+E+B*E, data = DesignAndAveResponse) Q2Model3 <- lm(formula = y ~ A+B+D+E+B*E, data = DesignAndAveResponse) DAAG::press(Q2Model2) # library(DAAG) DAAG::press(Q2Model3) We can see that the value of $PRESS$ statistic is smaller for the model which include Factor D as explanatory variable, which means including Factor D to the model is recommended. anova(Q2Model3) summary(Q2Model3) MEPlot(Q2Model3) # To plot the mean effect IAPlot(Q2Model3) # To plot the interaction The effects of A and D, when going from level -1 to level 1, is positive and the effect of B and E when going from 1 to -1 is positive so the levels we choose to maximize the response are A(+), B(-), D(+), E(-) to increase the free height of leaf springs. The interaction effect will follow from the settings of the main effects and we can see that when both B and E are negative and thus the interaction effect is positive(+). ##### (g) Is this the best possible design for ﬁve factors in 16 runs? Specifically, can you ﬁnd a fractional design for ﬁve factors in 16 runs with a higher resolution than this one? We will use th frf2 package from R. Note that frf2 can also be used for the construction of a full factorial design and for any fractional factorial design. BestPlan<-FrF2(16,5) design.info(BestPlan) generators(BestPlan) We can see that the default (best) design from FrF2 has resolution V and an alias defining relationship E=ABCD. Thus, (also mentioned in the info) the 2fi and the main effects are all free (no aliases between them) and estimable. That means we could fit a model with all the main effects and all the 2fi if we had used this fractional factorial design rather than the one given in the data. The model would look like: Q2Model <- lm(formula = FreeHeight ~ A + B + C + D + E + A*B + A*C + A*D + A*E + B*C + B*D + B*E + C*D + C*E + D*E, data = HypotheticalDataWithTheNewDesign) 程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts