代写代考 MATH1302 20/04/2022

Lab session 4: Solutions
RMIT University-MATH1302 20/04/2022
# List of packages required for this analysis
pkg <- c("mosaic", "ggplot2", "tidyverse", "car", "randtests", "stats", "dplyr", "agricolae","nlme","lme4", "emmeans", "cros sdes", "AlgDesign", "DoE.base", "FrF2","SixSigma","ggfortify","nortest","lmtest") Copyright By PowCoder代写 加微信 powcoder

# Check if packages are not installed and assign the
# names of the packages not installed to the variable new.pkg new.pkg <- pkg[!(pkg %in% installed.packages())] # If there are any packages in the list that aren't installed, # install them if (length(new.pkg)>0) {
install.packages(new.pkg, repos = “http://cran.rstudio.com”) }
# Load the packages into R library(mosaic) # favstats() library(ggplot2) library(tidyverse) # summarise() library(car) #ncvTest() library(randtests) # runs.test() library(stats) # TukeyHSD() library(dplyr) # mutate() library(agricolae) # LSD.test() library(nlme)
library(lme4) # lmer() library(emmeans) # emmeans() library(crossdes) # find.BIB() #library(multcomp) library(AlgDesign) # gen.factorial() library(DoE.base)
library(FrF2) library(SixSigma) library(ggfortify) library(nortest) library(lmtest)
Question 1: Full Factorial Designs
A typical golfer uses the putter for about 40% on strokes. Improving one’s putting is a logical and perhaps simple way to improve a golf score. An experiment was conducted to study the effects of four factors on putting accuracy. The design factors are length of putt, type of putter, breaking putt versus straight putt, and level versus downhill putt. The response variable is distance from the ball to the centre of the cup after the ball comes to rest. One golfer performs the experiment, a 24 full factorial design with seven replicates was used, and all putts are made in random order. The results are shown in the next Table.
Input the data into the software.

Export the data to csv file
Display the structure of the data.
a) (a) Analyse the data from this experiment by fitting an ANOVA model with intercept, main effects, and interactions of any order. Which of the main effects are significantly affect putting performance?
1. We need a linear model of intercept, main effects, and interactions of any order.
write.csv(Q1Data,”Q1Data.csv”, row.names = TRUE) # to Export Data as a CSV file from RStudio
str(Q1Data)
## ‘data.frame’: 112 obs. of 6 variables:
## $ Length
## $ Break
## $ Slope
## $ response: num 10 0 4 0 0 5 6.5 16.5 4.5 19.5 …
: int 1 1 1 1 1 1 1 1 1 1 …
: Factor w/ 2 levels “10”,”30″: 1 2 1 2 1 2 1 2 1 2 …
: Factor w/ 2 levels “Mallet”,”Cavity back”: 1 1 2 2 1 1 2 2 1 1 … : Factor w/ 2 levels “Straight”,”Breaking”: 1 1 1 1 2 2 2 2 1 1 … : Factor w/ 2 levels “Level”,”Downhill”: 1 1 1 1 1 1 1 1 2 2 …
ourModel <- lm(formula = response ~ Length + Type + Break + Slope + Length * Type + Length * Break + Length * Slope + anova(ourModel) Type * Break + Type * Slope + Break * Slope + Length * Type * Break + Length * Type * Slope + Length * Break * Slope + Type * Break * Slope + Length * Type * Break * Slope, data = Q1Data) ## Analysis of Variance Table ## ## Response: response ## Df 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## Residuals 96 Sum Sq 917.1 388.1 145.1 1.4 218.7 11.9 93.8 115.0 56.4 1.6 7.3 113.0 39.5 33.8 95.6 8315.8 Mean Sq 917.15 388.15 145.15 1.40 218.68 11.90 93.81 115.02 56.43 1.63 7.25 113.00 39.48 33.77 95.65 86.62 F value 10.5878 4.4809 1.6756 0.0161 2.5245 0.1373 1.0829 1.3278 0.6515 0.0188 0.0837 1.3045 0.4558 0.3899 1.1042 Pr(>F) 0.001572 ** 0.036862 * 0.198615 0.899280 0.115377 0.711776 0.300658 0.252054 0.421588 0.891271 0.772939 0.256228 0.501207 0.533858 0.295994
## Length:Type
## Length:Break
## Length:Slope
## Type:Break
## Type:Slope
## Break:Slope
## Length:Type:Break
## Length:Type:Slope
## Length:Break:Slope
## Type:Break:Slope
## Length:Type:Break:Slope
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
SSE=sum(ourModel$residuals**2) SSE
## [1] 8315.786
From the ANOVA table, it seems that only the main effects of Length and Type are significant. Note that if we are interested in seen the actual model and the coefficients we need to use the coded levels for the variables.
Second alternative solution for question 1 (a). Two similar generations with the fac.design function.
plan <- fac.design(nlevels = c(2,2,2,2), replications=7, randomize=FALSE, block.name="Replications", factor.names=c("Length" ,"Type","Break","Slope")) Design <- expand.grid(Length = gl(2, 1, labels = c("10", "30")), Type = gl(2, 1, labels = c("Mallet", "Cavity back")), Break = gl(2, 1, labels = c("Straight", "Breaking")), Slope = gl(2, 1, labels = c("Level", "Downhill"))) # Replicates Q1Data <- data.frame(repl = rep(1:7, each = 16), rbind(Design)) # Add the Response Q1Data$response <- c(10, 0, 4, 0, 0, 5, 6.5, 16.5, 4.5, 19.5, 15, 41.5, 8, 21.5, 0, 18, 18, 16.5, 6, 10, 0, 20.5, 18.5, 4.5, 18, 18, 16, 39, 4.5, 10.5, 0, 5, 14, 4.5, 1, 34, 18.5, 18, 7.5, 0, 14.5, 16, 8.5, 6.5, 6.5, 6.5, 0, 7, 12.5, 17.5, 14.5, 11, 19.5, 20, 6, 23.5, 10, 5.5, 0, 3.5, 10, 0, 4.5, 10, 19, 20.5, 12, 25.5, 16, 29.5, 0, 8, 0, 10, 0.5, 7, 13, 15.5, 1, 32.5, 16, 17.5, 14, 21.5, 15, 19, 10, 8, 17.5, 7, 9, 8.5, 41, 24, 4, 18.5, 18.5, 33, 5, 0, 11, 10, 0, 8, 6, 36, 3, 36, 14, 16, 6.5, 8 ) ## creating full factorial with 16 runs ... run.no.std.rp Length Type Break Slope Replications 1.1 1 1 1 1 .1 2.1 2 1 1 1 .1 3.1 1 2 1 1 .1 4.1 2 2 1 1 .1 5.1 1 1 2 1 .1 6.1 2 1 2 1 .1 7.1 1 2 2 1 .1 8.1 2 2 2 1 .1 9.1 1 1 1 2 .1 10.12112 .1 11.11212 .1 12.12212 .1 13.11122 .1 14.12122 .1 15.11222 .1 16.12222 .1 1.21111 .2 2.22111 .2 3.21211 .2 4.22211 .2 5.21121 .2 6.22121 .2 7.21221 .2 8.22221 .2 9.21112 .2 10.22112 .2 11.21212 .2 12.22212 .2 13.21122 .2 14.22122 .2 15.21222 .2 16.22222 .2 1.31111 .3 2.32111 .3 3.31211 .3 4.32211 .3 5.31121 .3 6.32121 .3 7.31221 .3 8.32221 .3 9.31112 .3 10.32112 .3 11.31212 .3 12.32212 .3 13.31122 .3 14.32122 .3 15.31222 .3 16.32222 .3 1.41111 .4 2.42111 .4 3.41211 .4 4.42211 .4 5.41121 .4 6.42121 .4 7.41221 .4 8.42221 .4 9.41112 .4 10.42112 .4 11.41212 .4 12.42212 .4 13.41122 .4 14.42122 .4 15.41222 .4 16.42222 .4 1.51111 .5 2.52111 .5 3.51211 .5 4.52211 .5 5.51121 .5 6.52121 .5 7.51221 .5 8.52221 .5 9.51112 .5 10.52112 .5 11.51212 .5 12.52212 .5 13.51122 .5 14.52122 .5 15.51222 .5 16.52222 .5 1.61111 .6 2.62111 .6 3.61211 .6 4.62211 .6 5.61121 .6 Add the response to the design. b. Analyse the residuals from this experiment. Any indications of model inadequacy?. response <- c(10, 0, 4, 0, 0, 5, 6.5, 16.5, 4.5, 19.5, 15, 41.5, 8, 21.5, 0, 18, 18, 16.5, 6, 10, 0, 20.5, 18.5, 4.5, 18, 18, 16, 39, 4.5, 10.5, 0, 5, 14, 4.5, 1, 34, 18.5, 18, 7.5, 0, 14.5, 16, 8.5, 6.5, 6.5, 6.5, 0, 7, 12.5, 17.5, 14.5, 11, 19.5, 20, 6, 23.5, 10, 5.5, 0, 3.5, 10, 0, 4.5, 10, 19, 20.5, 12, 25.5, 16, 29.5, 0, 8, 0, 10, 0.5, 7, 13, 15.5, 1, 32.5, 16, 17.5, 14, 21.5, 15, 19, 10, 8, 17.5, 7, 9, 8.5, 41, 24, 4, 18.5, 18.5, 33, 5, 0, 11, 10, 0, 8, 6, 36, 3, 36, 14, 16, 6.5, 8 ) plan$response<-response ourModel <- lm(formula = response ~ Length + Type + Break + Slope + Length * Type + Length * Break + Length * Slope + anova(ourModel) Type * Break + Type * Slope + Break * Slope + Length * Type * Break + Length * Type * Slope + Length * Break * Slope + Type * Break * Slope + Length * Type * Break * Slope, data = plan) ## Analysis of Variance Table ## ## Response: response ## Df 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## Residuals 96 Sum Sq 917.1 388.1 145.1 1.4 218.7 11.9 93.8 115.0 56.4 1.6 7.3 113.0 39.5 33.8 95.6 8315.8 Mean Sq 917.15 388.15 145.15 1.40 218.68 11.90 93.81 115.02 56.43 1.63 7.25 113.00 39.48 33.77 95.65 86.62 F value 10.5878 4.4809 1.6756 0.0161 2.5245 0.1373 1.0829 1.3278 0.6515 0.0188 0.0837 1.3045 0.4558 0.3899 1.1042 Pr(>F) 0.001572 ** 0.036862 * 0.198615 0.899280 0.115377 0.711776 0.300658 0.252054 0.421588 0.891271 0.772939 0.256228 0.501207 0.533858 0.295994
## Length:Type
## Length:Break
## Length:Slope
## Type:Break
## Type:Slope
## Break:Slope
## Length:Type:Break
## Length:Type:Slope
## Length:Break:Slope
## Type:Break:Slope
## Length:Type:Break:Slope
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
Plotting the residuals
plot(ourModel, 1) # residuals versus fitted values
## class=design, type=
## NOTE: columns run.no and run.no.std.rp are annotation, ## not part of the data frame
86 87 88 89 90 91 92 93 94 95 96 97 98 99
6.62121 .6 7.61221 .6 8.62221 .6 9.61112 .6
100 101 102 103 104 105 106 107 108 109 110 111 112
1.71111 .7 2.72111 .7 3.71211 .7 4.72211 .7 5.71121 .7 6.72121 .7 7.71221 .7 8.72221 .7 9.71112 .7
10.62112 .6 11.61212 .6 12.62212 .6 13.61122 .6 14.62122 .6 15.61222 .6 16.62222 .6
10.72112 .7 11.71212 .7 12.72212 .7 13.71122 .7 14.72122 .7 15.71222 .7 16.72222 .7 full factorial

plot(ourModel, 2) # QQ plot
Check assumptions statistically: Residual Test:
a. Equality of variances – homogeneity
: variances are equal
: at least one variance is different
The corresponding p-value ( constant variance or “homogeneity”.
b. Normality of Residual
) is less than
. Therefore, we reject the null hypothesis which means the model does not have a
ncvTest(ourModel)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 18.09453, Df = 1, p = 2.102e-05
: Errors are normally distributed
: Errors are not normally distributed
y = rstudent(ourModel) shapiro.test(y)
50.0 = α 120000.0

The p-Value is less than the significant level . So, we conclude that we reject the null hypothesis. Therefore, there is statistically significant evident that the stochastic component of this model is not normally distributed.
c. Auto Correlation Function:
: Errors are uncorrelated : Errors are correlated
The ACF plot confirms that there are only a few lags exceeded the horizontal dashed line. Therefore, it seems that we might need to reject the null hypothesis which means we have autocorrelation on residual model.
The p-Value is much greater than the significant level . So, we fail to reject the null hypothesis. Therefore, there is no statistically significant evident that the residuals of this model have autocorrelation.
d. Randomness
: Errors are random
: Errors are not random
The corresponding p-value is less than the significant level . Therefore, we reject null hypothesis. which means the errors on this model are not random. The analysis of the residuals indicates that the assumptions are not satisfied.
Question 2: Fractional Factorial Designs
A replicated fractional factorial design is used to investigate the effect of five factors on the free height of leaf springs used in an automotive application. The factors are:
A = furnace temperature, B = heating time,
C = transfer time,
acf(ourModel$residuals)
durbinWatsonTest(ourModel)
## lag Autocorrelation D-W Statistic p-value ## 1 0.137223 1.717472 0.186 ## Alternative hypothesis: rho != 0
runs.test(rstudent(ourModel))
## Runs Test ##
## data: rstudent(ourModel)
## statistic = -3.0374, runs = 41, n1 = 56, n2 = 56, n = 112, p-value =
## 0.002386
## alternative hypothesis: nonrandomness
## Shapiro-Wilk normality test ##
## data: y
## W = 0.97185, p-value = 0.0181

D = hold down time, and
E = quench oil temperature
The collected data are as shown below
Input the data into the software.
A B C -1 -1 -1 1 -1 -1 -1 1 -1 1 1 -1 -1 -1 1 1 -1 1
DE -1 -1 1 -1 1 -1 -1 -1 1 -1 -1 -1
FreeHeight Rep 7.78 1 8.15 1 7.50 1 7.59 1 7.54 1 7.69 1
Q2Data=read.csv(file=”Q2Data2.csv”,header = TRUE) head(Q2Data)
DesignAndResponse<-Q2Data[1:16,1:6] head(DesignAndResponse) A B C -1 -1 -1 1 -1 -1 -1 1 -1 1 1 -1 -1 -1 1 1 -1 1 DE FreeHeight -1 -1 7.78 1 -1 8.15 1 -1 7.50 -1 -1 7.59 1 -1 7.54 -1 -1 7.69 a. We have 5 factors and 16 runs so that means this a half fraction so we are looking for just 1 defining relationship. aliases(lm(FreeHeight ~ (.)^2, data = DesignAndResponse), code=TRUE) ## ## ## ## ## ## ## ## ## A=A B=B C=C D=D E=E $aliases A:B = C:D A:C = B:D A:D = B:C We can see that the the 2fi are fully aliased as shown above. We can see that D=ABC.To write down all the aliased structure we set up to 5 fi in R and we get: aliases(lm(FreeHeight ~ (.)^5, data = DesignAndResponse), code=TRUE) We can see all the aliased relationships above (All generated from th D=ABC defining relationship). b. What is the resolution of this design? The resolution of the design is the minimum number of letters in all defining relationships. Here we only have one defining relationship(generator), i.e D=ABC and thus the resolution of this design is is IV. c. Analyse the data using the 3 replications of this experiment. What factors influence the free height? Q2Model <- lm(formula = FreeHeight ~ A+B+C+D+E+A*B+A*C+A*D+A*E+B*E+C*E+D*E+A*B*E+A*C*E+A*D*E, data = Q2Data) anova(Q2Model) ## Analysis of Variance ## ## Response: FreeHeight ## Residuals 32 0.62707 ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Df Sum Sq 1 0.70325 1 0.32177 1 0.02950 1 0.09992 1 0.68402 1 0.01050 1 0.00002 1 0.00630 1 0.04877 1 0.28060 1 0.01300 1 0.01880 1 0.00005 1 0.00460 1 0.04260 Mean Sq 0.70325 0.32177 0.02950 0.09992 0.68402 0.01050 0.00002 0.00630 0.04877 0.28060 0.01300 0.01880 0.00005 0.00460 0.04260 0.01960 F value 35.8878 16.4203 Pr(>F) 1.115e-06 *** 0.0003024 *** 0.2287738 0.0308933 * 1.416e-06 *** 0.4694506 0.9755153 0.5746027 0.1245000 0.0006397 *** 0.4213435 0.3346624 0.9592041 0.6312514 0.1501284
5.0990 34.9063 0.5359 0.0010 0.3216 2.4887 14.3195 0.6635 0.9595 0.0027 0.2349 2.1740
No other factor interactions can be included as they are aliased with other effects (see the alias structure from part a)). The factors that influence the free height are A, B, D and E.
d. Analyse the data using the mean of the free height. What factors influence the mean free height? Generating the average response over the 3 replicates.
Building the data including the design and the avarage of the responce over the 3 replicates.
averesponse<- (Q2Data[1:16,6] + Q2Data[17:32,6] + Q2Data[33:48,6])/3 averesponse ## [1] 7.790000 8.070000 7.520000 7.633333 7.806667 7.946667 7.506667 7.686667 ## [9] 7.290000 7.733333 7.520000 7.646667 7.400000 7.623333 7.203333 7.633333 DesignAndAveResponse <- Q2Data[1:16,1:5] DesignAndAveResponse$y <- averesponse head(DesignAndAveResponse) ## 1 ## 2 ## 3 ## 4 ## 5 ## 6 -1 -1 -1 1 -1 -1 -1 1 -1 1 1 -1 -1 -1 1 1 -1 1 -1 -1 7.790000 1 -1 8.070000 1 -1 7.520000 -1 -1 7.633333 1 -1 7.806667 -1 -1 7.946667 ## $legend ## ## A=A B=B C=C D=D E=E ## ## $aliases ## ## A = B:C:D ## B = A:C:D ## C = A:B:D ## D = A:B:C ## E = A:B:C:D:E ## A:B = C:D ## A:C = B:D ## A:D = B:C ## A:E = B:C:D:E ## B:E = A:C:D:E ## C:E = A:B:D:E ## D:E = A:B:C:E ## A:B:E = C:D:E ## A:C:E = B:D:E ## A:D:E = B:C:E Fitting the maximum model as this is determined by the alias structure of the design. Q2Model2 <- lm(formula = y ~ A+B+C+D+E+A*B+A*C+A*D+A*E+B*E+C*E+D*E, data = DesignAndAveResponse) anova(Q2Model2) ## Analysis of Variance Table ## ## Response: y Df Sum Sq Mean Sq 1 0.234417 0.234417 1 0.107256 0.107256 1 0.009834 0.009834 1 0.033306 0.033306 1 0.228006 0.228006 1 0.003501 0.003501 1 0.000006 0.000006 1 0.002101 0.002101 1 0.016256 0.016256 1 0.093534 0.093534 1 0.004334 0.004334 1 0.006267 0.006267 3 0.015752 0.005251 F value 44.6450 20.4271 ## Residuals ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 The factors that influence the free height(y) are A, B and E. Factor C ids no longer significant using a significant level a=5%. The best way to analise the data when we have replications is to apply the analysis keeping all the data over the replications and nor summing or averaging as we did here. e. Analyse the residuals from this experiment, and comment on your findings. ## Residuals 35 0.67432 ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Df Sum Sq 1 0.70325 1 0.32177 1 0.02950 1 0.09992 1 0.68402 1 0.01050 1 0.00002 1 0.00630 1 0.04877 1 0.28060 1 0.01300 1 0.01880 Mean Sq 0.70325 0.32177 0.02950 0.09992 0.68402 0.01050 0.00002 0.00630 0.04877 0.28060 0.01300 0.01880 0.01927 F value 36.5015 16.7011 Pr(>F) 6.805e-07 *** 0.0002428 *** 0.2241592 0.0289879 * 8.758e-07 *** 0.4652492 0.9752903 0.5710254 0.1206021 0.0005293 *** 0.4169236 0.3299932
Check assumptions statistically: Residual Test:
Plots (4 in one)
6.3432 43.4240 0.6667 0.0012 0.4001 3.0960 17.8136 0.8254 1.1936
Pr(>F) 0.006837 ** 0.020252 * 0.264620 0.086285 . 0.007112 ** 0.474008 0.974645 0.571967 0.176716 0.024314 * 0.430581 0.354482
Q2Model <- lm(formula = FreeHeight ~ A+B+C+D+E+A*B+A*C+A*D+A*E+B*E+C*E+D*E, data = Q2Data) anova(Q2Model) ## Analysis of Variance ## ## Response: FreeHeight 5.1862 35.5033 0.5451 0.0010 0.3271 2.5313 14.5643 0.6749 0.9759 autoplot(Q2Model) a. Equality of variances - homogeneity : variances are equal : at least one variance is different The corresponding p-value ( constant variance or “homogeneity”. b. Normality of Residual ) is less than . Therefore, we reject the null hypothesis which means the model does not have a ncvTest(Q2Model) ## Non-constant Variance Score Test ## Variance formula: ~ fitted.values ## Chisquare = 6.322484, Df = 1, p = 0.011922 : Errors are normally distributed : Errors are not normally distributed shapiro.test(Q2Model$residuals) ## Shapiro-Wilk normality test ## ## data: Q2Model$residuals ## W = 0.9373, p-value = 0.0128 ks.test(Q2Model$residuals,"pnorm") ## Warning in ks.test(Q2Model$residuals, "pnorm"): ties should not be present for ## the Kolmogorov-Smirnov test ## One-sample Kolmogorov-Smirnov test ## ## data: Q2Model$residuals ## D = 0.42425, p-value = 6.267e-08 ## alternative hypothesis: two-sided ad.test(Q2Model$residuals) ## Anderson-Darling normality test ## ## data: Q2Model$residuals ## A = 0.78702, p-value = 0.03831 The p-Values are less than the significant level significant evident that the stochastic component of this model is not normally distributed. c. Auto Correlation Function: : Errors are uncorrelated : Errors are correlated . So, we conclude that we reject the null hypothesis. Therefore, there is statistically The ACF plot confirms that there are no lags exceeded the horizontal dashed line. Therefore, it seems that we fail to reject the null hypothesis which means we do not have autocorrelation on the residuals of the model. durbinWatsonTest(Q2Model) ## lag Autocorrelation D-W Statistic 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com