Lab session 4: Solutions
Lab session 4: Solutions
Copyright By PowCoder代写 加微信 powcoder
RMIT University-MATH1302
20/04/2022
# List of packages required for this analysis
pkg <- c("mosaic", "ggplot2", "tidyverse", "car", "randtests", "stats", "dplyr", "agricolae","nlme","lme4", "emmeans", "crossdes", "AlgDesign", "DoE.base", "FrF2","SixSigma","ggfortify","nortest","lmtest")
# Check if packages are not installed and assign the
# names of the packages not installed to the variable new.pkg
new.pkg <- pkg[!(pkg %in% installed.packages())]
# If there are any packages in the list that aren't installed,
# install them
if (length(new.pkg)>0) {
install.packages(new.pkg, repos = “http://cran.rstudio.com”)
# Load the packages into R
library(mosaic) # favstats()
library(ggplot2)
library(tidyverse) # summarise()
library(car) #ncvTest()
library(randtests) # runs.test()
library(stats) # TukeyHSD()
library(dplyr) # mutate()
library(agricolae) # LSD.test()
library(nlme)
library(lme4) # lmer()
library(emmeans) # emmeans()
library(crossdes) # find.BIB()
#library(multcomp)
library(AlgDesign) # gen.factorial()
library(DoE.base)
library(FrF2)
library(SixSigma)
library(ggfortify)
library(nortest)
library(lmtest)
Question 1: Full Factorial Designs
A typical golfer uses the putter for about 40% on strokes. Improving one’s putting is a logical and perhaps simple way to improve a golf score. An experiment was conducted to study the effects of four factors on putting accuracy. The design factors are length of putt, type of putter, breaking putt versus straight putt, and level versus downhill putt. The response variable is distance from the ball to the centre of the cup after the ball comes to rest. One golfer performs the experiment, a 24 full factorial design with seven replicates was used, and all putts are made in random order. The results are shown in the next Table.
Input the data into the software.
Design <- expand.grid(Length = gl(2, 1, labels = c("10", "30")),
Type = gl(2, 1, labels = c("Mallet", "Cavity back")),
Break = gl(2, 1, labels = c("Straight", "Breaking")),
Slope = gl(2, 1, labels = c("Level", "Downhill")))
# Replicates
Q1Data <- data.frame(repl = rep(1:7, each = 16),
rbind(Design))
# Add the Response
Q1Data$response <- c(10, 0, 4, 0, 0, 5, 6.5, 16.5, 4.5, 19.5, 15, 41.5, 8, 21.5, 0, 18,
18, 16.5, 6, 10, 0, 20.5, 18.5, 4.5, 18, 18, 16, 39, 4.5, 10.5, 0, 5,
14, 4.5, 1, 34, 18.5, 18, 7.5, 0, 14.5, 16, 8.5, 6.5, 6.5, 6.5, 0, 7,
12.5, 17.5, 14.5, 11, 19.5, 20, 6, 23.5, 10, 5.5, 0, 3.5, 10, 0, 4.5, 10,
19, 20.5, 12, 25.5, 16, 29.5, 0, 8, 0, 10, 0.5, 7, 13, 15.5, 1, 32.5,
16, 17.5, 14, 21.5, 15, 19, 10, 8, 17.5, 7, 9, 8.5, 41, 24, 4, 18.5,
18.5, 33, 5, 0, 11, 10, 0, 8, 6, 36, 3, 36, 14, 16, 6.5, 8 )
Export the data to csv file
write.csv(Q1Data,"Q1Data.csv", row.names = TRUE) # to Export Data as a CSV file from RStudio
Display the structure of the data.
str(Q1Data)
## 'data.frame': 112 obs. of 6 variables:
## $ repl : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Length : Factor w/ 2 levels "10","30": 1 2 1 2 1 2 1 2 1 2 ...
## $ Type : Factor w/ 2 levels "Mallet","Cavity back": 1 1 2 2 1 1 2 2 1 1 ...
## $ Break : Factor w/ 2 levels "Straight","Breaking": 1 1 1 1 2 2 2 2 1 1 ...
## $ Slope : Factor w/ 2 levels "Level","Downhill": 1 1 1 1 1 1 1 1 2 2 ...
## $ response: num 10 0 4 0 0 5 6.5 16.5 4.5 19.5 ...
a) (a) Analyse the data from this experiment by fitting an ANOVA model with intercept, main effects, and interactions of any order. Which of the main effects are significantly affect putting performance?
We need a linear model of intercept, main effects, and interactions of any order.
ourModel <- lm(formula = response ~ Length + Type + Break + Slope +
Length * Type + Length * Break + Length * Slope +
Type * Break + Type * Slope + Break * Slope +
Length * Type * Break + Length * Type * Slope +
Length * Break * Slope + Type * Break * Slope +
Length * Type * Break * Slope,
data = Q1Data)
anova(ourModel)
## Analysis of Variance Table
## Response: response
## Df Sum Sq Mean Sq F value Pr(>F)
## Length 1 917.1 917.15 10.5878 0.001572 **
## Type 1 388.1 388.15 4.4809 0.036862 *
## Break 1 145.1 145.15 1.6756 0.198615
## Slope 1 1.4 1.40 0.0161 0.899280
## Length:Type 1 218.7 218.68 2.5245 0.115377
## Length:Break 1 11.9 11.90 0.1373 0.711776
## Length:Slope 1 93.8 93.81 1.0829 0.300658
## Type:Break 1 115.0 115.02 1.3278 0.252054
## Type:Slope 1 56.4 56.43 0.6515 0.421588
## Break:Slope 1 1.6 1.63 0.0188 0.891271
## Length:Type:Break 1 7.3 7.25 0.0837 0.772939
## Length:Type:Slope 1 113.0 113.00 1.3045 0.256228
## Length:Break:Slope 1 39.5 39.48 0.4558 0.501207
## Type:Break:Slope 1 33.8 33.77 0.3899 0.533858
## Length:Type:Break:Slope 1 95.6 95.65 1.1042 0.295994
## Residuals 96 8315.8 86.62
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
SSE=sum(ourModel$residuals**2)
## [1] 8315.786
From the ANOVA table, it seems that only the main effects of Length and Type are significant. Note that if we are interested in seen the actual model and the coefficients we need to use the coded levels for the variables.
Second alternative solution for question 1 (a). Two similar generations with the fac.design function.
plan <- fac.design(nlevels = c(2,2,2,2), replications=7, randomize=FALSE, block.name="Replications", factor.names=c("Length","Type","Break","Slope"))
## creating full factorial with 16 runs ...
## run.no run.no.std.rp Length Type Break Slope Replications
## 1 1 1.1 1 1 1 1 .1
## 2 2 2.1 2 1 1 1 .1
## 3 3 3.1 1 2 1 1 .1
## 4 4 4.1 2 2 1 1 .1
## 5 5 5.1 1 1 2 1 .1
## 6 6 6.1 2 1 2 1 .1
## 7 7 7.1 1 2 2 1 .1
## 8 8 8.1 2 2 2 1 .1
## 9 9 9.1 1 1 1 2 .1
## 10 10 10.1 2 1 1 2 .1
## 11 11 11.1 1 2 1 2 .1
## 12 12 12.1 2 2 1 2 .1
## 13 13 13.1 1 1 2 2 .1
## 14 14 14.1 2 1 2 2 .1
## 15 15 15.1 1 2 2 2 .1
## 16 16 16.1 2 2 2 2 .1
## 17 17 1.2 1 1 1 1 .2
## 18 18 2.2 2 1 1 1 .2
## 19 19 3.2 1 2 1 1 .2
## 20 20 4.2 2 2 1 1 .2
## 21 21 5.2 1 1 2 1 .2
## 22 22 6.2 2 1 2 1 .2
## 23 23 7.2 1 2 2 1 .2
## 24 24 8.2 2 2 2 1 .2
## 25 25 9.2 1 1 1 2 .2
## 26 26 10.2 2 1 1 2 .2
## 27 27 11.2 1 2 1 2 .2
## 28 28 12.2 2 2 1 2 .2
## 29 29 13.2 1 1 2 2 .2
## 30 30 14.2 2 1 2 2 .2
## 31 31 15.2 1 2 2 2 .2
## 32 32 16.2 2 2 2 2 .2
## 33 33 1.3 1 1 1 1 .3
## 34 34 2.3 2 1 1 1 .3
## 35 35 3.3 1 2 1 1 .3
## 36 36 4.3 2 2 1 1 .3
## 37 37 5.3 1 1 2 1 .3
## 38 38 6.3 2 1 2 1 .3
## 39 39 7.3 1 2 2 1 .3
## 40 40 8.3 2 2 2 1 .3
## 41 41 9.3 1 1 1 2 .3
## 42 42 10.3 2 1 1 2 .3
## 43 43 11.3 1 2 1 2 .3
## 44 44 12.3 2 2 1 2 .3
## 45 45 13.3 1 1 2 2 .3
## 46 46 14.3 2 1 2 2 .3
## 47 47 15.3 1 2 2 2 .3
## 48 48 16.3 2 2 2 2 .3
## 49 49 1.4 1 1 1 1 .4
## 50 50 2.4 2 1 1 1 .4
## 51 51 3.4 1 2 1 1 .4
## 52 52 4.4 2 2 1 1 .4
## 53 53 5.4 1 1 2 1 .4
## 54 54 6.4 2 1 2 1 .4
## 55 55 7.4 1 2 2 1 .4
## 56 56 8.4 2 2 2 1 .4
## 57 57 9.4 1 1 1 2 .4
## 58 58 10.4 2 1 1 2 .4
## 59 59 11.4 1 2 1 2 .4
## 60 60 12.4 2 2 1 2 .4
## 61 61 13.4 1 1 2 2 .4
## 62 62 14.4 2 1 2 2 .4
## 63 63 15.4 1 2 2 2 .4
## 64 64 16.4 2 2 2 2 .4
## 65 65 1.5 1 1 1 1 .5
## 66 66 2.5 2 1 1 1 .5
## 67 67 3.5 1 2 1 1 .5
## 68 68 4.5 2 2 1 1 .5
## 69 69 5.5 1 1 2 1 .5
## 70 70 6.5 2 1 2 1 .5
## 71 71 7.5 1 2 2 1 .5
## 72 72 8.5 2 2 2 1 .5
## 73 73 9.5 1 1 1 2 .5
## 74 74 10.5 2 1 1 2 .5
## 75 75 11.5 1 2 1 2 .5
## 76 76 12.5 2 2 1 2 .5
## 77 77 13.5 1 1 2 2 .5
## 78 78 14.5 2 1 2 2 .5
## 79 79 15.5 1 2 2 2 .5
## 80 80 16.5 2 2 2 2 .5
## 81 81 1.6 1 1 1 1 .6
## 82 82 2.6 2 1 1 1 .6
## 83 83 3.6 1 2 1 1 .6
## 84 84 4.6 2 2 1 1 .6
## 85 85 5.6 1 1 2 1 .6
## 86 86 6.6 2 1 2 1 .6
## 87 87 7.6 1 2 2 1 .6
## 88 88 8.6 2 2 2 1 .6
## 89 89 9.6 1 1 1 2 .6
## 90 90 10.6 2 1 1 2 .6
## 91 91 11.6 1 2 1 2 .6
## 92 92 12.6 2 2 1 2 .6
## 93 93 13.6 1 1 2 2 .6
## 94 94 14.6 2 1 2 2 .6
## 95 95 15.6 1 2 2 2 .6
## 96 96 16.6 2 2 2 2 .6
## 97 97 1.7 1 1 1 1 .7
## 98 98 2.7 2 1 1 1 .7
## 99 99 3.7 1 2 1 1 .7
## 100 100 4.7 2 2 1 1 .7
## 101 101 5.7 1 1 2 1 .7
## 102 102 6.7 2 1 2 1 .7
## 103 103 7.7 1 2 2 1 .7
## 104 104 8.7 2 2 2 1 .7
## 105 105 9.7 1 1 1 2 .7
## 106 106 10.7 2 1 1 2 .7
## 107 107 11.7 1 2 1 2 .7
## 108 108 12.7 2 2 1 2 .7
## 109 109 13.7 1 1 2 2 .7
## 110 110 14.7 2 1 2 2 .7
## 111 111 15.7 1 2 2 2 .7
## 112 112 16.7 2 2 2 2 .7
## class=design, type= full factorial
## NOTE: columns run.no and run.no.std.rp are annotation,
## not part of the data frame
Add the response to the design.
response <- c(10, 0, 4, 0, 0, 5, 6.5, 16.5, 4.5, 19.5, 15, 41.5, 8, 21.5, 0, 18,
18, 16.5, 6, 10, 0, 20.5, 18.5, 4.5, 18, 18, 16, 39, 4.5, 10.5, 0, 5,
14, 4.5, 1, 34, 18.5, 18, 7.5, 0, 14.5, 16, 8.5, 6.5, 6.5, 6.5, 0, 7,
12.5, 17.5, 14.5, 11, 19.5, 20, 6, 23.5, 10, 5.5, 0, 3.5, 10, 0, 4.5, 10,
19, 20.5, 12, 25.5, 16, 29.5, 0, 8, 0, 10, 0.5, 7, 13, 15.5, 1, 32.5,
16, 17.5, 14, 21.5, 15, 19, 10, 8, 17.5, 7, 9, 8.5, 41, 24, 4, 18.5,
18.5, 33, 5, 0, 11, 10, 0, 8, 6, 36, 3, 36, 14, 16, 6.5, 8 )
plan$response<-response
Analyse the residuals from this experiment. Any indications of model inadequacy?.
ourModel <- lm(formula = response ~ Length + Type + Break + Slope +
Length * Type + Length * Break + Length * Slope +
Type * Break + Type * Slope + Break * Slope +
Length * Type * Break + Length * Type * Slope +
Length * Break * Slope + Type * Break * Slope +
Length * Type * Break * Slope,
data = plan)
anova(ourModel)
## Analysis of Variance Table
## Response: response
## Df Sum Sq Mean Sq F value Pr(>F)
## Length 1 917.1 917.15 10.5878 0.001572 **
## Type 1 388.1 388.15 4.4809 0.036862 *
## Break 1 145.1 145.15 1.6756 0.198615
## Slope 1 1.4 1.40 0.0161 0.899280
## Length:Type 1 218.7 218.68 2.5245 0.115377
## Length:Break 1 11.9 11.90 0.1373 0.711776
## Length:Slope 1 93.8 93.81 1.0829 0.300658
## Type:Break 1 115.0 115.02 1.3278 0.252054
## Type:Slope 1 56.4 56.43 0.6515 0.421588
## Break:Slope 1 1.6 1.63 0.0188 0.891271
## Length:Type:Break 1 7.3 7.25 0.0837 0.772939
## Length:Type:Slope 1 113.0 113.00 1.3045 0.256228
## Length:Break:Slope 1 39.5 39.48 0.4558 0.501207
## Type:Break:Slope 1 33.8 33.77 0.3899 0.533858
## Length:Type:Break:Slope 1 95.6 95.65 1.1042 0.295994
## Residuals 96 8315.8 86.62
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
Plotting the residuals
plot(ourModel, 1) # residuals versus fitted values
plot(ourModel, 2) # QQ plot
Check assumptions statistically:
Residual Test:
a. Equality of variances – homogeneity
\(H_0\): variances are equal
\(H_1\): at least one variance is different
ncvTest(ourModel)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 18.09453, Df = 1, p = 2.102e-05
The corresponding p-value (\(0.000021\)) is less than \(\alpha = 0.05\). Therefore, we reject the null hypothesis which means the model does not have a constant variance or “homogeneity”.
b. Normality of Residual
\(H_0\): Errors are normally distributed
\(H_1\): Errors are not normally distributed
y = rstudent(ourModel)
shapiro.test(y)
## Shapiro-Wilk normality test
## data: y
## W = 0.97185, p-value = 0.0181
The p-Value \(0.0181\) is less than the significant level \(\alpha = 0.05\). So, we conclude that we reject the null hypothesis. Therefore, there is statistically significant evident that the stochastic component of this model is not normally distributed.
c. Auto Correlation Function:
\(H_0\): Errors are uncorrelated
\(H_1\): Errors are correlated
acf(ourModel$residuals)
The ACF plot confirms that there are only a few lags exceeded the horizontal dashed line. Therefore, it seems that we might need to reject the null hypothesis which means we have autocorrelation on residual model.
durbinWatsonTest(ourModel)
## lag Autocorrelation D-W Statistic p-value
## 1 0.137223 1.717472 0.186
## Alternative hypothesis: rho != 0
The p-Value \(0.19\) is much greater than the significant level \(\alpha = 0.05\). So, we fail to reject the null hypothesis. Therefore, there is no statistically significant evident that the residuals of this model have autocorrelation.
d. Randomness
\(H_0\): Errors are random
\(H_1\): Errors are not random
runs.test(rstudent(ourModel))
## Runs Test
## data: rstudent(ourModel)
## statistic = -3.0374, runs = 41, n1 = 56, n2 = 56, n = 112, p-value =
## 0.002386
## alternative hypothesis: nonrandomness
The corresponding p-value \(0.002386\) is less than the significant level \(\alpha = 0.05\). Therefore, we reject null hypothesis. which means the errors on this model are not random. The analysis of the residuals indicates that the assumptions are not satisfied.
Question 2: Fractional Factorial Designs
A replicated fractional factorial design is used to investigate the effect of five factors on the free height of leaf springs used in an automotive application. The factors are:
A = furnace temperature,
B = heating time,
C = transfer time,
D = hold down time, and
E = quench oil temperature
The collected data are as shown below
Input the data into the software.
Q2Data=read.csv(file=”Q2Data2.csv”,header = TRUE)
head(Q2Data)
## A B C D E FreeHeight Rep
## 1 -1 -1 -1 -1 -1 7.78 1
## 2 1 -1 -1 1 -1 8.15 1
## 3 -1 1 -1 1 -1 7.50 1
## 4 1 1 -1 -1 -1 7.59 1
## 5 -1 -1 1 1 -1 7.54 1
## 6 1 -1 1 -1 -1 7.69 1
DesignAndResponse<-Q2Data[1:16,1:6] head(DesignAndResponse) ## A B C D E FreeHeight ## 1 -1 -1 -1 -1 -1 7.78 ## 2 1 -1 -1 1 -1 8.15 ## 3 -1 1 -1 1 -1 7.50 ## 4 1 1 -1 -1 -1 7.59 ## 5 -1 -1 1 1 -1 7.54 ## 6 1 -1 1 -1 -1 7.69 We have 5 factors and 16 runs so that means this a half fraction so we are looking for just 1 defining relationship. aliases(lm(FreeHeight ~ (.)^2, data = DesignAndResponse), code=TRUE) ## $legend ## A=A B=B C=C D=D E=E ## $aliases ## A:B = C:D ## A:C = B:D ## A:D = B:C We can see that the the 2fi are fully aliased as shown above. We can see that D=ABC.To write down all the aliased structure we set up to 5 fi in R and we get: aliases(lm(FreeHeight ~ (.)^5, data = DesignAndR 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com