Lab session 4: Solutions
RMIT University-MATH1302 May 13, 2022
# List of packages required for this analysis
pkg <- c("mosaic", "ggplot2", "tidyverse", "car", "randtests", "stats", "dplyr", "agricolae","nlme","lm
Copyright By PowCoder代写 加微信 powcoder
e4″, “emmeans”, “crossdes”, “AlgDesign”, “DoE.base”, “FrF2″,”SixSigma”,”ggfortify”,”nortest”,”lmtest”,
“DAAG”, “reshape2”, “Rmisc”)
# Check if packages are not installed and assign the
# names of the packages not installed to the variable new.pkg new.pkg <- pkg[!(pkg %in% installed.packages())]
# If there are any packages in the list that aren't installed, # install them
if (length(new.pkg)>0) {
install.packages(new.pkg, repos = “http://cran.rstudio.com”)
# Load the packages into R library(mosaic) # favstats() library(ggplot2)
library(tidyverse) # summarise() library(car) #ncvTest() library(randtests) # runs.test() library(stats) # TukeyHSD() library(dplyr) # mutate() and gather() library(agricolae) # LSD.test() library(nlme)
library(lme4) # lmer() library(emmeans) # emmeans() library(crossdes) # find.BIB() #library(multcomp) library(AlgDesign) # gen.factorial() library(DoE.base)
library(FrF2) library(SixSigma) library(ggfortify) library(nortest) library(lmtest) library(DAAG) # press() library(reshape2) library(Rmisc) # summarySE()
Question 1: Full Factorial Designs
A typical golfer uses the putter for about 40% on strokes. Improving one’s putting is a logical and perhaps simple way to improve a golf score. An experiment was conducted to study the effects of four factors on putting accuracy. The design factors are length of putt, type of putter, breaking putt versus straight putt, and level versus downhill putt. The response variable is distance from the ball to the centre of the cup after the ball comes to rest. One golfer performs the experiment, a 24 full factorial design with seven replicates was used, and all putts are made in random order. The results are shown in the next Table.
Input the data into the software.
Design <- expand.grid(Length = gl(2, 1, labels = c("10", "30")),
Type = gl(2, 1, labels = c("Mallet", "Cavity back")),
Break = gl(2, 1, labels = c("Straight", "Breaking")),
Slope = gl(2, 1, labels = c("Level", "Downhill")))
# Replicates
Q1Data <- data.frame(repl = rep(1:7, each = 16),
rbind(Design))
# Add the Response
Q1Data$response <- c(10, 0, 4, 0, 0, 5, 6.5, 16.5, 4.5, 19.5, 15, 41.5, 8, 21.5, 0, 18,
18, 16.5, 6, 10, 0, 20.5, 18.5, 4.5, 18, 18, 16, 39, 4.5, 10.5, 0, 5,
14, 4.5, 1, 34, 18.5, 18, 7.5, 0, 14.5, 16, 8.5, 6.5, 6.5, 6.5, 0, 7,
head(Q1Data)
## repl Length
Type Break Slope response
Level 10
Level 0
## 5 1 10 Level 0
## 6 1 30 Level 5
Export the data to csv file
Display the structure of the data.
12.5, 17.5, 14.5, 11, 19.5, 20, 6, 23.5, 10, 5.5, 0, 3.5, 10, 0, 4.5, 10,
19, 20.5, 12, 25.5, 16, 29.5, 0, 8, 0, 10, 0.5, 7, 13, 15.5, 1, 32.5,
16, 17.5, 14, 21.5, 15, 19, 10, 8, 17.5, 7, 9, 8.5, 41, 24, 4, 18.5,
18.5, 33, 5, 0, 11, 10, 0, 8, 6, 36, 3, 36, 14, 16, 6.5, 8 )
10 Cavity back Straight Level 4
30 Cavity back Straight Level 0
write.csv(Q1Data,"Q1Data.csv", row.names = TRUE) # to Export Data as a CSV file from RStudio
str(Q1Data)
a) Analyse the data from this experiment by fitting an ANOVA model with intercept, main effects, and interactions of any order. Which of the main effects are significantly affect putting performance?
1. We need a linear model of intercept, main effects, and interactions of any order.
ourModel <- lm(formula = response ~ Length + Type + Break + Slope +
anova(ourModel)
Length * Type + Length * Break + Length * Slope + # Two ineractions Type * Break + Type * Slope + Break * Slope +
Length * Type * Break + Length * Type * Slope + # three ineractions Length * Break * Slope + Type * Break * Slope +
Length * Type * Break * Slope, # four ineractions data = Q1Data)
## Analysis of Variance Table
## Response: response
Df Sum Sq Mean Sq F value Pr(>F)
1 917.1 917.15 10.5878 0.001572 **
1 388.1 388.15 4.4809 0.036862 *
1 145.1 145.15 1.6756 0.198615
1 1.4 1.40 0.0161 0.899280
1 218.7 218.68 2.5245 0.115377
1 11.9 11.90 0.1373 0.711776
1 93.8 93.81 1.0829 0.300658
1 115.0 115.02 1.3278 0.252054
1 56.4 56.43 0.6515 0.421588
1 1.6 1.63 0.0188 0.891271
1 7.3 7.25 0.0837 0.772939
1 113.0 113.00 1.3045 0.256228
1 39.5 39.48 0.4558 0.501207
1 33.8 33.77 0.3899 0.533858
## Length:Type
## Length:Break
## Length:Slope
## Type:Break
## Type:Slope
## Break:Slope
## Length:Type:Break
## Length:Type:Slope
## Length:Break:Slope
## Type:Break:Slope
## Length:Type:Break:Slope 1 95.6 95.65 1.1042 0.295994
## Residuals 96 8315.8 86.62
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
From the ANOVA table, it seems that only the main effects of Length and Type are significant. Note that if we are interested in seen the actual model and the coefficients we need to use the coded levels for the variables.
Second alternative solution for question 1 (a). Two similar generations with the fac.design function.
plan <- fac.design(nlevels = c(2,2,2,2), replications=7, randomize=FALSE, block.name="Replications", fa
ctor.names=c("Length","Type","Break","Slope"))
## 'data.frame': 112 obs. of 6 variables:
## $ Length
## $ Type
## $ Break
## $ Slope
## $ response: num 10 0 4 0 0 5 6.5 16.5 4.5 19.5 ...
: int 1 1 1 1 1 1 1 1 1 1 ...
: Factor w/ 2 levels "10","30": 1 2 1 2 1 2 1 2 1 2 ...
: Factor w/ 2 levels "Mallet","Cavity back": 1 1 2 2 1 1 2 2 1 1 ...
: Factor w/ 2 levels "Straight","Breaking": 1 1 1 1 2 2 2 2 1 1 ...
: Factor w/ 2 levels "Level","Downhill": 1 1 1 1 1 1 1 1 2 2 ...
## creating full factorial with 16 runs ...
## run.no run.no.std.rp Length Type Break Slope Replications
## 1 1 1.1 1 1 1 1 .1
## 2 2 2.1 2 1 1 1 .1
## 3 3 3.1 1 2 1 1 .1
## 4 4 4.1 2 2 1 1 .1
## 5 5 5.1 1 1 2 1 .1
## 6 6 6.1 2 1 2 1 .1
## 7 7 7.1 1 2 2 1 .1
## 8 8 8.1 2 2 2 1 .1
## 9 9 9.1 1 1 1 2 .1
10.12112 .1 11.11212 .1 12.12212 .1 13.11122 .1 14.12122 .1 15.11222 .1 16.12222 .1
1.21111 .2 2.22111 .2 3.21211 .2 4.22211 .2 5.21121 .2 6.22121 .2 7.21221 .2 8.22221 .2 9.21112 .2
10.22112 .2 11.21212 .2 12.22212 .2 13.21122 .2 14.22122 .2 15.21222 .2 16.22222 .2
1.31111 .3 2.32111 .3 3.31211 .3 4.32211 .3 5.31121 .3 6.32121 .3 7.31221 .3 8.32221 .3 9.31112 .3
10.32112 .3 11.31212 .3 12.32212 .3 13.31122 .3 14.32122 .3 15.31222 .3 16.32222 .3
1.41111 .4 2.42111 .4 3.41211 .4 4.42211 .4 5.41121 .4 6.42121 .4 7.41221 .4 8.42221 .4
## 57 57
## 58 58
## 59 59
## 60 60
## 61 61
## 62 62
## 63 63
## 64 64
## 65 65
## 66 66
## 67 67
## 68 68
## 69 69
## 70 70
## 71 71
## 72 72
## 73 73
## 74 74
## 75 75
## 76 76
## 77 77
## 78 78
## 79 79
## 80 80
## 81 81
## 82 82
## 83 83
## 84 84
## 85 85
## 86 86
## 87 87
## 88 88
## 89 89
## 90 90
## 91 91
## 92 92
## 93 93
## 94 94
## 95 95
## 96 96
## 97 97
## 98 98
## 99 99
## 100 100
## 101 101
## 102 102
## 103 103
## 104 104
## 105 105
## 106 106
## 107 107
## 108 108
## 109 109
## 110 110
## 111 111
## 112 112
## class=design, type= full factorial
9.41112 .4 10.42112 .4 11.41212 .4 12.42212 .4 13.41122 .4 14.42122 .4 15.41222 .4 16.42222 .4
1.51111 .5 2.52111 .5 3.51211 .5 4.52211 .5 5.51121 .5 6.52121 .5 7.51221 .5 8.52221 .5 9.51112 .5
10.52112 .5 11.51212 .5 12.52212 .5 13.51122 .5 14.52122 .5 15.51222 .5 16.52222 .5
1.61111 .6 2.62111 .6 3.61211 .6 4.62211 .6 5.61121 .6 6.62121 .6 7.61221 .6 8.62221 .6 9.61112 .6
10.62112 .6 11.61212 .6 12.62212 .6 13.61122 .6 14.62122 .6 15.61222 .6 16.62222 .6
1.71111 .7 2.72111 .7 3.71211 .7 4.72211 .7 5.71121 .7 6.72121 .7 7.71221 .7 8.72221 .7 9.71112 .7
10.72112 .7 11.71212 .7 12.72212 .7 13.71122 .7 14.72122 .7 15.71222 .7 16.72222 .7
Add the response to the design.
response <- c(10, 0, 4, 0, 0, 5, 6.5, 16.5, 4.5, 19.5, 15, 41.5, 8, 21.5, 0, 18,
18, 16.5, 6, 10, 0, 20.5, 18.5, 4.5, 18, 18, 16, 39, 4.5, 10.5, 0, 5,
14, 4.5, 1, 34, 18.5, 18, 7.5, 0, 14.5, 16, 8.5, 6.5, 6.5, 6.5, 0, 7,
12.5, 17.5, 14.5, 11, 19.5, 20, 6, 23.5, 10, 5.5, 0, 3.5, 10, 0, 4.5, 10,
19, 20.5, 12, 25.5, 16, 29.5, 0, 8, 0, 10, 0.5, 7, 13, 15.5, 1, 32.5,
16, 17.5, 14, 21.5, 15, 19, 10, 8, 17.5, 7, 9, 8.5, 41, 24, 4, 18.5,
18.5, 33, 5, 0, 11, 10, 0, 8, 6, 36, 3, 36, 14, 16, 6.5, 8 )
plan$response<-response
head(plan)
## Length Type Break Slope Replications response
## 1 1 1 1 1 .1 10
## 2 2 1 1 1 .1 0
## 3 1 2 1 1 .1 4
## 4 2 2 1 1 .1 0
## 5 1 1 2 1 .1 0
## 6 2 1 2 1 .1 5
(b) Analyse the residuals from this experiment. Any indications of model inadequacy?.
ourModel <- lm(formula = response ~ Length + Type + Break + Slope +
Length * Type + Length * Break + Length * Slope +
anova(ourModel)
Type * Break + Type * Slope + Break * Slope +
Length * Type * Break + Length * Type * Slope +
Length * Break * Slope + Type * Break * Slope +
Length * Type * Break * Slope,
data = plan)
## NOTE: columns run.no and run.no.std.rp are annotation,
## not part of the data frame
Plotting the residuals
plot(ourModel, 1) # residuals versus fitted values
plot(ourModel, 2) # QQ plot
## Analysis of Variance Table
## Response: response
Df Sum Sq Mean Sq F value Pr(>F)
1 917.1 917.15 10.5878 0.001572 **
1 388.1 388.15 4.4809 0.036862 *
1 145.1 145.15 1.6756 0.198615
1 1.4 1.40 0.0161 0.899280
1 218.7 218.68 2.5245 0.115377
1 11.9 11.90 0.1373 0.711776
1 93.8 93.81 1.0829 0.300658
1 115.0 115.02 1.3278 0.252054
1 56.4 56.43 0.6515 0.421588
1 1.6 1.63 0.0188 0.891271
1 7.3 7.25 0.0837 0.772939
1 113.0 113.00 1.3045 0.256228
1 39.5 39.48 0.4558 0.501207
1 33.8 33.77 0.3899 0.533858
## Length:Type
## Length:Break
## Length:Slope
## Type:Break
## Type:Slope
## Break:Slope
## Length:Type:Break
## Length:Type:Slope
## Length:Break:Slope
## Type:Break:Slope
## Length:Type:Break:Slope 1 95.6 95.65 1.1042 0.295994
## Residuals 96 8315.8 86.62
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
Check assumptions statistically: Residual Test:
a. Equality of variances – homogeneity
: variances are equal
: at least one variance is different
The corresponding p-value ( ) is less than . Therefore, we reject the null hypothesis which means the model does not have a constant variance or “homogeneity”.
b. Normality of Residual
: Errors are normally distributed
: Errors are not normally distributed
ncvTest(ourModel)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 18.09453, Df = 1, p = 2.102e-05
y = rstudent(ourModel)
shapiro.test(y)
50.0 = α 120000.0
The p-Value is less than the significant level . So, we conclude that we reject the null hypothesis. Therefore, there is statistically significant evident that the stochastic component of this model is not normally distributed.
c. Auto Correlation Function:
: Errors are uncorrelated : Errors are correlated
The ACF plot confirms that there are only a few lags exceeded the horizontal dashed line. Therefore, it seems that we might need to reject the null hypothesis which means we have autocorrelation on residual model.
acf(ourModel$residuals)
durbinWatsonTest(ourModel)
## lag Autocorrelation D-W Statistic p-value
## 1 0.137223 1.717472 0.192
## Alternative hypothesis: rho != 0
## Shapiro-Wilk normality test
## data: y
## W = 0.97185, p-value = 0.0181
50.0 = α 1810.0
The p-Value is much greater than the significant level . So, we fail to reject the null hypothesis. Therefore, there is no statistically significant evident that the residuals of this model have autocorrelation.
d. Randomness
: Errors are random
: Errors are not random
runs.test(rstudent(ourModel))
## Runs Test
## data: rstudent(ourModel)
## statistic = -3.0374, runs = 41, n1 = 56, n2 = 56, n = 112, p-value =
## 0.002386
## alternative hypothesis: nonrandomness
The corresponding p-value is less than the significant level . Therefore, we reject null hypothesis. which means the errors on this model are not random. The analysis of the residuals indicates that some of the assumptions are not satisfied.
Question 2: Fractional Factorial Designs
A replicated fractional factorial design is used to investigate the effect of five factors on the free height of leaf springs used in an automotive application. The factors are:
A = furnace temperature,
B = heating time,
C = transfer time,
D = hold down time, and
E = quench oil temperature
The collected data are as shown below
Input the data into the software.
50.0 = α 683200.0
## A B C DE ## 1 -1 -1 -1 -1 -1 ## 2 1 -1 -1 1 -1 ## 3 -1 1 -1 1 -1 ## 4 1 1 -1 -1 -1 ## 5 -1 -1 1 1 -1 ## 6 1 -1 1 -1 -1
FreeHeight Rep
DesignAndResponse<-Q2Data[1:16,1:6]
head(DesignAndResponse)
## A B C DE FreeHeight ## 1 -1 -1 -1 -1 -1 7.78 ## 2 1 -1 -1 1 -1 8.15 ## 3 -1 1 -1 1 -1 7.50 ## 4 1 1 -1 -1 -1 7.59 ## 5 -1 -1 1 1 -1 7.54 ## 6 1 -1 1 -1 -1 7.69
a) We have 5 factors and 16 runs so that means this a half fraction so we are looking for just 1 defining relationship.
aliases(lm(FreeHeight ~ (.)^2, data = DesignAndResponse), code=TRUE)
## $legend
## A=A B=B C=C D=D E=E
## $aliases
## A:B = C:D
## A:C = B:D
## A:D = B:C
We can see that the the 2fi are fully aliased as shown above. We can see that D=ABC.To write down all the aliased structure we set up to 5 fi in R and we get:
aliases(lm(FreeHeight ~ (.)^5, data = DesignAndResponse), code=TRUE)
Q2Data=read.csv(file="Q2Data2.csv",header = TRUE)
head(Q2Data)
We can see all the aliased relationships above (All generated from th D=ABC defining relationship).
(b) What is the resolution of this design?
The resolution of the design is the minimum number of letters in all defining relationships. Here we only have one defining relationship(generator), i.e D=ABC and thus the resolution of this design is IV.
(c) Analyse the data using the 3 replications of this experiment. What factors influence the free height?
Q2Model <- lm(formula = FreeHeight ~ A+B+C+D+E+A*B+A*C+A*D+A*E+B*E+C*E+D*E+A*B*E+A*C*E+A*D*E,
data = Q2Data)
anova(Q2Model)
## Analysis of Variance Table
## Response: FreeHeight
## Residuals 32 0.62707 0.01960
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Sum Sq Mean Sq F value Pr(>F)
1 0.70325 0.70325 35.8878 1.115e-06 ***
1 0.32177 0.32177 16.4203 0.0003024 ***
1 0.02950 0.02950 1.5055 0.2287738
1 0.09992 0.09992 5.0990 0.0308933 *
1 0.68402 0.68402 34.9063 1.416e-06 ***
1 0.01050 0.01050 0.5359 0.4694506
1 0.00002 0.00002 0.0010 0.9755153
1 0.00630 0.00630 0.3216 0.5746027
1 0.04877 0.04877 2.4887 0.1245000
1 0.28060 0.28060 14.3195 0.0006397 ***
1 0.01300 0.01300 0.6635 0.4213435
1 0.01880 0.01880 0.9595 0.3346624
1 0.00005 0.00005 0.0027 0.9592041
1 0.00460 0.00460 0.2349 0.6312514
1 0.04260 0.04260 2.1740 0.1501284
## $legend
## A=A B=B C=C D=D E=E
## $aliases
## A = B:C:D
## B = A:C:D
## C = A:B:D
## D = A:B:C
## E = A:B:C:D:E
## A:B = C:D
## A:C = B:D
## A:D = B:C
## A:E = B:C:D:E
## B:E = A:C:D:E
## C:E = A:B:D:E
## D:E = A:B:C:E
## A:B:E = C:D:E
## A:C:E = B:D:E
## A:D:E = B:C:E
No other factor interactions can be included as they are aliased with other effects (see the alias structure from part a)). The factors that influence the free height are A, B, D and E.
(d) Analyse the data using the mean of the free height. What factors influence the mean free height? Generating the average response over the 3 replicates.
Building the data including the design and the avarage of the responce over the 3 replicates.
averesponse<- (Q2Data[1:16,6] + Q2Data[17:32,6] + Q2Data[33:48,6])/3
averesponse
## [1] 7.790000 8.070000 7.520000 7.633333 7.806667 7.946667 7.506667 7.686667
## [9] 7.290000 7.733333 7.520000 7.646667 7.400000 7.623333 7.203333 7.633333
DesignAndAveResponse <- Q2Data[1:16,1:5]
DesignAndAveResponse$y <- averesponse
head(DesignAndAveResponse)
##ABCDE y ## 1 -1 -1 -1 -1 -1 7.790000 ## 2 1 -1 -1 1 -1 8.070000 ## 3 -1 1 -1 1 -1 7.520000 ## 4 1 1 -1 -1 -1 7.633333 ## 5 -1 -1 1 1 -1 7.806667 ## 6 1 -1 1 -1 -1 7.946667
Fitting the maximum model as this is determined by the alias structure of the design.
Q2Model2 <- lm(formula = y ~ A+B+C+D+E+A*B+A*C+A*D+A*E+B*E+C*E+D*E,
data = DesignAndAveResponse)
anova(Q2Model2)
## Analysis of Variance Table
## Response: y
## Residuals 3 0.015752 0.005251
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Sum Sq Mean Sq F value Pr(>F)
1 0.234417 0.234417 44.6450 0.006837 **
1 0.107256 0.107256 20.4271 0.020252 *
1 0.009834 0.009834 1.8729 0.264620
1 0.033306 0.033306 6.3432 0.086285 .
1 0.228006 0.228006 43.4240 0.007112 **
1 0.003501 0.003501 0.6667 0.474008
1 0.000006 0.000006 0.0012 0.974645
1 0.002101 0.002101 0.4001 0.571967
1 0.016256 0.016256 3.0960 0.176716
1 0.093534 0.093534 17.8136 0.024314 *
1 0.004334 0.004334 0.8254 0.430581
1 0.006267 0.006267 1.1936 0.354482
The factors that influence the free height(y) are A, B and E. Factor D is no longer significant using a significant level a=5%. The best way to analise the data when we have replications is to apply the analysis keeping all the data over the replications and nor summing or averaging as we did here.
(e) Analyse the residuals from this experiment, and comment on your findings.
Q2Model <- lm(formula = FreeHeight ~ A+B+C+D+E+A*B+A*C+A*D+A*E+B*E+C*E+D*E,
data = Q2Data)
anova(Q2Model)
## Analysis of Variance Table
## Response: FreeHeight
## Residuals 35 0.67432 0.01927
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Sum Sq Mean Sq F value Pr(>F)
1 0.70325 0.703
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com