程序代写 MATH1302

Lab session 4: Solutions

Lab session 4: Solutions

Copyright By PowCoder代写 加微信 powcoder

RMIT University-MATH1302

20/04/2022

# List of packages required for this analysis
pkg <- c("mosaic", "ggplot2", "tidyverse", "car", "randtests", "stats", "dplyr", "agricolae","nlme","lme4", "emmeans", "crossdes", "AlgDesign", "DoE.base", "FrF2","SixSigma","ggfortify","nortest","lmtest") # Check if packages are not installed and assign the # names of the packages not installed to the variable new.pkg new.pkg <- pkg[!(pkg %in% installed.packages())] # If there are any packages in the list that aren't installed, # install them if (length(new.pkg)>0) {
install.packages(new.pkg, repos = “http://cran.rstudio.com”)

# Load the packages into R
library(mosaic) # favstats()
library(ggplot2)
library(tidyverse) # summarise()
library(car) #ncvTest()
library(randtests) # runs.test()
library(stats) # TukeyHSD()
library(dplyr) # mutate()
library(agricolae) # LSD.test()
library(nlme)
library(lme4) # lmer()
library(emmeans) # emmeans()
library(crossdes) # find.BIB()
#library(multcomp)
library(AlgDesign) # gen.factorial()
library(DoE.base)
library(FrF2)
library(SixSigma)
library(ggfortify)
library(nortest)
library(lmtest)

Question 1: Full Factorial Designs

A typical golfer uses the putter for about 40% on strokes. Improving one’s putting is a logical and perhaps simple way to improve a golf score. An experiment was conducted to study the effects of four factors on putting accuracy. The design factors are length of putt, type of putter, breaking putt versus straight putt, and level versus downhill putt. The response variable is distance from the ball to the centre of the cup after the ball comes to rest. One golfer performs the experiment, a 24 full factorial design with seven replicates was used, and all putts are made in random order. The results are shown in the next Table.

Input the data into the software.

Design <- expand.grid(Length = gl(2, 1, labels = c("10", "30")), Type = gl(2, 1, labels = c("Mallet", "Cavity back")), Break = gl(2, 1, labels = c("Straight", "Breaking")), Slope = gl(2, 1, labels = c("Level", "Downhill"))) # Replicates Q1Data <- data.frame(repl = rep(1:7, each = 16), rbind(Design)) # Add the Response Q1Data$response <- c(10, 0, 4, 0, 0, 5, 6.5, 16.5, 4.5, 19.5, 15, 41.5, 8, 21.5, 0, 18, 18, 16.5, 6, 10, 0, 20.5, 18.5, 4.5, 18, 18, 16, 39, 4.5, 10.5, 0, 5, 14, 4.5, 1, 34, 18.5, 18, 7.5, 0, 14.5, 16, 8.5, 6.5, 6.5, 6.5, 0, 7, 12.5, 17.5, 14.5, 11, 19.5, 20, 6, 23.5, 10, 5.5, 0, 3.5, 10, 0, 4.5, 10, 19, 20.5, 12, 25.5, 16, 29.5, 0, 8, 0, 10, 0.5, 7, 13, 15.5, 1, 32.5, 16, 17.5, 14, 21.5, 15, 19, 10, 8, 17.5, 7, 9, 8.5, 41, 24, 4, 18.5, 18.5, 33, 5, 0, 11, 10, 0, 8, 6, 36, 3, 36, 14, 16, 6.5, 8 ) Export the data to csv file write.csv(Q1Data,"Q1Data.csv", row.names = TRUE) # to Export Data as a CSV file from RStudio Display the structure of the data. str(Q1Data) ## 'data.frame': 112 obs. of 6 variables: ## $ repl : int 1 1 1 1 1 1 1 1 1 1 ... ## $ Length : Factor w/ 2 levels "10","30": 1 2 1 2 1 2 1 2 1 2 ... ## $ Type : Factor w/ 2 levels "Mallet","Cavity back": 1 1 2 2 1 1 2 2 1 1 ... ## $ Break : Factor w/ 2 levels "Straight","Breaking": 1 1 1 1 2 2 2 2 1 1 ... ## $ Slope : Factor w/ 2 levels "Level","Downhill": 1 1 1 1 1 1 1 1 2 2 ... ## $ response: num 10 0 4 0 0 5 6.5 16.5 4.5 19.5 ... a) (a) Analyse the data from this experiment by fitting an ANOVA model with intercept, main effects, and interactions of any order. Which of the main effects are significantly affect putting performance? We need a linear model of intercept, main effects, and interactions of any order. ourModel <- lm(formula = response ~ Length + Type + Break + Slope + Length * Type + Length * Break + Length * Slope + Type * Break + Type * Slope + Break * Slope + Length * Type * Break + Length * Type * Slope + Length * Break * Slope + Type * Break * Slope + Length * Type * Break * Slope, data = Q1Data) anova(ourModel) ## Analysis of Variance Table ## Response: response ## Df Sum Sq Mean Sq F value Pr(>F)
## Length 1 917.1 917.15 10.5878 0.001572 **
## Type 1 388.1 388.15 4.4809 0.036862 *
## Break 1 145.1 145.15 1.6756 0.198615
## Slope 1 1.4 1.40 0.0161 0.899280
## Length:Type 1 218.7 218.68 2.5245 0.115377
## Length:Break 1 11.9 11.90 0.1373 0.711776
## Length:Slope 1 93.8 93.81 1.0829 0.300658
## Type:Break 1 115.0 115.02 1.3278 0.252054
## Type:Slope 1 56.4 56.43 0.6515 0.421588
## Break:Slope 1 1.6 1.63 0.0188 0.891271
## Length:Type:Break 1 7.3 7.25 0.0837 0.772939
## Length:Type:Slope 1 113.0 113.00 1.3045 0.256228
## Length:Break:Slope 1 39.5 39.48 0.4558 0.501207
## Type:Break:Slope 1 33.8 33.77 0.3899 0.533858
## Length:Type:Break:Slope 1 95.6 95.65 1.1042 0.295994
## Residuals 96 8315.8 86.62
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

SSE=sum(ourModel$residuals**2)

## [1] 8315.786

From the ANOVA table, it seems that only the main effects of Length and Type are significant. Note that if we are interested in seen the actual model and the coefficients we need to use the coded levels for the variables.

Second alternative solution for question 1 (a). Two similar generations with the fac.design function.

plan <- fac.design(nlevels = c(2,2,2,2), replications=7, randomize=FALSE, block.name="Replications", factor.names=c("Length","Type","Break","Slope")) ## creating full factorial with 16 runs ... ## run.no run.no.std.rp Length Type Break Slope Replications ## 1 1 1.1 1 1 1 1 .1 ## 2 2 2.1 2 1 1 1 .1 ## 3 3 3.1 1 2 1 1 .1 ## 4 4 4.1 2 2 1 1 .1 ## 5 5 5.1 1 1 2 1 .1 ## 6 6 6.1 2 1 2 1 .1 ## 7 7 7.1 1 2 2 1 .1 ## 8 8 8.1 2 2 2 1 .1 ## 9 9 9.1 1 1 1 2 .1 ## 10 10 10.1 2 1 1 2 .1 ## 11 11 11.1 1 2 1 2 .1 ## 12 12 12.1 2 2 1 2 .1 ## 13 13 13.1 1 1 2 2 .1 ## 14 14 14.1 2 1 2 2 .1 ## 15 15 15.1 1 2 2 2 .1 ## 16 16 16.1 2 2 2 2 .1 ## 17 17 1.2 1 1 1 1 .2 ## 18 18 2.2 2 1 1 1 .2 ## 19 19 3.2 1 2 1 1 .2 ## 20 20 4.2 2 2 1 1 .2 ## 21 21 5.2 1 1 2 1 .2 ## 22 22 6.2 2 1 2 1 .2 ## 23 23 7.2 1 2 2 1 .2 ## 24 24 8.2 2 2 2 1 .2 ## 25 25 9.2 1 1 1 2 .2 ## 26 26 10.2 2 1 1 2 .2 ## 27 27 11.2 1 2 1 2 .2 ## 28 28 12.2 2 2 1 2 .2 ## 29 29 13.2 1 1 2 2 .2 ## 30 30 14.2 2 1 2 2 .2 ## 31 31 15.2 1 2 2 2 .2 ## 32 32 16.2 2 2 2 2 .2 ## 33 33 1.3 1 1 1 1 .3 ## 34 34 2.3 2 1 1 1 .3 ## 35 35 3.3 1 2 1 1 .3 ## 36 36 4.3 2 2 1 1 .3 ## 37 37 5.3 1 1 2 1 .3 ## 38 38 6.3 2 1 2 1 .3 ## 39 39 7.3 1 2 2 1 .3 ## 40 40 8.3 2 2 2 1 .3 ## 41 41 9.3 1 1 1 2 .3 ## 42 42 10.3 2 1 1 2 .3 ## 43 43 11.3 1 2 1 2 .3 ## 44 44 12.3 2 2 1 2 .3 ## 45 45 13.3 1 1 2 2 .3 ## 46 46 14.3 2 1 2 2 .3 ## 47 47 15.3 1 2 2 2 .3 ## 48 48 16.3 2 2 2 2 .3 ## 49 49 1.4 1 1 1 1 .4 ## 50 50 2.4 2 1 1 1 .4 ## 51 51 3.4 1 2 1 1 .4 ## 52 52 4.4 2 2 1 1 .4 ## 53 53 5.4 1 1 2 1 .4 ## 54 54 6.4 2 1 2 1 .4 ## 55 55 7.4 1 2 2 1 .4 ## 56 56 8.4 2 2 2 1 .4 ## 57 57 9.4 1 1 1 2 .4 ## 58 58 10.4 2 1 1 2 .4 ## 59 59 11.4 1 2 1 2 .4 ## 60 60 12.4 2 2 1 2 .4 ## 61 61 13.4 1 1 2 2 .4 ## 62 62 14.4 2 1 2 2 .4 ## 63 63 15.4 1 2 2 2 .4 ## 64 64 16.4 2 2 2 2 .4 ## 65 65 1.5 1 1 1 1 .5 ## 66 66 2.5 2 1 1 1 .5 ## 67 67 3.5 1 2 1 1 .5 ## 68 68 4.5 2 2 1 1 .5 ## 69 69 5.5 1 1 2 1 .5 ## 70 70 6.5 2 1 2 1 .5 ## 71 71 7.5 1 2 2 1 .5 ## 72 72 8.5 2 2 2 1 .5 ## 73 73 9.5 1 1 1 2 .5 ## 74 74 10.5 2 1 1 2 .5 ## 75 75 11.5 1 2 1 2 .5 ## 76 76 12.5 2 2 1 2 .5 ## 77 77 13.5 1 1 2 2 .5 ## 78 78 14.5 2 1 2 2 .5 ## 79 79 15.5 1 2 2 2 .5 ## 80 80 16.5 2 2 2 2 .5 ## 81 81 1.6 1 1 1 1 .6 ## 82 82 2.6 2 1 1 1 .6 ## 83 83 3.6 1 2 1 1 .6 ## 84 84 4.6 2 2 1 1 .6 ## 85 85 5.6 1 1 2 1 .6 ## 86 86 6.6 2 1 2 1 .6 ## 87 87 7.6 1 2 2 1 .6 ## 88 88 8.6 2 2 2 1 .6 ## 89 89 9.6 1 1 1 2 .6 ## 90 90 10.6 2 1 1 2 .6 ## 91 91 11.6 1 2 1 2 .6 ## 92 92 12.6 2 2 1 2 .6 ## 93 93 13.6 1 1 2 2 .6 ## 94 94 14.6 2 1 2 2 .6 ## 95 95 15.6 1 2 2 2 .6 ## 96 96 16.6 2 2 2 2 .6 ## 97 97 1.7 1 1 1 1 .7 ## 98 98 2.7 2 1 1 1 .7 ## 99 99 3.7 1 2 1 1 .7 ## 100 100 4.7 2 2 1 1 .7 ## 101 101 5.7 1 1 2 1 .7 ## 102 102 6.7 2 1 2 1 .7 ## 103 103 7.7 1 2 2 1 .7 ## 104 104 8.7 2 2 2 1 .7 ## 105 105 9.7 1 1 1 2 .7 ## 106 106 10.7 2 1 1 2 .7 ## 107 107 11.7 1 2 1 2 .7 ## 108 108 12.7 2 2 1 2 .7 ## 109 109 13.7 1 1 2 2 .7 ## 110 110 14.7 2 1 2 2 .7 ## 111 111 15.7 1 2 2 2 .7 ## 112 112 16.7 2 2 2 2 .7 ## class=design, type= full factorial ## NOTE: columns run.no and run.no.std.rp are annotation, ## not part of the data frame Add the response to the design. response <- c(10, 0, 4, 0, 0, 5, 6.5, 16.5, 4.5, 19.5, 15, 41.5, 8, 21.5, 0, 18, 18, 16.5, 6, 10, 0, 20.5, 18.5, 4.5, 18, 18, 16, 39, 4.5, 10.5, 0, 5, 14, 4.5, 1, 34, 18.5, 18, 7.5, 0, 14.5, 16, 8.5, 6.5, 6.5, 6.5, 0, 7, 12.5, 17.5, 14.5, 11, 19.5, 20, 6, 23.5, 10, 5.5, 0, 3.5, 10, 0, 4.5, 10, 19, 20.5, 12, 25.5, 16, 29.5, 0, 8, 0, 10, 0.5, 7, 13, 15.5, 1, 32.5, 16, 17.5, 14, 21.5, 15, 19, 10, 8, 17.5, 7, 9, 8.5, 41, 24, 4, 18.5, 18.5, 33, 5, 0, 11, 10, 0, 8, 6, 36, 3, 36, 14, 16, 6.5, 8 ) plan$response<-response Analyse the residuals from this experiment. Any indications of model inadequacy?. ourModel <- lm(formula = response ~ Length + Type + Break + Slope + Length * Type + Length * Break + Length * Slope + Type * Break + Type * Slope + Break * Slope + Length * Type * Break + Length * Type * Slope + Length * Break * Slope + Type * Break * Slope + Length * Type * Break * Slope, data = plan) anova(ourModel) ## Analysis of Variance Table ## Response: response ## Df Sum Sq Mean Sq F value Pr(>F)
## Length 1 917.1 917.15 10.5878 0.001572 **
## Type 1 388.1 388.15 4.4809 0.036862 *
## Break 1 145.1 145.15 1.6756 0.198615
## Slope 1 1.4 1.40 0.0161 0.899280
## Length:Type 1 218.7 218.68 2.5245 0.115377
## Length:Break 1 11.9 11.90 0.1373 0.711776
## Length:Slope 1 93.8 93.81 1.0829 0.300658
## Type:Break 1 115.0 115.02 1.3278 0.252054
## Type:Slope 1 56.4 56.43 0.6515 0.421588
## Break:Slope 1 1.6 1.63 0.0188 0.891271
## Length:Type:Break 1 7.3 7.25 0.0837 0.772939
## Length:Type:Slope 1 113.0 113.00 1.3045 0.256228
## Length:Break:Slope 1 39.5 39.48 0.4558 0.501207
## Type:Break:Slope 1 33.8 33.77 0.3899 0.533858
## Length:Type:Break:Slope 1 95.6 95.65 1.1042 0.295994
## Residuals 96 8315.8 86.62
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

Plotting the residuals

plot(ourModel, 1) # residuals versus fitted values

plot(ourModel, 2) # QQ plot

Check assumptions statistically:

Residual Test:

a. Equality of variances – homogeneity

\(H_0\): variances are equal

\(H_1\): at least one variance is different

ncvTest(ourModel)

## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 18.09453, Df = 1, p = 2.102e-05

The corresponding p-value (\(0.000021\)) is less than \(\alpha = 0.05\). Therefore, we reject the null hypothesis which means the model does not have a constant variance or “homogeneity”.

b. Normality of Residual

\(H_0\): Errors are normally distributed

\(H_1\): Errors are not normally distributed

y = rstudent(ourModel)
shapiro.test(y)

## Shapiro-Wilk normality test
## data: y
## W = 0.97185, p-value = 0.0181

The p-Value \(0.0181\) is less than the significant level \(\alpha = 0.05\). So, we conclude that we reject the null hypothesis. Therefore, there is statistically significant evident that the stochastic component of this model is not normally distributed.

c. Auto Correlation Function:

\(H_0\): Errors are uncorrelated

\(H_1\): Errors are correlated

acf(ourModel$residuals)

The ACF plot confirms that there are only a few lags exceeded the horizontal dashed line. Therefore, it seems that we might need to reject the null hypothesis which means we have autocorrelation on residual model.

durbinWatsonTest(ourModel)

## lag Autocorrelation D-W Statistic p-value
## 1 0.137223 1.717472 0.186
## Alternative hypothesis: rho != 0

The p-Value \(0.19\) is much greater than the significant level \(\alpha = 0.05\). So, we fail to reject the null hypothesis. Therefore, there is no statistically significant evident that the residuals of this model have autocorrelation.

d. Randomness

\(H_0\): Errors are random

\(H_1\): Errors are not random

runs.test(rstudent(ourModel))

## Runs Test
## data: rstudent(ourModel)
## statistic = -3.0374, runs = 41, n1 = 56, n2 = 56, n = 112, p-value =
## 0.002386
## alternative hypothesis: nonrandomness

The corresponding p-value \(0.002386\) is less than the significant level \(\alpha = 0.05\). Therefore, we reject null hypothesis. which means the errors on this model are not random. The analysis of the residuals indicates that the assumptions are not satisfied.

Question 2: Fractional Factorial Designs

A replicated fractional factorial design is used to investigate the effect of five factors on the free height of leaf springs used in an automotive application. The factors are:

A = furnace temperature,
B = heating time,
C = transfer time,
D = hold down time, and
E = quench oil temperature
The collected data are as shown below

Input the data into the software.

Q2Data=read.csv(file=”Q2Data2.csv”,header = TRUE)
head(Q2Data)

## A B C D E FreeHeight Rep
## 1 -1 -1 -1 -1 -1 7.78 1
## 2 1 -1 -1 1 -1 8.15 1
## 3 -1 1 -1 1 -1 7.50 1
## 4 1 1 -1 -1 -1 7.59 1
## 5 -1 -1 1 1 -1 7.54 1
## 6 1 -1 1 -1 -1 7.69 1

DesignAndResponse<-Q2Data[1:16,1:6] head(DesignAndResponse) ## A B C D E FreeHeight ## 1 -1 -1 -1 -1 -1 7.78 ## 2 1 -1 -1 1 -1 8.15 ## 3 -1 1 -1 1 -1 7.50 ## 4 1 1 -1 -1 -1 7.59 ## 5 -1 -1 1 1 -1 7.54 ## 6 1 -1 1 -1 -1 7.69 We have 5 factors and 16 runs so that means this a half fraction so we are looking for just 1 defining relationship. aliases(lm(FreeHeight ~ (.)^2, data = DesignAndResponse), code=TRUE) ## $legend ## A=A B=B C=C D=D E=E ## $aliases ## A:B = C:D ## A:C = B:D ## A:D = B:C We can see that the the 2fi are fully aliased as shown above. We can see that D=ABC.To write down all the aliased structure we set up to 5 fi in R and we get: aliases(lm(FreeHeight ~ (.)^5, data = DesignAndR 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com