Lab session 3: Solutions
RMIT University-MATH1302 23/03/2022
# List of packages required for this analysis
pkg <- c("mosaic", "ggplot2", "tidyverse", "car", "randtests", "stats", "dplyr", "agricolae","nlme","lm
Copyright By PowCoder代写 加微信 powcoder
e4″, “emmeans”, “crossdes”, “AlgDesign”, “DoE.base”)
# Check if packages are not installed and assign the
# names of the packages not installed to the variable new.pkg new.pkg <- pkg[!(pkg %in% installed.packages())]
# If there are any packages in the list that aren't installed, # install them
if (length(new.pkg)) {
install.packages(new.pkg, repos = "http://cran.rstudio.com")
# Load the packages into R library(mosaic) # favstats() library(ggplot2) library(tidyverse) # summarise() # library(car) #ncvTest()
# library(randtests) # runs.test() library(stats) # TukeyHSD() library(dplyr) # mutate() #library(agricolae) # LSD.test() library(nlme)
library(lme4) # lmer() library(emmeans) # emmeans() library(crossdes) # find.BIB() #library(multcomp) library(AlgDesign) # gen.factorial() library(DoE.base)
Question 1: Balance Incomplete Block Design
As an example, suppose that a chemical engineer thinks that the time of reaction for a chemical process is a function of the type of catalyst employed. Four catalysts are currently being investigated. The experimental procedure consists of selecting a batch of raw material, loading the pilot plant, applying each catalyst in a separate run of the pilot plant, and observing the reaction time. Because variations in the batches of raw material may affect the performance of the catalysts, the engineer decides to use batches of raw material as blocks. However, each batch is only large enough to permit three catalysts to be run. Therefore, a randomized incomplete block design must be used. The balanced incomplete block design for this experiment, along with the observations recorded, is shown in next Table. The order in which the catalysts are run in each block is randomized.
Treatment/ Block 1 2 3 4
Treatment/ Block
a) Input the data into the software.
75 67 72 73 75 68
Catalyst <- data.frame(Treatment = c(1,1,1,
head(Catalyst)
## Treatment Batch Time
## 1 1 1 73
## 2 1 2 74
## 3 1 4 71
## 4 2 2 75
## 5 2 3 67
## 6 2 4 72
Batch = c(1,2,4,
Time = c(73, 74, 71,
75, 67, 72,
73, 75, 68,
75, 72, 75))
Export the data to the desktop
Display the structure of the data.
b) How many times each pair of treatments appears in the same block?
write.csv(Catalyst,"Catalyst.csv", row.names = TRUE) # to Export DataFrame as a CSV file from RStudio t o desktop
str(Catalyst)
## 'data.frame': 12 obs. of 3 variables:
## $ Treatment: num
## $ Batch : num
## $ Time : num
1 1 1 2 2 2 3 3 3 4 ...
1 2 4 2 3 4 1 2 3 1 ...
73 74 71 75 67 72 73 75 68 75 ...
a=4 # treatments,
b=4 # blocks,
k=3 # each block contains k treatments,
r=3 # each treatment occurs r times in the design (or is replicated r times),
lambda<- r*(k-1)/(a-1) # the number of times each pair of treatments appears in the same block lambda
c) Test if thereis a difference between the four treatments (Catalysts)? draw your conclusions using α = 5%. 1. we should treat the Treatment and Batch variables as factors.
Catalyst <- Catalyst %>% mutate(Treatment = factor(Treatment),
Batch = factor(Batch))
str(Catalyst)
## ‘data.frame’: 12 obs. of 3 variables:
## $ Treatment: Factor w/ 4 levels “1”,”2″,”3″,”4″: 1 1 1 2 2 2 3 3 3 4 …
## $ Batch : Factor w/ 4 levels “1”,”2″,”3″,”4″: 1 2 4 2 3 4 1 2 3 1 …
## $ Time : num 73 74 71 75 67 72 73 75 68 75 …
2. state your hypothesis testing for catalysts: .
where are the treatment effects of the three different lubricating oils. 3. we might looks into the Descriptive Statistics:
Compute the mean and the SD (standard deviation) of the reaction time by different Ingredients:
Catalyst %>%
group_by(Treatment) %>%
summarise(n = n(),
mean = mean(Time, na.rm = TRUE),
sd = sd(Time, na.rm = TRUE),
stderr = sd/sqrt(n),
LCL = mean – qt(1 – (0.05 / 2), n – 1) * stderr,
UCL = mean + qt(1 – (0.05 / 2), n – 1) * stderr,
median = median(Time, na.rm = TRUE),
min = min(Time, na.rm = TRUE),
max = max(Time, na.rm = TRUE),
IQR = IQR(Time, na.rm = TRUE))
## # A tibble: 4 x 11
## Treatment n mean sd stderr LCL UCL median min max IQR
##
3 72.7 1.53
3 71.3 4.04
3 72 3.61
3 74 1.73 1 69.7 78.3 75 72 75 1.5
0.882 68.9 76.5
2.33 61.3 81.4
2.08 63.0 81.0
Also, you can graphically check if there is a difference between the distribution using Box plot. Note that you include the jittered dot.
0 ≠ iτenotsaelta : aH 0= 4τ= 3τ= 2τ= 1τ: 0H
Apply BIBD design into the linear regression model.
# When Treatments (adjusted for blocks)
model <- lm(formula = Time ~ Batch + Treatment, data=Catalyst)
Treatments_adj<-anova(model)
Treatments_adj
## Analysis of Variance Table
## Response: Time
## Df Sum Sq Mean Sq F value Pr(>F)
## Batch 3 55.00 18.3333 28.205 0.001468 **
## Treatment 3 22.75 7.5833 11.667 0.010739 *
## Residuals 5 3.25 0.6500
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
ggplot(Catalyst, aes(x = Treatment, y = Time, fill = Treatment)) +
geom_boxplot() +
geom_jitter(shape = 15,
color = “steelblue”,
position = position_jitter(0.21)) +
theme_classic() +
labs( title= “Comparative Box Plot”)
## Analysis of Variance Table
## Response: Time
## Df Sum Sq Mean Sq F value Pr(>F)
## Treatment 3 11.667 3.8889 5.9829 0.0414634 *
## Batch 3 66.083 22.0278 33.8889 0.0009528 ***
## Residuals 5 3.250 0.6500
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
Did you notice anything?
One of the challenges with our previous analysis is that lm() ran the analysis and considered our block effect as a fixed effect. lm() uses ordinary least squares as the methodology for calculating the ANOVA table. In other words, we can easily calculate all the SS, MS, F, by hand if we wanted to. When adding a random effect into our models we need to take a slightly different approach. This is the first change in the methodology of our ANOVAs – moving to a REML or Restricted Maximum Likelihood methodology. This is where the lmer() package comes into play.
Let’s start by reviewing our statistical model above. Recognizing that we need to find a way to let R know that the block effect is a random effect in our model. Let’s create a new object called rmodel with our new model:
Apply BIBD design into the mixed model.
Notice how we set out the random effect of block in this model – (1|block).
to see the ANOVA table from the LMER package we use the anova() function
Here, we might use two different approach in order to conclude on the output. a. F-test.
rmodel <- lmer(Time ~ (1|Batch) + Treatment, data=Catalyst)
anova(rmodel)
## Analysis of Variance Table
## npar Sum Sq Mean Sq F value
## Treatment 3 22.247 7.4158 11.409
is rejected if
, We assumed
qf(p=.05, df1=3, df2=5, lower.tail=FALSE) # To find F critical value
## [1] 5.409451
# # When Block (adjusted for Treatments)
model_1 <- lm(formula = Time ~ Treatment + Batch, data=Catalyst)
Block_adj<-anova(model_1)
)1+b−a−N(,1−a,αF ≥ 0F 154904.5 = 5,3,50.0F = )1+b−a−N(,1−a,αF
50.0 = α 904.11 = seRSM/RSM = 0F )1+b−a−N(,1−a,αF > 0F 0H
As , we can reject the null hypothesis which means that at leat one of the treatment effects is significant.
b. p-value.
To calculate the p-value you need to add an extra piece of syntax
As the ANOVA table shown, the p-value of treatment is .So, we can reject the null hypothesis which mean at leat one of the treatment effects is significant.
d) Obtain the estimates of treatment means (i.e., the adjusted means or the least square means) and the standard error of the difference between two treatment mean estimates (i.e. the standard error of τˆi − τˆj ).
the estimates of treatment means
1-pf(11.409,3,5)
## [1] 0.0112646
emmeans(rmodel, ~ Treatment)
## Treatment emmean SE df lower.CL upper.CL
71.4 1.5 3.51
71.6 1.5 3.51
72.0 1.5 3.51
75.0 1.5 3.51
67.0 75.8
67.2 76.0
67.6 76.4
70.6 79.4
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
the standard error of the difference between two treatment mean estimates
# The standard error of the difference between two treatment mean estimates is
S= sqrt((2*k*Treatments_adj$`Mean Sq`[3])/(lambda*a))
## [1] 0.698212
e) Compare all pairs of treatment means by using the Tukey tests.
Tukey2<-aov(formula = Time ~ Batch + Treatment, data=Catalyst)
Tuk3<-TukeyHSD(Tukey2)
Tuk3$Treatment
## diff lwr upr p adj
## 2-1 0.2222222 -2.2067758 2.651220 0.98524773
## 3-1 0.5555556 -1.8734425 2.984554 0.83239470
## 4-1 3.2222222 0.7932242 5.651220 0.01658273
## 3-2 0.3333333 -2.0956647 2.762331 0.95407666
## 4-2 3.0000000 0.5710020 5.428998 0.02220124
## 4-3 2.6666667 0.2376686 5.095665 0.03526978
)ESMa×λ(√ = ES k×2
)1+b−a−N(,1−a,αF ≥ 0F
We are not interested in the post hoc analysis for the blocking factors. The Tukey method shows that there is significantly difference between the levels of the Ingredients (treatment). The ingredient is significantly different from any of the other ingredients , , . Ingredient is is not significantly different from the ingredients and . Ingredient is not significantly different from and . See plot below.
#par(mfrow=c(3,1))
plot(Tuk3)
31 232 1321
f) Generate a block design with five treatments in four blocks of three elements. Generate the design in a Table and test if this design is a Balanced Incomplete Block Design (BIBD)?
my.design_1<-find.BIB(5, 4, 3)
my.design_1
Generate the design in Table:
Treatment/ Block 1 2 3 4
This design is not a BIBD because the treatments are not all repeated the same number of times in the design and we can check this with the ‘isGYD’ function. For this example:
g) Generate a block design with five treatments in seven blocks of three elements. Generate the design in a Table and test if this design is a Balanced Incomplete Block Design (BIBD)?
isGYD(my.design_1)
## [1] The design is neither balanced w.r.t. rows nor w.r.t. columns.
my.design_2 <-find.BIB(7, 7, 3)
my.design_2
## [,1] [,2] [,3]
## [1,] 1 4 5
## [2,] 1 2 3
## [3,] 1 6 7
## [4,] 3 4 7
## [5,] 2 4 6
## [6,] 3 5 6
## [7,] 2 5 7
Generate the design in Table: HW
Question 2: Full Factorial Designs a) Generate a full factorial design with two variables each with three levels.
isGYD(my.design_2)
## [1] The design is a balanced incomplete block design w.r.t. rows.
## [,1] [,2] [,3]
## [1,] 1 3 4
## [2,] 1 2 5
## [3,] 2 3 4
## [4,] 3 4 5
## X1 X2
## 1 -1 -1
## 2 0 -1
## 3 1 -1
## 4 -1 0
## 5 0 0
## 6 1 0
## 7 -1 1
## 8 0 1
## 9 1 1
gen.factorial(3,3,factors="all")
## X1 X2 X3
## 1 1 1 1
## 2 2 1 1
## 3 3 1 1
## 4 1 2 1
## 5 2 2 1
## 6 3 2 1
## 7 1 3 1
## 8 2 3 1
## 9 3 3 1
## 10 1 1 2
## 11 2 1 2
## 12 3 1 2
## 13 1 2 2
## 14 2 2 2
## 15 3 2 2
## 16 1 3 2
## 17 2 3 2
## 18 3 3 2
## 19 1 1 3
## 20 2 1 3
## 21 3 1 3
## 22 1 2 3
## 23 2 2 3
## 24 3 2 3
## 25 1 3 3
## 26 2 3 3
## 27 3 3 3
b) Generate a full factorial design with three variables, two with three levels and one with two levels. Name the variables as A, B and C, respectively
gen.factorial(c(3,3,2), varNames=c("A","B","C"))
gen.factorial(c(3,3))
c) Generate a full factorial design with their variables, one with two levels, one with three levels and one with six levels. Name the variables as A, B and C respectively
#blocking designs
fac.design(nlevels=c(2,3,6), factor.names = c("A","B", "C"), blocks=6, seed=12345)
## Warning in fac.design(nlevels = c(2, 3, 6), factor.names = c("A", "B", "C"), :
## confounding of blocks with 2-factor interactions
## Warning in fac.design(nlevels = c(2, 3, 6), factor.names = c("A", "B", "C"), :
## confounding of blocks with 2-factor interactions
## creating full factorial with 36 runs ...
## 5 0 0 -1
## 6 1 0 -1
## 7 -1 1 -1
## 8 0 1 -1
## 9 1 1 -1
## 10 -1 -1 1
## 11 0 -1 1
## 12 1 -1 1
## 13 -1 0 1
## 14 0 0 1
## 15 1 0 1
## 16 -1 1 1
## 17 0 1 1
## 18 1 1 1
d) Generate a full factorial design with two variables each with three levels, but this time with 4 replications. Name the variables as Matype and Temp, respectively
fac.design(nlevels=c(3,3), factor.names = c("Matype","Temp"),
replications=4, block.name="Replications")
## creating full factorial with 9 runs ...
## run.no run.no.std.rp Blocks A B C
## 36 36
## class=design, type= full factorial.blocked
## NOTE: columns run.no and run.no.std.rp are annotation,
## not part of the data frame
1 34.1.6
2 24.1.4
3 8.1.2
4 17.1.3
5 1.1.1
6 27.1.5
1 22 6 1 23 4 1 21 2 1 13 3 1 11 1 1 12 5
run.no run.no.std.rp Blocks A B C
7 36.2.6
8 20.2.4
9 29.2.5
10 10.2.2
11 13.2.3
12 3.2.1
223 6 221 4 213 5 222 2 211 3 212 1
run.no run.no.std.rp Blocks A B C
13 25.3.5
14 32.3.6
15 5.3.1
16 22.3.4
17 15.3.3
18 12.3.2
311 5 321 6 313 1 322 4 312 3 323 2
run.no run.no.std.rp Blocks A B C
19 33.4.6
20 7.4.2
21 28.4.5
22 18.4.3
23 23.4.4
24 2.4.1
412 6 411 2 422 5 423 3 413 4 421 1
run.no run.no.std.rp Blocks A B C
25 14.5.3
26 30.5.5
27 9.5.2
28 4.5.1
29 19.5.4
30 35.5.6
521 3 523 5 512 2 522 1 511 4 513 6
run.no run.no.std.rp Blocks A B C
16.6.3 622 3 31.6.6 611 6 26.6.5 621 5
6.6.1 623 1 21.6.4 612 4 11.6.2 613 2
e) As an example of a factorial design involving two factors, an engineer is designing a battery for use in a device that will be subjected to some extreme variations in temperature. The only design parameter that he can select at this point is the plate material for the battery, and he has three possible choices. When the device is manufactured and is shipped to the field, the engineer has no control over the temperature extremes that the device will encounter, and he knows from experience that temperature will probably affect the effective battery life. However, temperature can be controlled in the product development laboratory for the purposes of a test.
## run.no run.no.std.rp Matype Temp Replications
## 1 1 3.1 3 1 .1
## 2 2 9.1 3 3 .1
## 3 3 4.1 1 2 .1
## 4 4 8.1 2 3 .1
## 5 5 5.1 2 2 .1
## 6 6 2.1 2 1 .1
## 7 7 1.1 1 1 .1
## 8 8 6.1 3 2 .1
## 9 9 7.1 1 3 .1
## 36 36
## class=design, type= full factorial
## NOTE: columns run.no and run.no.std.rp are annotation,
## not part of the data frame
9.233 .2 6.232 .2 7.213 .2 5.222 .2 3.231 .2 8.223 .2 2.221 .2 1.211 .2 4.212 .2 2.321 .3 6.332 .3 7.313 .3 4.312 .3 9.333 .3 8.323 .3 5.322 .3 3.331 .3 1.311 .3 3.431 .4 5.422 .4 9.433 .4 2.421 .4 7.413 .4 1.411 .4 4.412 .4 6.432 .4 8.423 .4
Life<-read.csv("Life.csv")
What effects do material type and temperature have on the life of the battery?
1. we should treat the Treatment and Batch variables as factors.
Life <- Life[-1] %>% mutate(Matype = factor(Matype),
Temp = factor(Temp))
## ‘data.frame’: 36 obs. of 3 variables:
## $ Matype: Factor w/ 3 levels “1”,”2″,”3″: 1 1 1 1 2 2 2 2 3 3 …
## $ Temp : Factor w/ 3 levels “15”,”70″,”125″: 1 1 1 1 1 1 1 1 1 1 …
## $ Life : int 74 130 155 180 126 150 159 188 110 138 …
2. in the two-factor factorial, both row and column factors (or treatments), A and B, are of equal interest. Specifically, we are interested in testing hypotheses about the equality of row treatment effects, say
where are the row treatment effects.
And the equality of column treatment effects, say
where are the column treatment effects.
We are also interested in determining whether row and column treatments interact. Thus, we also wish to test
Apply Full factorial design into the linear regression model.
model <- lm(formula = Life ~ Matype + Temp + Temp*Matype, data=Life)
anova(model)
0 ≠ ji)βτ( eno tsael ta : aH 0 = ji)βτ( : 0H
0 ≠ iβ eno tsael ta : aH 0= 3β= 2β= 1β: 0H
0 ≠ iτenotsaelta : aH 0= 3τ= 2τ= 1τ: 0H
Here, we might use two different approach in order to conclude on the output. Let’s do for intraction:
a. F-test.
is rejected if
interaction between material types and temperature. Furthermore, , so the main effects of material type and temperature are also significant.
b. p-value.
As the ANOVA table shown, the p-value of all parameters are < .So, we should reject the null hypothesis which mean all the parameters are significant.
Is there a choice of material that would give uniformly long life regardless of temperature?
, We assumed
qf(p=.05, df1=4, df2=27, lower.tail=FALSE) # To find F critical value
## [1] 2.727765
, we can reject the null hypothesis which means that we conclude that there is a significant
group_by(Temp, Matype) %>%
summarize(average_life = mean(Life), .groups = ‘drop’) %>%
ggplot(aes(Temp, average_life, color = Matype, group = Matype)) +
geom_point() + geom_line()
## Analysis of Variance Table
## Response: Life
## Matype:Temp 4 9438 2359.4 3.5429 0.018973 *
## Residuals 27 17981 666.0
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
Df Sum Sq Mean Sq F value Pr(>F)
2 10633 5316.6 7.9834 0.001888 **
2 39083 19541.6 29.3438 1.694e-07 ***
131453.3 = 72,2,50.0F
)1+b−a−N(,1−a,αF ≥ 0F ))1−n(ba(,)1−b()1−a(,αF ≥ 0F 567727.2 = 72,4,50.0F = ))1−n(ba(,)1−b()1−a(,αF
50.0 = α 9245.3 = seRSM/RSM = 0F ))1−n(ba(,)1−b()1−a(,αF > 0F 0H
In general, longer life is attained at low temperature, regardless of material type. Changing from low to intermediate temperature, battery life with material type 3 may actually increase, whereas it decreases for types 1 and 2. From intermediate to high temperature, battery life decreases for material types 2 and 3 and is essentially unchanged for type 1. Material type 3 seems to give the best results if we want less loss of effective life as the temperature changes.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com