title: ‘Lab session 3: Solutions’
author: “RMIT University-MATH1302”
date: “23/03/2022”
output: html_document
Copyright By PowCoder代写 加微信 powcoder
“`{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
**Library**
“`{r message=FALSE, warning=FALSE}
# List of packages required for this analysis
pkg <- c("mosaic", "ggplot2", "tidyverse", "car", "randtests", "stats", "dplyr", "agricolae","nlme","lme4", "emmeans", "crossdes", "AlgDesign", "DoE.base")
# Check if packages are not installed and assign the
# names of the packages not installed to the variable new.pkg
new.pkg <- pkg[!(pkg %in% installed.packages())]
# If there are any packages in the list that aren't installed,
# install them
if (length(new.pkg)) {
install.packages(new.pkg, repos = "http://cran.rstudio.com")
# Load the packages into R
library(mosaic) # favstats()
library(ggplot2)
library(tidyverse) # summarise()
# library(car) #ncvTest()
# library(randtests) # runs.test()
library(stats) # TukeyHSD()
library(dplyr) # mutate()
#library(agricolae) # LSD.test()
library(nlme)
library(lme4) # lmer()
library(emmeans) # emmeans()
library(crossdes) # find.BIB()
#library(multcomp)
library(AlgDesign) # gen.factorial()
library(DoE.base)
# Question 1: Balance Incomplete Block Design (BIBD)
##### As an example, suppose that a chemical engineer thinks that the time of reaction for a chemical process is a function of the type of catalyst employed. Four catalysts are currently being investigated. The experimental procedure consists of selecting a batch of raw material, loading the pilot plant, applying each catalyst in a separate run of the pilot plant, and observing the reaction time. Because variations in the batches of raw material may affect the performance of the catalysts, the engineer decides to use batches of raw material as blocks. However, each batch is only large enough to permit three catalysts to be run. Therefore, a randomized incomplete block design must be used. The balanced incomplete block design for this experiment, along with the observations recorded, is shown in next Table. The order in which the catalysts are run in each block is randomized.
| Treatment/ Block | 1 | 2 | 3 | 4 |
|------------------|----|----|----|----|
| 1 | 73 | 74 | | 71 |
| 2 | | 75 | 67 | 72 |
| 3 | 73 | 75 | 68 | |
| 4 | 75 | | 72 | 75 |
##### a) Input the data into the software.
Catalyst <- data.frame(Treatment = c(1,1,1,
Batch = c(1,2,4,
Time = c(73, 74, 71,
75, 67, 72,
73, 75, 68,
75, 72, 75))
head(Catalyst)
**Export the data to the desktop**
```{r eval=FALSE}
write.csv(Catalyst,"Catalyst.csv", row.names = TRUE) # to Export DataFrame as a CSV file from RStudio to desktop
**Display the structure of the data.**
str(Catalyst)
##### b) How many times each pair of treatments appears in the same block?
a=4 # treatments,
b=4 # blocks,
k=3 # each block contains k treatments,
r=3 # each treatment occurs r times in the design (or is replicated r times),
lambda<- r*(k-1)/(a-1) # the number of times each pair of treatments appears in the same block
##### c) Test if thereis a difference between the four treatments (Catalysts)? draw your conclusions using α = 5%.
1) we should treat the Treatment and Batch variables as factors.
Catalyst <- Catalyst %>% mutate(Treatment = factor(Treatment),
Batch = factor(Batch))
str(Catalyst)
2) state your hypothesis testing for catalysts:
$H_0: \tau_1= \tau_2= \tau_3= \tau_4= 0$.
$H_a: at \hspace{0.1cm} least \hspace{0.1cm} one \hspace{0.1cm} \tau_i \neq 0$
where $\tau_i$ are the treatment effects of the three different lubricating oils.
3) we might looks into the Descriptive Statistics:
Compute the mean and the SD (standard deviation) of the reaction time by different Ingredients:
“`{r echo=TRUE}
Catalyst %>%
group_by(Treatment) %>%
summarise(n = n(),
mean = mean(Time, na.rm = TRUE),
sd = sd(Time, na.rm = TRUE),
stderr = sd/sqrt(n),
LCL = mean – qt(1 – (0.05 / 2), n – 1) * stderr,
UCL = mean + qt(1 – (0.05 / 2), n – 1) * stderr,
median = median(Time, na.rm = TRUE),
min = min(Time, na.rm = TRUE),
max = max(Time, na.rm = TRUE),
IQR = IQR(Time, na.rm = TRUE))
Also, you can graphically check if there is a difference between the distribution using Box plot. Note that you include the jittered dot.
“`{r echo=TRUE}
ggplot(Catalyst, aes(x = Treatment, y = Time, fill = Treatment)) +
geom_boxplot() +
geom_jitter(shape = 15,
color = “steelblue”,
position = position_jitter(0.21)) +
theme_classic() +
labs( title= “Comparative Box Plot”)
##### Apply BIBD design into the linear regression model.
# When Treatments (adjusted for blocks)
model <- lm(formula = Time ~ Batch + Treatment, data=Catalyst)
Treatments_adj<-anova(model)
Treatments_adj
# # When Block (adjusted for Treatments)
model_1 <- lm(formula = Time ~ Treatment + Batch, data=Catalyst)
Block_adj<-anova(model_1)
**Did you notice anything?**
One of the challenges with our previous analysis is that lm() ran the analysis and considered our block effect as a fixed effect. lm() uses ordinary least squares as the methodology for calculating the ANOVA table. In other words, we can easily calculate all the SS, MS, F, by hand if we wanted to. When adding a random effect into our models we need to take a slightly different approach. This is the first change in the methodology of our ANOVAs – moving to a REML or Restricted Maximum Likelihood methodology. This is where the lmer() package comes into play.
Let’s start by reviewing our statistical model above. Recognizing that we need to find a way to let R know that the block effect is a random effect in our model. Let’s create a new object called rmodel with our new model:
##### Apply BIBD design into the mixed model.
rmodel <- lmer(Time ~ (1|Batch) + Treatment, data=Catalyst)
Notice how we set out the random effect of block in this model – (1|block).
***to see the ANOVA table from the LMER package we use the anova() function***
anova(rmodel)
Here, we might use two different approach in order to conclude on the output.
a) F-test.
$H_0$ is rejected if $F_0 > F_{\alpha,a-1,(N-a-b+1)}$
$⸪ \hspace{0.2cm} F_0 = MS_R/ MS_Res = 11.409$, We assumed $\alpha = 0.05$
qf(p=.05, df1=3, df2=5, lower.tail=FALSE) # To find F critical value
$⸪ \hspace{0.2cm} F_{\alpha,a-1,(N-a-b+1)} = F_{0.05,3,5} = 5.409451$
$⸫ \hspace{0.2cm} F_0 \geq F_{\alpha,a-1,(N-a-b+1)}$
As $F_0 \geq F_{\alpha,a-1,(N-a-b+1)}$, we can reject the null hypothesis which means that at leat one of the treatment effects is significant.
b) p-value.
***To calculate the p-value you need to add an extra piece of syntax***
1-pf(11.409,3,5)
As the ANOVA table shown, the p-value of treatment is $0.0112646$.So, we can reject the null hypothesis which mean at leat one of the treatment effects is significant.
##### d) Obtain the estimates of treatment means (i.e., the adjusted means or the least square means) and the standard error of the difference between two treatment mean estimates (i.e. the standard error of τˆi − τˆj ).
***the estimates of treatment means***
emmeans(rmodel, ~ Treatment)
***the standard error of the difference between two treatment mean estimates***
$SE=\sqrt(\frac{2 \times k}{\lambda \times a} MS_E)$
# The standard error of the difference between two treatment mean estimates is
S= sqrt((2*k*Treatments_adj$`Mean Sq`[3])/(lambda*a))
###### e) Compare all pairs of treatment means by using the Tukey tests.
Tukey2<-aov(formula = Time ~ Batch + Treatment, data=Catalyst)
Tuk3<-TukeyHSD(Tukey2)
Tuk3$Treatment
We are not interested in the post hoc analysis for the blocking factors. The Tukey method shows that there is significantly difference between the levels of the Ingredients (treatment). The ingredient $4$ is significantly different from any of the other ingredients $1$, $2$, $3$. Ingredient $1$ is is not significantly different from the ingredients $2$ and $3$. Ingredient $2$ is not significantly different from $1$ and $3$. See plot below.
#par(mfrow=c(3,1))
plot(Tuk3)
##### f) Generate a block design with five treatments in four blocks of three elements. Generate the design in a Table and test if this design is a Balanced Incomplete Block Design (BIBD)?
my.design_1<-find.BIB(5, 4, 3)
my.design_1
Generate the design in Table:
| Treatment/ Block | 1 | 2 | 3 | 4 |
|------------------|----|----|----|----|
| 1 | # | | # | |
| 2 | | | # | # |
| 3 | # | # | | # |
| 4 | | # | # | |
| 5 | # | # | | # |
This design is not a BIBD because the treatments are not all repeated the same number of times in the design and we can check this with the 'isGYD' function. For this example:
isGYD(my.design_1)
##### g) Generate a block design with five treatments in seven blocks of three elements. Generate the design in a Table and test if this design is a Balanced Incomplete Block Design (BIBD)?
my.design_2 <-find.BIB(7, 7, 3)
my.design_2
Generate the design in Table: **HW**
isGYD(my.design_2)
-------------------------------------------------------------------------
# Question 2: Full Factorial Designs
##### a) Generate a full factorial design with two variables each with three levels.
gen.factorial(c(3,3))
gen.factorial(3,3,factors="all")
##### b) Generate a full factorial design with three variables, two with three levels and one with two levels. Name the variables as A, B and C, respectively
gen.factorial(c(3,3,2), varNames=c("A","B","C"))
##### c) Generate a full factorial design with their variables, one with two levels, one with three levels and one with six levels. Name the variables as A, B and C respectively
#blocking designs
fac.design(nlevels=c(2,3,6), factor.names = c("A","B", "C"), blocks=6, seed=12345)
##### d) Generate a full factorial design with two variables each with three levels, but this time with 4 replications. Name the variables as Matype and Temp, respectively
fac.design(nlevels=c(3,3), factor.names = c("Matype","Temp"),
replications=4, block.name="Replications")
##### e) As an example of a factorial design involving two factors, an engineer is designing a battery for use in a device that will be subjected to some extreme variations in temperature. The only design parameter that he can select at this point is the plate material for the battery, and he has three possible choices. When the device is manufactured and is shipped to the field, the engineer has no control over the temperature extremes that the device will encounter, and he knows from experience that temperature will probably affect the effective battery life. However, temperature can be controlled in the product development laboratory for the purposes of a test.
![](Exp.jpg)
Life<-read.csv("Life.csv")
- **What effects do material type and temperature have on the life of the battery?**
1) we should treat the Treatment and Batch variables as factors.
Life <- Life[-1] %>% mutate(Matype = factor(Matype),
Temp = factor(Temp))
2) in the two-factor factorial, both row and column factors (or treatments), A and B, are of equal interest. Specifically, we are interested in testing hypotheses about the equality of row treatment effects, say
$H_0: \tau_1= \tau_2= \tau_3= 0$.
$H_a: at \hspace{0.1cm} least \hspace{0.1cm} one \hspace{0.1cm} \tau_i \neq 0$
where $\tau_i$ are the row treatment effects.
And the equality of column treatment effects, say
$H_0: \beta_1= \beta_2= \beta_3= 0$.
$H_a: at \hspace{0.1cm} least \hspace{0.1cm} one \hspace{0.1cm} \beta_i \neq 0$
where $\beta_i$ are the column treatment effects.
We are also interested in determining whether row and column treatments interact. Thus, we also wish to test
$H_0: (\tau \beta)ij= 0$.
$H_a: at \hspace{0.1cm} least \hspace{0.1cm} one \hspace{0.1cm} (\tau \beta)ij \neq 0$
##### Apply Full factorial design into the linear regression model.
model <- lm(formula = Life ~ Matype + Temp + Temp*Matype, data=Life)
anova(model)
Here, we might use two different approach in order to conclude on the output.
Let's do for intraction:
a) F-test.
$H_0$ is rejected if $F_0 > F_{\alpha,(a-1)(b-1),(ab(n – 1))}$
$⸪ \hspace{0.2cm} F_0 = MS_R/ MS_Res = 3.5429$, We assumed $\alpha = 0.05$
qf(p=.05, df1=4, df2=27, lower.tail=FALSE) # To find F critical value
$⸪ \hspace{0.2cm} F_{\alpha,(a-1)(b-1),(ab(n – 1))} = F_{0.05,4,27} = 2.727765$
$⸫ \hspace{0.2cm} F_0 \geq F_{\alpha,(a-1)(b-1),(ab(n – 1))}$
As $F_0 \geq F_{\alpha,a-1,(N-a-b+1)}$, we can reject the null hypothesis which means that we conclude that there is a significant interaction between material types and temperature. Furthermore, $F_{0.05,2,27}=3.354131$, so the main effects of material type and temperature are also significant.
b) p-value.
As the ANOVA table shown, the p-value of all parameters are < $0.0.05$.So, we should reject the null hypothesis which mean all the parameters are significant.
- **Is there a choice of material that would give uniformly long life regardless of temperature?**
group_by(Temp, Matype) %>%
summarize(average_life = mean(Life), .groups = ‘drop’) %>%
ggplot(aes(Temp, average_life, color = Matype, group = Matype)) +
geom_point() + geom_line()
In general, longer life is attained at low temperature, regardless of material type. Changing from low to intermediate temperature, battery life with material type 3 may actually increase, whereas it decreases for types 1 and 2. From intermediate to high temperature, battery life decreases for material types 2 and 3 and is essentially unchanged for type 1. Material type 3 seems to give the best results if we want less loss of effective life as the temperature changes.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com