Mathematics and Statistics –
Design and Analysis of Experiments
Week 3 – Experiments with a Single Factor: The Analysis of Variance
Copyright By PowCoder代写 加微信 powcoder
What If There Are More Than Two Factor Levels?
• The t-test does not directly apply
• There are lots of practical situations where there are either more than two levels of interest, or there are several factors of simultaneous interest
• The analysis of variance (ANOVA) is the appropriate analysis “engine” for these types of experiments
• The ANOVA was developed by Fisher in the early 1920s, and initially applied to agricultural experiments
• Used extensively today for industrial experiments
An Example (See pg. 66)
• An engineer is interested in investigating the relationship between the RF power setting and the etch rate for this tool. The objective of an experiment like this is to model the relationship between etch rate and RF power, and to specify the power setting that will give a desired target etch rate.
• The response variable is etch rate.
• She is interested in a particular gas (C2F6) and gap (0.80 cm), and wants to test four levels of RF power: 160W, 180W, 200W, and 220W. She decided to test five wafers at each level of RF power.
• The experimenter chooses 4 levels of RF power 160W, 180W, 200W, and 220W
• The experiment is replicated 5 times – runs made in random order
An Example (See pg. 66)
• Does changing the power change the mean etch rate?
• Is there an optimum level for power?
• We would like to have an objective way to answer these questions
• The t-test really doesn’t apply here – more than two factor levels
The Analysis of Variance (Sec. 3.2, pg. 68)
• In general, there will be a levels of the factor, or a treatments, and n replicates of the experiment, run in random order…a completely randomized design (CRD)
• N = an total runs
• We consider the fixed effects case…the random effects case will be
discussed later
• Objective is to test hypotheses about the equality of the a treatment means
The Analysis of Variance
• The name “analysis of variance” stems from a partitioning of the total variability in the response variable into components that are consistent with a model for the experiment
• Thebasicsingle-factorANOVAmodelis y =μ+τ +ε ,i=1,2,…,a
ij i ij j=1,2,…,n
μ = an overall mean, τi = ith treatment effect, εij = experimentalerror,NID(0,σ2)
Models for the Data
There are several ways to write a model for the data:
yij = μ +τi +εij is called the effects model Letμ=μ τ+,then
y = μ ε +is called the means model ij i ij
Regression models can also be employed
The Analysis of Variance
• Totalvariabilityismeasuredbythetotalsumof
SST= (yy)2 −
∑∑ i=1 j1=
i. .. ij i.
∑∑ i=1 j 1 =
• ThebasicANOVApartitioningis:
SST = SSTreatments
[(y y)−(y +y)]2−
i=1 j 1 = +
The Analysis of Variance
SST = SSTreatments SSE
• A large value of SSTreatments reflects large differences in treatment means
• A small value of SSTreatments likely indicates no differences in treatment means
• Formal statistical hypotheses are:
H0:μ1=μ2 =μa =
H1 : At least one mean is different
The Analysis of Variance
• While sums of squares cannot be directly compared to test the hypothesis of equal means, mean squares can be compared.
• A mean square is a sum of squares divided by its degrees of freedom:
dfTotal =dfTreatments dfError + an−1=a −1 +a(n −1)
MSTreatments = SSTreatments , MSE = SSE a−1 a(n−1)
• If the treatment means are equal, the treatment and error mean squares will be (theoretically) equal.
• If treatment means differ, the treatment mean square will be larger than the error mean square.
The Analysis of Variance is Summarized in a Table
Computing…see text, pp 69
The reference distribution for F0 is the Fa-1, a(n-1) distribution Reject the null hypothesis (equal treatment means) if
0 α,a−1,a(n−1)
ANOVA Table – Example 3-1
The Reference Distribution:
To produce synthetic garments 5 different percentages of cotton are used. For each percentage five measurements are taken. The results are:
Cotton percentage
Observed elasticity
(Α) 15% (Β) 20% (C) 25% (D) 30% (E) 35%
7 7 15 11 9
12 17 12 18 18
14 18 18 19 19
19 24 22 19 23
7 10 11 15 11
Provided that the basic assumptions of ANOVA hold, can we conclude that the elasticity of the garment depends on the amount of cotton used?
We have k =5 and n =n =n =n =n =5 with total N =25 observations 12345
(balanced design).
Using our data we may calculate the followings.
(Α) (Β) (C) (D) (E)
49 77 88 107 54
9.8 15.4 17.6 21.4 10.8
The sum of all observations isY =375, so Y = 375 =15.
For the main effects of the factor «Cotton percentage» we have that if the model
E(Yij ) = μ +αi is applied then ˆ and
α =Y −Y =9.8−15=−5.2
α2 =Y2. −Y.. =15.4−15=0.4
α3 =Y3. −Y.. =17.6−15=2.6 ˆ
α4 =Y4. −Y.. =21.4−15=6.4
α5 =Y5. −Y.. =10.8−15=−4.2
For the ANOVA we calculate
SST =∑∑(Y −Y )2 =(7−15)2 +(7−15)2 +…+(11−15)2 =618
ij .. i=1 j=1
SSA=∑∑(Y −Y )2 =5
(Y −Y )2 =5[(9.8−15)2 +(15.4−15)2 +…+(10.8−15)2]= i. ..
i. .. = 462.8
and SSE=∑∑(Y −Y )2 =155.2
ij i. So, we obtain, i =1 j =1
because F = 14.91 > 4.43 = F4, 20 (0.01) , the null hypothesis H0 :α1 =α2 =…=α5 =0
is rejected using confidence level 1%, and thus we conclude that there are significant differences between the levels of the factor.
ΜΙΝΙΤΑΒ output
One-way ANOVA: y versus A
Source DF SS MS F P
A 4 462.80 115.70 14.91 0.000
Error 20 155.20 7.76
Total 24 618.00
S = 2.786 R-Sq = 74.89% R-Sq(adj) = 69.86%
Individual 95% CIs For Mean Based on
Pooled StDev
Level N Mean StDev ——+———+———+———+—
1 5 9.800 3.347 (—–*—-)
2 5 15.400 3.130 (—-*—-)
3 5 17.600 2.074
4 5 21.400 2.302
5 5 10.800 2.864 (—–*—-)
(—-*—-)
(—-*—-)
Pooled StDev = 2.786
yi⋅ ±tN−k (0.025)spooled / ni
——+———+———+———+—
10.0 15.0 20.0 25.0
Model Adequacy Checking in the ANOVA Text reference, Section 3.4, pg. 80
• Checking assumptions is important
• Normality
• Constant variance
• Independence
• Have we fit the right model?
• Later we will talk about what to do if some of these assumptions are violated
Model Adequacy Checking in the ANOVA
• Examination of residuals (see text, Sec. 3-4, pg. 80)
=y y− ij i.
• Computer software generates the residuals
• Residual plots are very useful
• Normal probability plot of residuals
e=y y− ij ij ij
Other Important Residual Plots
εˆij =Yij −Yi.⋅
Can be used to check the following hypotheses:
Hypothesis
Independence of observations
Residuals vs order
Normal probability plot
Equality (or “homogeneity”) of variances,
called homoscedasticity
Residuals vs Yi⋅
Post-ANOVA Comparison of Means
• The analysis of variance tests the hypothesis of equal treatment means
• Assume that residual analysis is satisfactory
• If that hypothesis is rejected, we don’t know which specific
means are different
• Determining which specific means differ following an ANOVA
is called the multiple comparisons problem
• There are lots of ways to do this…see text, Section 3.5, pg. 89
• We will use pairwise t-tests on means…sometimes called Fisher’s Least Significant Difference (or Fisher’s LSD) Method
Design-Expert Output
Graphical Comparison of Means Text, pg. 91
If the null hypothesis is rejected, then we can use one of the multiple comparison method (Tukey, Duncan, LSD, Scheffe, …) to test the effects of the factor.
Grouping Information Using Fisher Method
A N Mean Grouping
4 5 21.400 A 3517.600 2515.400
5 5 10.800
1 5 9.800
Means that do not share a letter are significantly different.
The Regression Model
Why Does the ANOVA Work?
We are sampling from normal populations, so SSTreatments ~ χ2 ifH istrue, and SSE ~ χ2
σ2 a−1 0 σ2 a(n−1) Cochran’s theorem gives the independence of
these two chi-square random variables
SoF =SS /(a−1) χ2 /(a−1) F
0 Treatments 2 a−1
SSE /[a(n−1)] χa(n−1) /[a(n−1)]
a−1,a(n−1)
Finally, E ( MST reatments ) = σ 2 + i =1 i and E ( MS E ) = σ 2
a−1 Therefore an upper-tail F test is appropriate.
Sample Size Determination-Text, Section 3.7, pg. 105
• FAQindesignedexperiments
• Answerdependsonlotsofthings;includingwhat type of experiment is being contemplated, how it will be conducted, resources, and desired sensitivity
• Sensitivity refers to the difference in means that the experimenter wishes to detect
• Generally, increasing the number of replications increases the sensitivity or it makes it easier to detect small differences in means
Sample Size Determination-Fixed Effects Case
• Can choose the sample size to detect a specific difference in means and achieve desired values of type I and type II errors
• Type I error – reject H0 when it is true (α) β
• Type II error – fail to reject H0 when it is false (
• Power=1-β
• Operating characteristic curves plot β against a parameter
Φ = i=1 2 aσ
Sample Size Determination
Fixed Effects Case—use of OC Curves
• The OC curves (Operating Characteristic Curves) for the fixed effects model are in the Appendix, Table V
• A very common way to use these charts is to define a difference 2
in two means D of interest, then the minimum value of Φ is Φ2 = nD2
• Typically work in term of the ratio of D / σ and try values of n
until the desired power is achieved
• Most statistics software packages will perform power and
sample size calculations – see page 108
• There are some other methods discussed in the text
Power and sample size calculations from Minitab (Page 108)
3.8 Other Examples of Single-Factor Experiments
Conclusions?
The Random Effects Model
• There are a large number of possible levels for
the factor (theoretically an infinite number)
• The experimenter chooses a of these levels at random
• Inference will be to the entire population of levels
Y=μ+α+ε or ij iij
V(Y )=σ2 +σ2 or ij α
Variance components
Covariance structure:
Observations (a = 3 and n = 2):
ANOVA F-test is identical to the fixed-effects case
Estimating the variance components using the ANOVA method:
• The ANOVA variance component estimators are moment estimators
• Normality not required
• They are unbiased estimators
• Finding confidence intervals on the variance components is “clumsy”
• Negative estimates can occur – this is “embarrassing”, as variances are always non- negative
• Confidence interval for the error variance:
• Confidence interval for the interclass correlation:
Maximum Likelihood Estimation of the Variance Components
• The likelihood function is just the joint pdf of the sample observations with the observations fixed and the parameters unknown:
• Choose the parameters to maximize the likelihood function
Residual Maximum Likelihood (REML) is used to estimate variance components
Point estimates from REML agree with the moment estimates for balanced data
Confidence intervals
To produce synthetic garments 5 different percentages of cotton were randomly selected and used. For each percentage five measurements are taken. The results are:
Cotton percentage
Observed elasticity
(Α) 15% (Β) 20% (C) 25% (D) 30% (E) 35%
7 7 15 11 9
12 17 12 18 18
14 18 18 19 19
19 24 22 19 23
7 10 11 15 11
Provided that the basic assumptions of ANOVA hold, can we conclude that the elasticity of the garment depends on the amount of cotton used?
We have that,
Since F = 14.91 > 4.43 = F4, 20 (0.01) , we hypothesis
H :σ2 =0 0α
Using significance level 1%, so we observe differences in the effects of the factor’s levels.
Estimation of the variance components
σˆ2 =MSE=7.76 σˆ2 =MSA−MSE
N(k −1) 25⋅4
Thus, σˆ 2 = 115.7 − 7.76 = 21.588
i=1 i =25−5⋅25=5.
ANOVA: y versus A
Factor Type Levels Values
A random 5 1; 2; 3; 4; 5
Analysis of Variance for y
Source DF SS MS F P
A 4 462.80 115.70 14.91 0.000
Error 20 155.20 7.76
Total 24 618.00
S = 2.78568 R-Sq = 74.89% R-Sq(adj) = 69.86%
Expected Mean
Square for Each
Term (using
Variance Error unrestricted
Source component term model)
1 A 21.588 2 (2) + 5 (1)
2 Error 7.760 (2)
ANOVA Regression
• Suppose we have the ANOVA model
• 𝒀𝒀 =𝝁𝝁+𝝉𝝉 +𝜺𝜺,𝑖𝑖=1,2,…,𝑘𝑘𝑎𝑎𝑎𝑎𝑎𝑎𝑗𝑗=1,2,…,𝑎𝑎 𝒊𝒊𝒊𝒊𝒊𝒊𝒊𝒊 𝑗𝑗
• Define 𝒙𝒙 = �1, 𝑖𝑖𝑖𝑖 𝑡𝑡𝑡𝑡𝑡 𝑜𝑜𝑜𝑜𝑗𝑗𝑡𝑡𝑜𝑜𝑜𝑜𝑎𝑎𝑡𝑡𝑖𝑖𝑜𝑜𝑎𝑎 𝑖𝑖𝑖𝑖 𝑖𝑖𝑜𝑜𝑜𝑜 𝑖𝑖 𝑙𝑙𝑡𝑡𝑜𝑜𝑡𝑡𝑙𝑙 𝒊𝒊 0, 𝑜𝑜𝑡𝑡𝑡𝑡𝑡𝑜𝑜𝑜𝑜𝑖𝑖𝑖𝑖𝑡𝑡
• Then𝒀𝒀𝒊𝒊𝒊𝒊 =𝜷𝜷𝟎𝟎 +𝜷𝜷𝟏𝟏𝒙𝒙𝟏𝟏 +𝜷𝜷𝟐𝟐𝒙𝒙𝟐𝟐 +⋯+𝜷𝜷𝒌𝒌𝒙𝒙𝒌𝒌 +𝜺𝜺𝒊𝒊𝒊𝒊
• Itiseasytoseethat 𝝁𝝁+𝝉𝝉𝒊𝒊=𝜷𝜷𝟎𝟎 +𝜷𝜷𝒊𝒊
• In matrix form the corresponding regression model is 𝐘𝐘=𝑿𝑿𝜷𝜷+𝜺𝜺
ANOVA Regression Y 1 1 0 0 0
𝐘𝐘 = 𝑿𝑿𝜷𝜷 + 𝜺𝜺
11 11000
12
ε12 𝛽𝛽=β0
11000 Y
Y 10100 1
10100 21
Y= X= β2 ε= Y ε
10100 2n 2n22
10001
β
k1 Yk1 1 0 0 0 1 k ε
Y 10001 kn kn
• Addtheassumption n β + n β ++ n β =0 tohaveasolutiontothesystem
• Thesolutionis:
Observe that X’X is not invertible (sum of all columns = 1st column)
Ifweset𝛽𝛽𝑖𝑖 =𝜇𝜇+ 𝛼𝛼𝑖𝑖 𝑡𝑡𝑡𝑡𝑡𝑎𝑎
we have that 𝐘𝐘 = 𝑿𝑿𝜷𝜷 + 𝜺𝜺, where
we have that 𝐘𝐘 = 𝑿𝑿𝜷𝜷 + 𝜺𝜺, where X is as above without the first
column. So X will be invertible and X is as above without the last column. Model is:
Ifweset𝛽𝛽 =𝜇𝜇+ 𝛼𝛼 𝑡𝑡𝑡𝑡𝑡𝑎𝑎 𝑖𝑖𝑖𝑖
𝒀𝒀=𝜷𝜷+𝜷𝜷𝒙𝒙+𝜷𝜷𝒙𝒙+⋯+𝜷𝜷 𝒙𝒙 +𝜺𝜺 𝒊𝒊𝒊𝒊𝟎𝟎𝟏𝟏𝟏𝟏𝟐𝟐𝟐𝟐 𝒌𝒌−𝟏𝟏𝒌𝒌−𝟏𝟏𝒊𝒊𝒊𝒊
• Tomeasurethepressure4differentmetalbarscan hold the following data were collected
𝑥𝑥1 𝑥𝑥2 𝑥𝑥3 𝑥𝑥1 𝑥𝑥2 𝑥𝑥3
• 𝒀𝒀𝒊𝒊𝒊𝒊 = 𝜷𝜷𝟎𝟎 +𝜷𝜷𝟏𝟏𝒙𝒙𝟏𝟏 +𝜷𝜷𝟐𝟐𝒙𝒙𝟐𝟐 +𝜷𝜷𝟑𝟑𝒙𝒙3 +𝜺𝜺𝒊𝒊𝒊𝒊
Analysis of variance
DF Regression 3 Residual 11
Sum of Squares .31887 .09836
Mean Square .10629 .00894
F = 11,88690
V ariable x3
-.33250 .06686 (Constant) 2.04500 .04728
Signif F = .0009 R Square = .76426 Variables in the Equation
B SE B -.20375 .06686
-.37667 .07222
Beta T -.54024 -3.047
-.90339 -5,215 -.88162 -4,973
Sig T .0111
.0004 43,252 .0000
and correspond to the standardized variables
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com