Lecture 25 (Section 4.2+ extra)
Techniques for choosing predictors in MLR: best subsets, forward, backward, Mallow’s CP, AIC
Tradeoff between fit and complexity, between bias and variance. Why complexity is important, connection to out of sample behavior
Penalized regression: lasso and ridge regression
Lecture 26 Review of MLR
Lecture 27 (Section 5.1 and 5.2)
Intro to one-way ANOVA
Connection to simple linear regression and using indicator variables
Randomization: randomized experiment vs observational study
Terms: experimental units, explanatory factor, levels, balance
Lecture 28 (Section 5.3)
Fitting ANOVA model
Fisher/history of ANOVA
Concept of linear model and decomposition of variance: data = model + error, then decompose further Y = mu + alpha_i + epsilon
grand mean + treatment effect + residual
Total variation in response = Variation explained by model + unexplained variation in residuals
How to look at an ANOVA table and state relevant hypothesis in context
Understanding degrees of freedom (slide 22)
Lecture 29 (Section 5.4 and 5.5)
ANOVA conditions: check with residuals (constant variance normality, independence)
Review of F-test
Confidence intervals, hypothesis tests for pairwise difference in means
Multiple testing issue: what is it?
Fisher’s Least Significant Difference
Effect Size: measure how big an effect is by seeing how standard deviations it is: difference in means/square root(MSE).
Can obtain this from fitting ANOVA — obtain MSE. Easy to obtain difference in means
— why they are important (give a good sense of the actual difference rather than just significance)
— not influenced by sample size (unlike p-values which get really small for very large sample size)
— Good to talk about p-value and effect size
Lecture 30 (Section 5.7 and briefly 5.6)
Types of error rates: Type I and Type II
— individual error rate
— family-wise error rate
Approach when doing ANOVA to compare multiple groups
(1) If ANOVA F-test is not significant, stop
(2) If significant, find pairwise differences and test them but do an adjustment to p-value to avoid multiple testing issue
E.g. Bonferroni (simplest but very conservative), Fisher’s LDS, Tukey’s HSD
Transformations: can transform response to sometimes handle issues with conditions, e.g. square, square root, log, reciprocal
E.g. use log if data are right skewed, variance increases with mean
Lecture 31 Chapter 6
Why it’s important to randomize when designing an experiment (pg 204, 205 of book)
Randomized complete block designs
Terms: treatments, experimental units, responses, parameters of interest
What are blocks? Different kinds of blocks
Completely randomized design
Randomized block design (summary on slide 9)
Blocks by subdividing
Blocks by grouping (matched subjects design)
Blocks by reuse (within-subjects design, repeated measures)
Randomized complete block vs complete block design
Visualization: side-by-side dot plots, looking at data after accounting for blocks
Two way ANOVA model: what it is (be able to write it out) and how to fit it
Two way ANOVA model w/ interaction
Lecture 32 Chapter 7
Two way factorial experiment and interaction
Visualization: cell means plot or interaction plot
ANOVA table w/ interactions. How to state hypotheses
Confidence intervals for difference in means in the presence of a significant interaction (slide 26)
— if no significant interaction, just a usual CI for difference in means
— if significant interaction, need to do CI for conditional means, conditioning on interaction
Compare difference in means under condition A (e.g. weight gain difference w/ antibiotics)
Compare difference in means under condition B (e.g. weight gain difference w/o antibiotics)
(R code for conditional means on Slide 27)
Effect sizes: similarly need to be conditional if there is a significant interaction.
E.g. w/ antibiotics the effect size of treatment is X, w/o antibiotics the effect size of treatment is Y
Lecture 33 Chapter 7 case study
Dinosaurs/ iridium concentration data
R code and example of how to use two-way ANOVA and consider interactions, visualizations, etc.
Lecture 34 How one can get fooled by randomness even when we think we are careful about multiple testing
Choices we make along the way in an analysis, then only report final conclusions: that can lead to problems (more significance than warranted)
Need to be careful about this.
Active area of research (“post selection inference”)
Some ideas:
— declare your hypotheses before you begin doing an analysis (to avoid “snooping” for significance)
— divide your data into different parts. One part for model choice, then second part on which you will base your conclusions w/ the model you chose from the first part