STAT 300: Final Week Review
Practice Problems (obviously not exhaustive; go over previous exams, lecture notes and book for more).
Write the model you would fit to answer the research questions below:
1. Can we predict a state’s vote share (percentage) for Trump in the 2018 election using state GDP, percentage of residents that are white, and share of the state’s population that lives in urban centers?
2. After accounting for age, is there a difference in weight loss for people assigned to either focus on limiting fat in their diet or to focus on limiting carbs in their diet?
3. How does adding different levels of fertilizer (none, light, medium, heavy) and irrigation (none, light, moderate) contribute to crop yield for corn in an Iowa county?
4. What is the relationship between years of higher education and salary at age 40?
Body Fat percentage is a difficult quantity to measure, but we can use easy-to-measure quantities to make reasonable predictions. The dataset BodyFat (library(Lock5Data) includes body measurements from 100 men who each had a percent body fat estimated by an underwater weighing technique.
Give a five number summary for the bodyfat of individuals in the data set.
What is the estimated multiple regression equation for predicting bodyfat using Age, Weight, Height, and Abdomen?
Explain how you would verify that the conditions are met for statistical inference with multiple linear regression.
Are you at all concerned about multicollinearity for this example? Explain how you could assess whether it would be an issue.
Which of the variables are significant predictors of BodyFat in your model? Give p-values to support your answers.
Calculate and interpret a 95% prediction interval for a 30 year old male who is 72 inches tall who weighs 200 pounds, and an abdomen circumferences on 100cm.
Calculate and interpret a 95% confidence interval for the mean weight of a male with those characteristics.
We would like to add one variable to the model. Explain how we could determine which variable is best to add.
Use R to follow your advice above and decide whether it’s better to add Chest, Wrist, or Ankle to the model. What is the new fitted model with this added variable?
Did any of the variables change from significant to not significant or vice versa? Specify which.
Explain why this happened.
Interpret the coefficient of ‘Abdomen’ in your (new) model.
Find and interpret the R-squared for the (new) model. Make sure you provide an interpretation in context.
We would like predict body temperature using a person’s pulse and sex (male = 0/female=1, but coded as Gender in these data). Use the dataset BodyTemp50.
What is the difference in mean body temperature for males and females?
What is the fitted regression equation for females?
Is there evidence that we should model the relationship between pulse and body temperature differently for males and females? Support your answer with information from your model.
We would like to understand the difference in city miles per gallon (measure of fuel efficiency for in-town driving) between different categories of car (Hatchback, Sedan, Sporty, SUV, Wagon, and 7Pass). Use the data Cars2015 for information about many aspects of new car models from 2015.
What is the average miles per gallon city for each type of car?
What is the standard deviation in miles per gallon city for each type of car?
Do you think that the ANOVA constant variance condition is met? Why or why not? We will proceed with an ANOVA test whether or not the condition is met.
What are the null and alternative hypotheses for performing an ANOVA test for the difference in group means?
Give a completed ANOVA table (including a p-value) to test if there is a difference in city miles per gallon for different types of cars.
Interpret the conclusion of the ANOVA test in context of the problem.
Which of the two groups do you think are most likely to have a significance difference in cityMPG (based on EDA)?
Which of the two groups do you think are least likely to have a significance difference in cityMPG (based on EDA)?
Build a confidence interval to compare the differences between those two groups (the ones least likely to have a difference). Use a t-multiplier of 1.98.
Based on that confidence interval, is there a significant difference between those two groups?