HW03: Here are the first six observations from the prostate data set found in the faraway library. Use help(prostate) to describe the dataset and the variables in the data sets.
obs lcavol lweight age lbph svi lcp gleason pgg45 lpsa
1 -0.579819 2.7695 50 -1.38629 0 -1.38629 6 0 -0.43078
2 -0.994252 3.3196 58 -1.38629 0 -1.38629 6 0 -0.16252
3 -0.510826 2.6912 74 -1.38629 0 -1.38629 7 20 -0.16252
4 -1.203973 3.2828 58 -1.38629 0 -1.38629 6 0 -0.16252
5 0.7514161 3.4324 62 -1.38629 0 -1.38629 6 0 0.37156
6 -1.049822 3.2288 50 -1.38629 0 -1.38629 6 0 0.76547
a) Generate a scatterplot of lpsa versus lcavol and fit the least squares line to the data.
b) Use the lm() function to perform a simple linear regression with lpsa as the response and lcavol as the predictor.
c) Use the summary command to answer if there is a relationship between the predictor and the response?
d) What is the predicted lpsa associated with a lcavol of -0.80?
e) What is the predicted lpsa associated with a lcavol of 80?
HW04: Simulate a data set named sim_data containing 50 observations using the following R code:
set.seed(13245)
for(i in 1:50) {
rnorm(1,mean=0,sd=3)
X[i] = i*0.5
Y[i] = 2 + 4*(i*0.5) + rnorm(1,mean=0,sd=3)
}
sim_data=data.frame(cbind(X,Y))
sim_data
Note: sim_data contains 50 observations with a response variable Y that is linearly related to X with B0=2 and B1=4 and with error terms that are normally distributed with a mean of 0 and a standard deviation of 3.
a) Perform a regression of Y on X and provide the output of the summary function.
Simulate another data set of 50 observations where the error terms follow a uniform distribution over the interval [-2,2]. Note the error terms in this case are not normally distributed but still have expectation 0.
b) Perform a regression of Y on X using the simulated data set and provide the output of the summary function.
HW05
See Exercise 7a,b,c in Chapter 2. The R code below provides the answers to the exercise.
#Ch2Ex7.R
#install these packages if they are not already installed on your computer
library(tidyverse)
library(knitr)
library(kableExtra)
data <- data.frame(X1 = c(0, 2, 0, 0, -1, 1),
X2 = c(3, 0, 1, 1, 0, 1),
X3 = c(0, 0, 3, 2, 1, 1),
Y = c('Red', 'Red', 'Red', 'Green', 'Green', 'Red'),
stringsAsFactors = F)
colnames(data) <- c('$X_{1}$', '$X_{2}$', '$X_{3}$', '$Y$')
library(knitr)
library(kableExtra)
kable(data, row.names = T) %>%
kable_styling(bootstrap_options = “striped”, full_width = F)
##———##
Z <- c(0, 0, 0)
data$dist_X_Z <- sqrt((data$`$X_{1}$` - Z[1])^2 +
(data$`$X_{2}$` - Z[2])^2 +
(data$`$X_{3}$` - Z[3])^2) %>%
round(4)
colnames(data)[5] <- '$d(X, Z)$'
kable(data, row.names = T, escape = F) %>%
kable_styling(bootstrap_options = “striped”, full_width = F) %>%
column_spec(6, background = “lightgreen”)
Repeat Exercise 7 a,b,c for the point z=(0,1,0)