程序代写代做代考 —


title: “solution2”
output: html_document

“`{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
“`

## Question 3

### Load data
“`{r}
train = read.csv(“linear-train.csv”)
test = read.csv(“linear-test.csv”)
“`

### 3.a

“`{r}
getMSE <- function(y, pred) { return (mean( (y - pred)^2)) } model = lm(out~in1+in2+in3+in4, train[1:2000, ]) pred = predict(model, train[2001:2200, ] ) mse = getMSE( train[2001:2200, "out"], pred) mse ``` The MSE of the 100 predictions for the test set is 5.986125. ### 3.b ```{r} mean(train[, "in1"]) mean(train[, "in2"]) mean(train[, "in3"]) mean(train[, "in4"]) ``` ```{r} library(caret) trainX = train[,c("in1", "in2", "in3", "in4")] standardizedPara = preProcess(trainX, method=c("center", "scale")) standardizedTrainX = predict(standardizedPara, trainX) standardizedTrainX["out"] = train["out"] standardizedModel = lm(out~in1+in2+in3+in4, standardizedTrainX[1:2000, ]) standardizedPred = predict(standardizedModel, standardizedTrainX[2001:2200, ] ) standardizedMse = getMSE( standardizedTrainX[2001:2200, "out"], standardizedPred) standardizedMse ``` ```{r} rangePara = preProcess(trainX, method=c("range")) rangeTrainX = predict(rangePara, trainX) rangeTrainX["out"] = train["out"] rangeModel = lm(out~in1+in2+in3+in4, rangeTrainX[1:2000, ]) rangePred = predict(rangeModel, rangeTrainX[2001:2200, ] ) rangeMse = getMSE( rangeTrainX[2001:2200, "out"], rangePred) rangeMse ``` The 4 input variables have different scale, especially the variable "in3" has large difference with others. The possible improvements are 1. standardize: transfrom each variable to mean 0 and standard deviation 1 2. scale each variables to range [0, 1] For each above two preprossing steps, I train the model and test on the lat 200 rows. Interestingly, they have the same MSE with the original one. I decide to choose method 2 finally, this method ensures all variables have the same scale. ### 3.c ```{r} rangeTestX = predict(rangePara, test[,c("in1", "in2", "in3", "in4")]) predTest = predict(rangeModel, rangeTestX) predTest ``` ```{r} getMSE(predTest, test["out"]) ``` The prediction is shown above and the MSE is 7.238796.