—
title: “solution2”
output: html_document
—
“`{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
“`
## Question 3
### Load data
“`{r}
train = read.csv(“linear-train.csv”)
test = read.csv(“linear-test.csv”)
“`
### 3.a
“`{r}
getMSE <- function(y, pred)
{
return (mean( (y - pred)^2))
}
model = lm(out~in1+in2+in3+in4, train[1:2000, ])
pred = predict(model, train[2001:2200, ] )
mse = getMSE( train[2001:2200, "out"], pred)
mse
```
The MSE of the 100 predictions for the test set is 5.986125.
### 3.b
```{r}
mean(train[, "in1"])
mean(train[, "in2"])
mean(train[, "in3"])
mean(train[, "in4"])
```
```{r}
library(caret)
trainX = train[,c("in1", "in2", "in3", "in4")]
standardizedPara = preProcess(trainX, method=c("center", "scale"))
standardizedTrainX = predict(standardizedPara, trainX)
standardizedTrainX["out"] = train["out"]
standardizedModel = lm(out~in1+in2+in3+in4, standardizedTrainX[1:2000, ])
standardizedPred = predict(standardizedModel, standardizedTrainX[2001:2200, ] )
standardizedMse = getMSE( standardizedTrainX[2001:2200, "out"], standardizedPred)
standardizedMse
```
```{r}
rangePara = preProcess(trainX, method=c("range"))
rangeTrainX = predict(rangePara, trainX)
rangeTrainX["out"] = train["out"]
rangeModel = lm(out~in1+in2+in3+in4, rangeTrainX[1:2000, ])
rangePred = predict(rangeModel, rangeTrainX[2001:2200, ] )
rangeMse = getMSE( rangeTrainX[2001:2200, "out"], rangePred)
rangeMse
```
The 4 input variables have different scale, especially the variable "in3" has large difference with others. The possible improvements are
1. standardize: transfrom each variable to mean 0 and standard deviation 1
2. scale each variables to range [0, 1]
For each above two preprossing steps, I train the model and test on the lat 200 rows. Interestingly, they have the same MSE with the original one.
I decide to choose method 2 finally, this method ensures all variables have the same scale.
### 3.c
```{r}
rangeTestX = predict(rangePara, test[,c("in1", "in2", "in3", "in4")])
predTest = predict(rangeModel, rangeTestX)
predTest
```
```{r}
getMSE(predTest, test["out"])
```
The prediction is shown above and the MSE is 7.238796.