HW1: Linear & Logistic Regression¶
In this homework, you will read the lecture notes first and then implement linear regression and logistic regression using this jupyter notebook. You will finish all blanks and run all the cells. Please export this to a pdf file that contains all results and turn in both the ipython file and pdf file. Some helpful resources are listed below:
1. First week mentor session video: https://video.gecacademy.cn/?id=b7d464e0-c32f-11e9-8a0d-15fd33044508&logo=
1. Second week Lecture slide: https://github.com/noise-lab/ML-Networking-Primer/blob/master/1_Regression.ipynb
1. Sklearn Logistic regression documentation: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
1. Sklearn Linear regression documentation: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
Note: If you cannot directly export it into pdf file, try export into html first and save that into a pdf.
Linear Regression¶
We will fit simple linear regression on the provided data file. The data samples are represented as row vectors; the first column referes to the input (x-axis) and the second column refers to the output (y-axis). We assume that this dataset was generated from a linear model $y = ax+b$ plus noise, and you need to find the optimal $a^*$ and $b^*$ to fit the data. Now follow the instructions below and show the result.
1. Load the data¶
In [ ]:
import numpy as np
In [ ]:
data =
x =
y =
2. Visualize the date¶
In [ ]:
import matplotlib.pyplot as plt
In [ ]:
3. Fit the linear regression¶
In [ ]:
from sklearn import linear_model
In [ ]:
linear_regression =
In [ ]:
best_line_ys =
4. Plot the results along with the points¶
In [ ]:
#plot the points and fitting line
plt.plot(…..)
plt.title(“Linear regression line fitting”)
plt.xlabel(“x”)
plt.ylabel(“y”)
plt.show()
Logistic Regression¶
We will fit a logistic regression on the provided data. This dataset is modified from Iris dataset. It contains 2 classes of 50 instances each, where each class refers to a type of the iris plant. Each instance contains 5 attributes, which are sepal length in cn, sepal width in cm, petal length in cm, petal width in cm and class (Iris-setosa, Iris-versicolor). The first 4 attributes are numerical, while the last attribute is categorical. In this task, you are going to use the first 4 attributes to predict the 5th attribute by fitting a logistic regression.
1. Load the data¶
In [ ]:
import numpy as np
In [ ]:
data =
x =
y =
2. Seperate data into train and test¶
training data should have 70 instances and testing data should have the rest.
In [ ]:
xtrain =
ytrain =
xtest =
ytest =
2. Fit the logistic regression¶
Fit the regression on our training data.
In [ ]:
logistic_regression =
3. Calculate the prediction accuracy¶
Find the accuracy of our fitted regression model on predicting the training and testing data. (hint: sklearn might have helpful functions that you only need one line to get the result for each case. )
In [ ]:
In [ ]:
4. Try change the number of training and testing data and write down what you find out and explain why.¶