CS计算机代考程序代写 chain COSC 2673/2793 | Machine Learning

COSC 2673/2793 | Machine Learning
**Example: Week02 Lecture QandA**

Demo code for week 02 Lecture QandA. The task is to predict the proportion of people living in poverty for some local regions in the US. The dataset consists features relating to the demographics of these regions.
Disclaimer: The code is done quickly to demonstrate some important concepts in regression and should not be considered as an adequate approach to solve the above task.

Reading Data & some visualizations
Read data into a data frame
In [ ]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

data = pd.read_csv(‘train.csv’, delimiter=’,’)
data.head()

It is always good to visualize data before starting the model development phase. I have only done few visualizations here. Data visualization and EDA is discussed in detail in labs. Lets visualize the training data
In [ ]:
plt.figure(figsize=(25,30))
for i, col in enumerate(data.columns):
plt.subplot(6,5,i+1)
plt.hist(data[col], alpha=0.3, color=’b’, density=True)
plt.title(col)
plt.xticks(rotation=’vertical’)
In [ ]:
import seaborn as sns
plt.figure(figsize=(25,30))
for i, col in enumerate(data.columns):
plt.subplot(6,5,i+1)
sns.scatterplot(data=data, x=col, y=’TARGET_poverty’)
# sns.regplot(x=col,y=’MEDV’, data=bostonHouseFrame)
plt.title(col)

plt.xticks(rotation=’vertical’)
plt.show()

Test (unseen) data
Hold out some data for testing (simulated unseen data)
In [ ]:
from sklearn.model_selection import train_test_split

with pd.option_context(‘mode.chained_assignment’, None):
train_data, test_data = train_test_split(data, test_size=0.2, shuffle=True)

Univariate Linear regression
Lets first try univariate regression technique. For this part we will use the percent_unemployment as the attribute (independent variable), and the dependent variable is TARGET_poverty.
In [ ]:
x_train = train_data[[‘percent_unemployment’]]
y_train = train_data[[‘TARGET_poverty’]]
In [ ]:
plt.scatter(x_train,y_train)
plt.xlabel(‘percent_unemployment’)
plt.ylabel(‘TARGET_poverty’)
plt.show()
In [ ]:
from sklearn.linear_model import LinearRegression

reg_lin = LinearRegression().fit(x_train, y_train)
In [ ]:
print (“Theta1 = {:.3f}, Theta0 = {:.3f}”.format(reg_lin.coef_[0][0], reg_lin.intercept_[0]))

Lets plot the optimal hypothesis with the data
In [ ]:
# y = mx+c
x_hyp = np.linspace(0,35,100)
y_hyp = x_hyp*reg_lin.coef_[0][0] + reg_lin.intercept_[0]

plt.scatter(x_train,y_train)
plt.plot(x_hyp, y_hyp, ‘r-‘)
plt.xlabel(‘percent_unemployment’)
plt.ylabel(‘TARGET_poverty’)
plt.show()
In [ ]:
y_train_hat = reg_lin.predict(x_train)
In [ ]:
mean_square_error_train = np.sqrt(np.mean((y_train – y_train_hat)**2))
print(“Root Mean squared error (train data): {:.3f}”.format(mean_square_error_train[‘TARGET_poverty’]))
In [ ]:
x_test = test_data[[‘percent_unemployment’]]
y_test = test_data[[‘TARGET_poverty’]]

y_test_hat = reg_lin.predict(x_test)

mean_square_error_test = np.sqrt(np.mean((y_test – y_test_hat)**2))
print(“Root Mean squared error (test data): {:.3f}”.format(mean_square_error_test[‘TARGET_poverty’]))

Plot of train and test data
In [ ]:
plt.scatter(x_train,y_train)
plt.scatter(x_test,y_test)
plt.xlabel(‘percent_unemployment’)
plt.ylabel(‘TARGET_poverty’)
plt.legend([‘Train Data’, ‘Test data’])
plt.show()

Lets take another independent variable. This time income_per_capita
In [ ]:
x_train2 = train_data[[‘income_per_capita’]]
y_train2 = train_data[[‘TARGET_poverty’]]
In [ ]:
reg_lin2 = LinearRegression().fit(x_train2, y_train2)
print (“Theta1 = {:.3f}, Theta0 = {:.3f}”.format(reg_lin2.coef_[0][0], reg_lin2.intercept_[0]))
In [ ]:
y_train2_hat = reg_lin2.predict(x_train2)
mean_square_error = np.sqrt(np.mean((y_train2 – y_train2_hat)**2))
print(“Root Mean squared error (train data): {:.3f}”.format(mean_square_error[‘TARGET_poverty’]))
In [ ]:
x_test2 = test_data[[‘income_per_capita’]]
y_test2 = test_data[[‘TARGET_poverty’]]

y_test2_hat = reg_lin2.predict(x_test2)

mean_square_error_test = np.sqrt(np.mean((y_test2 – y_test2_hat)**2))
print(“Root Mean squared error (test data): {:.3f}”.format(mean_square_error_test[‘TARGET_poverty’]))
In [ ]:
# y = mx+c
x_hyp = np.linspace(0,70000,100)
y_hyp = x_hyp*reg_lin2.coef_[0][0] + reg_lin2.intercept_[0]

plt.scatter(x_train2,y_train2)
plt.scatter(x_test2,y_test2)
plt.plot(x_hyp, y_hyp, ‘r-‘)
plt.xlabel(‘income_per_capita’)
plt.ylabel(‘TARGET_poverty’)
plt.legend([‘Train Data’, ‘Test data’])
plt.show()

Univariate Polynomial Regression
The linear model does not have enough capacity to represent the relationship between TARGET_poverty and income_per_capita
How can we increase the capacity of the model?
In [ ]:
x_train = train_data[[‘income_per_capita’]]
y_train = train_data[[‘TARGET_poverty’]]

Convert the features to Polynomial features. We can alter the capacity of the model with the hyper parameter polynomial degree.
In [ ]:
from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(4)
poly.fit(x_train)
x_train_poly = poly.transform(x_train)
In [ ]:
print(‘shape of x:’, x_train.shape)
print(‘shape of polynormial x:’, x_train_poly.shape)
In [ ]:
reg_poly = LinearRegression().fit(x_train_poly, y_train)
In [ ]:
x_hyp = np.linspace(0,70000,100)
x_hyp_poly = poly.transform(x_hyp.reshape(-1, 1))
y_hyp = reg_poly.predict(x_hyp_poly)

plt.scatter(x_train,y_train)
plt.plot(x_hyp, y_hyp, ‘r-‘)
plt.xlabel(‘percent_unemployment’)
plt.ylabel(‘TARGET_poverty’)
plt.legend([‘Train Data’, ‘Test data’])
plt.show()
In [ ]:
y_train_hat = reg_poly.predict(x_train_poly)
mean_square_error = np.sqrt(np.mean((y_train – y_train_hat)**2))
print(“Root Mean squared error (train data): {:.3f}”.format(mean_square_error[‘TARGET_poverty’]))
In [ ]:
x_test = test_data[[‘income_per_capita’]]
y_test = test_data[[‘TARGET_poverty’]]

x_test_poly = poly.transform(x_test)
y_test_hat = reg_poly.predict(x_test_poly)

mean_square_error_test = np.sqrt(np.mean((y_test – y_test_hat)**2))
print(“Root Mean squared error (test data): {:.3f}”.format(mean_square_error_test[‘TARGET_poverty’]))

Multivariate Linear Regression
In [ ]:
X_train = train_data.drop([‘TARGET_poverty’, ‘id’], axis=1)
y_train = train_data[[‘TARGET_poverty’]]
In [ ]:
multi_reg_lin = LinearRegression().fit(X_train, y_train)
In [ ]:
y_train_hat = multi_reg_lin.predict(X_train)

mean_square_error_train = np.sqrt(np.mean((y_train – y_train_hat)**2))
print(“Root Mean squared error (train data): {:.3f}”.format(mean_square_error_train[‘TARGET_poverty’]))
In [ ]:
X_test = test_data.drop([‘TARGET_poverty’, ‘id’], axis=1)
y_test = test_data[[‘TARGET_poverty’]]

y_test_hat = multi_reg_lin.predict(X_test)

mean_square_error_test = np.sqrt(np.mean((y_test – y_test_hat)**2))
print(“Root Mean squared error (test data): {:.3f}”.format(mean_square_error_test[‘TARGET_poverty’]))
In [ ]:
plt.scatter(y_test, y_test_hat)
plt.xlabel(‘Target Value’)
plt.ylabel(‘Predicted Value’)
plt.show()

Multivariate Polynomial Regression
In [ ]:
X_train = train_data.drop([‘TARGET_poverty’, ‘id’], axis=1)
y_train = train_data[[‘TARGET_poverty’]]
In [ ]:
poly = PolynomialFeatures(2)
poly.fit(X_train)
X_train_poly = poly.transform(X_train)
In [ ]:
print(‘shape of x:’, X_train.shape)
print(‘shape of polynormial x:’, X_train_poly.shape)
In [ ]:
reg_multi_poly = LinearRegression().fit(X_train_poly, y_train)
In [ ]:
y_train_hat = reg_multi_poly.predict(X_train_poly)
mean_square_error = np.sqrt(np.mean((y_train – y_train_hat)**2))
print(“Root Mean squared error (train data): {:.3f}”.format(mean_square_error[‘TARGET_poverty’]))
In [ ]:
X_test = test_data.drop([‘TARGET_poverty’, ‘id’], axis=1)
y_test = test_data[[‘TARGET_poverty’]]

X_test_poly = poly.transform(X_test)
y_test_hat = reg_multi_poly.predict(X_test_poly)

mean_square_error_test = np.sqrt(np.mean((y_test – y_test_hat)**2))
print(“Root Mean squared error (test data): {:.3f}”.format(mean_square_error_test[‘TARGET_poverty’]))

Data Normalization
In [ ]:
X_train = train_data.drop([‘TARGET_poverty’, ‘id’], axis=1)
y_train = train_data[[‘TARGET_poverty’]]
In [ ]:
from sklearn import preprocessing

scaler = preprocessing.StandardScaler().fit(X_train)
X_train_scaled = scaler.transform(X_train)
In [ ]:
multi_reg_lin = LinearRegression().fit(X_train_scaled, y_train)
In [ ]:
y_train_hat = multi_reg_lin.predict(X_train_scaled)

mean_square_error_train = np.sqrt(np.mean((y_train – y_train_hat)**2))
print(“Root Mean squared error (train data): {:.3f}”.format(mean_square_error_train[‘TARGET_poverty’]))
In [ ]:
X_test = test_data.drop([‘TARGET_poverty’, ‘id’], axis=1)
y_test = test_data[[‘TARGET_poverty’]]

X_test_scaled = scaler.transform(X_test)

y_test_hat = multi_reg_lin.predict(X_test_scaled)

mean_square_error_test = np.sqrt(np.mean((y_test – y_test_hat)**2))
print(“Root Mean squared error (test data): {:.3f}”.format(mean_square_error_test[‘TARGET_poverty’]))
In [ ]:
plt.scatter(y_test, y_test_hat)
plt.xlabel(‘Target Value’)
plt.ylabel(‘Predicted Value’)
plt.show()