CS计算机代考程序代写 Tutorial03-Tasks-Solutions (1)

Tutorial03-Tasks-Solutions (1)

QBUS2820 – Predictive Analytics
¶Tutorial 3 Tasks Solutions¶

In [3]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_context(‘notebook’)
%matplotlib inline

In [4]:

data=pd.read_csv(‘credit.csv’, index_col=’Obs’)
train = data.sample(frac=0.7, random_state=1)
test = data[data.index.isin(train.index)==False]
print(len(train), len(test))

280 120

In [5]:

values = np.arange(1, 101)
print(values)

[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100]

Problem 1
Complete code in the companion notebook to generate a plot of the test performance for the model with one as we change the number of neighbours. Interpret the results and relate them to our discussion in the first module.

In [10]:

from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

losses = []
for k in values:
#KNN trained with one predictor, ‘Limit’
knn = KNeighborsRegressor(n_neighbors= k)
knn.fit(train[[‘Limit’]], train[‘Balance’])
predictions = knn.predict(test[[‘Limit’]])
# Comment out the above 3 lines and uncomment the lines below to see what happen if you use two predictors
# KNN2 trained with two predictors, ‘Limit’, ‘Íncome’
# the value of K with lowest error will be different
#predictors=[‘Limit’,’Income’]
#knn2= KNeighborsRegressor(n_neighbors = k, metric=’mahalanobis’, metric_params={‘V’: train[predictors].cov()})
#knn2.fit(train[predictors], train[‘Balance’])
#predictions = knn2.predict(test[predictors])

loss = np.sqrt(mean_squared_error(test[‘Balance’], predictions))
losses.append(loss)

fig, ax= plt.subplots()
ax.plot(values, losses)
ax.set_xlabel(‘Number of neighbours’)
ax.set_ylabel(‘Test error’)
plt.show()

This will give you the value of $k$ with lowest error.

In [11]:

1 + np.argmin(losses)

Out[11]:

6