Tutorial_04_task_solution_updated
Tutorial 4 Tasks Solution¶¶
Generating Synthetic Data¶
Lets create a synthetic dataset as below
$f(x) = \beta_0 + \beta_1 x + \beta_2 x^2 $
In [13]:
import numpy as np
import matplotlib.pyplot as plt
# Initialise RNG, so we get same result everytime
np.random.seed(0)
# Number of training points
m = 50
x = np.linspace(0.0, 1.0, m)
In [14]:
# Function coefficients/parameters
beta0 = 4
beta1 = 1.5
beta2 = 3.2
# f values from 2nd order polynomial
f = beta0 + beta1 * x + beta2 * np.power(x,2)
f contains data perfectly sampled along the 2nd order polynomial. In the real world we usually observe data that is noisy. So we will add a small amount of Gaussian noise to the data
In [15]:
# Generate noisy sample from population
sigma2 = 0.1
y = f + np.random.normal(0, np.sqrt(sigma2), m)
In [16]:
fig1 = plt.figure()
plt.plot(x, f, label = “f (Ground Truth)”)
plt.scatter(x, y, label = “y (Observed Points)”, color = “red”)
plt.xlabel(“Predictor/Feature Value”)
plt.ylabel(“Target Value”)
plt.title(“Underlying Function vs Observations”)
plt.legend(loc=”upper left”)
fig1
Out[16]:
Your task is to implement the bias variance decomposition of KNN based on lecture and this simulated data set.¶
You may choose two (and more) k trial values, e.g. 1 and 50, and see how the bais and variance values look like based on the choices of k. Remember that
Expected Prediction Error (EPE), sometimes called expected test MSE(Mean Squared Error)
= Var(f_hat(x0))+ [Bias(f_hat(x0))]^2 + Var(ε)
Calculate the bias and variance with 1NN¶
In [17]:
x_0= x[25]
In [21]:
from sklearn import neighbors
# choose one specific x0
x_0= x[25]
true_f= np.asmatrix(f.reshape(m,1))
# true target value
f_x_0= true_f[25]
In [22]:
# fit a 1NN model
n_neighbors = 1
knn = neighbors.KNeighborsRegressor(n_neighbors, weights=’uniform’)
# generate the predictions with 1NN with the input x_0
# note here we use f(x_l) instead of y_l as the expection calcualtion for E_f_x0_hat. Refer to lecture slides.
E_f_x0_hat = knn.fit(x.reshape(-1, 1), true_f.reshape(-1, 1)).predict([[x_0]])
# calculate bias with vector x0
bias_x_0_k1= f_x_0 – E_f_x0_hat
# calcualte variance with vector x0
var_x_0_k1= sigma2/n_neighbors
In [23]:
print(“1NN bias and variance: {0}”.format([np.asscalar(bias_x_0_k1),var_x_0_k1]))
1NN bias and variance: [0.0, 0.1]
print(“1NN bias and variance: {0}”.format([np.asscalar(bias_x_0_k1),var_x_0_k1]))
Calculate the bias and variance when 50NN¶
In [24]:
# fit a 50NN model
n_neighbors = 50
knn_50 = neighbors.KNeighborsRegressor(n_neighbors, weights=’uniform’)
# generate the predictions with 1NN with the input x_0
# note here we use f(x_l) as in the slide instead of y as the expection calcualtion
E_f_x0_hat = knn_50.fit(x.reshape(-1, 1), true_f.reshape(-1, 1)).predict([[x_0]])
# calculate bias with vector x0
bias_x_0_k50= f_x_0 – E_f_x0_hat
# calcualte variance with vector x0
var_x_0_k50= sigma2/n_neighbors
In [25]:
print(“50NN bias and variance: {0}”.format([np.asscalar(bias_x_0_k50),var_x_0_k50]))
50NN bias and variance: [-0.229258642232403, 0.002]
print(“50NN bias and variance: {0}”.format([np.asscalar(bias_x_0_k50),var_x_0_k50]))
For a potential overfititng model, e.g. 1NN, the bias is 0, while the variance is high at 0.1
For a potential underfititng model, e.g. 50NN, the bias is high at -0.2293, while the variance is low at 0.002
Try to wrtie a for loop to iterate through a range K values, and visualise the bias and varianc pattern VS K.