kernel_ridge
[HW 4] Kernel Ridge Regression Practice¶
In this homework, you will practice implementing ridge regression with polynomial featurization applied to the data matrix X. You will first implement featurized ridge regression the naive way, and then implement it using kernels.
Copyright By PowCoder代写 加微信 powcoder
Imports and Helper Functions¶
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import cm
import seaborn as sns
sns.set_style(“white”)
# Make a result directory to store plots
os.makedirs(“./result”, exist_ok=True)
def heatmap(f, clip=True):
Function to generate a heatmap of a dataset
# example: heatmap(lambda x, y: x * x + y * y)
xx = yy = np.linspace(np.min(X), np.max(X), 72)
x0, y0 = np.meshgrid(xx, yy)
x0, y0 = x0.ravel(), y0.ravel()
z0 = f(x0, y0)
z0[z0 > 5] = 5
z0[z0 < -5] = -5
plt.hexbin(x0, y0, C=z0, gridsize=50, cmap=cm.jet, bins=None)
plt.colorbar()
cs = plt.contour(
xx, yy, z0.reshape(xx.size, yy.size), [-2, -1, -0.5, 0, 0.5, 1, 2], cmap=cm.jet)
plt.clabel(cs, inline=1, fontsize=10)
pos = y[:] == +1.0
neg = y[:] == -1.0
plt.scatter(X[pos, 0], X[pos, 1], c='red', marker='+')
plt.scatter(X[neg, 0], X[neg, 1], c='blue', marker='v')
data_names = ['circle', 'heart', 'asymmetric']
(a) Visualize the Datasets¶
The datasets that we are given are shapes, such as circles and hearts. Each data point in X is a pair of coordinates. For instance, X[0] may be (0, 5), which would mean that there is a point at (0, 5). The corresponding y[0] can either be the +1 class or -1 class.
Task: Visualize all the datasets.
Label the points with different $y$ values with different colors and/or shapes.
def viz_data(X, y):
Function to visualize the dataset. Label the points with different y values with different colors
- X: n x 2 data matrix that represents the coordinates of our data points
- y: n x 1 vector that represents the class labels for our data points
- None: Do not return anything. Just plot the data using a scatter plot
### YOUR CODE HERE ###
plt.figure(figsize=[6,12])
for i, dataset in enumerate(data_names):
data = np.load(dataset + '.npz')
X = data["x"]
y = data["y"]
plt.subplot(3,1,i+1)
viz_data(X, y)
plt.legend()
plt.title(dataset)
plt.savefig("./result/vis_data.png")
plt.show()
You should have noticed that all of the points that are labeled with +1 are at the center of the points that are labeled -1. Therefore, the data is currently not linearly separable, and none of our linear classifiers will work! To solve this problem, we will featurize the data matrix to "lift" the data up into a higher dimension where it is linearly separable.
(b) Polynomial Regression (Non-kernel)¶
We will first try to feature the data matrix using polynomial features.
Implement polynomial ridge regression to
fit the datasets circle.npz, asymmetric.npy, and
heart.npz. Use the first 80% of the data as the training dataset and the
last 20% of the data as the validation dataset.
Report both the average
training squared loss and the average validation squared for polynomial order
$p \in \{1, \dots, 16\}$. Use the regularization term $\lambda=0.001$ for all
$p$. Visualize your result and attach the heatmap plots for the
learned predictions over the entire 2D domain for $p \in \{2, 4, 6, 8, 10,
12\}$ in your writeup.
def featurize(X, D):
Create a vector of polynomial features up to order D from x
Your features do not need to include binomial coefficients.
For instance, you do not need to have (sqrt(2) * x_1 * x_2), (x_1 * x_2) is sufficient
- X: n x 2 data matrix
- D: Order of the polynomial features
- Featurized_X: n x k featurized data matrix (Note that k does not equal D!)
### YOUR CODE HERE ###
def ridge_regression(X, y, lambda_=0):
Compute the weight vector w that is determined by the closed-form ridge regression solution
- X: n x d data matrix
- y: n x 1 vector for labels
- lambda_: Regularization hyperparameter
- w: d x 1 weight vector
### YOUR CODE HERE ###
def ridge_error(X, y, w):
Compute the average squared loss given X, y, and w
- X: n x d data matrix
- y: n x 1 vector for labels
- w: d x 1 weight vector
- error: scalar value
return error
for ds in ['circle', 'heart', 'asymmetric']:
data = np.load(f'{ds}.npz')
SPLIT = 0.8
X = data["x"]
y = data["y"]
X /= np.max(X) # normalize the data
n_train = int(X.shape[0] * SPLIT)
X_train = X[:n_train:, :]
X_valid = X[n_train:, :]
y_train = y[:n_train]
y_valid = y[n_train:]
LAMBDA = 0.001
isubplot = 0
fig = plt.figure(figsize=[12,10])
for D in range(1, 17):
Xd_train = featurize(X_train, D)
Xd_valid = featurize(X_valid, D)
w = ridge_regression(Xd_train, y_train, LAMBDA)
error_train = ridge_error(Xd_train, y_train, w)
error_valid = ridge_error(Xd_valid, y_valid, w)
if D in [2, 4, 6, 8, 10, 12]:
isubplot += 1
plt.subplot(3,2,isubplot)
heatmap(lambda x, y: featurize(np.vstack([x, y]).T, D) @ w)
plt.title("D = %d" % D)
print("p = {:2d} train_error = {:10.6f} validation_error = {:10.6f}".
format(D, error_train, error_valid))
fig.savefig(f"./result/{ds}_non_kernel.png")
A heatmap may seem difficult to interpret in the context of machine learning. Think of the color at a specific data point on the heatmap as a "feature" that condenses all the information from the newly created features. The heatmap allows us to visualize the data, even though it has now been featurized into a higher dimension.
You can interpret data points with different colors as having different values along this new color dimension.
(c) Polynomial Kernel Ridge Regression¶
Implement kernel ridge regression to fit the datasets
circle.npz, heart.npz, and optionally (due to the
computational requirements), asymmetric.npz. Use the polynomial
kernel $K(\vec x_i, \vec x_j) = (1 + \vec x_i^\top \vec x_j)^p$. Use the first
80\% data as the training dataset and the last 20\% data as the validation
dataset. Report both the average training squared loss and the average
validation squared loss for polynomial order $p \in \{1,\dots, 16\}$. Use the
regularization term $\lambda=0.001$ for all $p$. For
circle.npz, also report the average training squared loss and
validation squared loss for polynomial order $p \in \{1,\dots, 24\}$ when you
use only the first 15\% of data as the training dataset and the final 85\% of data as
the validation dataset. Based on the error, comment on when you want
to use a high-order polynomial in linear/ridge regression.
def poly_kernel(X, XT, D):
Create the polynomial order D kernel matrix from X and X^T
- X: n x d data matrix
- XT: n x d data matrix (does not have to be the same matrix as X)
- D: Degree of the polynomial
- K: n x n kernel matrix
### YOUR CODE HERE ###
def kernel_ridge_regression(X, y, kernel_func, kernel_param, lambda_=0):
Perform kernel ridge regression by computing the alpha coefficient that is associated with the kernelized version
of the closed-form ridge regression solution
You are not required to use this skeleton code if you have an alternative method of computing the kernel
ridge regression predictions. This skeleton code is only here to help you.
If you are stuck, review "Kernel Ridge Regression: Theory" from the homework
- X: n x d training data matrix
- y: n x 1 vector for training labels
- kernel_func: Kernel function to be used in ridge regression
- kernel_param: Extra parameters needed for kernel function (i.e. D or sigma)
- alpha: n x 1 vector
### YOUR CODE HERE ###
return alpha
def kernel_ridge_error(X, XT, y, alpha, kernel_func, kernel_param):
Compute the average squared loss given X, XT, y, and alpha
- X: n x d data matrix
- XT: n x d data matrix (does not have to be the same matrix as X)
- y: n x 1 vector for labels
- alpha: n x 1 vector
- kernel_func: Kernel function to be used in ridge regression
- kernel_param: Extra parameters needed for kernel function (i.e. D or sigma)
- error: scalar value
### YOUR CODE HERE ###
return error
for ds in ['circle', 'heart']:
data = np.load(f'{ds}.npz')
SPLIT = 0.8
X = data["x"]
y = data["y"]
X /= np.max(X) # normalize the data
n_train = int(X.shape[0] * SPLIT)
X_train = X[:n_train:, :]
X_valid = X[n_train:, :]
y_train = y[:n_train]
y_valid = y[n_train:]
isubplot = 0
fig = plt.figure(figsize=[12,10])
for D in range(1, 17):
alpha = kernel_ridge_regression(X_train, y_train, poly_kernel, D, LAMBDA)
error_train = kernel_ridge_error(X_train, X_train.T, y_train, alpha, poly_kernel, D)
error_valid = kernel_ridge_error(X_valid, X_train.T, y_valid, alpha, poly_kernel, D)
print("p = {:2d} train_error = {:7.6f} validation_error = {:7.6f} ".
format(D, error_train, error_valid))
if D in [2, 4, 6, 8, 10, 12]:
isubplot += 1
plt.subplot(3,2,isubplot)
heatmap(lambda x, y: poly_kernel(np.column_stack([x, y]), X_train.T, D) @ alpha)
plt.title("D = %d" % D)
#plt.show();
#fig = plt.figure()
fig.savefig(f"./result/{ds}_kernel.png")
Are the heatmaps from the kernelized implementation of polynomial ridge regression the same or different from the naive implementation of polynomial ridge regression? Why might we be observing this?
Your comments here...
(d) RBF Kernel Ridge Regression¶
A popular kernel function that is widely used in various kernelized
learning algorithms is called the radial basis function kernel (RBF kernel).
It is defined as
\begin{equation} K(\mathbf{x}, \mathbf{x}') = \exp \left(-\frac{\lVert
\mathbf{x}-\mathbf{x}'\rVert_2^2}{2\sigma^2}\right).
\end{equation}
Implement the RBF kernel function for kernel ridge regression to fit the dataset
heart.npz. Use the regularization term $\lambda=0.001$.
Report the average squared loss, visualize your result and attach the
heatmap plots for the fitted functions over the 2D domain for $\sigma \in \{10,
3, 1, 0.3, 0.1, 0.03\}$ in your writeup.
You may want to vectorize your kernel
functions to speed up your implementation, although it is not necessary to receive full points on this part. Comment on the effect of
def rbf_kernel(X, XT, sigma):
Create the rbf kernel matrix from X and X^T
- X: n x d data matrix
- XT: n x d data matrix (does not have to be the same matrix as X)
- sigma: RBF kernel parameter
- K: n x n kernel matrix
### YOUR CODE HERE ###
# data = np.load('circle.npz')
data = np.load('heart.npz')
# data = np.load('asymmetric.npz')
SPLIT = 0.8
X = data["x"]
y = data["y"]
X /= np.max(X) # normalize the data
n_train = int(X.shape[0] * SPLIT)
X_train = X[:n_train:, :]
eX_valid = X[n_train:, :]
y_train = y[:n_train]
y_valid = y[n_train:]
fig = plt.figure(figsize=[12,10])
isubplot = 0
for sigma in [10, 3, 1, 0.3, 0.1, 0.03]:
alpha = kernel_ridge_regression(X_train, y_train, rbf_kernel, sigma, LAMBDA)
error_train = kernel_ridge_error(X_train, X_train.T, y_train, alpha, rbf_kernel, sigma)
error_valid = kernel_ridge_error(X_valid, X_train.T, y_valid, alpha, rbf_kernel, sigma)
print("sigma = {:6.3f} train_error = {:7.6f} validation_error = {:7.6f}".
format(sigma, error_train, error_valid))
isubplot += 1
plt.subplot(3,2,isubplot)
heatmap(lambda x, y: rbf_kernel(np.column_stack([x, y]), X_train.T, sigma) @ alpha)
plt.title("sigma = %.2f" % sigma)
fig.savefig("./result/heart_rbf.png")
#plt.show();
Your comments here...
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com