CS代考 kernel_ridge

kernel_ridge

[HW 4] Kernel Ridge Regression Practice¶
In this homework, you will practice implementing ridge regression with polynomial featurization applied to the data matrix X. You will first implement featurized ridge regression the naive way, and then implement it using kernels.

Copyright By PowCoder代写 加微信 powcoder

Imports and Helper Functions¶

import matplotlib.pyplot as plt
import numpy as np
from matplotlib import cm
import seaborn as sns
sns.set_style(“white”)

# Make a result directory to store plots
os.makedirs(“./result”, exist_ok=True)

def heatmap(f, clip=True):
Function to generate a heatmap of a dataset
# example: heatmap(lambda x, y: x * x + y * y)
xx = yy = np.linspace(np.min(X), np.max(X), 72)
x0, y0 = np.meshgrid(xx, yy)
x0, y0 = x0.ravel(), y0.ravel()
z0 = f(x0, y0)

z0[z0 > 5] = 5
z0[z0 < -5] = -5 plt.hexbin(x0, y0, C=z0, gridsize=50, cmap=cm.jet, bins=None) plt.colorbar() cs = plt.contour( xx, yy, z0.reshape(xx.size, yy.size), [-2, -1, -0.5, 0, 0.5, 1, 2], cmap=cm.jet) plt.clabel(cs, inline=1, fontsize=10) pos = y[:] == +1.0 neg = y[:] == -1.0 plt.scatter(X[pos, 0], X[pos, 1], c='red', marker='+') plt.scatter(X[neg, 0], X[neg, 1], c='blue', marker='v') data_names = ['circle', 'heart', 'asymmetric'] (a) Visualize the Datasets¶ The datasets that we are given are shapes, such as circles and hearts. Each data point in X is a pair of coordinates. For instance, X[0] may be (0, 5), which would mean that there is a point at (0, 5). The corresponding y[0] can either be the +1 class or -1 class. Task: Visualize all the datasets. Label the points with different $y$ values with different colors and/or shapes. def viz_data(X, y): Function to visualize the dataset. Label the points with different y values with different colors - X: n x 2 data matrix that represents the coordinates of our data points - y: n x 1 vector that represents the class labels for our data points - None: Do not return anything. Just plot the data using a scatter plot ### YOUR CODE HERE ### plt.figure(figsize=[6,12]) for i, dataset in enumerate(data_names): data = np.load(dataset + '.npz') X = data["x"] y = data["y"] plt.subplot(3,1,i+1) viz_data(X, y) plt.legend() plt.title(dataset) plt.savefig("./result/vis_data.png") plt.show() You should have noticed that all of the points that are labeled with +1 are at the center of the points that are labeled -1. Therefore, the data is currently not linearly separable, and none of our linear classifiers will work! To solve this problem, we will featurize the data matrix to "lift" the data up into a higher dimension where it is linearly separable. (b) Polynomial Regression (Non-kernel)¶ We will first try to feature the data matrix using polynomial features. Implement polynomial ridge regression to fit the datasets circle.npz, asymmetric.npy, and heart.npz. Use the first 80% of the data as the training dataset and the last 20% of the data as the validation dataset. Report both the average training squared loss and the average validation squared for polynomial order $p \in \{1, \dots, 16\}$. Use the regularization term $\lambda=0.001$ for all $p$. Visualize your result and attach the heatmap plots for the learned predictions over the entire 2D domain for $p \in \{2, 4, 6, 8, 10, 12\}$ in your writeup. def featurize(X, D): Create a vector of polynomial features up to order D from x Your features do not need to include binomial coefficients. For instance, you do not need to have (sqrt(2) * x_1 * x_2), (x_1 * x_2) is sufficient - X: n x 2 data matrix - D: Order of the polynomial features - Featurized_X: n x k featurized data matrix (Note that k does not equal D!) ### YOUR CODE HERE ### def ridge_regression(X, y, lambda_=0): Compute the weight vector w that is determined by the closed-form ridge regression solution - X: n x d data matrix - y: n x 1 vector for labels - lambda_: Regularization hyperparameter - w: d x 1 weight vector ### YOUR CODE HERE ### def ridge_error(X, y, w): Compute the average squared loss given X, y, and w - X: n x d data matrix - y: n x 1 vector for labels - w: d x 1 weight vector - error: scalar value return error for ds in ['circle', 'heart', 'asymmetric']: data = np.load(f'{ds}.npz') SPLIT = 0.8 X = data["x"] y = data["y"] X /= np.max(X) # normalize the data n_train = int(X.shape[0] * SPLIT) X_train = X[:n_train:, :] X_valid = X[n_train:, :] y_train = y[:n_train] y_valid = y[n_train:] LAMBDA = 0.001 isubplot = 0 fig = plt.figure(figsize=[12,10]) for D in range(1, 17): Xd_train = featurize(X_train, D) Xd_valid = featurize(X_valid, D) w = ridge_regression(Xd_train, y_train, LAMBDA) error_train = ridge_error(Xd_train, y_train, w) error_valid = ridge_error(Xd_valid, y_valid, w) if D in [2, 4, 6, 8, 10, 12]: isubplot += 1 plt.subplot(3,2,isubplot) heatmap(lambda x, y: featurize(np.vstack([x, y]).T, D) @ w) plt.title("D = %d" % D) print("p = {:2d} train_error = {:10.6f} validation_error = {:10.6f}". format(D, error_train, error_valid)) fig.savefig(f"./result/{ds}_non_kernel.png") A heatmap may seem difficult to interpret in the context of machine learning. Think of the color at a specific data point on the heatmap as a "feature" that condenses all the information from the newly created features. The heatmap allows us to visualize the data, even though it has now been featurized into a higher dimension. You can interpret data points with different colors as having different values along this new color dimension. (c) Polynomial Kernel Ridge Regression¶ Implement kernel ridge regression to fit the datasets circle.npz, heart.npz, and optionally (due to the computational requirements), asymmetric.npz. Use the polynomial kernel $K(\vec x_i, \vec x_j) = (1 + \vec x_i^\top \vec x_j)^p$. Use the first 80\% data as the training dataset and the last 20\% data as the validation dataset. Report both the average training squared loss and the average validation squared loss for polynomial order $p \in \{1,\dots, 16\}$. Use the regularization term $\lambda=0.001$ for all $p$. For circle.npz, also report the average training squared loss and validation squared loss for polynomial order $p \in \{1,\dots, 24\}$ when you use only the first 15\% of data as the training dataset and the final 85\% of data as the validation dataset. Based on the error, comment on when you want to use a high-order polynomial in linear/ridge regression. def poly_kernel(X, XT, D): Create the polynomial order D kernel matrix from X and X^T - X: n x d data matrix - XT: n x d data matrix (does not have to be the same matrix as X) - D: Degree of the polynomial - K: n x n kernel matrix ### YOUR CODE HERE ### def kernel_ridge_regression(X, y, kernel_func, kernel_param, lambda_=0): Perform kernel ridge regression by computing the alpha coefficient that is associated with the kernelized version of the closed-form ridge regression solution You are not required to use this skeleton code if you have an alternative method of computing the kernel ridge regression predictions. This skeleton code is only here to help you. If you are stuck, review "Kernel Ridge Regression: Theory" from the homework - X: n x d training data matrix - y: n x 1 vector for training labels - kernel_func: Kernel function to be used in ridge regression - kernel_param: Extra parameters needed for kernel function (i.e. D or sigma) - alpha: n x 1 vector ### YOUR CODE HERE ### return alpha def kernel_ridge_error(X, XT, y, alpha, kernel_func, kernel_param): Compute the average squared loss given X, XT, y, and alpha - X: n x d data matrix - XT: n x d data matrix (does not have to be the same matrix as X) - y: n x 1 vector for labels - alpha: n x 1 vector - kernel_func: Kernel function to be used in ridge regression - kernel_param: Extra parameters needed for kernel function (i.e. D or sigma) - error: scalar value ### YOUR CODE HERE ### return error for ds in ['circle', 'heart']: data = np.load(f'{ds}.npz') SPLIT = 0.8 X = data["x"] y = data["y"] X /= np.max(X) # normalize the data n_train = int(X.shape[0] * SPLIT) X_train = X[:n_train:, :] X_valid = X[n_train:, :] y_train = y[:n_train] y_valid = y[n_train:] isubplot = 0 fig = plt.figure(figsize=[12,10]) for D in range(1, 17): alpha = kernel_ridge_regression(X_train, y_train, poly_kernel, D, LAMBDA) error_train = kernel_ridge_error(X_train, X_train.T, y_train, alpha, poly_kernel, D) error_valid = kernel_ridge_error(X_valid, X_train.T, y_valid, alpha, poly_kernel, D) print("p = {:2d} train_error = {:7.6f} validation_error = {:7.6f} ". format(D, error_train, error_valid)) if D in [2, 4, 6, 8, 10, 12]: isubplot += 1 plt.subplot(3,2,isubplot) heatmap(lambda x, y: poly_kernel(np.column_stack([x, y]), X_train.T, D) @ alpha) plt.title("D = %d" % D) #plt.show(); #fig = plt.figure() fig.savefig(f"./result/{ds}_kernel.png") Are the heatmaps from the kernelized implementation of polynomial ridge regression the same or different from the naive implementation of polynomial ridge regression? Why might we be observing this? Your comments here... (d) RBF Kernel Ridge Regression¶ A popular kernel function that is widely used in various kernelized learning algorithms is called the radial basis function kernel (RBF kernel). It is defined as \begin{equation} K(\mathbf{x}, \mathbf{x}') = \exp \left(-\frac{\lVert \mathbf{x}-\mathbf{x}'\rVert_2^2}{2\sigma^2}\right). \end{equation} Implement the RBF kernel function for kernel ridge regression to fit the dataset heart.npz. Use the regularization term $\lambda=0.001$. Report the average squared loss, visualize your result and attach the heatmap plots for the fitted functions over the 2D domain for $\sigma \in \{10, 3, 1, 0.3, 0.1, 0.03\}$ in your writeup. You may want to vectorize your kernel functions to speed up your implementation, although it is not necessary to receive full points on this part. Comment on the effect of def rbf_kernel(X, XT, sigma): Create the rbf kernel matrix from X and X^T - X: n x d data matrix - XT: n x d data matrix (does not have to be the same matrix as X) - sigma: RBF kernel parameter - K: n x n kernel matrix ### YOUR CODE HERE ### # data = np.load('circle.npz') data = np.load('heart.npz') # data = np.load('asymmetric.npz') SPLIT = 0.8 X = data["x"] y = data["y"] X /= np.max(X) # normalize the data n_train = int(X.shape[0] * SPLIT) X_train = X[:n_train:, :] eX_valid = X[n_train:, :] y_train = y[:n_train] y_valid = y[n_train:] fig = plt.figure(figsize=[12,10]) isubplot = 0 for sigma in [10, 3, 1, 0.3, 0.1, 0.03]: alpha = kernel_ridge_regression(X_train, y_train, rbf_kernel, sigma, LAMBDA) error_train = kernel_ridge_error(X_train, X_train.T, y_train, alpha, rbf_kernel, sigma) error_valid = kernel_ridge_error(X_valid, X_train.T, y_valid, alpha, rbf_kernel, sigma) print("sigma = {:6.3f} train_error = {:7.6f} validation_error = {:7.6f}". format(sigma, error_train, error_valid)) isubplot += 1 plt.subplot(3,2,isubplot) heatmap(lambda x, y: rbf_kernel(np.column_stack([x, y]), X_train.T, sigma) @ alpha) plt.title("sigma = %.2f" % sigma) fig.savefig("./result/heart_rbf.png") #plt.show(); Your comments here... 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com