CS计算机代考程序代写 algorithm COMP3670/6670 Programming Assignment 2 – Clustering, Linear Regression and Gradient Descent

COMP3670/6670 Programming Assignment 2 – Clustering, Linear Regression and Gradient Descent

Enter Your Student ID:
Your Name:
Deadline: 23:59pm, 19 September, 2021
Submit: Write your answers in this file, and submit a single Jupyter Notebook file (.ipynb) on Wattle. Rename this file with your student number as ‘uXXXXXXX.ipynb’. Note: you don’t need to submit the .png or .npy files.
Enter Discussion Partner IDs Below: You could add more IDs with the same markdown format above.
Programming Section:
• 1: 30%
• 2: 40%
• 3: 30%
In [1]:
import numpy as np
import matplotlib.pyplot as plt
import math

np.random.seed(1)

Task1: Clustering

These programming exercises will focus on K-means clustering.
If you’re unsure of how k-means works, read this very helpful and freely available online breakdown from Stanford’s CS221 course; https://stanford.edu/~cpiech/cs221/handouts/kmeans.html
This assignment requires you to loosely interpret how k-means is a specific case of a more general algorithm named Expectation Maximisation. This is explained toward the end of the above article.
First, lets loading the dataset.
In [2]:
X = np.load(“./data_clustering.npy”)
plt.scatter(X[:,0], X[:,1])
plt.show()

K-means is a special, simple case of the Expectation Maximisation (EM) algorithm.
This simplified EM (k-means), is divided into two steps.
The E-Step, where for every sample in your dataset you find which “centroid” that datapoint is closest to that sample, and record that information.
The M-Step, where you move each “centroid” to the center of the samples which were found to be closest to it in the E-Step.
Each centroid is simply an estimated mean of a cluster. If you have $1$ centroid, then this centroid will become the mean of all your data.
Centroids are initially random values, and the k-means algorithm attempts to modify them so that each one represents the center of a cluster.
We have implemented a centroids initialization function.
In [3]:
def initialise_parameters(m, X):
C = X[np.random.choice(X.shape[0], m)]
return C

C = initialise_parameters(4, X)
print(C)

[[ 0.55638768 1.19083041]
[ 0.99468733 -0.63105385]
[-0.80861347 -0.47487527]
[ 0.83443335 0.7038998 ]]

Now let’s implement K-Means algorithm.

TASK 1.1: Create a function $E\_step(C, X) = L$, where $L$ is a matrix of the same dimension of the dataset $X$.
This function is is the E-Step (or “assignment step”) mentioned earlier.

HINT:
• https://stanford.edu/~cpiech/cs221/handouts/kmeans.html
• https://en.wikipedia.org/wiki/K-means_clustering#Standard_algorithm
• Each row of $L$ is a centroid taken from $C$.
In [ ]:
def E_step(C, X):
# YOUR CODE HERE

L = E_step(C, X)
plt.scatter(L[:, 0], L[:, 1])
plt.show()

TASK 1.2: Create a function $M\_step(C, X, L) = C$ which returns $C$ modified so that each centroid in $C$ is placed in the middle of the samples assigned to it. This is the M-Step.
In other words, make each centroid in $C$ the average of all the samples which were found to be closest to it during the E-step. This is also called the “update step” for K-means.

HINT: https://docs.scipy.org/doc/numpy/reference/generated/numpy.array_equal.html
In [ ]:
def M_step(C, X, L):
# YOUR CODE HERE

print(‘Before:’)
print(C)
print(‘\nAfter:’)
new_C = M_step(C, X, L)
print(new_C)

TASK 1.3: Implement $kmeans(X, m, i) = C, L$ which takes a dataset $X$ (of any dimension) and a scalar value $m$, and uses the previous 3 functions you wrote to:
• generate $m$ centroids.
• iterate between the E and M steps $i$ times (ie, it iterates $i$ times) to classify the $m$ clusters.
…and then returns:
• $C$, the centers of the $m$ clusters after $i$ iterations.
• $L$, the labels (centroid values) assigned to each sample in the dataset after $i$ iterations.

HINT: Using initialise_parameters to initial centroid
In [ ]:
def kmeans(X, m, i):
# YOUR CODE HERE

#CODE TO DISPLAY YOUR RESULTS. DO NOT MODIFY.
C_final, L_final = kmeans(X, 4, 10)
print(‘Initial Parameters:’)
print(C)
print(‘\nFinal Parameters:’)
print(C_final)

def allocator(X, L, c):
cluster = []
for i in range(L.shape[0]):
if np.array_equal(L[i, :], c):
cluster.append(X[i, :])
return np.asarray(cluster)

colours = [‘r’, ‘g’, ‘b’, ‘y’]
for i in range(4):
cluster = allocator(X, L_final, C_final[i, :])
plt.scatter(cluster[:,0], cluster[:,1], c=colours[i])
plt.show()

Your answer should like this, maybe with different colors: 

TASK 1.4: Explain how do you find the number of centroids

Answer:

Task 2: Linear Regression and Gradient Descent

For exercise 2, we’re going to implement multiple target batch linear regression with mean squared loss,
$$\mathcal{L} = \frac{1}{2 m} \sum_{i = 0}^{m} \mid \mid x_i\theta – y_i \mid \mid^2$$
.
For the following questions:
• $x \in \mathbb{R}^{m}$ is the vector directly representing input features from the provided dataset. Each row of $x$ is a single training example.
• $X \in \mathbb{R}^{m \times n}$ is the constructed feature matrix (e.g. polynomial features) used for learning. Each row of $X$ is a single training example.
• $\theta$ is our parameters. In the linear regression you’ve seen thus far, this is a vector. However, as we’re doing multiple target linear regression, $\theta$ will be a matrix.
• $y \in \mathbb{R}^{m}$ is a matrix of the target values we’re trying to estimate for each row of $X$. Each row $i$ of $X$ corresponds to row $i$ of $Y$.
• $m$ is the number of training examples.
• $n$ is the dimensionality of one training example.
Typically when people think of linear regression, they think of a mapping from $\mathbb{R}^n \rightarrow \mathbb{R}$, where they’re trying to predict a single scalar value.

First, we load the data.
In [5]:
x_train, x_val, y_train, y_val = np.load(“./data_regression.npy”)
plt.plot(x_train,y_train,’o’) ## YOUR CODE HERE
plt.xlabel(‘x’)
plt.ylabel(‘y’)
plt.title(“Training data”)
plt.ylim([-1,3])
plt.show()

It is obvious that it is not a good idea to perform linear regression directly on the input feature x. We need to add polynomial features. Lets construct an appropriate feature vector.

Task 2.1: Complete the get_polynomial_features function with the following specifications.
• Input1: an array x of shape $(m,1)$.
• Input2: degree of the polynomial (integer greater than or equal to one).
• Output: matrix of shape $(m,degree+1)$ consisting of horizontally concatenated polynomial terms.
• Output: the first column of output matrix should be all ones.

In [ ]:
def get_polynomial_features(x,degree=5):
# YOUR CODE HERE

# get polynomial features
X_train = get_polynomial_features(x_train,degree=5)

Let us implement gradient descent to find the optimal $\theta$.

TASK 2.2: Write a function $initialise\_parameters(n) = \theta$, where $\theta$ is the parameters we will use for linear regression $X\theta = Y$ for $X \in \mathbb{R}^{m \times n}, Y \in \mathbb{R}^{m}$.
The values of $\theta$ should be randomly generated. You will be judged on whether the matrix $\theta$ is correctly constructed for this problem.

HINT: $\theta$ should be an array of length $n$.
In [ ]:
def initialise_parameters(n):
# YOUR CODE HERE

# initialize theta
theta = initialise_parameters(X_train.shape[1])
print(theta)

TASK 2.3: Implement a function $ms\_error(X, \theta, y) = err$, which gives the mean squared error over all $m$ training examples.

In [ ]:
def ms_error(X, theta, y):
# YOUR CODE HERE

print(ms_error(X_train, theta, y_train))

TASK 2.4: Implement $grad(X, \theta, Y) = g$, a function that returns the average gradient ($\partial \mathcal{L}/\partial {\theta}$) across all the training examples $x_i \in \mathbb{R}^{1 \times n}$.

HINT:
• The gradient should be an array with same length as $\theta$.
• https://www.sharpsightlabs.com/blog/numpy-sum/
• https://docs.scipy.org/doc/numpy/reference/generated/numpy.multiply.html
• https://docs.scipy.org/doc/numpy/reference/generated/numpy.tile.html
In [ ]:
def grad(X, theta, Y):
# YOUR CODE HERE

print(grad(X_train, theta, y_train))

TASK 2.5: Implement $batch\_descent(X, Y, iterations, learning\_rate) = \theta, L$, a function which implements batch gradient descent returning $\theta$ (parameters which estimate $Y$ from $X$), and $L$.
$iterations$ is the number of gradient descent iterations to be performed.
$learning\_rate$ is, of course, the learning rate.
$L$ is a matrix recording the mean squared error at every iteration of gradient descent. It will be an array of length $iterations$.
You should use the functions you completed earlier to complete this.

HINT:
• Remember, the point of gradient descent is to minimise the loss function.
• It does this by taking “steps”. The gradient always points in the steepest direction uphill, so by stepping in the opposite direction of the gradient we move toward the value of $\theta$ that minimises the loss function.
In [ ]:
def batch_descent(X, Y, iterations, learning_rate):
# YOUR CODE HERE

#REPORTING CODE. YOU MAY NEED TO MODIFY THE LEARNING RATE OR NUMBER OF ITERATIONS
new_theta, L = batch_descent(X_train, y_train, 5000, 0.5)
plt.plot(L)
plt.title(‘Mean Squared Error vs Iterations’)
plt.show()
print(‘New Theta: \n’, new_theta)
print(‘\nFinal Mean Squared Error: \n’, ms_error(X_train, new_theta, y_train))

Task 3: Regularization and Model Selection

In task 2, we focussed on using gradient descent to do linear regression with a polynomial of degree 5.
Next, we would try to select a model that gives best performance on the val set.

Task 3.1: Visualize the prediction curves for different choice of degree polynomial features, by completing the code below.
• You can use the closed form solution or gradient descent for computing $\theta$.
• Compute the predictions on val data using x_val and computed $\theta$.

In [ ]:
def get_theta(X,y):
# YOUR CODE HERE

def get_prediction(X,theta):
# YOUR CODE HERE

for degree in range(1,10):
# prepare train/val data
X_train = get_polynomial_features(x_train,degree=degree)
x_val = np.linspace(-0.7, 0.8, x_val.shape[0])
X_val = get_polynomial_features(x_val,degree=degree)

# get theta
theta = get_theta(X_train,y_train)

# compute predictions on train/val set
pred_y_train = get_prediction(X_train,theta)
pred_y_val = get_prediction(X_val,theta)

# plot results
plt.plot(x_train,y_train,’o’,label=’train’)
plt.plot(x_val,pred_y_val,label=’val’)
plt.legend()
plt.xlabel(‘x’)
plt.ylabel(‘y’)
plt.title(“polynomial degree: {}”.format(degree))
plt.ylim([-1,3])
plt.show()

Task 2.2: Draw the train, val loss curve for different degree polynomials by completing the following code.

In [ ]:
x_train, x_val, y_train, y_val = np.load(“./data_regression.npy”)
# store train/val loss values
train_loss,val_loss = [],[]

for degree in range(1,10):
# prepare train/val data
X_train = # YOUR CODE HERE
X_val = # YOUR CODE HERE

# get theta
theta = get_theta(# YOUR CODE HERE)

# compute train/val losses
train_loss.append(# YOUR CODE HERE)
val_loss.append(# YOUR CODE HERE)

plt.plot(range(1,10),train_loss,’-o’,label=’train loss’)
plt.plot(range(1,10),val_loss,’-o’,label=’val loss’)
plt.xticks(range(1,10))
plt.legend()
plt.title(‘Train/Val Loss vs polynomial order’)
plt.show()

Task 2.3: What is the best choice for degree of polynomial features suitable for this problem?

Answer: