Q2¶
You might remember the linearly inseparable dataset from the clustering assignment. Below, we’re re-using it (see D2_temp, and X which is the final dataset), only now the datapoints all come labelled in the matrix $Z$.
You’re going to try and predict $Z$ given $X$.
Look at how the below data is constructed. It is not only linearly inseparable, but $X$ is composed of thousands of irrelevant variables that have no impact on $Z$.
In [10]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D #This is for 3d scatter plots.
import math
import random
from scipy.stats import multivariate_normal
import os
from matplotlib.pyplot import imread
np.random.seed(13579201)
def gen_D2Y2(m):
D2 = np.random.randn(m, 2)
Y2 = np.zeros((m, 2))
for r in range(D2.shape[0]):
s = np.linalg.norm(D2[r, :])
if s > 1.85:
Y2[r, 0] = 1
elif s < 0.3:
Y2[r, 1] = 1
return D2, Y2
m2 = 5000
D2_temp, Z = gen_D2Y2(m2)
colours = ['g', 'b']
def allocator(D2, Z, j):
cluster = []
for i in range(D2.shape[0]):
if Z[i, j]:
cluster.append(D2[i, :])
return np.asarray(cluster)
for i in range(2):
cluster = allocator(D2_temp, Z, i)
plt.scatter(cluster[:,0], cluster[:,1], c=colours[i])
plt.show()
left = np.random.randint(0, 10000)
right = np.random.randint(0, 10000)
X = np.hstack([np.random.randn(m2, left), D2_temp, np.random.randn(m2, right)])
print('The shape of X is', X.shape)

The shape of X is (5000, 13505)
So, as stated above you must predict $Z$ using only the matrix $X$.
You can re-use any of your code from previous assignments.
You will need to:
1. Understand how $X$ and $Z$ are constructed.
2. Train at least 1 model to perform prediction.
3. Complete the predict function below such that it can predict $Z$.
You will likely need to use all of your knowledge from this course.
TASK 2.1: Complete the functions below such that $predict\_Z(X) = Z\_prediction$.
If you succeed at training a model such that $predict\_Z(X) = Z\_prediction$, and the distance between $Z$ and $Z\_predicton$ is minimised, you will get the marks.
You cannot use the variables D2_temp, left, or right. You must predict $Z$ using $X$ by training a model on $X$ and $Z$.
You can train any models you wish, or create any additional functions you wish.
Unlike every other question in this course, you are also now allowed to use any of the library functions in numpy and, if you wish, scikitlearn.
HINT:
• You can add the parameters your model learns to $predict\_Z(X)$. No, you cannot add $Z$ as a parameter to $predict\_Z(X)$.
• You can combine models, add dimensions or split the dataset, using your knowledge of how the dataset is constructed.
• You can and should re-use any code you've written in previous assignments.
• Remember, there are only 2 relevant dimensions of $X$, and these 2 dimensions are hidden somewhere randomly determined.
• Depending which models you use, you might need to add an additional dimension to make the data linearly separable (assuming you use something like logistic regression which requires linear separability).
• I encourage you to discuss this problem on the forums and with your friends. You must implement your own solution (do not share code), but feel free to share ideas.
In [11]:
#CREATE ANY ADDITIONAL FUNCTIONS YOU WISH HERE
def predict_Z(X):
Z_prediction = np.zeros((X.shape[0], 2))
#YOUR CODE HERE
return Z_prediction
#Display Code. Leave it alooooooooooone.
Z_prediction = predict_Z(X)
summed_error = np.sum((Z_prediction - Z).T@(Z_prediction - Z))
print(summed_error)
1111.0
In [ ]:
In [ ]:
In [ ]: