代写代考

Before you start working on the exercise¶
Use Python version 3.7 up to 3.9. Make sure not to use Python 3.10
It is highly recommended to create a virtual environment for this course. You can find resources on how to create a virtual environment on the ISIS page of the course.

Copyright By PowCoder代写 加微信 powcoder

Make sure that no assertions fail or exceptions occur, otherwise points will be subtracted.
Use all the variables given to a function unless explicitly stated otherwise. If you are not using a variable you are doing something wrong.
Read the whole task description before starting with your solution.
After you submit the notebook more tests will be run on your code. The fact that no assertions fail on your computer locally does not guarantee that you completed the exercise correctly.
Please submit only the notebook file with its original name. If you do not submit an ipynb file you will fail the exercise.
Edit only between YOUR CODE HERE and END YOUR CODE.
Verify that no syntax errors are present in the file.
Before uploading your submission, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel\Restart) and then run all cells (in the menubar, select Cell\Run All).

import sys

if (3,7) <= sys.version_info[:2] <= (3, 9): print("Correct Python version") print(f"You are using a wrong version of Python: {'.'.join(map(str,sys.version_info[:3]))}") # This cell is for grading. DO NOT remove it # Use unittest asserts import unittest; t = unittest.TestCase() from pprint import pprint Exercise Sheet 2: Timing, Numpy, Plotting¶ The previous exercise sheet introduced several methods for classification: decision trees, nearest neighbors, and nearest means. Of those, the one that could learn from the data, and that also offered enough complexity to produce an accurate decision function was k-nearest neighbors. However, nearest neighbors can be slow when implemented in pure Python (i.e. with loops). This is especially the case when the number of data points or input dimensions is large. In this exercise sheet, we will speed up nearest neighbors by utilizing numpy and scipy packages. Your task will be to replace list-based operations by vector-based operations between numpy arrays. The speed and correctness of the implementations will then be tested. In particular, performance graphs will be drawn using the library matplotlib. Make sure to have installed all the required packages (numpy, scipy). For this you can use conda install or pip install . Make sure that you restart the Jupyter Server after you have installed any packages.

import numpy
import scipy
except ImportError:
print(“Please install NumPy and SciPy using the instructions above.”)
numpy_version = tuple(map(int, numpy.__version__.split(“.”)))
scipy_version = tuple(map(int, scipy.__version__.split(“.”)))
if numpy_version >= (1, 18, 0):
print(“NumPy version ok!”)
print(“Your NumPy version is too old!!!”)

if scipy_version >= (1, 6, 0):
print(“SciPy version ok!”)
print(“Your SciPy version is too old!!!”)

Before starting the homework sheet we recommend you finish these warm-up tasks. They won’t get you any points but should help you get familiar with Numpy.

import numpy as np
np.random.seed(0) # seed for reproducibility

x1 = np.random.randint(10, size=6) # random one-dimensional integer array
x2 = np.random.randint(10, size=(5, 4)) # random two-dimensional integer array

Shape of an Array (0 P)¶
Write a function that returns the number of rows and the number of columns of an array.

Use the attribute .shape that every numpy array has.

def array_shape(array):
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE

return number_of_rows, number_of_columns

# Test array_shape function
x1_number_of_rows, x1_number_of_columns = array_shape(x1)
x2_number_of_rows, x2_number_of_columns = array_shape(x2)

t.assertEqual(x1_number_of_rows, 6)
t.assertEqual(x1_number_of_columns, 0)
t.assertEqual(x2_number_of_rows, 5)
t.assertEqual(x2_number_of_columns, 4)

Indexing (0 P)¶
Return subarrays of the given arrays according to the conditions. Use array indexing e.g. x1[1:5:-2] instead of loops or hardcoding the solutions.

Save the second to last element of x1 in the variable x1_second_to_last.
Save a subarray that has every other element of x1 in the variable x1_every_other_element.
Save a reversed x1 in the variable x1_reversed.
Save the element in row 3 and column 2 of x2 in the variable x2_element_in_row_3_and_column_2. Please note that since indexing starts at zero so row 3 is actually the forth row.
Save a subarray/matrix that contains rows 2 to 4 and columns 0 to 3 of x2 in the variable x2_rows_2_to_4_columns_0_to_3. In this case row 4 and column 3 should be INCLUDED.

Try not to use the shape or length of an array for this exercise

# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE

# Test indexing solutions
t.assertEqual(x1_second_to_last, 7)
np.testing.assert_allclose(x1_every_other_element, np.array((5, 3, 7)))
np.testing.assert_allclose(x1_reversed, np.array((9,7,3,3,0,5)))
t.assertEqual(x2_element_in_row_3_and_column_2, 5)
np.testing.assert_allclose(x2_rows_2_to_4_columns_0_to_3, np.array(((1,6,7,7),(8,1,5,9),(8,9,4,3))))

Broadcasting (0 P)¶
Understanding broadcasting is an important part of understanding numpy.

Using np.newaxis, turn array_a into a column-vector and save the result in the variable array_a_to_column_vector.
Add the one-dimensional array_a and the two dimensional array_b together. Do not use any function and only the + operator.
Add the one-dimensional array_a and the two dimensional array_c together. Now it is important to use broadcasting since the dimensions of the two arrays do not match: array_a.shape = (3,) and array_c.shape = (3,2). Addition would work if the shape of array_a would be (3,1).

array_a = np.ones(3)
array_b = np.arange(6).reshape((2,3))
array_c = np.arange(6).reshape((3,2))

# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE

# Test broadcasting solutions
np.testing.assert_allclose(array_a_to_column_vector, np.ones(3).reshape(3,1))
np.testing.assert_allclose(array_a_plus_array_b, np.array(((1,2,3),(4,5,6))))
np.testing.assert_allclose(array_a_plus_array_c, np.array(((1,2),(3,4),(5,6))))

# do not use numpy from now on

Python Nearest Neighbor¶
The most basic element of computation of nearest neighbors is its distance function relating two arbitrary data points x1 and x2. We assume that these points are iterable (i.e. we can use a loop over their dimensions). One way among others to compute the square Euclidean distance between two points is by computing the sum of the component-wise distances.

def pydistance(x1: ‘Vector’, x2: ‘Vector’) -> float:
Calculates the square eucledian distance between two data points x1, x2

x1, x2 (vector-like): Two vectors (ndim=1) for which we want to calculate the distance
`len(x1) == len(x2)` will always be True

float: The square eucleadian distance between the two vectors
return sum([(x1d – x2d) ** 2 for x1d, x2d in zip(x1, x2)])

x1, x2 = [1, 4, 3, 2], [4, 8, -2, 2]
print(f’pydistance({x1}, {x1}) –> {pydistance(x1, x1)}’)
print(f’pydistance({x1}, {x2}) –> {pydistance(x1, x2)}’)

where we use the prefix “py-” of the function to indicate that the latter makes use of pure Python instead of numpy. Once the distance matrix has been implemented, the nearest neighbor for a given unlabeled point u that we would like to classify is obtained by iterating over all points in the training set (X, Y), selecting the point with smallest distance to u, and returning its corresponding label. Here X denotes the list of inputs in the training set and Y denotes the list of labels.

def pynearest(u: list, X: list, Y: list, distance: callable = pydistance) -> int:
Applies the nearest neighbour to the input `u`
with training set `X` and labels `Y`. The
distance metric can be specified using the
`distance` argument.

u (list): The input vector for which we want a prediction
X (list): A 2 dimensional list containing the trainnig set
Y (list): A list containing the labels for each vector in the training set
distance (callable): The distance metric. By default the `pydistance` function

int: The label of the closest datapoint to u in X
xbest = None
ybest = None
dbest = float(‘inf’)

for x, y in zip(X, Y):
d = distance(u, x)
if d < dbest: return ybest Note that this function either uses function pydistance (given as default if the argument distance is not specified). Or one could specify as argument a more optimized function for distance compuation, for example, one that uses numpy. Finally, one might not be interested in classifying a single point, but many of them. The method below receives a collection of such unlabeled test points stored in the variable U. The function returns a list of predictions associated to each test point. def pybatch(U, X, Y, nearest=pynearest, distance=pydistance): Applies the nearest neighbor algorithm, to all the datapoints `u` $\in$ `U`, with `X` the training set and `Y` the labels. Both the distance metric and the method of finding the neearest neighbor can be specified. U (list): List of vectors for which a prediction is desired. X (list): A 2 dimensional list containing the trainnig set Y (list): A list containing the labels for each vector in the training set nearest (callable): The method by which the nearest neighbor search happens. distance (callable): The distance metric. By default the `pydistance` function list: A list of predicted labels for each `u` $\in$ `U` return [nearest(u, X, Y, distance=distance) for u in U] Again, such function uses by default the Python nearest neighbor search (with a specified distance function). However, we can also specified a more optimized nearest neighbor function, for example, based on numpy. Finally, one could consider an alternative function to pybatch that would use numpy from the beginning to the end. The implementation of such more optimized functions, and the testing of their correct behavior and higher performance will be the objective of this exercise sheet. Testing and correctness¶ As a starting point, the code below tests the output of the nearest neighbor algorithm for some toy dataset with fixed parameters. In particular, the function data.toy(M,N,d) generates a problem with M unlabeled test points stored in a matrix U of size (M x d), then N labeled training points stored in a matrix X of size (N x d) and the output label is stored in a vector Y of size N composed of zeros and ones encoding the two possible classes. The variable d denotes the number of dimensions of each point. The toy dataset is pseudo-random, that is, for fixed parameters, it produce a random-looking dataset, but every time the method is called with the same parameters, the dataset is the same. The pseudo-randomness property will be useful to verify that each nearest neighbor implementation performs the same overall computation. Please check the data.py file within the exercise folder for the implementation details. if 'data.py' not in os.listdir(): t.fail('Did you download the \'data.py\' file from ISIS?') import data U, X, Y = data.toy(20, 100, 50) print(f'Shape of U (unlabeled datapoints): {U.shape}') print(f'Shape of X (training set): {X.shape}') print(f'Shape of Y (labels): {Y.shape}') print(f'Predictions: {pybatch(U, X, Y)}') In particular, the output of this function will help us to verify that the more optimized numpy-based versions of nearest neighbor are still valid. Plotting and performance¶ We now describe how to build a plot that relates a certain parameter of the dataset (e.g. the number of input dimensions d to the time required for the computation. We first initialize the basic plotting environment. import matplotlib from matplotlib import pyplot as plt %matplotlib inline from IPython.display import set_matplotlib_formats set_matplotlib_formats('pdf', 'png') plt.rcParams['savefig.dpi'] = 90 The command "%matplotlib inline" tells IPython notebook that the plots should be rendered inside the notebook. The following code plots the computation time of predicting 100 points from the test set using a training set of size 100, and where we vary the number of input dimensions. The measurement of time happens with the timeit module. timeit provides many convinience functions for benchmarking. In particular the repeat function runs the provided code many times and returns the time it took to run it. You can find more information about repeat here import timeit from statistics import mean # Values for the number of dimensions d to test dlist = [1, 2, 5, 10, 20, 50, 100, 200, 500] # Measure the computation time for each choice of number of dimensions d tlist = [] for d in dlist: U, X, Y = data.toy(100, 100, d) # get the average of three runs delta = mean(timeit.repeat(lambda : pybatch(U,X,Y), number=1, repeat=3)) tlist.append(delta) # Plot the results in a graph fig = plt.figure(figsize=(5, 3)) plt.plot(dlist, tlist, '-o') plt.xscale('log'); plt.yscale('log'); plt.xlabel('d'); plt.ylabel('time'); plt.grid(True) The time on the vertical axis is in seconds. Note that the exact computation time depends on the speed of your computer. As expected, the computation time increases with the number of input dimensions. Unfortunately, for the small dataset considered here (100 training and test points of 100 dimensions each), the algorithm already takes more than one second to execute. Thus, it is necessary for practical applications (e.g. the digit recognition task that we will consider at the end of this exercise sheet) to accelerate this nearest neighbor algorithm. 1. Accelerating the distance computation (25 P)¶ In this first exercise, we would like to accelerate the function that compute pairwise distances. a) Implement the function npdistance(x1,x2) with the same output as pydistance(x1,x2), but that computes the squared Euclidean distance using numpy operations. Verify that in both cases (i.e. using either npdistance or pydistance in the function pybatch) the output for the above toy example with parameters M=20, N=100, d=50 (i.e. data.toy(20,100,50)) remains the same. Our goal with this exercise is to speed-up our code. In practice this means that we want to remove for loops from our code. Therefore if your implementation contains a for loop it will automatically be considered wrong and will receive 0 points. Similarlly Python functions that hide for loops such as map are also considered invalid for this exercise. Similarly, functions provided by numpy that hide for loops like vectorize and apply_along_axis are also not to be used. Note: The input vectors can be either np.ndarray or lists of floats. import numpy as np def npdistance(x1: 'vector-like', x2: 'vector-like') -> float:
Calculates the square eucledian distance between two data points x1, x2
using `numpy` vectorized operations

x1, x2 (vector-like): Two vectors (ndim=1) for which we want to calculate the distance
`len(x1) == len(x2)` will always be True

float: The distance between the two vectors x1, x2
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE

# Verify your function
x1, x2 = [0.,-1.,-2.], [2.,3.,4.]

dist_to_same = npdistance(x1, x1)
print(f’npdistance({x1}, {x1}) –> {dist_to_same}\n’)
expected_dist_to_same = 0.
t.assertAlmostEqual(dist_to_same, expected_dist_to_same,
msg=’The distance of a vector to itself should be 0′)

dist = npdistance(x1, x2)
print(f’npdistance({x1}, {x2}) –> {dist}’)
expected_dist = pydistance(x1, x2)
print(f’expected_dist –> {expected_dist}\n’)
t.assertAlmostEqual(dist, expected_dist)

U, X, Y = data.toy(20,100,50)

no_numpy = pybatch(U, X, Y, distance=pydistance)
print(f’no_numpy –> {no_numpy}’)

w_np_dist = pybatch(U, X, Y, distance=npdistance)
print(f’w_np_dist –> {w_np_dist}’)

np.testing.assert_allclose(no_numpy, w_np_dist)

b) Create a plot similar to the one above, but where the computation time required by both methods are shown in a superposed manner. Here, we fix M=100, N=100, and we let d vary from 1 to 500, taking the list of values [1, 2, 5, 10, 20, 50, 100, 200, 500]. Your plot should show a quisi-constant runtime for the pybarch call using the npdistance function, compared to pydistance.

# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE

c) Based on your results, explain what kind of speedup numpy provides, and in what regime do you expect the speedup to be the most important:

Note: For this exercise you only need to provide a free text answer

Explain the speedup that numpy provides¶

2. Accelerating the nearest neighbor search (25 P)¶
Motivated by the success of the numpy optimized distance computation, we would like further accelerate the code by performing nearest neighbor search directly in numpy.

a) Implement the function npnearest(u,X,Y) as an alternative to the function pynearest(u,X,Y,distance=npdistance) that we have used in the previous exercise. Again, verify your function for the same toy example as before (i.e. data.toy(20,100,50)).

Unlike pynearest, npnearest doesn’t receive any distance argument. npnearest will work only with square eucledian distance. If you are confident that your npdistance implementation can work between a vector and a matrix, you are welcome to reuse it. It is however, perfectly acceptable to reimplement the distance algorithm in this function again.

Once again the use of for loops, or functions like map or vectorize is stictly not allowed in this exercise.

def npnearest(u: np.ndarray, X: np.ndarray, Y: np.ndarray, *args, **kwargs):
Finds x1 so that x1 is in X and u and x1 have a minimal distance (according to the
provided distance function) compared to all other data points in X. Returns the label of x1

u (np.ndarray): The vector (ndim=1) we want to classify
X (np.ndarray): A matrix (ndim=2) with training data points (vectors)
Y (np.ndarray): A vector containing the label of each data point in X
args, kwargs : Ignored. Only for compatibility with pybatch

int: The label of the data point which is closest to `u`
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE

TINY_U, TINY_X, TINY_Y = data.toy(3,3,3)
tiny_u = TINY_U[0]
print(‘u’)
pprint(tiny_u)
print(‘\nX’)
pprint(TINY_X)
print(‘\nY’)
pprint(TINY_Y)

np_nearest = npnearest(tiny_u, TINY_X, TINY_Y)
expected_nearest = pynearest(tiny_u, TINY_X, TINY_

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com