程序代写

Live Coding Wk4 – Lecture 10 – Nearest Neighbors¶

For this demo we will be exploring how to do classification using the K Nearest Neighbors algorithm. Lets take a bit of a jump back into the past and look at Lab 02 and look at some flowers.

Copyright By PowCoder代写 加微信 powcoder

### Imports and data you will need
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split

# Some thing which should be a bit familiar
data = pd.read_csv(‘data/IRIS.csv’)
data.head()

sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa

Problem: This seems familiar¶
You are still an intern at the ANU, under the of Environment and Society. However, you have just received a bit a of a stern talking-to about presenting a method for categorising the types of flowers by using a table of print values! So lets use a more robust method for classifying these flowers.

A better first look¶
Numbers can only get us so far. Lets first use the techniques of visualisation we learnt in Lab 03 to see if there are any obvious separation of specifies for:

The dimensions of the “sepal”;
And, the dimensions of the “petal”.

I’ll provide the plotting code, you just need to partition the data accordingly:

# For you to do
setosa_df = None # TODO
versicolor_df = None # TODO
virginica_df = None # TODO

setosa_df = data[data.species == “Iris-setosa”] # TODO
versicolor_df = data[data.species == “Iris-versicolor”] # TODO
virginica_df = data[data.species == “Iris-virginica”] # TODO

# Simple plotting
plt.figure(figsize=[14,6])

ax1 = plt.subplot(121)
plt.scatter(setosa_df.sepal_length, setosa_df.sepal_width, c=’r’, alpha=0.5)
plt.scatter(versicolor_df.sepal_length, versicolor_df.sepal_width, c=’b’, alpha=0.5)
plt.scatter(virginica_df.sepal_length, virginica_df.sepal_width, c=’g’, alpha=0.5)
plt.title(“Classification of Sepal Dimensions”)
plt.xlabel(“sepal_length”)
plt.ylabel(“sepal_width”)

ax2 = plt.subplot(122)
plt.scatter(setosa_df.petal_length, setosa_df.petal_width, c=’r’, alpha=0.5, label=’Iris-setosa’)
plt.scatter(versicolor_df.petal_length, versicolor_df.petal_width, c=’b’, alpha=0.5, label=’Iris-versicolor’)
plt.scatter(virginica_df.petal_length, virginica_df.petal_width, c=’g’, alpha=0.5, label=’Iris-virginica’)
plt.title(“Classification of Petal Dimensions”)
plt.xlabel(“petal_length”)
plt.ylabel(“petal_width”)
plt.legend()

plt.show()

Discussion: How we might want to separate these by hand?

Making a classifier¶
As we can see, there is some separation between the species of flowers just by looking at these two dimensional plots. Lets try make a classified using all the information we have using the sklean library. Take a look at KNeighborsClassifier.

?KNeighborsClassifier

Lets format the data first. First lets define some function to switch from indices to species (string) types.

# For you to define
def species_to_index(species_str):

def index_to_species(species_index):

# For you to define
def species_to_index(species_str):
if species_str == “Iris-setosa”:
elif species_str == “Iris-versicolor”:

def index_to_species(species_index):
if species_str == 1:
return “Iris-setosa”
elif species_str == 2:
return “Iris-versicolor”
return “Iris-virginica”

Now you can format the data and split it into training and testing:

# Columns we want to get values from
info_columns = [‘sepal_length’, ‘sepal_width’, ‘petal_length’, ‘petal_width’,’species_index’]

# For you to define
data[‘species_index’] = None # TODO

iris_array = data[info_columns].values
train_data, test_data = train_test_split(iris_array, train_size=0.6)

# Training Data
train_input = None # TODO input needs to be a 2D array
train_output = None # TODO output needs to be a 1D array

# Testing Data
test_input = None # TODO input needs to be a 2D array
test_output = None # TODO output needs to be a 1D array

Its now your job to define, train, and predict using KNeighborsClassifier. (with n_neighbors=3)

# Define our model
knn1 = None # TODO

# You need to train!

test_pred1 = None # TODO

Lets see what the plots of only test data now looks like.

# Simple plotting
plt.figure(figsize=[14,6])

colours = [‘r’, ‘b’, ‘g’]
correctly_pred1 = np.equal(test_pred1, test_output)

ax1 = plt.subplot(121)
for i in range(3):
s_indices_correct = (test_output == i) & correctly_pred1
plt.scatter(test_input[s_indices_correct, 0], test_input[s_indices_correct, 1],
c=colours[i], alpha=0.5)

s_indices_incorrect = (test_output == i) & (np.logical_not(correctly_pred1))
plt.scatter(test_input[s_indices_incorrect, 0], test_input[s_indices_incorrect, 1],
marker=’X’, c=colours[i], alpha=0.5)
plt.title(“Classification of Sepal Dimensions”)
plt.xlabel(“sepal_length”)
plt.ylabel(“sepal_width”)

ax2 = plt.subplot(122)
for i in range(3):
s_indices_correct = (test_output == i) & correctly_pred1
plt.scatter(test_input[s_indices_correct, 2], test_input[s_indices_correct, 3],
c=colours[i], alpha=0.5, label=index_to_species(i))

s_indices_incorrect = (test_output == i) & (np.logical_not(correctly_pred1))
plt.scatter(test_input[s_indices_incorrect, 2], test_input[s_indices_incorrect, 3],
marker=’X’, c=colours[i], alpha=0.5)
plt.title(“Classification of Petal Dimensions (k=3, p=2)”)
plt.xlabel(“petal_length”)
plt.ylabel(“petal_width”)
plt.legend()

plt.show()

Trying around¶
Alright, its now your turn to try and make another KNN classifier with n_neighbors=5 and p=1.

# Define our model
knn2 = None # TODO

# You need to train!

test_pred2 = None # TODO

# Simple plotting
plt.figure(figsize=[14,6])

colours = [‘r’, ‘b’, ‘g’]
correctly_pred2 = np.equal(test_pred2, test_output)

ax1 = plt.subplot(121)
for i in range(3):
s_indices_correct = (test_output == i) & correctly_pred2
plt.scatter(test_input[s_indices_correct, 0], test_input[s_indices_correct, 1],
c=colours[i], alpha=0.5)

s_indices_incorrect = (test_output == i) & (np.logical_not(correctly_pred2))
plt.scatter(test_input[s_indices_incorrect, 0], test_input[s_indices_incorrect, 1],
marker=’X’, c=colours[i], alpha=0.5)
plt.title(“Classification of Sepal Dimensions”)
plt.xlabel(“sepal_length”)
plt.ylabel(“sepal_width”)

ax2 = plt.subplot(122)
for i in range(3):
s_indices_correct = (test_output == i) & correctly_pred2
plt.scatter(test_input[s_indices_correct, 2], test_input[s_indices_correct, 3],
c=colours[i], alpha=0.5, label=index_to_species(i))

s_indices_incorrect = (test_output == i) & (np.logical_not(correctly_pred2))
plt.scatter(test_input[s_indices_incorrect, 2], test_input[s_indices_incorrect, 3],
marker=’X’, c=colours[i], alpha=0.5)
plt.title(“Classification of Petal Dimensions”)
plt.xlabel(“petal_length”)
plt.ylabel(“petal_width”)
plt.legend()

plt.show()

Some lose ends¶
Lets print some simple metric results, accuracy. Define the accuracy of the two models (correctly predicted vs total predictions).

def accuracy(model):
pass # TODO

# Accuracy
print(‘Accuracy (k=3, p=2):’, accuracy(knn1))
print(‘Accuracy (k=5, p=1):’, accuracy(knn2))

Discussion: Have we done anything wrong if we want to say something about the success of KNN using the above accuracy scores?

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com