Tutorial_08_tasks_solutions
Tutorial 08 Tasks Solution¶
Copyright By PowCoder代写 加微信 powcoder
import numpy as np
import pandas as pd
from sklearn import neighbors
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import cross_val_score
spam_df = pd.read_csv(“spambase.txt”)
# Load the data and split into training and test sets (75/25)
trainData, testData, trainLabels, testLabels = train_test_split(spam_df.iloc[:,0:-1], spam_df.iloc[:,-1], test_size=0.25, random_state=42)
# Store cv score for each k
cv_scores = []
k_vals = []
Build a KNN classifier. Use cross validation on the training set to estimate the best k.
for k in range(1, 30, 2):
model = neighbors.KNeighborsClassifier(n_neighbors=k)
scores = cross_val_score(model, trainData, trainLabels, cv=10, scoring=’accuracy’)
score = scores.mean()
print(“k={0}, cv_score={1:.2f}”.format(k, score * 100))
cv_scores.append(score)
k_vals.append(k)
k=1, cv_score=80.85
k=3, cv_score=79.29
k=5, cv_score=79.20
k=7, cv_score=78.42
k=9, cv_score=78.30
k=11, cv_score=77.75
k=13, cv_score=77.69
k=15, cv_score=77.00
k=17, cv_score=76.73
k=19, cv_score=76.59
k=21, cv_score=76.21
k=23, cv_score=76.01
k=25, cv_score=75.20
k=27, cv_score=75.31
k=29, cv_score=74.91
Find best performing k
idx = np.argmax(cv_scores)
print(“k={0} achieved highest accuracy of {1:.2f}”.format(k_vals[idx], cv_scores[idx] * 100))
k=1 achieved highest accuracy of 80.85
Plot the confusion matrix using your final trained model and the test data
model = neighbors.KNeighborsClassifier(n_neighbors = k_vals[idx])
model.fit(trainData, trainLabels)
predictions = model.predict(testData)
print(confusion_matrix(testLabels, predictions))
[[570 95]
[104 381]]
Predict whether email 647 is spam or not and print the result
pred_647 = model.predict(spam_df.iloc[647,0:-1].values.reshape(1, -1))[0]
if pred_647:
print(“Email 647 is spam”)
print(“Email 647 is not spam”)
Email 647 is spam
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com