程序代写代做代考 Tutorial_05_Tasks

Tutorial_05_Tasks

August 22, 2018

QBUS6850 – Machine Learning for Business

1 Tutorial 4

1.1 Task 1 – Spam Email Classification

1. Download the spambase.txt data
2. Load the data and split into training and test sets (75/25)
3. Build a KNN classifier. Use cross validation on the training set to estimate the best k
4. Plot the confusion matrix using your final trained model and the test data
5. Predict whether email 647 is spam or not and print the result

Notes: – Data description available here http://sci2s.ugr.es/keel/dataset.php?cod=109

1.2 Task 2 – Clustering Hand Written Digits

1. Cluster the digits dataset from sklearn using k-means
2. Cluster the digits using another method from the sklearn clustering user guide http://scikit-

learn.org/stable/modules/clustering.html.
3. Compare clustering accuracy of k-means and your chosen method using mutual information

score and homgeneity score.

Notes: – Some methods can be computationally prohibitive but there are solutuions. For ex-
ample If you choose spectral clustering make sure to set the affinity = ’nearest neighbours’. – Some
methods listed are semi-supervised clustering methods. They require you to provide a small num-
ber of training or exemplar samples.

1

Tutorial 4
Task 1 – Spam Email Classification
Task 2 – Clustering Hand Written Digits