CS6735 Programming Project
Conduct an experimental study on the following machine learning algorithms: (1) ID3; (2) Adaboost on ID3; (3) Random Forest; (4) Naïve Bayes; (5) K-nearest neighbors (kNN).
Implement the five algorithms using Java or Python.
Evaluate your implementation on the datasets in data.zip (downloadable from course website) using 10 times 5-fold cross-validation, and report the average accuracy and standard deviation. All datasets are for UCI machine learning repository. You can check the detailed descriptions from the following link:
HYPERLINK “http://www.ics.uci.edu/~mlearn/MLRepository.html” http://www.ics.uci.edu/~mlearn/MLRepository.html
For breast cancer data see:
HYPERLINK “http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)”http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
For car data see: HYPERLINK “http://archive.ics.uci.edu/ml/datasets/Car+Evaluation”http://archive.ics.uci.edu/ml/datasets/Car+Evaluation
For ecoli data see: HYPERLINK “http://archive.ics.uci.edu/ml/datasets/Ecoli”http://archive.ics.uci.edu/ml/datasets/Ecoli
For letter recognition data see:
HYPERLINK “http://archive.ics.uci.edu/ml/datasets/Letter+Recognition”http://archive.ics.uci.edu/ml/datasets/Letter+Recognition
For mushroom data see: HYPERLINK “http://archive.ics.uci.edu/ml/datasets/Mushroom”http://archive.ics.uci.edu/ml/datasets/Mushroom
For each data set, there is a target variable, the one your model predicts. The following are the target variable for each data set.
Mushroom: first column (e, p)
Letter: first column (A, B, …)
Ecoli: last column (cp, im, ..)
Car: last column (acc, uacc, ..)
Breast-cancer: last column (2, 4)
Compare and discuss your algorithms (implementations) based on your experimental results.
Submission:
Hand in a report of your experimental study via Desire2Learning, including:
Description of the learning algorithms you implement.
Description of the datasets you use (number of examples, number of attribute, number of classes, type of attributes, etc.).
Technical details of your implementation: pre-processing of data sets (discretization, etc.), parameter setting, etc.
Design of your programming implementation (data structures, overall program structure).
Report and analysis of your experimental results.
Submit your code via Desire2Learning.
Deadline:
Submit your report and source code via D2L no later than 11:59pm, April 15, Thursday, 2021.