Microsoft Word – CS504_hw5.doc
CS504 Fall 2018 Homework 5
Due: 12/07/2018 by 11:59PM
The goal of this homework is to analyze the performance of classification algorithms on several real and synthetic data sets. This will be done in the following steps:
- First, you will explore the data sets.
- Next, you will perform a series of experiments on which you will be asked to answer a series of questions. For these experiments, you will be using the Weka data mining software.
- Compile your answers in the form of a report.
Datasets and corresponding descriptions are provided with this homework.
Datasets |
a.arff |
b.arff |
c.arff |
sick.arff |
hepatitis.arff |
Weka assumes by default that the class attribute is the last column. You need to change this for the ‘hepatitis’ data set.
Data Exploration
• Visually explore the data sets, and describe the following for each data set o types of attributes
o class distribution
o which attributes appear to be good predictors, if any
o possible correlation between attributes
o any special structure that you might observe
Experiments
- Use the F-measure to measure performance.
- Use the default parameters for all the classifiers in Weka, unless specified otherwise.
- Experiment 1: Run Decision Trees(J48), Bayes(NaiveBayes), KNN(IBk) k=1 and k=21 respectively on data sets ‘a’ and ‘b’
o For each classifier, compare its performance obtained on data set ‘a’ to its
performance obtained on data set ‘b’.
o For data set ‘a’, compare the performance of the 4 classifiers.
o Give explanations for your observations above.
• Experiment 2: Run J48, NaiveBayes, IBk k=1 and k=10 on ‘c’ o Compare the performance of the 4 classifiers.
§ Comment on the effect of k.
o Give explanations for your observations above.
• Experiment 3: Run NaiveBayes, IBk k=3 and J48 on ‘sick’
o Compare the performance of J48 and NaiveBayes.
o Compare the performance of J48 and IBk.
o Give explanations for your observations above.
• Experiment 4: Run IBk k=3 and NaiveBayes on ‘hepatitis’
o Compare the performance of the two classifiers.
o Give explanations for your observations above.
- Collect output from your experiments. Report
- Write a report addressing the above questions in data exploration and experiments.
- Report limit: pdf format, no longer than 6 pages, no smaller than 11pt font.