weka机器学习代写 CS504 Fall 2018 Homework 5

Microsoft Word – CS504_hw5.doc

CS504 Fall 2018 Homework 5

Due: 12/07/2018 by 11:59PM

The goal of this homework is to analyze the performance of classification algorithms on several real and synthetic data sets. This will be done in the following steps:

  1. First, you will explore the data sets.
  2. Next, you will perform a series of experiments on which you will be asked to answer a series of questions. For these experiments, you will be using the Weka data mining software.
  3. Compile your answers in the form of a report.

Datasets and corresponding descriptions are provided with this homework.

Datasets
a.arff
b.arff
c.arff
sick.arff
hepatitis.arff

Weka assumes by default that the class attribute is the last column. You need to change this for the ‘hepatitis’ data set.

Data Exploration

• Visually explore the data sets, and describe the following for each data set o types of attributes

o class distribution
o which attributes appear to be good predictors, if any o possible correlation between attributes
o any special structure that you might observe

Experiments

  • Use the F-measure to measure performance.
  • Use the default parameters for all the classifiers in Weka, unless specified otherwise.
  • Experiment 1: Run Decision Trees(J48), Bayes(NaiveBayes), KNN(IBk) k=1 and k=21 respectively on data sets ‘a’ and ‘b’

o For each classifier, compare its performance obtained on data set ‘a’ to its

performance obtained on data set ‘b’.
o For data set ‘a’, compare the performance of the 4 classifiers. o Give explanations for your observations above.

• Experiment 2: Run J48, NaiveBayes, IBk k=1 and k=10 on ‘c’ o Compare the performance of the 4 classifiers.

§ Comment on the effect of k.
o Give explanations for your observations above.

• Experiment 3: Run NaiveBayes, IBk k=3 and J48 on ‘sick’ o Compare the performance of J48 and NaiveBayes. o Compare the performance of J48 and IBk.
o Give explanations for your observations above.

• Experiment 4: Run IBk k=3 and NaiveBayes on ‘hepatitis’ o Compare the performance of the two classifiers.
o Give explanations for your observations above.

  • Collect output from your experiments. Report
  • Write a report addressing the above questions in data exploration and experiments.
  • Report limit: pdf format, no longer than 6 pages, no smaller than 11pt font.