Assignment 5
Due: 3/8
Note: Show all your work.
Problem 1 (10 points) Consider the following confusion matrix.
Note: C1 is positive and C2 is negative.
Compute sensitivity, specificity, precision, accuracy, F-meassure, F2, and MCC measures.
Problem 2 (10 points) Suppose you built two classifier models M1 and M2 from the same training dataset and tested them on the same test dataset using 10-fold cross- validation. The error rates obtained over 10 iterations (in each iteration the same training and test partitions were used for both M1 and M2) are given in the table below. Determine whether there is a significant difference between the two models using the statistical method discussed that we discussed in the class (this method is also discussed in Section 8.5.5, pp 372-373 of the textbook). Use a significance level of 5%. If there is a significant difference, which one is better?
predicted class
actual class
C1
C2
C1
437
42
C2
213
829
Iteration
M1
M2
1
0.20
0.21
2
0.09
0.13
3
0.03
0.05
4
0.13
0.18
5
0.19
0.07
6
0.12
0.05
7
0.2
0.1
8
0.18
0.07
9
0.20
0.07
10
0.15
0.06
Note: When you calculate var(M1 – M2), calculate a sample variance (not a population variance).
Problem 3 (10 points). The following table shows a test result of a classifier on a dataset.
Tuple_id
Actual Class
Probability
1
P
0.63
2
N
0.72
3
N
0.53
4
P
0.74
5
P
0.87
6
P
0.89
7
N
0.83
8
P
0.60
9
N
0.70
10
P
0.98
Problem 3-1. For each row, compute TP, FP, TN, FN, TPR, and FPR.
Problem 3-2. Plot the ROC curve for the dataset. You must draw the curve yourself (i.e., don’t use Weka, R, or other software to generate the curve).
Problem 4 (10 points). This is a practice of comparing performance of classifier models using ROC curves. You can plot ROC curves using Weka Knowledge Flow. How to use Knowledge Flow is described in Chapter 7 of the Weka manual posted on Blackboard. Following the instruction in the manual (especially Section 7.4.2), build and test Logistic and RandomForest classifiers on crx-data.arff dataset, and capture the screenshot which shows two ROC curves. Include this screenshot in your submission. Compare and discuss the performance of the two models using the ROC curves.
Submission:
Include all answers in a single file and name it LastName_FirstName_HW5.EXT. Here, “EXT” is an appropriate file extension (e.g., docx or pdf). If you have multiple files, then combine all files into a single archive file. Name the archive file as LastName_FirstName_HW5.EXT. Here, “EXT” is an appropriate archive file extension (e.g., zip or rar). Upload the file to Blackboard.