Lecture 19
Evaluating Classifiers: Precision, Recall, and ROC Curves CS 189, Spring 2022 (CDSS offering)
4th March, 2022
Adapted from slides by
Copyright By PowCoder代写 加微信 powcoder
Previously Covered
Lecture 17, Evaluating Learned Models
– How do we decide what combination of model parameters and hyperparameters is optimal?
– Learned about risk, empirical risk, cross-validation
– (Very) brief recap of cross-validation: for k-fold cross validation, train data on k – 1 folds,
validate using suitable metric on k-th fold, average across all possible folds Lecture 18, Bias-variance and Decision Theory
– For classification, how does the problem setting affect the decision rule and the generalization error?
– Bayes optimal decision rule does not always minimise loss
Today¡¯s Lecture
– We¡¯ve made progress in evaluating models in the context of regression problems, but how do we evaluate classifiers?
– Focus on understanding why we cannot apply the same methods, and see what we can do instead. In particular need to compute new metrics
– Talk about evaluating decisions in the context of metrics like precision and recall, sensitivity and specificity, etc.
– Introduce ROC curves
– If time permits, go over interview question related to today¡¯s content
Naive Evaluation of Classifiers (1)
Logistic Regression Predicted Probability
Neural Network Predicted Probability
– Can use predicted probability >= 0.5 as our decision rule to determine accuracy.
– Both models have same accuracy, but does that mean they are equally suited to the task
Naive Evaluation of Classifiers (2)
– But what if the classifier isn¡¯t probabilistic? Eg. SVM
– Can we compare losses?
– Is it meaningful to compare cross entropy loss of logistic regression with optimisation problem in SVMs? No, they¡¯re like apples and oranges
– Even if loss functions are comparable, we should be evaluating the actual decisions made by models, not the loss functions. We do this using metrics
– First metric: accuracy
– Good first start, when might we not use it?
– When we have a class imbalance in the data and/or loss function is asymmetric
– Hence we must look at metrics beyond accuracy and loss in order to better evaluate classifiers
Motivating Example
Negative Positive
Feature Space /Predicted Probability
Decision Boundary
– The red positives are class 1, the blue negatives are class 0
– Everything to the right of the decision boundary gets classified as positive ie. 1
Definitions
Note #1: Focusing only on binary classification today
Note #2: Positive and negative can be arbitrary labels. Eg. binary species classification
– True Positive (TP): predicted 1, actually 1
– True Negative (TN): predicted 0, actually 0
– False Positive (FP): predicted 1, actually 0. Also known as Type 1 Error
– False Negative (FN): predicted 0, actually 1. Also known as Type 2 Error
Tip #1 for remembering: The N/P tells you the decision the classifier made, and the T/F tells you how reality corresponds to the decision. For example, for false positives, we know the model predicted class 1 because it says positive, and we know the true class is actually the opposite of that ie. 0 because it says false
Confusion Matrix
Note #3: Other conventions have the matrix transposed ie. predicted rows, actual columns. Somewhat arbitrary, just be sure to pay attention to the labels and you¡¯ll be fine
Adapted from (https://www.nbshare.io/notebook/626706996/Learn-And-Code-Confusion-Matrix-With-Python/)
Metrics (1)
– Accuracy = (TP + TN)/N, where N = TP + TN + FP + FN ie. what proportion of our observations do we classify correctly
– Precision = TP/(TP + FP) ie. what proportion of observations that we classify as 1 are actually 1. Denominator depends on the model
– Recall = TP/(TP + FN) ie. what proportion of the 1s in the data are we actually classifying as 1. Denominator depends on the data
Tip #2 for remembering: Precision starts with P, so all terms in numerator and denominator have P in them
Metrics (2)
– Sensitivity = TP/(TP + FN) ie. True Positive Rate or TPR. Should look familiar (why?)
– Specificity = TN/(TN + FP) ie. True Negative Rate or TNR
Tip #3 for remembering:
– Sensitivity has N in it, but is actually True Positive Rate
– Specificity has P in it, but is actually True Negative Rate
– I joke to myself that whoever named Greenland and Iceland also named sensitivity and
specificity
In both cases, the denominator depends on the data, not the model predictions. In other words, depends on the actual positives and actual negatives
Many more metrics that we will not cover today (F-1 score, False Discovery Proportion, etc.)
Motivating Example (2)
Negative Positive
Feature Space /Predicted Probability
Decision Boundary
– Exercise #1: Compute number of TP, TN, FP, FN
– If time permits, compute precision, recall, sensitivity, specificity
Motivating Example (3)
Recall figure from Lecture 18
If time permits, identify which regions correspond to TP, TN, FP, and FN
ROC Curves
– As stated before, a classifier is a combination of a model and a decision rule. In the case of probabilistic classifiers, we have a decision rule to convert probabilities into actual decisions.
– For each decision rule ie. probability threshold, we can compute the TPR (sensitivity) and FPR (1 – specificity)
– The line/curve joining all these points is known as a Receiver Operating Characteristic or ROC
Visually Evaluating Classifiers
Source: https://upload.wikimedia.org/wikipedia/commons/3/36/Roc-draft-xkcd-style.svg
Quantitatively Evaluating Classifiers (1)
– We can summarise the ROC curve for a classifier by computing the Area Under the Curve (AUC)
– The higher the AUC, the better the model
– The AUC of a classifier is equivalent to the probability that the classifier will rank a
randomly chosen positive sample higher than a randomly chosen negative sample.
Quantitatively Evaluating Classifiers (2)
Bonus (1): Precision Recall Curves
– ROC curves are insensitive to class imbalance
– We can use Precision-Recall curves instead, which are sensitive to class imbalance
Bonus (2): Interview Question
How would you explain precision and recall to the CEO? You may use a confusion matrix in your answer if you wish, but are not required to do so
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com