CS代写 Classification Error

Classification Error

Test and Training Error
I As with numeric responses, we are interested in making accurate decisions in terms of assinging labels.

I We have argued that the intuitive Maximum A Posteriori Decision rule is optimal, i.e. it reduces the overall training error.
Yˆ = argmaxy2CPˆ(Y = y|X)
I However, as with regression, we need to worry about overfitting the
I As opposed to regression, we can better distinguish the types of errors we make.

Two Types of Errors
Important: the MAP decision rule minimizes the overall error rate. But this may come at the expense of high (low) type 1 versus low (high) type 2 error rates!

Precision, Recall and Accuracy
I Precision = TruePositives TruePositive +FalsePositive

Precision, Recall and Accuracy
I Recall = TruePositives TruePositive +FalseNegatives
Also called ”sensitivity”

Precision, Recall and Accuracy
I Specificity = TrueNegative TrueNegative +FalsePositive

Precision, Recall and Accuracy
I Accuracy = TruePositives+TrueNegative AllCases

For Reference: Many synonyms.
Figure: Synonyms for Type 1, Type 2 errors taken from Hastie et al, 2014.

MAP Decision Rule and Error Types
I Suppose you were to suggest to only screen bad loan applications intensively, i.e. those whose probability of being rejected are above a threshold c ̄.
I MAP Decision rule says that the overall error rate is minimal, in case you set c ̄ = 1/2.
I Lets see how we would do in this case.

Classification and MAP Rule: How are we doing?
MAP Decision rule says that the overall error rate is minimal, in case you set c ̄ = 1/2.
> Defaulted=as.character(MORTGAGE[test]$Defaulted)
> glm.probs=predict(glm.fit,MORTGAGE[test],type=”response”)
> glm.pred=rep(“Grant loan”,length(glm.probs))
> glm.pred[glm.probs>.5]=”Decline loan”
> addmargins(table(glm.pred,Defaulted))
Default Non default Sum
Decline loan
Grant loan
How do we do?
I Accuracy 4+859=863/1000 or almost 86% correctly classified I Precision = 4/(4+5) = 44%.
I Recall = 4/(4+132) = 3%.
I Specificity = 859/864 = 99%.
We get really high specificitiy, because most mortgages do not default!

Not following MAP Rule…
Suppose you set c ̄ = 0.2, out of our test sample of 1000 loan applications…
> Defaulted=as.character(MORTGAGE[test]$Defaulted)
> glm.probs=predict(glm.fit,MORTGAGE[test],type=”response”)
> glm.pred=rep(“Grant loan”,length(glm.probs))
> glm.pred[glm.probs>.2]=”Decline loan”
> addmargins(table(glm.pred,Defaulted))
glm.pred Default Non default Sum
Decline loan 32
Grant loan 104
Sum 136
How do we do?
I Accuracy = (32+722) /1000 or almost 75% correctly classified I Precision = 32 /(32+142) = 18%
I Recall = 32/(32+104 ) = 24%
I Specificity = 722/871 = 83%.
We are trading o↵ true positives with true negatives.

Not following MAP Rule…
Suppose you set c ̄ = 0.1, out of our test sample of 1000 loan applications…
> Defaulted=as.character(MORTGAGE[test]$Defaulted)
> glm.probs=predict(glm.fit,MORTGAGE[test],type=”response”)
> glm.pred=rep(“Grant loan”,length(glm.probs))
> glm.pred[glm.probs>.1]=”Decline loan”
> addmargins(table(glm.pred,Defaulted))
glm.pred Default Non default Sum
Decline loan 97
Grant loan 39
Sum 136
How do we do?
I Accuracy = (97+349) /1000 or almost 45% correctly classified I Precision = 97 /(97+515) = 16%
I Recall = 97/(97+39 ) = 71%
I Specificity = 349/864 = 40%.
We are trading o↵ true positives with true negatives.

Visualizing Accuracy, Recall and ↵
Accuracy Recall (Sensitivity) Specificity
0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6
0.4 0.6 0.8
0.4 0.6 0.8
sensitivity
specificity

Alternative costs
> glm.pred[glm.probs>.5]=”Decline loan”
> addmargins(table(glm.pred,Defaulted))
Suppose our costs for false-positves versus false-negatives are Actual outcome yi
Predicted outcome yˆ i
Non-default
Decline loan
Grant loan
What are the expected misclassification costs?
C(y|yˆ =Declinecheck)= TP + FP
i i 1,1 TP + FP 1,2 TP + FP
C (y |yˆ = Grant loan) = FN + TN
i i 2,1 FN + TN 2,2 FN + TN

Alternative costs
What are the expected misclassification costs?
C = p(yˆ = Decline) ⇥ C (y |yˆ = Decline) (1)
+ p(yˆ = Grant)C (y |yˆ = Grant) (2) iii
Suppose 1,1 = 2,2 = 0. We can derive conditions on 1,2 and 2,1 that give us lowest expected misclassification cost. Note that the FP/FN counts are a function of the underlying cuto↵ c ̄ – so there is a di↵erent cost function for every di↵erent value c ̄.

Trade o↵ between Sensitivity and Specificity
I As we decrease c ̄, there is a trade o↵ between the type 1 and type 2 errors that occur.
I For very low c ̄, a lot of loans are rejected resulting in many false negatives (loans that would not have defaulted being rejected), but relatively few false positives – high recall and low specificity
I As we increase c ̄, fewer loans are declined; this reduces the false positive cases but increases the false negative cases (we fail to scrutinize reject loans that would default.
I Overall, since most loans do not default the overall increase in accuracy as we increase c ̄ is driven by specificity.

ROC Curve to Visualize Trade O↵ between Sensitivity and Specificity
I ROC curve is a popular graphic for simultaneously displaying the two types of errors for all possible thresholds.
I It comes from communications theory and stands for“receiver operating characteristics”.
I It plots Specificity against Sensitivity as we vary c ̄ (without showing c ̄)
I The ideal ROC curve is in the top left corner (100 % specificity and 100% sensitivity)
I The 45 degree line is the classifier that assigns observations to classes in a random fashion (e.g. a coin toss)
I The overall performance of a classifier, summarized over all possible thresholds, is given by the area under the (ROC) curve (AUC).

ROC Curve in our example
1.0 0.6 0.2
Specificity
Sensitivity
0.0 0.4 0.8

Choice of Decision Rule depends on cost
I The aim of this exercise was to highlight that choice of decision criterion, i.e. cuto↵ c ̄ need not be dictated by the MAP decision rule that guarantees, on average, best prediction performance.
I The question really is, what the associated costs are for having many tue positives or true negatives.
I Optimal choice of c ̄ would take into account the potentially di↵erent costs and may give solutions that are far away from 1/2.

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts