程序代写代做代考 data science data mining algorithm information retrieval Introduction to information system

Introduction to information system

Model Evaluation Metrics

Bowei Chen

School of Computer Science

University of Lincoln

CMP3036M/CMP9063M Data Science

MASH

• Maths

• And

• Stats

• Help

• MASH

• mash@lincoln.ac.uk

• In The Library

mailto:mash@lincoln.ac.uk

• What Is A Model Evaluation Metric?

• Mean Absolute Error (MAE)

• Root Mean Squared Error (RMSE)

• Confusion Matrix

• Accuracy, Error Rate, True Positive Rate, False Positive Rate, Precision

• Receiver Operator Characteristic (ROC)

• Area Under ROC Curve (AUC)

Today’s Objectives

Quick Recap on Model Selection

• Basic Setup of the Learning from Data

• Cross-Validation Methods

– Test Set Method

– Leave-One-Out Cross Validation

– 𝑘-Fold Cross Validation

• Appendix A: Testing-Based/Stepwise Procedures

– Backward Elimination

– Forward Selection

– Stepwise Selection

• Appendix B: Criterion-Based Procedures

– Mallows’ 𝐶𝑝

– AIC & BIC

– 𝑅2 Adjusted

Basic Setup of the Learning from Data

Source: Y. Abu-Mostafa, M. Magdon-Ismail and H. Lin.

Learning from Data. AMLbook.com, 2012, Chapter 1

Test Set Method

1) Randomly choose 30% of the data

to be in a test set

2) The remainder is a training set

3) Perform your regression on the

training set

4) Estimate your future performance

with the test set

LOOCV (Leave-One-Out Cross Validation)

For 𝑘 = 1 to 𝑛

1) Let (𝑥𝑘 , 𝑦𝑘) be the 𝑘
th record

2) Temporarily remove (𝑥𝑘 , 𝑦𝑘)
from the dataset

3) Train on the remaining 𝑁 − 1
data points

4) Note your error (𝑥𝑘 , 𝑦𝑘)

When you’ve done all points, report

the mean error.

𝑘-Fold Cross Validation

Break the dataset into 𝑘 partitions randomly. In
this example, we’ll have 𝑘 = 3 partitions
colored blue, green and violet.

For the blue partition, train on all the points not
in the blue partition. Find the test-set sum of
errors on the blue points.

For the green partition, train on all the points
not in the green partition. Find the test-set sum
of errors on the green points.

For the violet partition, train on all the points
not in the violet partition. Find the test-set sum
of errors on the violet points.

Then report the mean error

What Is A Model Evaluation Metric?

A model evaluation metric (or performance metric) measures how well your

data mining or machine learning algorithm is performing on a given dataset.

Example:

If we apply a classification algorithm on a dataset, we first check to see how

many of the data points were classified correctly. This is a performance metric

and the formal name for it is ―accuracy.‖

MAE

The mean absolute error (MAE) metric is given by

MAE =
1

𝑛
|𝜀𝑖|

𝑛

𝑖=1

=
1

𝑛
|𝑦 𝑖 − 𝑦𝑖|

𝑛

𝑖=1

where

• 𝑛 is the total number of observations

• 𝑦 𝑖 is the predicted value of the 𝑖th observation

• 𝑦𝑖 is the actual value of the 𝑖th observation

Test Set Method with Using MAE

1) Randomly choose 30% of the data

to be in a test set

2) The remainder is a training set

3) Perform your regression on the

training set

4) Estimate your future performance

with the test set

MAE =
1

3
3 + 7 + 1 =

3
≈ 3.67

y
-3

-7

RMSE

The root mean squared error (RMSE) metric is given by

RMSE =
1

𝑛
𝜀𝑖

𝑛

𝑖=1

=
1

𝑛
𝑦 𝑖 − 𝑦𝑖

𝑛

𝑖=1

where

• 𝑛 is the total number of observations

• 𝑦 𝑖 is the predicted value of the 𝑖th observation

• 𝑦𝑖 is the actual value of the 𝑖th observation

Test Set Method with Using RMSE

1) Randomly choose 30% of the data

to be in a test set

2) The remainder is a training set

3) Perform your regression on the

training set

4) Estimate your future performance

with the test set

y
-3

-7

RMSE =
1

3
32 + 72 + 1 =

3
≈ 4.43

MAE vs RMSE

Similarity

Both measures express average

model prediction error in units of

the variable of interest. They range

from 0 to ∞ and are indifferent to
the direction of errors. Lower

values better model performances.

Difference

Taking the square root of the average squared

errors has some interesting implications for

RMSE. Since the errors are squared before

they are averaged, the RMSE gives a relatively

high weight to large errors. This means the

RMSE should be more useful when large errors

are particularly undesirable.

Editor of Human in a Machine World. MAE and RMSE — Which Metric is Better?

https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d#.5utapadgw

https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d

MAE vs RMSE

Editor of Human in a Machine World. MAE and RMSE — Which Metric is Better?

https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d#.5utapadgw

Classification with Two Classes

Price Fullbase

1 420 1

2 385 0

3 495 0

4 605 0

5 610 0

6 660 1

7 660 1

8 690 0

9 838 1

10 885 0

… … …

Housing dataset

Response variable Predictor

Distributions of Two Classes

With

Fullbase
Without

Fullbase
# of

houses

𝑃

Threshold

With

Fullbase
Without

Fullbase
# of

houses

Call these houses ―negative‖ Call these houses ―positive‖

𝑃

True Positive (TP)

With

Fullbase
Without

Fullbase
# of

houses

True Positive (TP)

Call these houses ―negative‖ Call these houses ―positive‖

𝑃

False Positive (FP)

With

Fullbase
Without

Fullbase
# of

houses

False Positive (FP)

Call these houses ―negative‖ Call these houses ―positive‖

𝑃

True Negative (TN)

With

Fullbase
Without

Fullbase
# of

houses

True Negative (TN)

Call these houses ―negative‖ Call these houses ―positive‖

𝑃

False Negative (FN)

𝑃

With

Fullbase
Without

Fullbase
# of

houses

False Negative (FN)

Call these houses ―negative‖ Call these houses ―positive‖

2 × 2 Confusion Matrix for Two-Class Problems

Actual

1 0 ∑

Estimate
1 𝑇𝑃 𝐹𝑃 𝑁 + = 𝑇𝑃 + 𝐹𝑃

0 𝐹𝑁 𝑇𝑁 𝑁 − = 𝐹𝑁 + 𝑇𝑁

∑ 𝑁+ = 𝑇𝑃 + 𝐹𝑁 𝑁− = 𝐹𝑃 + 𝑇𝑁 𝑁 = 𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁

Kevin Murphy. Machine Learning A Probabilistic Perspective, pp.183

• 𝑁+ is the true number of positives
• 𝑁− is the true number of negatives
• 𝑁 + is the estimated number of positives
• 𝑁 − is the estimated number of negatives

Accuracy

Overall, how often is the classifier correct?

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝑇𝑁

𝑁
=

100 + 50

165
= 0.91

Actual

1 0 ∑

Estimate
1 𝑇𝑃 = 100 𝐹𝑃 = 10 𝑁 + = 𝑇𝑃 + 𝐹𝑃 = 110

0 𝐹𝑁 = 5 𝑇𝑁 = 50 𝑁 − = 𝐹𝑁 + 𝑇𝑁 = 55

∑ 𝑁+ = 𝑇𝑃 + 𝐹𝑁 = 105 𝑁− = 𝐹𝑃 + 𝑇𝑁 = 60 𝑁 = 165

Error Rate

Overall, how often is the classifier incorrect?

1 − 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 1 −
𝑇𝑃 + 𝑇𝑁

𝑁
=
𝐹𝑃 + 𝐹𝑁

𝑁
= 0.09

Actual

1 0 ∑

Estimate
1 𝑇𝑃 = 100 𝐹𝑃 = 10 𝑁 + = 𝑇𝑃 + 𝐹𝑃 = 110

0 𝐹𝑁 = 5 𝑇𝑁 = 50 𝑁 − = 𝐹𝑁 + 𝑇𝑁 = 55

∑ 𝑁+ = 𝑇𝑃 + 𝐹𝑁 = 105 𝑁− = 𝐹𝑃 + 𝑇𝑁 = 60 𝑁 = 165

True Positive Rate/Recall

When it’s actually yes, how often does it predict yes?

𝑇𝑃𝑅 =
𝑇𝑃

𝑇𝑃 + 𝐹𝑁
=
100

105
= 0.95

also known as recall

Actual

1 0 ∑

Estimate
1 𝑇𝑃 = 100 𝐹𝑃 = 10 𝑁 + = 𝑇𝑃 + 𝐹𝑃 = 110

0 𝐹𝑁 = 5 𝑇𝑁 = 50 𝑁 − = 𝐹𝑁 + 𝑇𝑁 = 55

∑ 𝑁+ = 𝑇𝑃 + 𝐹𝑁 = 105 𝑁− = 𝐹𝑃 + 𝑇𝑁 = 60 𝑁 = 165

False Positive Rate

When it’s actually no, how often does it predict yes?

𝐹𝑃𝑅 =
𝐹𝑃

𝐹𝑃 + 𝑇𝑁
=
10

60
= 0.17

Actual

1 0 ∑

Estimate
1 𝑇𝑃 = 100 𝐹𝑃 = 10 𝑁 + = 𝑇𝑃 + 𝐹𝑃 = 110

0 𝐹𝑁 = 5 𝑇𝑁 = 50 𝑁 − = 𝐹𝑁 + 𝑇𝑁 = 55

∑ 𝑁+ = 𝑇𝑃 + 𝐹𝑁 = 105 𝑁− = 𝐹𝑃 + 𝑇𝑁 = 60 𝑁 = 165

Precision

When it predicts yes, how often is it correct?

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃

𝑇𝑃 + 𝐹𝑃
=
100

110
= 0.91

Actual

1 0 ∑

Estimate
1 𝑇𝑃 = 100 𝐹𝑃 = 10 𝑁 + = 𝑇𝑃 + 𝐹𝑃 = 110

0 𝐹𝑁 = 5 𝑇𝑁 = 50 𝑁 − = 𝐹𝑁 + 𝑇𝑁 = 55

∑ 𝑁+ = 𝑇𝑃 + 𝐹𝑁 = 105 𝑁− = 𝐹𝑃 + 𝑇𝑁 = 60 𝑁 = 165

ROC Graph/Curve/Space

• ROC graphs are two-dimensional

graphs in which TPR is plotted on the Y

axis and FPR is plotted on the X axis.

• An ROC grahp depicts relative trade-

offs between benefits (true positives)

and costs (false positives).

• Figure shows an ROC graph with five

classifiers/models labelled A through E

ROC Graph/Curve/Space

• Lower left point (0, 0) represents the strategy
of never issuing a positive classification,

such a classier commits no false positive

errors but also gains no true positives.

• Upper right corner (1, 1) represents the
opposite strategy, of unconditionally issuing

positive classifications.

• Point (0, 1) represents perfect classification.

D’s performance is perfect as shown.

• One point in ROC space is better than

another if it is to the northwest of the first

Best ROC Curve

T
P

100%

FPR 0%
100%

With

Fullbase

Without

Fullbase

# of

houses

𝑃

Call these houses ―negative‖ Call these houses ―positive‖

TPR=1 FPR=0

The distributions don’t overlap at all

Worse ROC Curve

T
P

100%

FPR 0%
100%

With

Fullbase

Without

Fullbase

# of

houses Call these houses ―negative‖ Call these houses ―positive‖

FPR=TPR

The distributions overlap completely

𝑃

Threshold TPR (%) FPR (%)

0.25 99 50

0.3 97 39

0.4 83 20

0.6 60 10

0.8 40 5

0.9 20 2

0.25
0.3

0.4

0.6

0.8

0.9

FPR

T
P

Plotting A ROC Curve

AUC

Area under ROC curve (AUC) has an
important statistical property:

• The AUC of a model is equivalent
to the probability that the classier
will rank a randomly chosen
positive instance higher than a
randomly chosen negative
instance.

• Often used to compare classifiers:

The bigger AUC the better

T
P

100%

FPR
0% 100%

T
P

100%

FPR
0% 100%

T
P

100%

FPR
0% 100%

AUC = 50% AUC = 90% AUC = 65% AUC = 100%

T
P

100%

FPR
0% 100%

AUC

• What Is A Model Evaluation Metric?

• Mean Absolute Error (MAE)

• Root Mean Squared Error (RMSE)

• Confusion Matrix

• Accuracy, Error Rate, True Positive Rate, False Positive Rate, Precision

• Receiver Operator Characteristic (ROC)

• Area Under ROC Curve (AUC)

Summary

References

• Editor of Human in a Machine World. MAE and RMSE — Which Metric is Better?

https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-

e60ac3bde13d#.5utapadgw

• David Page. Evaluating Machine Learning Methods. University of Wisconsin-Madison

Lecture Slides, 2016

• Kevin Murphy. Machine Learning A Probabilistic Perspective. Chapters 5, 6, 7 & 8

• Christopher Manning, Prabhakar Raghavan and Hinrich Schütze. An Introduction to

Information Retrieval. Chapter 1

• Jesse Davis and Mark Goadrich. The Relationship Between Precision-Recall and ROC

Curves. In ICML, 2006

Thank You!

bchen@Lincoln.ac.uk

mailto:bchen@Lincoln.ac.uk

Related Posts