程序代写 COMP20008 Elements of Data Processing

Experimental design for supervised learning — Introduction
School of Computing and Information Systems
@University of Melbourne 2022

Copyright By PowCoder代写 加微信 powcoder

Supervised vs Unsupervised Learning
Supervised
Classification Regression
Unsupervised
Clustering
Association (Recommendation)
Dimensionality reduction: Feature selection & feature projection
Others: Reinforcement Learning, Transfer learning, etc.
COMP20008 Elements of Data Processing

Experimental Design (supervised)
• Evaluation methods • Performance metrics • Feature selection
COMP20008 Elements of Data Processing

Evaluation methods for supervised learning
School of Computing and Information Systems
@University of Melbourne 2022

Experimental Design (supervised)
• Evaluation methods • Performance metrics • Feature selection
COMP20008 Elements of Data Processing

The generalisation challenge
– Generalises well
COMP20008 Elements of Data Processing
– Overfits the data
– Low Bias error
– High Variance error

The generalisation challenge – cont.
When a model learns too much from the training data: it has
– Low bias error
• Predicts well on training data – But high variance error
• Predictions change widely given different training data.
COMP20008 Elements of Data Processing

Evaluation method – training and test sets
How do we know if our model will do well on unseen data? • We train the model on a set of data – the training set.
• We evaluate the model on a new set of data – the test set.
• Assumptions:
• Training and test sets are from the same distribution
• Samples are drawn independently and identically i.i.d. at random.
• Only one set? Partition the set into training and test set.
COMP20008 Elements of Data Processing

Training – validation – test split
• A separate test set for model selection – the validation set to prevent data leakage.
Training Test
Test (hold out)
• Repeat to select the best model (or hyperparameters) • Train the model on the training set.
• Select the model that performs best on the validation set.
• Use Training + Validation sets to train the selected model.
• Report performance on the independent test set (hold out set).
COMP20008 Elements of Data Processing

Evaluation method – Cross validation
• Statistical uncertainty – the problem when the dataset is small.
• We cannot be confident on the performance estimation.
• Cross validation – uses all data to estimate model performance Training Test
COMP20008 Elements of Data Processing

K-fold cross validation
Partition training data randomly into ! blocks
Repeat to select the best hyper-parameters or model – Repeat ! times:
• ! − 1 blocks for training
• 1 block for evaluation – Average the ! scores
Training set
Test (validate)
Test (validate)
Test (Validate)
COMP20008 Elements of Data Processing

Leave one out cross validation
Leave one out cross validation (LOOCV)
• n-fold cross validation ( ! = %)
• The validation set is exactly one observation. • More expensive to run.
Test/Validate Train
Training set
COMP20008 Elements of Data Processing

Repeated k-fold cross validation
• Repeat k-fold cross validation r times, for example 5 times, to reduce any bias from random number choices
Repeat r times:
i. Partition dataset randomly into k blocks ii. Repeat k times:
• k-1 blocks for training and 1 block for testing • Record 1 performance score
Average the !×* scores
COMP20008 Elements of Data Processing

Evaluation method – Bootstrap validation
• Bootstrap – A man pulling himself up and over a fence by pulling upwards on his own bootstraps.
• Relying on smaller samples of the population itself in order to draw conclusions on the population.
• Bootstrap sample – each smaller sample is drawn randomly from the original dataset using replacement.
• Out of bag data (OOB): remaining data points NOT in the bootstrap sample; these are the test (validation) data.
COMP20008 Elements of Data Processing

Bootstrap validation (bootstrapping) – cont.
Draw + bootstrap samples (from training data)
Repeat to select the best hyper-parameters or model – Repeat ! times, for each bootstrap sample:
• Train the model on the bootstrap sample,
• Evaluate the performance on the OOB (test/validation data)
– Report the mean and standard deviation of the ! performance scores
• Can handle imbalanced data sets – use stratified bootstrap where biased sampling is used.
COMP20008 Elements of Data Processing

The overall flow
How do we know if decision tree is better than knn for the dataset?
1. Model evaluation and selection using one of the following:
a. Hold-out: Training – validation splits
b. Cross validation: k– fold Leave one out, repeated CV
c. Bootstrapping
After step 1, you have selected the final algorithm and hyper parameters:
2. Fit the selected model and hyper-parameters with the entire training set
3. Report performance on the independent test set.
COMP20008 Elements of Data Processing

Performance metrics
Regression and Classification
COMP20008 Elements of Data Processing

Classification metrics

Confusion Matrix
The outcomes of the classification can be summarised in a Confusion Matrix (contingency table)
• Actual class: {yes, no, yes, yes, …} • Predicted class: {no, yes, yes, no, …}
TP: true positive FN: false negative FP: false positive TN: true negative
PREDICTED CLASS
ACTUAL CLASS
COMP20008 Elements of Data Processing

Classification metric – Accuracy
How many observations are correctly classified out of ! observations
“##$%&#’ =
• ! is the total number of observations. ! = #*+ + #.- + #.+ + #*-
• !”#”$ = 0.96 !”#%#$#”$
#*+ + #*- !
PREDICTED CLASS
ACTUAL CLASS
COMP20008 Elements of Data Processing

• Accuracy is misleading in imbalanced problems. • &’#( = 0.97
• The predictions for the minority class are completely wrong but the overall accuracy value is high.
PREDICTED CLASS
ACTUAL CLASS
COMP20008 Elements of Data Processing

Classification metric – Recall
#*+ #*+ + #.-
• !” ≈0.94 !”#%
• Use recall when you do not want FN (detecting as many malicious programs as possible)
• Recall – Effectiveness of a classifier to identify class labels
PREDICTED CLASS
ACTUAL CLASS
COMP20008 Elements of Data Processing

Classification metric – Precision
• Precision – Agreement of the true class labels with those of the
classifier’s
• !” ≈0.98 !”#$
+%5#9:9;! =
#*+ #*+ + #.+
PREDICTED CLASS
ACTUAL CLASS
• Use precision when you DON’T want FP (avoid putting innocent people in prison)
COMP20008 Elements of Data Processing

Classification metric – F1
• F1 – The harmonic mean between precision and recall. .1 = 2× +%5#9:9;! × 45#&66
+%5#9:9;! + 45#&66
PREDICTED CLASS
ACTUAL CLASS
• 2× (.&*×(.&! ≈ 0.96 (.&*#(.&!
COMP20008 Elements of Data Processing

Multi-class Confusion matrix
• What about 3 classes? Calculate TP, FP, TN, FN for each class.
PREDICTED CLASS
ACTUAL CLASS
• Sunny: #TP: c1
#FN: c2 +c3
#FP: c4 +c7
#TN: c5 +c6+c8 +c9
#FN: c4 +c6
#FP: c2 +c8
#TN: c1 +c3+c7 +c9
Overcast: #TP: c9
#FN: c7 +c8
#FP: c3 +c6
#TN: c1 +c2+c4 +c5
COMP20008 Elements of Data Processing

Multi-class Accuracy
• Subset accuracy • Average accuracy
∑$ ##$! !”#
(##$ +##) ) ∑$ ! !&
• ! is total number of observations, k is the number of classes
COMP20008 Elements of Data Processing

Multi-class Accuracy – cont.
• Subset accuracy = !”# )
*+,-.,** #++
PREDICTED CLASS
ACTUAL CLASS
COMP20008 Elements of Data Processing

Multi-class Accuracy – cont.
#&’ (#&) / !!*
• Average accuracy =
Sunny: 94/100 TP: 30
TN: 28+2+1+33
Rain: 95/100 TP: 28
TN: 30+3+1+33
Overcast: 93/100 TP: 33
TN: 30+0+2+28
PREDICTED CLASS
ACTUAL CLASS
Average accuracy :
(0.94 + 0.95 + 0.93) = 0.94
COMP20008 Elements of Data Processing

Multi-class metrics – cont.
• Recall, Precision, and F1
• Involves averaging over multiple classes.
• Macro and Micro averaging options.
• We do not go into these in this subject, but it is good to know that they exist.
COMP20008 Elements of Data Processing

Regression metrics

Linear Regression – revision
Yi=β0 +β1Xi+εi
Error term or the residual
ε = Y − Y&
for this X value i
COMP20008 Elements of Data Processing

Regression metrics
• MSE – Mean Square Error
• Lower is better
D E F = 1 G . ( H − HJ ) / – ,-$ , ,
EEF=G. (H−HJ)/ ,-$ , ,
COMP20008 Elements of Data Processing

Regression metrics – cont.
• RMSE – Root Mean Square Error
Most used measure Lower value is better
∑. (H−HJ)/ ,-$ , ,
COMP20008 Elements of Data Processing

Regression metrics – cont.
• MAE –Mean Absolute Error 1 . J D ” F = – G , – $ H, − H,
Lower value is better
COMP20008 Elements of Data Processing

Regression metrics and outliers
• MAE and RMSE are in the same scale of the residual and MSE is in quadratic scale of the residual
• Which is sensitive to outliers?
COMP20008 Elements of Data Processing

• Others (MAPE, Median Absolute Error)
• https://scikit-learn.org/stable/modules/model_evaluation.html#
COMP20008 Elements of Data Processing

Feature selection – univariate
Supervised Learning
COMP20008 Elements of Data Processing

Feature selection – univariate
Intuition: evaluate “goodness” of each feature
• Consider each feature separately: linear time in number of
attributes
• Typically most popular and simple strategy
COMP20008 Elements of Data Processing

Feature selection for classification
What makes a single feature good?
• Well correlated with class • Not independent of class
Which of !!, !” is a good feature for predicting the class “?
COMP20008 Elements of Data Processing

Feature selection – Mutual information (MI)
What makes a feature good? If it is well correlated with the class • Mutual Information (revision)
!”#,% =’% −’%#
• Is feature ) well correlated with the class *? !”),* =’* −’*)
• High !” ), * : ) strongly predicts *; select ) into the feature set
• Low !” ), * : ) can not predict *; ) is not selected into the feature set
COMP20008 Elements of Data Processing

Feature selection – MI – cont.
Is )! well correlated with the class *? !”)!,* =’* −’*)!
= 1 − 0 = 1 (High MI, Yes!)
• The feature a1 perfectly predicts c; select a1 as a feature.
Is )” well correlated with the class *? !”)”,* =’* −’*)”
= 1 − 1 = 0 (Low MI, No!)
• The feature a2 can not predict c at all; a2 is Not selected as a feature.
COMP20008 Elements of Data Processing

Feature selection – Chi-square c! test What makes a single feature good?
If it is not independent of the class Chi-square c” test
COMP20008 Elements of Data Processing

Chi-square c! – Null Hypothesis
H0: two variables are independent: # $ ∩ & = # $ ×# &
• A statistical hypothesis test for quantifying the independence of pairs of nominal or ordinal variables.
• Takes into account sample size & has a significance test (MI does not)

Chi-square c! test – cont.
)#: a feature and the class variable are independent
Is there a difference between observed and expected frequencies?
1. Summarise observed frequencies of the feature and the class
2. Find expected frequencies for all (feature, class) pairs
3. Calculate the c” statistic: difference between observed and expected values
4. Look up c” critical value to test )#
Reject !!; feature and class variables are not independent.
COMP20008 Elements of Data Processing

Chi-square c! test – example Chi-square c” test for !! and “:
1. Summarise observed frequencies of !! and ” in a contingency table
a1= Y a1= N
COMP20008 Elements of Data Processing

Chi-square c! test – example Chi-square c” test for !! and “:
Find expected frequencies for all (feature, class) pairs
• Probability” #” =%∩’=% = ” #” =% ×” ‘=% = #$×#$ = “$
•ExpectedfrequencyE#”=%,’=% =*×”#”=%∩’=% =4×”$=1
a1= Y a1= N
COMP20008 Elements of Data Processing

Chi-square c! test – example
Calculate the c” statistic (difference between observed and
expected frequencies)
-%,’ − .%,’ %({*+,, *+-} %({/!+,, /!+-}
/# #”,’ = 0 0
a1= N .%,’ a1= Y 1
/##”,’= #0″”+!0″”+!0″”+#0″” =4 “”””
COMP20008 Elements of Data Processing

Chi-square c! test – example
4. Look up c! critical value and test !” • With 95% confidence (3 level = 0.05)
• Degreesoffreedomdf= 2−1 × 2−1 =1(#1 has2values,’has2values) • The c# critical value is 3.84 (lookup this value)
COMP20008 Elements of Data Processing

Chi-square c! test – example
4. Look up c! critical value to test !” • The c” statistic is 4
• The c” critical value is 3.84
(*+=1, , = 0.05)
• The c” statistic > critical value • The p-value < 0.05 • Reject )# • !1 is NOT independent of c • !1 is selected into the feature set • p-value > 0.05
• Fail to reject 1!
(3.84, 0.05)
• p-value < 0.05 • Reject 1! COMP20008 Elements of Data Processing Degrees of freedom • Maximum number of independent values that have freedom to vary in the data sample. If a feature can have k different possible values • Why is the degree of freedom k – 1? • Because the frequency of the last value is totally determined by those of the k – 1 values; the frequency of the last value is not free to vary. Weather forecasts have 3 possible values: sunny, rain, or overcast. Given * samples, if 7" samples are sunny, and 7# samples are rain, the number of samples that are overcast must be * − 7" − 7#. COMP20008 Elements of Data Processing Univariate feature selection – potential issues Difficult to control for inter-dependence of features • Feature filtering of single features may remove important features. For example, where the class is the XOR of some features: • Given all the features, the class is totally predictable. • Given one of them, the MI or chi-square statistic is 0. • In practice, feature extraction is also used, i.e. construct new features out of existing ones, e.g. ratio / difference between features Income Expenditure i>e Credit
120 100 1 Good
50 30 1 Good
50 70 0 Bad
200 40 1 Good
200 210 0 Bad
…………
160 150 1 Good
COMP20008 Elements of Data Processing

Feature selection for regression
1. Mutual Information
COMP20008 Elements of Data Processing

Model evaluation with feature selection
• Feature selection should be done within each training step. K-fold cross validation procedure
Partition training data randomly into ! blocks
Repeat to select the best hyper-parameters or model – Repeat ! times:
• Feature selection on the ! − 1 blocks,
• ! − 1 blocks for training,
• 1 block for evaluation
– Average the ! scores
COMP20008 Elements of Data Processing

Model evaluation with feature selection
• Feature selection should be done within each training step. Bootstrap validation procedure
Draw $ bootstrap samples
Repeat to select the best hyper-parameters or model
– Repeat $ times, for each bootstrap sample:
• Feature selection on the 899:;:<#= • Train the model on the bootstrap sample, • Evaluate the performance on the OOB (test/validation data) - Report the mean and standard deviation of the $ performance scores COMP20008 Elements of Data Processing The overall flow with feature selection 1. Model evaluation and selection with feature selection 2. Apply feature selection and fit the selected model and hyper- parameters with the entire training set 3. Report performance on the independent test set. COMP20008 Elements of Data Processing 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com