代写代考 CS 189 (CDSS offering)

Lecture 17:
Evaluating learned models CS 189 (CDSS offering)
2022/02/28

Today’s lecture
We now know enough about learning models to ask other important questions:
If my learned parameters minimize the training loss, am I done? Should I
deploy my model and move on?
How do I determine whether I am “satisfied” with the model?
What can I do if I am not satisfied with the model?
Answering these questions will also shed light on how to select hyperparameters,
which we have glossed over until now

True risk and empirical risk
Risk is defined as expected loss: R(θ) = 𝔼[l(θ; X, Y )]
This is sometimes called true risk to distinguish from empirical risk below
Empirical risk is the average loss on the training set: R̂(θ) = N1 ∑N l(θ; xi, yi) i=1
Supervised learning is oftentimes empirical risk minimization (ERM) Is this the same as true risk minimization?

The difference between true and empirical risk
The empirical risk looks just like a Monte Carlo estimate of the true risk, so shouldn’t we have R̂(θ) ≈ R(θ)? Why might this not be the case?
Intuitively, the issue here is that we are already using the training dataset to learn θ — we can’t “reuse” the same data to then get an estimate of the risk!
When the empirical risk is low, but the true risk is high, we are overfitting
When the empirical risk is high, and the true risk is also high, we are underfitting

Overfitting and underfitting
When the empirical risk is low, but the true risk is high, we are overfitting
This can happen if the dataset is too small and/or the model is too “powerful”
When the empirical risk is high, and the true risk is also high, we are underfitting
Generally, the true risk won’t be lower than the empirical risk
This can happen if the model is too “weak” and/or the optimization doesn’t
work well (i.e., the training loss does not decrease satisfactorily)
What constitutes “high”? Often, that is up to the practitioner — that is, one
must ask: “How well do I expect my model to work for this problem?”

Diagnosing overfitting and underfitting
As mentioned, we cannot rely on the empirical risk R̂(θ) being an accurate
estimate of the true risk R(θ)
But we need to estimate R(θ) in order to diagnose overfitting and underfitting!
What’s the problem? We want to use the dataset for two purposes: learning θ
and estimating R(θ)
This suggests a natural solution: divide the dataset into two parts, one part for
learning θ and one part for estimating R(θ)

Training and validation sets
We use the training set for training, i.e., learning θ
training set
the dataset
validation set
The loss on the training set also informs us of whether or not the
empirical risk is “high” — if so, we are underfitting
Thus, we also use the training set for making sure that the We reserve the validation set for diagnosing overfitting
optimization is working, i.e., decreasing training loss satisfactorily
The loss on the validation set should be an accurate estimate of
the true risk, thus we can compare losses on these two sets

The machine learning workflow
training set
validation set
Learn θ on the training set
Measure loss on the validation set
Not overfitting or underfitting? You’re done
if the training loss is not low enough…
you are underfitting! increase model capacity, improve optimizer, … and go back to step 1
if the training loss is much smaller than the validation loss…
you are overfitting! decrease model capacity, collect more data, … and go back to step 1

You’re done?
What does “you’re done” mean?
In industry, maybe it means: deploy your model
training set
validation set
In research, competitions, this class, etc., it means: report your
model’s performance on a test set
The test set is reserved for reporting final performance only and
must never, ever be used for anything else

Selecting hyperparameters
What are some examples of hyperparameters we have seen so far?
• l1/l2-regularization strength λ
Soft margin SVM slack penalty C
Also other things like learning rate α, choice of featurization/kernel, etc.
We select these hyperparameters to make sure that we are neither underfitting
nor overfitting, i.e., this selection process is a part of the earlier workflow

Cross validation
Setting aside a fixed portion of training data for validation can be an issue if we
repeat the training-validation workflow many times
We may start overfitting to the validation set!
An alternative approach which better utilizes the dataset but is more
computationally intensive is cross validation
High level idea: for evaluating a single setup (model, hyperparameters, …), repeat
training-validation multiple times, splitting the dataset differently each time
Then average the results in order to estimate how “good” that setup is

An example: k-fold cross validation
https://scikit-learn.org/stable/modules/cross_validation.html

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts