Tutorial Questions | Week 2
COSC2779 – Deep Learning
This tutorial is aimed at reviewing basic machine learning concepts. Please try the questions
before you join the session.
1. A computer program is said to learn from experience E with respect to some task T and some performance
measure P if its performance on T, as measured by P, improves with experience E. Suppose we feed a
learning algorithm a lot of historical weather data, and have it learn to predict weather.
(a) What would be a reasonable choice for P?
Solution: The probability of it correctly predicting a future date’s weather.
(b) What is T?
Solution: The weather prediction task.
2. Suppose you are working on stock market prediction, and you would like to predict the price of a particular
stock tomorrow (measured in dollars). You want to use a learning algorithm for this. Would you treat this
as a classification or a regression problem?
Solution: Regression
3. In which one of the following figures do you think the hypothesis has overfit the training set?
(a) (b) (c) (d)
Solution: (a)
4. What’s the trade-off between bias and variance?
Solution: Bias is error due to erroneous or overly simplistic assumptions in the learning algorithm
you’re using. This can lead to the model underfitting your data, making it hard for it to have high
predictive accuracy and for you to generalize your knowledge from the training set to the test set.
Variance is error due to too much complexity in the learning algorithm you’re using. This leads to the
algorithm being highly sensitive to high degrees of variation in your training data, which can lead your
model to overfit the data. You’ll be carrying too much noise from your training data for your model to
be very useful for your test data.
The bias-variance decomposition essentially decomposes the learning error from any algorithm by adding
the bias, the variance and a bit of irreducible error due to noise in the underlying dataset. Essentially,
if you make the model more complex and add more variables, you’ll lose bias but gain some variance
— in order to get the optimally reduced amount of error, you’ll have to tradeoff bias and variance. You
don’t want either high bias or high variance in your model.
A good article on Bias vs variances is at: http://scott.fortmann-roe.com/docs/BiasVariance.html
5. How do you ensure you’re not overfitting with a model?
Solution: This is a simple restatement of a fundamental problem in machine learning: the possibility
of overfitting training data and carrying the noise of that data through to the test set, thereby providing
poor generalizations.
There are three main methods to avoid overfitting:
1- Keep the model simpler: reduce variance by taking into account fewer variables and parameters,
thereby removing some of the noise in the training data.
2- Use regularization techniques (week 3 lecture) such as weight decay etc. that penalize certain model
parameters if they’re likely to cause over-fitting.
3- Use proper evaluation techniques such as k-folds cross-validation to identify if there is over-fitting.
6. We can regularize a regression model by introducing a ridge penalty as shown in the equation below:
L = 1
N
∑N
i=1 (y − ŷ)
2
+ λw>w
what happens if we tune λ by looking at the performance on the train set?
Solution: With maximum capacity to over fit the model and λ = 0 we will get the loss to zero (The
best possible loss). This is a trivial solution and is not the optimal one that would generalize well to
unseen data.
7. The breast cancer dataset is a standard machine learning dataset. It contains 9 attributes describing 286
women that have suffered and survived breast cancer and whether or not breast cancer recurred within 5
years. It is a binary classification problem. Of the 286 women, 201 did not suffer a recurrence of breast
cancer, leaving the remaining 85 that did. How would you handle an imbalanced dataset?
Solution: This is a classification task with around 70% of the data is in one class. That leads to
problems: an accuracy of 90% can be skewed if you have no predictive power on the other category of
data! Here are a few tactics to get over this:
• Collect more data if possible (difficult in this case).
• Select an appropriate performance metric: confusion matrix, precision, recall, F1, Kappa, ROC.
• Balancing by re-sampling (over sampling or under sampling).
• Generate synthetic samples.
• Penalized models: Imposes an additional cost on the model for making classification mistakes on
the minority class during training. These penalties can bias the model to pay more attention to
the minority class.
• Can consider other approaches like anomaly detection.
What’s important here is that you have a identified what damage an unbalanced dataset can cause,
and how to balance that. A good article on class imbalance: link
8. What validation technique would you use on a time series dataset?
Solution: Instead of using standard k-folds cross-validation, you have to pay attention to the fact that
a time series is not randomly distributed data — it is inherently ordered by chronological order. If a
pattern emerges in later time periods for example, your model may still pick up on it even if that effect
doesn’t hold in earlier years!
You’ll want to do something like forward chaining where you’ll be able to model on past data then look
at forward-facing data.
• fold 1 : training [1], test [2]
• fold 2 : training [1 2], test [3]
Page 2
http://archive.ics.uci.edu/ml/datasets/Breast+Cancer
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cohen_kappa_score.html
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3755824/
https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/
• fold 3 : training [1 2 3], test [4]
• fold 4 : training [1 2 3 4], test [5]
• fold 5 : training [1 2 3 4 5], test [6]
Sliding window might be another option.
Ref:
Burman, Prabir, Edmond Chow, and Deborah Nolan. “A Cross-Validatory Method for Dependent
Data.” Biometrika 81, no. 2 (1994): 351-58. Accessed July 10, 2020. doi:10.2307/2336965.
https://robjhyndman.com/papers/cv-wp.pdf
https://www.sciencedirect.com/science/article/abs/pii/S0304407600000300
Page 3