ISML-3: Regression
Dong Gong
University of Adelaide
Slides by Dong Gong and Lingqiao Liu
Outlines
University of Adelaide 2
• Introduction to regression
• Linear Regression
– Regression to scalar values
– Regression to vectors
• Regularized Regression
– Ridge regression
– Lasso
• Support Vector Regression
What is regression?
University of Adelaide 3
• Review
– Types of Machine Learning?
What is regression?
University of Adelaide 4
• Supervised learning:
– Known at the training stage (x, y)
– Predict unknown y for x at the test stage
• Classification
– Y is a discrete variable
• Regression
– Y is a continuous variable
What is regression?
University of Adelaide 5
• Supervised learning:
– Known at the training stage (x, y)
– Predict unknown y for x at the test stage
• Classification
– Y is a discrete variable
• Regression
– Y is a continuous variable
Example of Regression Tasks
University of Adelaide 6
Learning to predict housing price y
ML
Method
Regression vs Classification
7
Practical workflow
University of Adelaide 8
Practical workflow
University of Adelaide 9
For some methods, we
also need to perform
normalization
Example of Regression Tasks
University of Adelaide 10
Less obvious task: Image processing
● Predicting pixel values from observed pixels
Example of Regression Tasks
University of Adelaide 11
Less obvious task: crowd counting
Example of Regression Tasks
University of Adelaide 12
Example of Regression Tasks
University of Adelaide 13
Less obvious task: crowd counting
Example of Regression Tasks
University of Adelaide 14
Less obvious task: pose prediction
● Predicting the joint positions of skelton
● Prediction the future from the past
?{(cx, cy, cz)}
Example of Regression Tasks
University of Adelaide 15
Less obvious task: pose prediction
● Predicting the joint positions of skelton
● Prediction the future from the past
?
Example of Regression Tasks
University of Adelaide 16
Less obvious task: generating sounds
Example of Regression Tasks
University of Adelaide 17
Less obvious task: generating sounds
Example of Regression Tasks
University of Adelaide 18
Less obvious task: generating sounds
(auto-regressive model)
Regression framework
University of Adelaide 19
• Goal: Fitting what from what?
– Fitting the scalar y from input attributes x.
• Model: How to fit? An assumed form of the fitting function f().
– Linear model
– Nonlinear model
• Loss: how to measure the fitting error? Or additional
objectives?
– E.g. Mean squared error (MSE) loss
Linear regression
University of Adelaide 20
• Goal:
– Fit the scalar value y
• Model:
• Loss function (MSE):
Linear regression
University of Adelaide 21
• Loss for N training samples:
• Sum (or mean) of the MSE on N samples
Linear regression
University of Adelaide 22
• Solution:
• Minimizing the loss to solving for the solution w*
• Solving for w:
• Optimal achieved with the 1st order derivative being zero.
Linear regression
University of Adelaide 23
• Extension to vector outputs (multiple scalars)
• Solution in a similar format:
Classification as Regression
University of Adelaide 24
• Vector encoding of classification target
• Classification as regression:
• Directly fitting the values of the encoded target.
1
-1
-1
.
.
-1
Discussion: What is the drawback of this solution?
Classification as Regression
University of Adelaide 25
• Limitation:
– Put unnecessary requirements to the predicted output
– May increase the fitting difficulty and lead to bad training result
• But why it is commonly used in practice?
– Close-form solution, less storage for low-dimensional data
– Quick update for incremental learning, distributed learning
Classification as Regression
University of Adelaide 26
• Quick update for incremental learning, distributed
learning
– Rewriting X into two parts X1 and X2 in the distributed
learning view.
Classification as Regression
University of Adelaide 27
• Discussion: Simple distributed learning
– Consider the case: N = 10^9, d = 1000, c = 10
– Data are equally distributed in two servers
– Communication is needed between servers
– Cost of sending raw data vs sending and ?
• Regression as Classification?
– Encoding continuous variables as discrete variables
Regularized linear regression model
University of Adelaide 28
• Why regularization?
– Avoid overfitting
– Controlling the complexity of the model
– Enforce certain property of solution
– Handling the noise in the data
– Handling redundant features
p-Norm
University of Adelaide 29
(Consider d = 2)
Also be caused as Lp-norm.
• Using squared L2-norm as the regularizer.
• Hyperparameter controls the strength of the
regularization.
• Minimizing regularizer together with the error term.
• Forcing w to be small (in the measurement of L2-norm)
Ridge regression
University of Adelaide 30
• Using squared L2-norm as the regularizer.
Ridge regression
University of Adelaide 31
In plain English, it measures the sensitivity of the regressor w.r.t the perturbation
of inputs. Minimizing it reflects our expectation of a smooth predictor.
• Using squared L2-norm as the regularizer.
• A probabilistic interpretation of L2-norm regularizer:
• Interpret variables, including w, as random variable
• Assume that each element w is centred around zero.
Ridge regression
University of Adelaide 32
Ridge regression: Solution
University of Adelaide 33
Practice: Derive the solution of ridge regression
Ridge regression: Solution
University of Adelaide 34
Solution
Compare with the solution of linear regression
Discussion
University of Adelaide 35
• An issue about the linear regression solution
What if is not invertible?
• It means there are multiple solutions for w to achieve
minimal
Discussion
University of Adelaide 36
• It means there are multiple solutions for w to achieve
minimal
• Adding regularization makes it always invertible.
• It essentially provides additional criterion for choosing
optimal solution among multiple equivalent solutions of
the first term.
Lasso
University of Adelaide 37
• L1 norm encourages sparse solution. This could be useful
for understanding the impact of various factors, e.g.
perform feature selection.
• Sometimes it can lead to an improved performance since it
can suppressing noisy factors.
• Unfortunately, it does not have a close-form solution
Sparse vs Dense solution
University of Adelaide 38
• Sparse solution: most parameters are 0
• Why l1 norm enforces sparsity?
• Let’s consider an example in 2D
is an ellipsoid equation. All w leading to the same z will form an
ellipsoid in the parameter space.
Sparse vs Dense solution
University of Adelaide 39
Support vector regression
University of Adelaide 40
• Key idea: if the fitting error is
already small enough, we do
not need to make it too
small.
• Finding the model leading to
small enough, but not “too
small,” error.
A different way of measuring
the fitting error
Support vector regression: hard margin
University of Adelaide 41
• Key idea: if the fitting error is already small enough, we do
not need to make it too small.
Support vector regression: hard margin
University of Adelaide 42
• Formulation
Support vector regression: soft-margin
University of Adelaide 43
Support vector regression
• Hinge loss view of soft-margin SVM:
– Decision value , in binary case, we hope this value
should be a large positive number for positive class and large
negative number for negative class
– If the decision value is larger than a given value (1 in this case),
we do not care if it can be larger (Don’t care)
– We only penalize the case that the decision value is not
sufficiently large (or small)
University of Adelaide 44
Support vector regression
• SVR in the hinge loss view:
– Decision value , in regression, we hope this value is
close to the ground truth
– But if the fitting error is smaller than a given value, we do not
care if it can be smaller or whether it can be exactly same to the
value in training data.
– We only penalize the case that the fitting error is not sufficiently
small.
University of Adelaide 45
Support vector regression
University of Adelaide 46
Dual form of SVR – similar to SVM
University of Adelaide 47
Recap and Summary
• Regression problem in Machine Learning
• Linear regression, to scalar and to vectors
• Regularization for regression models
• Support Vector Regression
University of Adelaide 48