CS计算机代考程序代写 ISML-3: Regression

ISML-3: Regression

Dong Gong
University of Adelaide

Slides by Dong Gong and Lingqiao Liu

Outlines

University of Adelaide 2

• Introduction to regression
• Linear Regression

– Regression to scalar values
– Regression to vectors

• Regularized Regression
– Ridge regression
– Lasso

• Support Vector Regression

What is regression?

University of Adelaide 3

• Review
– Types of Machine Learning?

What is regression?

University of Adelaide 4

• Supervised learning:
– Known at the training stage (x, y)
– Predict unknown y for x at the test stage

• Classification
– Y is a discrete variable

• Regression
– Y is a continuous variable

What is regression?

University of Adelaide 5

• Supervised learning:
– Known at the training stage (x, y)
– Predict unknown y for x at the test stage

• Classification
– Y is a discrete variable

• Regression
– Y is a continuous variable

Example of Regression Tasks

University of Adelaide 6

Learning to predict housing price y

ML
Method

Regression vs Classification

7

Practical workflow

University of Adelaide 8

Practical workflow

University of Adelaide 9

For some methods, we
also need to perform
normalization

Example of Regression Tasks

University of Adelaide 10

Less obvious task: Image processing
● Predicting pixel values from observed pixels

Example of Regression Tasks

University of Adelaide 11

Less obvious task: crowd counting

Example of Regression Tasks

University of Adelaide 12

Example of Regression Tasks

University of Adelaide 13

Less obvious task: crowd counting

Example of Regression Tasks

University of Adelaide 14

Less obvious task: pose prediction
● Predicting the joint positions of skelton
● Prediction the future from the past

?{(cx, cy, cz)}

Example of Regression Tasks

University of Adelaide 15

Less obvious task: pose prediction
● Predicting the joint positions of skelton
● Prediction the future from the past

?

Example of Regression Tasks

University of Adelaide 16

Less obvious task: generating sounds

Example of Regression Tasks

University of Adelaide 17

Less obvious task: generating sounds

Example of Regression Tasks

University of Adelaide 18

Less obvious task: generating sounds
(auto-regressive model)

Regression framework

University of Adelaide 19

• Goal: Fitting what from what?
– Fitting the scalar y from input attributes x.

• Model: How to fit? An assumed form of the fitting function f().
– Linear model
– Nonlinear model

• Loss: how to measure the fitting error? Or additional
objectives?
– E.g. Mean squared error (MSE) loss

Linear regression

University of Adelaide 20

• Goal:
– Fit the scalar value y

• Model:

• Loss function (MSE):

Linear regression

University of Adelaide 21

• Loss for N training samples:
• Sum (or mean) of the MSE on N samples

Linear regression

University of Adelaide 22

• Solution:
• Minimizing the loss to solving for the solution w*

• Solving for w:
• Optimal achieved with the 1st order derivative being zero.

Linear regression

University of Adelaide 23

• Extension to vector outputs (multiple scalars)

• Solution in a similar format:

Classification as Regression

University of Adelaide 24

• Vector encoding of classification target
• Classification as regression:

• Directly fitting the values of the encoded target.

1

-1

-1

.

.

-1

Discussion: What is the drawback of this solution?

Classification as Regression

University of Adelaide 25

• Limitation:
– Put unnecessary requirements to the predicted output
– May increase the fitting difficulty and lead to bad training result

• But why it is commonly used in practice?
– Close-form solution, less storage for low-dimensional data
– Quick update for incremental learning, distributed learning

Classification as Regression

University of Adelaide 26

• Quick update for incremental learning, distributed
learning
– Rewriting X into two parts X1 and X2 in the distributed

learning view.

Classification as Regression

University of Adelaide 27

• Discussion: Simple distributed learning
– Consider the case: N = 10^9, d = 1000, c = 10
– Data are equally distributed in two servers
– Communication is needed between servers
– Cost of sending raw data vs sending and ?

• Regression as Classification?
– Encoding continuous variables as discrete variables

Regularized linear regression model

University of Adelaide 28

• Why regularization?
– Avoid overfitting

– Controlling the complexity of the model
– Enforce certain property of solution
– Handling the noise in the data
– Handling redundant features

p-Norm

University of Adelaide 29

(Consider d = 2)

Also be caused as Lp-norm.

• Using squared L2-norm as the regularizer.

• Hyperparameter controls the strength of the
regularization.

• Minimizing regularizer together with the error term.
• Forcing w to be small (in the measurement of L2-norm)

Ridge regression

University of Adelaide 30

• Using squared L2-norm as the regularizer.

Ridge regression

University of Adelaide 31

In plain English, it measures the sensitivity of the regressor w.r.t the perturbation
of inputs. Minimizing it reflects our expectation of a smooth predictor.

• Using squared L2-norm as the regularizer.

• A probabilistic interpretation of L2-norm regularizer:
• Interpret variables, including w, as random variable
• Assume that each element w is centred around zero.

Ridge regression

University of Adelaide 32

Ridge regression: Solution

University of Adelaide 33

Practice: Derive the solution of ridge regression

Ridge regression: Solution

University of Adelaide 34

Solution

Compare with the solution of linear regression

Discussion

University of Adelaide 35

• An issue about the linear regression solution

What if is not invertible?

• It means there are multiple solutions for w to achieve
minimal

Discussion

University of Adelaide 36

• It means there are multiple solutions for w to achieve
minimal

• Adding regularization makes it always invertible.
• It essentially provides additional criterion for choosing

optimal solution among multiple equivalent solutions of
the first term.

Lasso

University of Adelaide 37

• L1 norm encourages sparse solution. This could be useful
for understanding the impact of various factors, e.g.
perform feature selection.

• Sometimes it can lead to an improved performance since it
can suppressing noisy factors.

• Unfortunately, it does not have a close-form solution

Sparse vs Dense solution

University of Adelaide 38

• Sparse solution: most parameters are 0
• Why l1 norm enforces sparsity?

• Let’s consider an example in 2D

is an ellipsoid equation. All w leading to the same z will form an
ellipsoid in the parameter space.

Sparse vs Dense solution

University of Adelaide 39

Support vector regression

University of Adelaide 40

• Key idea: if the fitting error is
already small enough, we do
not need to make it too
small.

• Finding the model leading to
small enough, but not “too
small,” error.

A different way of measuring
the fitting error

Support vector regression: hard margin

University of Adelaide 41

• Key idea: if the fitting error is already small enough, we do
not need to make it too small.

Support vector regression: hard margin

University of Adelaide 42

• Formulation

Support vector regression: soft-margin

University of Adelaide 43

Support vector regression

• Hinge loss view of soft-margin SVM:

– Decision value , in binary case, we hope this value
should be a large positive number for positive class and large
negative number for negative class

– If the decision value is larger than a given value (1 in this case),
we do not care if it can be larger (Don’t care)

– We only penalize the case that the decision value is not
sufficiently large (or small)

University of Adelaide 44

Support vector regression

• SVR in the hinge loss view:

– Decision value , in regression, we hope this value is
close to the ground truth

– But if the fitting error is smaller than a given value, we do not
care if it can be smaller or whether it can be exactly same to the
value in training data.

– We only penalize the case that the fitting error is not sufficiently
small.

University of Adelaide 45

Support vector regression

University of Adelaide 46

Dual form of SVR – similar to SVM

University of Adelaide 47

Recap and Summary

• Regression problem in Machine Learning
• Linear regression, to scalar and to vectors
• Regularization for regression models
• Support Vector Regression

University of Adelaide 48