RMIT Classification: Trusted
Logistics Regression
COSC 2673-2793 | Semester 1 2021 (Computational) Machine Learning
Image: Freepik.com
RMIT Classification: Trusted
Revision: Supervised Learning
The Task is an unknown target function: 𝐲 = 𝑓(𝐱)
Attributes of the task: 𝐱 Unknown function: 𝑓(𝐱) Output of the function: 𝐲
ML finds a Hypothesis, h, which is a function (or model) which approximates
the unknown target function
The hypothesis is often called a model
h∗(𝐱) ≈ 𝑓(𝐱)
COSC2673 | COSC2793 Week 3: Logistic Regression 2
RMIT Classification: Trusted
Revision: Supervised Learning
In supervised learning, the output is known: 𝑦 = 𝑓(𝐱)
Experience: Examples of input-output pairs
Task: Learns a model that maps input to desired output
Predict the output for new “unseen” inputs.
Performance: Error measure of how closely the hypothesis predicts the target output
Most typical of learning tasks
Two main types of supervised learning: Classification
Regression
3
Revision: Regression
RMIT Classification: Trusted
A form of Supervised learning
Experience: x(i),y(i) ni=1 x2Rm y2R
Hypothesis space:
• Linear: h✓(x)=✓0+✓1×1+✓2×2+···+✓mxm =✓>x
• Polynomial:
h ✓ ( x ) = ✓ 0 + ✓ 1 x 1 + ✓ 2 x 21 + · · · + ✓ i x m + ✓ j x 2m + · · ·
P
Loss function: J(✓) = 1 n h✓ x(i) y(i) 2n i=1
2
Optimization: Gradient decent
}
Repeat { ∂ θj=θj−α∂θjJ(θ) ∀j
4
RMIT Classification: Trusted
Classification
A type of supervised learning
COSC2673 | COSC2793 Week 3: Logistic Regression 5
RMIT Classification: Trusted
Classification
Instead of output are continuous/real valued, now they are discrete labels or classes
For example
Image recognition: cat/dog
Sentiment recognition: positive/neutral/negative Credit card transaction: fraudulent/legitimate Tumour: benign/malignant
Binary class: 𝑦 ∈ {0,1}, 𝑦 ∈ {+1, −1}
Tertiary (or n-ary) class: 𝑦 ∈ {0,1,2, … , 𝑘} or 𝑦 ∈ {cat, dog, … }
COSC2673 | COSC2793 Week 3: Logistic Regression 6
RMIT Classification: Trusted
Binary Classification
In a continuous space, for classification what is: The input?
The output?
1 0.8 0.6 0.4 0.2 0
0 0.5 1
1.5 2
2.5
COSC2673 | COSC2793 Week 3: Logistic Regression 7
RMIT Classification: Trusted
Binary Classification
Consider univariate and binary class case
Threshold classifier output h”(𝐱) at 0.5
◦ Ifh”(𝐱)≥0.5,predict𝑦=1 ◦ Ifh”(𝐱)<0.5,predict𝑦=0
1
0
0 0.5 1 1.5 2 2.5
8
RMIT Classification: Trusted
Binary Classification
Issues with using linear regression for this: For binary classification we have 𝑦 ∈ {0,1}
But, h"(𝐱) can be larger than 1 or less than 0. Regression line doesn’t fit the classification problem very
well
1
0
0 0.5 1 1.5 2 2.5
9
RMIT Classification: Trusted
Hypothesis Space
For Logistics Regression
RMIT Classification: Trusted
Binary Classification
1
0
0 0.5 1 1.5 2 2.5
g(z) =
1
1+e z
11
RMIT Classification: Trusted
Sigmoid Function
Introduce the sigmoid function also known as the logistic function
Properties:
Between 0 and 1 Continuous and smooth
• cantakederivatives Approximates a step function
g(z) = 1 1+e z
1 0.8 0.6 0.4 0.2
0
-10 -5 0 5 10
Z
12
g(Z)
RMIT Classification: Trusted
Logistic Regression Hypothesis
Goal: perform binary classification based on linear model of attributes
0 ≤ h"(𝐱) ≤ 1, approximate step function Logistic Regression:
h✓ (x) = g (✓0 + ✓1x1 + ✓2x2 + · · · + ✓mxm) = g ✓>x Function g is the sigmoid function
1
1+e z
g(z) =
COSC2673 | COSC2793 Week 3: Logistic Regression 13
RMIT Classification: Trusted
Logistic Regression Hypothesis
Logistic Regression Hypothesis space for univariate case:
h✓ (x) = g (✓0 + ✓1×1) g(z) = 1 z
1+e
11 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2
00
-10 -5 0 5 10-10 -5 0 5 10
xx
COSC2673 | COSC2793 Week 3: Logistic Regression 14
h✓ (x)
h✓ (x)
RMIT Classification: Trusted
Logistic Regression Hypothesis
That is, we use the sigmoid function to transform our linear hypothesis into the logistics hypothesis. Hence for Logistic Regression:
h”(𝐱)=𝑔𝜃#𝐱 = 1 1+𝑒$”!𝐱
Importantly, we have 0 ≤ h”(𝐱) ≤ 1 for all inputs However, what actually is h”(𝐱)?
COSC2673 | COSC2793 Week 3: Logistic Regression 15
RMIT Classification: Trusted
Interpretation of Hypothesis
h”(𝐱) estimates the probability that 𝑦 = 1 for input 𝑥
Mathematically: h”(𝐱(‘)) = 𝑃 𝑦(‘) = 1|𝐱(‘), 𝜃 It is NOT the actual value of “𝑦”
For example, if h”(𝐱) = 0.8
It means a house with floor area, .75 units, has a 80% chance of having a price over $1mn.
1
0
0 0.5 1 1.5 2 2.5
COSC2673 | COSC2793 Week 3: Logistic Regression 16
RMIT Classification: Trusted
Notation
Probability Notation:
𝑃 𝐴 = 𝑎 𝐵 = 𝑏)
The probability that random variable A takes value 𝑎 Given that random variable B has taken value 𝑏.
17
RMIT Classification: Trusted
Interpretation of Hypothesis
To get probability of 𝑦 = 0, recall:
𝑃 𝑦=0|𝐱,𝜃 +𝑃 𝑦=1|𝐱,𝜃 =1
Therefore:
1
𝑃 𝑦=0|𝐱,𝜃 =1−𝑃 𝑦=1|𝐱,𝜃
𝑃𝑦=0|𝐱,𝜃 =1−h”(𝐱)
0
0 0.5 1 1.5 2 2.5
COSC2673 | COSC2793 Week 3: Logistic Regression 18
RMIT Classification: Trusted
Decision Boundary
The decision boundary is the point at which the binary classification changes:h”(𝐱) = 𝑔 𝜃#𝐱
Predict 𝑦 = 1 if h”(𝐱) ≥ 0.5 𝜃#𝐱 ≥ 0
Predict 𝑦 = 0 if h”(𝐱) < 0.5 𝜃#𝐱 < 0
Observe that only need to consult the weights, not the sigmoid!
1 0.8 0.6 0.4 0.2
0
-10 -5 0 5 10
1
0
0 0.5 1 1.5 2 2.5
=) )*+"#!𝐱
19
RMIT Classification: Trusted
Decision Boundary
4.5 4 3.5 3 2.5 2
0 0.5 1 1.5 2 2.5
h"(𝐱) = 𝑔(𝜃, + 𝜃)𝑥) + 𝜃-𝑥-) Predict 𝑦 = 1 if 𝜃, + 𝜃)𝑥) + 𝜃-𝑥- ≥ 0
COSC2673 | COSC2793 Week 3: Logistic Regression 20
RMIT Classification: Trusted
Non-linear Decision Boundaries
h"(𝐱) = 𝑔(𝜃, + 𝜃)𝑥) + 𝜃-𝑥- + 𝜃.𝑥)- + 𝜃/𝑥-) Predict𝑦=1if𝜃, +𝜃)𝑥) +𝜃-𝑥- +𝜃.𝑥)- +𝜃/𝑥- ≥0
COSC2673 | COSC2793 Week 3: Logistic Regression 21
RMIT Classification: Trusted
Loss Function
For Logistics Regression
RMIT Classification: Trusted
Revision: Loss Function
The loss function is used to find the best weights: 𝜃
Linear regression: J(✓) = 1 Pn h✓ X(i) y(i) 2
2n i=1
L hθ X(i) , y(i)
J(✓) = 1 Pn L h✓ X(i) ,y(i) 2 2n i=1
1
0
0 0.5 1 1.5 2 2.5
23
RMIT Classification: Trusted
Loss Function
11 (') (') 𝐽(𝜃)=𝑛 ∑loss h" 𝐱 ,𝑦
'0)
Squared loss: loss h" 𝐱(') , 𝑦(') = h" 𝐱(') − 𝑦(')
Logistic Regression:
Several ways to obtain the loss function, but we use the following intuitive approach
This loss function can be written is a more general way:
-
COSC2673 | COSC2793 Week 3: Logistic Regression 24
𝐽(𝜃)
Non-convex
𝐽(𝜃)
Convex
RMIT Classification: Trusted
Loss Function
Consider what happens if we substitute sigmoid function directly into the squared loss:
*
1$ 1 (!) 𝐽(𝜃) = 2𝑛 ∑ %&!𝐱 − 𝑦
!"# 1+𝑒
𝜃𝜃
COSC2673 | COSC2793 Week 3: Logistic Regression 25
RMIT Classification: Trusted
Loss Function
(i) (i) − log hθ X(i) if y(i) = 1 Lhθ X ,y = −log1−hθX(i) ify(i)=0
COSC2673 | COSC2793 Week 3: Logistic Regression 26
RMIT Classification: Trusted
Loss Function
(i) (i) − log hθ X(i) if y(i) = 1 Lhθ X ,y = −log1−hθX(i) ify(i)=0
L hθ X(i) , y(i) = −y(i) log hθ X(i) − 1 − y(i) log 1 − hθ X(i)
COSC2673 | COSC2793 Week 3: Logistic Regression 27
RMIT Classification: Trusted
Logistic Regression Loss Function
Now we come full-circle, and have a loss function for logistics regression!
J(✓) = 1 Pn L h✓ X(i) ,y(i) 2n i=1
J(✓)= 1 Pn y(i)log h✓ X(i) 1 y(i) log 1 h✓ X(i) 2n i=1
Find weights to: 𝑚𝑖𝑛𝐽(𝜃) "
COSC2673 | COSC2793 Week 3: Logistic Regression 28
RMIT Classification: Trusted
Prediction with Logistic Regression
To make predictions:
h"(𝐱) = 1 1+𝑒$"!𝐱
Predict 𝑦 = 1 if h"(𝐱) ≥ 0.5 Predict 𝑦 = 0 if h"(𝐱) < 0.5
COSC2673 | COSC2793 Week 3: Logistic Regression 29
RMIT Classification: Trusted
Gradient Descent
With Logistics Regression
RMIT Classification: Trusted
Logistic Regression Gradient Descent
With a smooth loss function and a single minimum, P Gradient Descent is the same as before
J(✓)= 1 n L h✓ X(i) ,y(i) 𝜃 =𝜃 −𝛼 𝛿 𝐽(𝜃) 2n i=1 + + 𝛿𝜃+
Where the partial derivatives are:
𝛿 1$ (!) (!)(!)
𝛿𝜃𝐽(𝜃)=𝑛∑ h& 𝐱 + !"#
−𝑦 𝐱
Tutorial: Derive this.
◦ Requires repeated application of chain rule, and
◦ , h& 𝐱(!) =h& 𝐱(!) 1−h& 𝐱(!) 𝐱(!) ,&"
◦ where h" 𝐱(') is the sigmoid function
31
RMIT Classification: Trusted
Other optimisation approaches
For both linear and logistic regression, there are other optimisation methods in addition to gradient descent
Outside the scope of this course but just to let you know, these include: • Solvingtheleastsquaresexactlyforlinearregression
• Conjugategradient
• BFGSalgorithm
COSC2673 | COSC2793 Week 3: Logistic Regression 32
RMIT Classification: Trusted
Multi-class Classification
RMIT Classification: Trusted
Binary vs Multi-class Classification
Binary Classification
Multi-class Classification
𝑥" 𝑥"
𝑥!
𝑥!
COSC2673 | COSC2793 Week 3: Logistic Regression 34
One-vs-all
RMIT Classification: Trusted
𝑥"
Class 1: Class 2: Class 3:
𝑥$
𝑥$
h "# 𝐱 = 𝑃 𝑦 = 𝑗 𝐱 , 𝜃 ∀ 𝑗 = 1 , 2 , 3
𝑥#
𝑥#
𝑥!
𝑥$
𝑥#
COSC2673 | COSC2793 Week 3: Logistic Regression 35
RMIT Classification: Trusted
One-vs-all
Train a logistic regression classifier h!" (𝐱) For each class 𝑗
To predict the probability that 𝑦 = 𝑗
Given a new input 𝐱(∗) with unknown class, predict its class to be:
𝑚𝑎𝑥h!" 𝐱(∗) "
COSC2673 | COSC2793 Week 3: Logistic Regression 36
RMIT Classification: Trusted
Issues & Assumptions
Binary Classification
Multi-class Classification
𝑥" 𝑥"
𝑥!
𝑥!
COSC2673 | COSC2793 Week 3: Logistic Regression 37