CS计算机代考程序代写 chain algorithm RMIT Classification: Trusted

RMIT Classification: Trusted
Logistics Regression
COSC 2673-2793 | Semester 1 2021 (Computational) Machine Learning
Image: Freepik.com

RMIT Classification: Trusted
Revision: Supervised Learning
The Task is an unknown target function: 𝐲 = 𝑓(𝐱)
Attributes of the task: 𝐱 Unknown function: 𝑓(𝐱) Output of the function: 𝐲
ML finds a Hypothesis, h, which is a function (or model) which approximates
the unknown target function
The hypothesis is often called a model
h∗(𝐱) ≈ 𝑓(𝐱)
COSC2673 | COSC2793 Week 3: Logistic Regression 2

RMIT Classification: Trusted
Revision: Supervised Learning
In supervised learning, the output is known: 𝑦 = 𝑓(𝐱)
Experience: Examples of input-output pairs
Task: Learns a model that maps input to desired output
Predict the output for new “unseen” inputs.
Performance: Error measure of how closely the hypothesis predicts the target output
Most typical of learning tasks
Two main types of supervised learning: Classification
Regression
3

Revision: Regression
RMIT Classification: Trusted
A form of Supervised learning
Experience: x(i),y(i) ni=1 x2Rm y2R
Hypothesis space:
• Linear: h✓(x)=✓0+✓1×1+✓2×2+···+✓mxm =✓>x
• Polynomial:
h ✓ ( x ) = ✓ 0 + ✓ 1 x 1 + ✓ 2 x 21 + · · · + ✓ i x m + ✓ j x 2m + · · ·
P

Loss function: J(✓) = 1 n h✓ x(i) y(i) 2n i=1
2
Optimization: Gradient decent
}
Repeat { ∂ θj=θj−α∂θjJ(θ) ∀j
4

RMIT Classification: Trusted
Classification
A type of supervised learning
COSC2673 | COSC2793 Week 3: Logistic Regression 5

RMIT Classification: Trusted
Classification
Instead of output are continuous/real valued, now they are discrete labels or classes
For example
Image recognition: cat/dog
Sentiment recognition: positive/neutral/negative Credit card transaction: fraudulent/legitimate Tumour: benign/malignant
Binary class: 𝑦 ∈ {0,1}, 𝑦 ∈ {+1, −1}
Tertiary (or n-ary) class: 𝑦 ∈ {0,1,2, … , 𝑘} or 𝑦 ∈ {cat, dog, … }
COSC2673 | COSC2793 Week 3: Logistic Regression 6

RMIT Classification: Trusted
Binary Classification
In a continuous space, for classification what is: The input?
The output?
1 0.8 0.6 0.4 0.2 0
0 0.5 1
1.5 2
2.5
COSC2673 | COSC2793 Week 3: Logistic Regression 7

RMIT Classification: Trusted
Binary Classification
Consider univariate and binary class case
Threshold classifier output h”(𝐱) at 0.5
◦ Ifh”(𝐱)≥0.5,predict𝑦=1 ◦ Ifh”(𝐱)<0.5,predict𝑦=0 1 0 0 0.5 1 1.5 2 2.5 8 RMIT Classification: Trusted Binary Classification Issues with using linear regression for this: For binary classification we have 𝑦 ∈ {0,1} But, h"(𝐱) can be larger than 1 or less than 0. Regression line doesn’t fit the classification problem very well 1 0 0 0.5 1 1.5 2 2.5 9 RMIT Classification: Trusted Hypothesis Space For Logistics Regression RMIT Classification: Trusted Binary Classification 1 0 0 0.5 1 1.5 2 2.5 g(z) = 1 1+ez 11 RMIT Classification: Trusted Sigmoid Function Introduce the sigmoid function also known as the logistic function Properties: Between 0 and 1 Continuous and smooth • cantakederivatives Approximates a step function g(z) = 1 1+ez 1 0.8 0.6 0.4 0.2 0 -10 -5 0 5 10 Z 12 g(Z) RMIT Classification: Trusted Logistic Regression Hypothesis Goal: perform binary classification based on linear model of attributes 0 ≤ h"(𝐱) ≤ 1, approximate step function Logistic Regression: h✓ (x) = g (✓0 + ✓1x1 + ✓2x2 + · · · + ✓mxm) = g ✓>x Function g is the sigmoid function
1
1+ez
g(z) =
COSC2673 | COSC2793 Week 3: Logistic Regression 13

RMIT Classification: Trusted
Logistic Regression Hypothesis
Logistic Regression Hypothesis space for univariate case:
h✓ (x) = g (✓0 + ✓1×1) g(z) = 1z
1+e
11 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2
00
-10 -5 0 5 10-10 -5 0 5 10
xx
COSC2673 | COSC2793 Week 3: Logistic Regression 14
h✓ (x)
h✓ (x)

RMIT Classification: Trusted
Logistic Regression Hypothesis
That is, we use the sigmoid function to transform our linear hypothesis into the logistics hypothesis. Hence for Logistic Regression:
h”(𝐱)=𝑔𝜃#𝐱 = 1 1+𝑒$”!𝐱
Importantly, we have 0 ≤ h”(𝐱) ≤ 1 for all inputs However, what actually is h”(𝐱)?
COSC2673 | COSC2793 Week 3: Logistic Regression 15

RMIT Classification: Trusted
Interpretation of Hypothesis
h”(𝐱) estimates the probability that 𝑦 = 1 for input 𝑥
Mathematically: h”(𝐱(‘)) = 𝑃 𝑦(‘) = 1|𝐱(‘), 𝜃 It is NOT the actual value of “𝑦”
For example, if h”(𝐱) = 0.8
It means a house with floor area, .75 units, has a 80% chance of having a price over $1mn.
1
0
0 0.5 1 1.5 2 2.5
COSC2673 | COSC2793 Week 3: Logistic Regression 16

RMIT Classification: Trusted
Notation
Probability Notation:
𝑃 𝐴 = 𝑎 𝐵 = 𝑏)
The probability that random variable A takes value 𝑎 Given that random variable B has taken value 𝑏.
17

RMIT Classification: Trusted
Interpretation of Hypothesis
To get probability of 𝑦 = 0, recall:
𝑃 𝑦=0|𝐱,𝜃 +𝑃 𝑦=1|𝐱,𝜃 =1
Therefore:
1
𝑃 𝑦=0|𝐱,𝜃 =1−𝑃 𝑦=1|𝐱,𝜃
𝑃𝑦=0|𝐱,𝜃 =1−h”(𝐱)
0
0 0.5 1 1.5 2 2.5
COSC2673 | COSC2793 Week 3: Logistic Regression 18

RMIT Classification: Trusted
Decision Boundary
The decision boundary is the point at which the binary classification changes:h”(𝐱) = 𝑔 𝜃#𝐱
Predict 𝑦 = 1 if h”(𝐱) ≥ 0.5 𝜃#𝐱 ≥ 0
Predict 𝑦 = 0 if h”(𝐱) < 0.5 𝜃#𝐱 < 0 Observe that only need to consult the weights, not the sigmoid! 1 0.8 0.6 0.4 0.2 0 -10 -5 0 5 10 1 0 0 0.5 1 1.5 2 2.5 =) )*+"#!𝐱 19 RMIT Classification: Trusted Decision Boundary 4.5 4 3.5 3 2.5 2 0 0.5 1 1.5 2 2.5 h"(𝐱) = 𝑔(𝜃, + 𝜃)𝑥) + 𝜃-𝑥-) Predict 𝑦 = 1 if 𝜃, + 𝜃)𝑥) + 𝜃-𝑥- ≥ 0 COSC2673 | COSC2793 Week 3: Logistic Regression 20 RMIT Classification: Trusted Non-linear Decision Boundaries h"(𝐱) = 𝑔(𝜃, + 𝜃)𝑥) + 𝜃-𝑥- + 𝜃.𝑥)- + 𝜃/𝑥-) Predict𝑦=1if𝜃, +𝜃)𝑥) +𝜃-𝑥- +𝜃.𝑥)- +𝜃/𝑥- ≥0 COSC2673 | COSC2793 Week 3: Logistic Regression 21 RMIT Classification: Trusted Loss Function For Logistics Regression RMIT Classification: Trusted Revision: Loss Function The loss function is used to find the best weights: 𝜃 Linear regression: J(✓) = 1 Pn h✓ X(i) y(i)2 2n i=1 L 􏰇hθ 􏰇X(i)􏰈 , y(i)􏰈 J(✓) = 1 Pn Lh✓ X(i),y(i)2 2n i=1 1 0 0 0.5 1 1.5 2 2.5 23 RMIT Classification: Trusted Loss Function 11 (') (') 𝐽(𝜃)=𝑛 ∑loss h" 𝐱 ,𝑦 '0) Squared loss: loss h" 𝐱(') , 𝑦(') = h" 𝐱(') − 𝑦(') Logistic Regression: Several ways to obtain the loss function, but we use the following intuitive approach This loss function can be written is a more general way: - COSC2673 | COSC2793 Week 3: Logistic Regression 24 𝐽(𝜃) Non-convex 𝐽(𝜃) Convex RMIT Classification: Trusted Loss Function Consider what happens if we substitute sigmoid function directly into the squared loss: * 1$ 1 (!) 𝐽(𝜃) = 2𝑛 ∑ %&!𝐱 − 𝑦 !"# 1+𝑒 𝜃𝜃 COSC2673 | COSC2793 Week 3: Logistic Regression 25 RMIT Classification: Trusted Loss Function 􏰇 􏰇 (i)􏰈 (i)􏰈 􏰱− log 􏰇hθ 􏰇X(i)􏰈􏰈 if y(i) = 1 Lhθ X ,y = −log􏰇1−hθ􏰇X(i)􏰈􏰈 ify(i)=0 COSC2673 | COSC2793 Week 3: Logistic Regression 26 RMIT Classification: Trusted Loss Function 􏰇 􏰇 (i)􏰈 (i)􏰈 􏰱− log 􏰇hθ 􏰇X(i)􏰈􏰈 if y(i) = 1 Lhθ X ,y = −log􏰇1−hθ􏰇X(i)􏰈􏰈 ify(i)=0 L 􏰇hθ 􏰇X(i)􏰈 , y(i)􏰈 = −y(i) log 􏰇hθ 􏰇X(i)􏰈􏰈 − 􏰇1 − y(i)􏰈 log 􏰇1 − hθ 􏰇X(i)􏰈􏰈 COSC2673 | COSC2793 Week 3: Logistic Regression 27 RMIT Classification: Trusted Logistic Regression Loss Function Now we come full-circle, and have a loss function for logistics regression! J(✓) = 1 Pn Lh✓ X(i),y(i) 2n i=1 J(✓)= 1 Pn y(i)logh✓X(i)1y(i)log1h✓X(i) 2n i=1 Find weights to: 𝑚𝑖𝑛𝐽(𝜃) " COSC2673 | COSC2793 Week 3: Logistic Regression 28 RMIT Classification: Trusted Prediction with Logistic Regression To make predictions: h"(𝐱) = 1 1+𝑒$"!𝐱 Predict 𝑦 = 1 if h"(𝐱) ≥ 0.5 Predict 𝑦 = 0 if h"(𝐱) < 0.5 COSC2673 | COSC2793 Week 3: Logistic Regression 29 RMIT Classification: Trusted Gradient Descent With Logistics Regression RMIT Classification: Trusted Logistic Regression Gradient Descent With a smooth loss function and a single minimum, P Gradient Descent is the same as before J(✓)= 1 n Lh✓X(i),y(i) 𝜃 =𝜃 −𝛼 𝛿 𝐽(𝜃) 2n i=1 + + 𝛿𝜃+ Where the partial derivatives are: 𝛿 1$ (!) (!)(!) 𝛿𝜃𝐽(𝜃)=𝑛∑ h& 𝐱 + !"# −𝑦 𝐱 Tutorial: Derive this. ◦ Requires repeated application of chain rule, and ◦ , h& 𝐱(!) =h& 𝐱(!) 1−h& 𝐱(!) 𝐱(!) ,&" ◦ where h" 𝐱(') is the sigmoid function 31 RMIT Classification: Trusted Other optimisation approaches For both linear and logistic regression, there are other optimisation methods in addition to gradient descent Outside the scope of this course but just to let you know, these include: • Solvingtheleastsquaresexactlyforlinearregression • Conjugategradient • BFGSalgorithm COSC2673 | COSC2793 Week 3: Logistic Regression 32 RMIT Classification: Trusted Multi-class Classification RMIT Classification: Trusted Binary vs Multi-class Classification Binary Classification Multi-class Classification 𝑥" 𝑥" 𝑥! 𝑥! COSC2673 | COSC2793 Week 3: Logistic Regression 34 One-vs-all RMIT Classification: Trusted 𝑥" Class 1: Class 2: Class 3: 𝑥$ 𝑥$ h "# 𝐱 = 𝑃 𝑦 = 𝑗 𝐱 , 𝜃 ∀ 𝑗 = 1 , 2 , 3 𝑥# 𝑥# 𝑥! 𝑥$ 𝑥# COSC2673 | COSC2793 Week 3: Logistic Regression 35 RMIT Classification: Trusted One-vs-all Train a logistic regression classifier h!" (𝐱) For each class 𝑗 To predict the probability that 𝑦 = 𝑗 Given a new input 𝐱(∗) with unknown class, predict its class to be: 𝑚𝑎𝑥h!" 𝐱(∗) " COSC2673 | COSC2793 Week 3: Logistic Regression 36 RMIT Classification: Trusted Issues & Assumptions Binary Classification Multi-class Classification 𝑥" 𝑥" 𝑥! 𝑥! COSC2673 | COSC2793 Week 3: Logistic Regression 37