LINEAR CLASSIFICATION
COMP2420/COMP6420 INTRODUCTION TO DATA MANAGEMENT, ANALYSIS AND SECURITY
WEEK 4 – LECTURE 2 Wednesday 16 March 2022
of Computing
Copyright By PowCoder代写 加微信 powcoder
College of Engineering and Computer Science
Credit: (previous course convenor)
HOUSEKEEPING
Lab-test (week 04)
Carried out in Week 4
Available on Wattle over 3 h period Wednesday of week 04 (1-4pm) (Wednesday 16 March)
Learning Outcomes
Describe what Linear Classification (LC) 01 is
Formulate the linear classification problem
Describe the learning and loss functions involved in LC
05 Apply logistic regression to LC
Describe the metrics used in evaluation performance
INTRODUCTION
What is linear classification?
• Using linear models to solve classification tasks
àDividing the feature space into a collection of regions labelled according to targeted classes, where the decision boundaries between those regions are linear.
Source: https://towardsdatascience.com/linear-classifiers-an-overview-e121135bd3bb 6
Problem Setup
How can we decide whether a sentiment expressed in a tweet is positive or negative?
Classification as Regression
Can we do this task using what we have learned in previous lectures?
Simple hack: Ignore that the output is categorical!
Suppose we have a binary problem, 𝑦 ∈ {−1,1}
Assuming the standard model used for regression:
Classification as Regression (contd..)
One dimensional example (input x is 1-dim)
The colors indicate labels (a blue plus denotes class -1, red circle class 1)
FORMULATION
Formulation (contd..)
Ourclassifierhastheform𝑓𝐗,𝛃 =𝐗𝛃 A reasonable decision rule is
𝑠𝑖𝑔𝑛 𝑓 𝐗,𝛃
What does this function look like?
Formulation (contd..)
This specifies a linear classifier: it has a linear boundary (hyperplane)
which separates the space into two “half- spaces”
Example in 1D
In 1D this is simply a threshold
Example in 2D
In 2D this is a line
Example in 3D
In 3D this is a plane
What about higher-dimensional spaces?
LEARNING AND LOSS FUNCTIONS
Learning Linear Classifiers
Learning consists in estimating a “good” decision boundary
What does “good” mean?
We need a criteria that tell us how to select the parameters.
Loss Functions
Classifying using a linear decision boundary reduces the data dimension to 1
𝑠𝑖𝑔𝑛 𝑓 𝐗,𝛃
What is the cost of being wrong?
Loss function: 𝐿 𝑦, 𝑡 is the loss incurred for predicting y when the correct answer is 𝑡 = ±1
For medical diagnosis: For a diabetes screening test is it better to have a diagnosis that incorrectly screens you diabetic or the one that incorrectly tells that you are not diabetic?
Loss Functions (contd..)
A possible loss function to minimize is the zero/one loss
𝐿𝑦𝐱,𝑡=00 if𝑦𝐱=𝑡 1 if𝑦𝐱≠𝑡
Is this minimization easy to do? Why?
Why 0-1 loss hard to optimize?
• It’s a combinatorial optimization problem
• Shown to be NP-Hard
• Loss function is non-smooth, non-
• Small changes in 𝛃 can change the loss
Loss Functions for Classification
• HingeLoss(𝑚𝑎𝑥 0,1−𝑡⋅𝑦 )
• Logistic Loss (𝑙𝑜𝑔 1 + 𝑒!”⋅$ )
• Many others
• Analysis of these loss functions needs deeper understanding of matrix algebra and convex optimization, so won’t be covered here
Some complex cases
What if movie predictions are used for rankings? Now the predicted ratings don’t matter, just the order that they imply.
In what order does a student prefer ANU, UNSW and University of Sydney?
Possibilities:
• 0-1 loss on the winner
• Permutation distance
• Accuracy of top K choices
Can we always separate the classes?
If we can separate the classes, the problem is linearly separable
Causes of non-perfect separation:
• Model is too simple
• Noise in the inputs (i.e., data attributes)
• Simple features that do not account for all variations
• Errors in data targets (mislabelings)
Should we make the model complex enough to have perfect separation in the training data?
Further reading on noise:
https://www.cse.fau.edu/~xqzhu/papers/AIR.Zhu.2004.Noise.pdf
How to evaluate how good my classifier is? How is it doing on dog vs no-dog?
Confusion Matrix
Predicted P
Predicted N
In the previous example:
Predicted P
Predicted N
How to evaluate how good my classifier is?
Recall: fraction of positives that are identified correctly
𝑅= 𝑇𝑃 = 𝑇𝑃
𝑇𝑃 + 𝐹𝑁 all groundtruth instances
Precision: fraction of positive identifications that are actually correct
𝑃= 𝑇𝑃 = 𝑇𝑃
𝑇𝑃 + 𝐹𝑃 all predicted
Confusion Matrix
Predicted P
Predicted N
Used for recall
In the previous example:
Used for accuracy
Predicted P
Predicted N
F1 score: harmonic mean of precision and recall
𝐹1=2 𝑃.𝑅 𝑃+𝑅
Accuracy: fraction of all identifications that are actually correct
Accuracy = 𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
Precision Recall Curve
𝑃= 𝑇𝑃 𝑇𝑃 + 𝐹𝑃
Threshold: chosen point to turn a probability score into one class or another
Average Precision (AP): area under the curve
𝑅= 𝑇𝑃 𝑇𝑃 + 𝐹𝑁
Precision-Recall Curve
• High P: Low FP rate
• High R: Low FN rate
• What happens when threshold is set to zero?
https://scikit- learn.org/stable/auto_examples/model_selection/plot_precision_recall.html#:~:text=The%20precision%2D recall%20curve%20shows,a%20low%20false%20negative%20rate.
Metrics vs Loss
• Metrics on a dataset is what we care about (performance)
• We typically cannot directly optimize for the metrics
• Our loss function should reflect the problem we are solving. We then hope it will yield models that will do well on our dataset
LOGISTIC REGRESSION
• Used for classification
• Can be binomial, ordinal or multinomial
• Uses a logistic function
• Predicts the probability of belonging to a particular class or category (instead of using a real-valued approach using signs)
Logistic function
Is S shaped (sigmoid curve), with the equation:
𝑓𝑡=1 1+𝑒67
Attribution:Logistic Curve 38
Logistic Curve
Wenotethat𝑓 𝑡 ∈ 0,1 ,∀𝑡 If we have
𝑡 = 𝛽8 + 𝛽9𝑥9 + 𝛽:𝑥: + ⋯ + 𝛽;𝑥;
𝑃𝑌=1=𝑓𝑡= 1
1 + 𝑒6 “=<#>#=⋯=<$>$
Here, 𝑃 𝑌 = 1 refers to the probability of the input 𝑋 belonging to the default class 𝑌 = 1
Log odds (logit) function
The odds of 𝑋 belonging to the class 𝑌 = 1 is given by:
= 𝑒“=<#>#=⋯=<$>$
Taking log on either side gives:
= 𝛽8 + 𝛽9𝑥9 + 𝛽:𝑥: + ⋯ + 𝛽;𝑥;
Live demo Live demo next by Mindika
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com