CS代写 COMP2420/COMP6420 INTRODUCTION TO DATA MANAGEMENT, ANALYSIS AND SECURITY

LINEAR CLASSIFICATION
COMP2420/COMP6420 INTRODUCTION TO DATA MANAGEMENT, ANALYSIS AND SECURITY
WEEK 4 – LECTURE 2 Wednesday 16 March 2022
of Computing

Copyright By PowCoder代写 加微信 powcoder

College of Engineering and Computer Science
Credit: (previous course convenor)

HOUSEKEEPING

Lab-test (week 04)
Carried out in Week 4
Available on Wattle over 3 h period Wednesday of week 04 (1-4pm) (Wednesday 16 March)

Learning Outcomes
Describe what Linear Classification (LC) 01 is
Formulate the linear classification problem
Describe the learning and loss functions involved in LC
05 Apply logistic regression to LC
Describe the metrics used in evaluation performance

INTRODUCTION

What is linear classification?
• Using linear models to solve classification tasks
àDividing the feature space into a collection of regions labelled according to targeted classes, where the decision boundaries between those regions are linear.
Source: https://towardsdatascience.com/linear-classifiers-an-overview-e121135bd3bb 6

Problem Setup
How can we decide whether a sentiment expressed in a tweet is positive or negative?

Classification as Regression
Can we do this task using what we have learned in previous lectures?
Simple hack: Ignore that the output is categorical!
Suppose we have a binary problem, 𝑦 ∈ {−1,1}
Assuming the standard model used for regression:

Classification as Regression (contd..)
One dimensional example (input x is 1-dim)
The colors indicate labels (a blue plus denotes class -1, red circle class 1)

FORMULATION

Formulation (contd..)
Ourclassifierhastheform𝑓𝐗,𝛃 =𝐗𝛃 A reasonable decision rule is
𝑠𝑖𝑔𝑛 𝑓 𝐗,𝛃
What does this function look like?

Formulation (contd..)
This specifies a linear classifier: it has a linear boundary (hyperplane)
which separates the space into two “half- spaces”

Example in 1D
In 1D this is simply a threshold

Example in 2D
In 2D this is a line

Example in 3D
In 3D this is a plane

What about higher-dimensional spaces?

LEARNING AND LOSS FUNCTIONS

Learning Linear Classifiers
Learning consists in estimating a “good” decision boundary
What does “good” mean?
We need a criteria that tell us how to select the parameters.

Loss Functions
Classifying using a linear decision boundary reduces the data dimension to 1
𝑠𝑖𝑔𝑛 𝑓 𝐗,𝛃
What is the cost of being wrong?
Loss function: 𝐿 𝑦, 𝑡 is the loss incurred for predicting y when the correct answer is 𝑡 = ±1

For medical diagnosis: For a diabetes screening test is it better to have a diagnosis that incorrectly screens you diabetic or the one that incorrectly tells that you are not diabetic?

Loss Functions (contd..)
A possible loss function to minimize is the zero/one loss
𝐿𝑦𝐱,𝑡=00 if𝑦𝐱=𝑡 1 if𝑦𝐱≠𝑡
Is this minimization easy to do? Why?

Why 0-1 loss hard to optimize?
• It’s a combinatorial optimization problem
• Shown to be NP-Hard
• Loss function is non-smooth, non-
• Small changes in 𝛃 can change the loss

Loss Functions for Classification
• HingeLoss(𝑚𝑎𝑥 0,1−𝑡⋅𝑦 )
• Logistic Loss (𝑙𝑜𝑔 1 + 𝑒!”⋅$ )
• Many others
• Analysis of these loss functions needs deeper understanding of matrix algebra and convex optimization, so won’t be covered here

Some complex cases
What if movie predictions are used for rankings? Now the predicted ratings don’t matter, just the order that they imply.
In what order does a student prefer ANU, UNSW and University of Sydney?
Possibilities:
• 0-1 loss on the winner
• Permutation distance
• Accuracy of top K choices

Can we always separate the classes?
If we can separate the classes, the problem is linearly separable

Causes of non-perfect separation:
• Model is too simple
• Noise in the inputs (i.e., data attributes)
• Simple features that do not account for all variations
• Errors in data targets (mislabelings)
Should we make the model complex enough to have perfect separation in the training data?
Further reading on noise:
https://www.cse.fau.edu/~xqzhu/papers/AIR.Zhu.2004.Noise.pdf

How to evaluate how good my classifier is? How is it doing on dog vs no-dog?

Confusion Matrix
Predicted P
Predicted N
In the previous example:
Predicted P
Predicted N

How to evaluate how good my classifier is?
Recall: fraction of positives that are identified correctly
𝑅= 𝑇𝑃 = 𝑇𝑃
𝑇𝑃 + 𝐹𝑁 all groundtruth instances
Precision: fraction of positive identifications that are actually correct
𝑃= 𝑇𝑃 = 𝑇𝑃
𝑇𝑃 + 𝐹𝑃 all predicted

Confusion Matrix
Predicted P
Predicted N
Used for recall
In the previous example:
Used for accuracy
Predicted P
Predicted N

F1 score: harmonic mean of precision and recall
𝐹1=2 𝑃.𝑅 𝑃+𝑅
Accuracy: fraction of all identifications that are actually correct
Accuracy = 𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁

Precision Recall Curve
𝑃= 𝑇𝑃 𝑇𝑃 + 𝐹𝑃
Threshold: chosen point to turn a probability score into one class or another
Average Precision (AP): area under the curve
𝑅= 𝑇𝑃 𝑇𝑃 + 𝐹𝑁

Precision-Recall Curve
• High P: Low FP rate
• High R: Low FN rate
• What happens when threshold is set to zero?
https://scikit- learn.org/stable/auto_examples/model_selection/plot_precision_recall.html#:~:text=The%20precision%2D recall%20curve%20shows,a%20low%20false%20negative%20rate.

Metrics vs Loss
• Metrics on a dataset is what we care about (performance)
• We typically cannot directly optimize for the metrics
• Our loss function should reflect the problem we are solving. We then hope it will yield models that will do well on our dataset

LOGISTIC REGRESSION

• Used for classification
• Can be binomial, ordinal or multinomial
• Uses a logistic function
• Predicts the probability of belonging to a particular class or category (instead of using a real-valued approach using signs)

Logistic function
Is S shaped (sigmoid curve), with the equation:
𝑓𝑡=1 1+𝑒67
Attribution:Logistic Curve 38
Logistic Curve

Wenotethat𝑓 𝑡 ∈ 0,1 ,∀𝑡 If we have
𝑡 = 𝛽8 + 𝛽9𝑥9 + 𝛽:𝑥: + ⋯ + 𝛽;𝑥;
𝑃𝑌=1=𝑓𝑡= 1
1 + 𝑒6 “=<#>#=⋯=<$>$
Here, 𝑃 𝑌 = 1 refers to the probability of the input 𝑋 belonging to the default class 𝑌 = 1

Log odds (logit) function
The odds of 𝑋 belonging to the class 𝑌 = 1 is given by:
= 𝑒“=<#>#=⋯=<$>$
Taking log on either side gives:
= 𝛽8 + 𝛽9𝑥9 + 𝛽:𝑥: + ⋯ + 𝛽;𝑥;

Live demo Live demo next by Mindika

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com