Lecture 9:
Classification for Gaussians CS 189 (CDSS offering)
2022/02/07
Copyright By PowCoder代写 加微信 powcoder
Today’s lecture
• So far, we have talked a lot about regression, in particular linear regression
• Today marks our transition into classification, which will make up the bulk of the rest of the course
• We will come up with a probabilistic model for classification
• We will also see how the assumption of MVG class conditionals leads to the same probabilistic model
Classification
• In classification, we are given a dataset = {(x1, y1), …, (xN, yN)}
• Just like before, but now each yi ” {0,…, K # 1} is a discrete label
• Our goal is still to learn a model that predicts outputs given inputs: f!(x) = y
• Just like before, we are mostly going to work with probabilistic models
• Let’s first focus on binary classification
A probabilistic model for binary classification
• In binary classification, all of the labels yi ” {0, 1}
• But our linear model from before, f!(x) = !$x, outputs an unconstrained real number!
• Maybe we could have this output represent a probability somehow?
• How do we make our model output f!(x) define a probability distribution over y?
• First, let’s just focus on outputting p!(y = 1!x)
• Then we know p!(y = 0!x) = 1 # p!(y = 1!x)
• How do we make our model output a number between 0 and 1?
The logistic function
• f!(x) = !$x is some arbitrary number, how do we turn it into a probability?
• The most common way is to use the logistic function (also called sigmoid, expit)
sigmoid(z) = 1 = exp z 1+exp{#z} expz+1
• So, p!(y = 1!x) = sigmoid(f!(x))
• Andp!(y=0!x)=1#p!(y=1!x) 5
The MLE of !
given = {(x1, y1), …, (xN, yN)}, where each yi ” {0, 1}
argmoax II poly I xi arg f II po lo I xi argfax l y logpolo xi t yi logPoliI Xi
expOtxit l O x i log expOtxit 1 6
Logistic regression
• This classification setup is known as logistic regression (yes, confusing…)
• Unlike linear regression, there is no analytical closed form solution
• Instead, we must rely on iterative optimization — specifically, we resort to
gradient based optimization
• We will talk more about logistic regression next lecture
• For the rest of today, let’s go back and try to understand what we’ve done from a
different perspective: the generative perspective 7
Approaches to classification
• There are two probabilistic approaches to classification problems
• Discriminative: directly learn a model of p!(y!x), forget about p(x)
• Generative: model the class conditionals p!(x ! y) and class priors p!(y), use Bayes’ rule to compute p!(y!x)
• There are also non probabilistic approaches, e.g., approaches that reason directly about the decision boundary (we’ll see examples later in the semester)
• Let’s take a look at a simple generative approach to binary classification:
linear discriminant analysis (LDA)
Linear discriminant analysis
suppose p!(y = 0) = p!(y = 1) = 0.5, and let’s model the class conditionals as MVGs: p!(x ! y = 0) = (x; “0, &) and p!(x ! y = 1) = (x; “1, &)
N x Me E 0.57 N x ma É
c 1 more Es
Linear discriminant analysis
so, predict y = 1 when log (x; “1, &) # log (x; “0, &) > 0
x m T E x μ x Mo E x Mo O
Ex2miExmiEn tx2uEx
M E Mo Mi E n 10
wTxtb0w Ilm
Modifications to the generative model
• There are a number of modifications we can make to the LDA model
• For example, what if the class conditionals have different covariances? &0 ‘ &1
• Then, we get quadratic discriminant analysis, because the classification rule is no longer linear in x but instead quadratic in x
• Or, what if p!(y = 0) ‘ p!(y = 1) = 0.5?
• This results in a rather simple change to the classification rule — left as an exercise
• Lastly, what if we can assume diagonal &? This is the naïve Bayes model/assumption • Then the classification rule becomes even simpler to compute!
Generative vs. discriminative classification
• Generative classifiers have some advantages over their discriminative counterparts
• They have an explicit notion of p!(x) — this can be useful for detecting outliers or data anomalies
• Sometimes, they can be more effective when training data is limited
• Nevertheless, discriminative classifiers are the standard way to go — why?
• Long story short, they work better — especially neural networks (next week)
• Especially when working with complex x, modeling p!(x ! y) is hard
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com