Lecture 12:
Support-vector machines (1) CS 189 (CDSS offering)
2022/02/14
Copyright By PowCoder代写 加微信 powcoder
Today’s lecture
• Today, we continue on our journey of linear classifiers
• What have we seen so far? LDA, logistic regression, and perceptrons
• This week, we will dive deep into one of the most classic machine learning models for classification — the support-vector machine (SVM)
• We are taking a break from probability, probabilistic models, and MLE/MAP
• Instead, SVMs approach the linear classification problem geometrically
• Today, we will cover the most basic formulation of the SVM, known as the hard margin SVM
Recap: the perceptron algorithm The simplest version
• The perceptron is best understood as repeating the following update to !
• Pick a (xi, yi) — if sign(!!xi) = yi, move on; otherwise, ! ” ! + yixi
• We repeat this until sign(!!xi) = yi for all i (here, we have yi # {$1,1})
• For linearly separable datasets, this is guaranteed to eventually happen!
• If the data are not linearly separable, this algorithm will never terminate
• This is one of the earliest examples of a learning algorithm, and it has some interesting similarities to the (stochastic) gradient update for logistic regression
What does the perceptron learn?
in general, there are no guarantees about which decision boundary will be learned but it does seem like some decision boundaries are better than others…
What is a good decision boundary?
• Logistic regression answered this question from a probabilistic viewpoint
• SVMs instead answer this question from a geometric viewpoint
• SVMs consider the distance from the boundary to the closest points
• This distance (sometimes ! 2) is called the margin
• SVMs say: find the decision boundary that maximizes the margin!
Visualizing the max-margin linear classifier
we suspect that the max-margin linear classifier can generalize better to new points compared to other linear classifiers that are also perfect on the training data
The SVM problem setup
• How do we find the max-margin linear classifier?
• Remember that our decision boundary is defined as w!x + b = 0
• We’re changing the notation compared to last time (!!x = 0), and we’re also explicitly pulling out the bias term b — this will be important for SVMs
• We are given {(x1, y1), …, (xN, yN)}, where all yi ” {#1, 1}
• Same as the perceptron, now our negative examples are labeled #1, not 0
Calculating the margin — setting up
• How do we calculate the margin for some setting of our parameters w and b?
• First, let’s consider calculating the distance from some point x to the hyperplane
• This distance is the length of the vector from x to the projection of x onto the hyperplane — call this projection xp, and let d = x ” xp
• We know that d, like w, is orthogonal to the hyperplane, so d = !w for some !
We also know that w#xp + b = 0, since xp is on the hyperplane 8
Calculating the margin — visualizing
Calculating the margin — solving
wexptbw xdtbwtxawtb
Formulating the optimization — objective
arg max w b
margin notice if
margin sit all points classified correctly
I multiply w and b by c o the separating hip don’t change
the margin and
I can let’s set it to 1
IwTxitblhowever mi I
so the margin becomes
Formulating the optimization — constraints
all points classified correctly
Wtxi tb 2 0 if yi 1 so if yo I
wt Xi tb 20 but we set m.in wTxitbl 1 YiwTxitb21 foralli
Ii can turn
arms llull
st yow Xitb 21foralli 12
arg min wi b
The SVM optimization problem
• So our final optimization problem for SVMs is: arg min 1 !w!2 s.t. yi(w”xi + b) # 1 $i
• This is an example of a quadratic program (QP) — though we won’t talk about it,
efficient iterative optimization algorithms exist for solving these types of problems
• The solution to this QP is the max-margin linear classifier for our dataset, but this only exists if our data are linearly separable!
• What if it’s not? More on that next time…
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com