Lx_SVM
Introduction to Information Retrieval
SUPPORT VECTOR MACHINE
1
Mainly based on
https://nlp.stanford.edu/IR-book/pdf/15svm.pdf
https://nlp.stanford.edu/IR-book/pdf/15svm.pdf
Introduction to Information Retrieval
Overview
§ SVM is a huge topic
§ Integration of MMDS, IIR, and Andrew Moore’s slides here
§ Our foci:
§ Geometric intuition è Primal form
§ Alternative interpretation from Empirical Risk Minimization point
of view.
§ Understand the final solution (in the dual form)
§ Generalizes well to Kernel functions
§ SVM + Kernels
2
Introduction to Information Retrieval
3
Linear classifiers: Which Hyperplane?
§ Lots of possible solutions for a, b, c.
§ Some methods find a separating
hyperplane, but not the optimal one
[according to some criterion of expected
goodness]
§ E.g., perceptron
§ Support Vector Machine (SVM) finds an
optimal* solution.
§ Maximizes the distance between the
hyperplane and the “difficult points”
close to decision boundary
§ One intuition: if there are no points near
the decision surface, then there are no
very uncertain classification decisions
This line represents
the decision
boundary:
ax + by � c = 0
Ch. 15
wTxi + b = 0
Introduction to Information Retrieval
4
Another intuition
§ If you have to place a fat separator between classes,
you have less choices, and so the capacity of the
model has been decreased
Sec. 15.1
Introduction to Information Retrieval
5
Support Vector Machine (SVM)
Support vectors
Maximizes
margin
§ SVMs maximize the margin around
the separating hyperplane.
§ A.k.a. large margin classifiers
§ The decision function is fully
specified by a subset of training
samples, the support vectors.
§ Solving SVMs is a quadratic
programming problem
§ Seen by many as the most
successful current text
classification method*
*but other discriminative methods
often perform very similarly
Sec. 15.1
Narrower
margin
Introduction to Information Retrieval
6
§ w: decision hyperplane normal vector
§ xi: data point i
§ yi: class of data point i (+1 or -1)
§ Classifier is: f(xi) = sign(wTxi + b)
§ Functional margin of xi is: yi (wTxi + b)
§ But note that we can increase this margin simply by scaling w, b….
§ Functional margin of dataset is twice the minimum
functional margin for any point
§ The factor of 2 comes from measuring the whole width of the
margin
Maximum Margin: Formalization
Sec. 15.1
NB: Not 1/0
NB: a common
trick
Introduction to Information Retrieval
+ +
+
+
+
+
+
+
+
–
–
–
–
–
–
–
–
–
A
B
C
Largest Margin
§ Distance from the
separating
hyperplane
corresponds to the
“confidence”
of prediction
§ Example:
§ We are more sure
about the class of A
and B than of C
7wTxi + b = 0
wTxi + b = 3.7
w
wTxi + b = 7.4
Introduction to Information Retrieval
8
Geometric Margin
§ Distance from example to the separator is
§ Examples closest to the hyperplane are support vectors.
§ Margin ρ of the separator is the width of separation between support vectors
of classes.
w
xw b
yr
T +
=
r
ρx
x′
w
Algebraic derivation of finding r:
Dotted line x��x is perpendicular to
decision boundary so parallel to w.
Unit vector is w/|w|, so line is
rw/|w|, for some r.
x� = x – yrw/|w|.
x� satisfies wTx�+b = 0.
So wT(x –yrw/|w|) + b = 0
Recall that |w| = sqrt(wTw).
So wTx –yr|w| + b = 0
So, solving for r gives:
r = y(wTx + b)/|w|
Sec. 15.1
Introduction to Information Retrieval
Help from Inner Product
§ Remember: Dot product / inner product
§ (Or use an algebraic derivation similar to prev slide)
§ A’s projection onto B = /
9
! “#$%
hA,Bi = kAkkBk cos ✓
AAACJ3icbVBLSwMxEM76rPVV9eglWAQPUnZF0Iui9eKxgn1AdynZdNoGs9klmRVK6b/x4l/xIqiIHv0npt0VfA0k8833zZDMFyZSGHTdd2dmdm5+YbGwVFxeWV1bL21sNkycag51HstYt0JmQAoFdRQooZVoYFEooRneXEz05i1oI2J1jcMEgoj1legJztBSndKpL5nqS6Dn+7RKfZ0VJ9RvgEZ6nufsrn5VPDY+DgAZ7ZTKbsWdBv0LvByUSR61TunJ78Y8jUAhl8yYtucmGIyYRsEljIt+aiBh/Ib1oW2hYhGYYDTdc0x3LdOlvVjbo5BO2e8TIxYZM4xC2xkxHJjf2oT8T2un2DsORkIlKYLi2UO9VFKM6cQ02hUaOMqhBYxrYf9K+YBpxtFaW7QmeL9X/gsaBxXPrXhXh+Wzam5HgWyTHbJHPHJEzsglqZE64eSOPJBn8uLcO4/Oq/OWtc44+cwW+RHOxycoGqOxAAACJ3icbVBLSwMxEM76rPVV9eglWAQPUnZF0Iui9eKxgn1AdynZdNoGs9klmRVK6b/x4l/xIqiIHv0npt0VfA0k8833zZDMFyZSGHTdd2dmdm5+YbGwVFxeWV1bL21sNkycag51HstYt0JmQAoFdRQooZVoYFEooRneXEz05i1oI2J1jcMEgoj1legJztBSndKpL5nqS6Dn+7RKfZ0VJ9RvgEZ6nufsrn5VPDY+DgAZ7ZTKbsWdBv0LvByUSR61TunJ78Y8jUAhl8yYtucmGIyYRsEljIt+aiBh/Ib1oW2hYhGYYDTdc0x3LdOlvVjbo5BO2e8TIxYZM4xC2xkxHJjf2oT8T2un2DsORkIlKYLi2UO9VFKM6cQ02hUaOMqhBYxrYf9K+YBpxtFaW7QmeL9X/gsaBxXPrXhXh+Wzam5HgWyTHbJHPHJEzsglqZE64eSOPJBn8uLcO4/Oq/OWtc44+cwW+RHOxycoGqOxAAACJ3icbVBLSwMxEM76rPVV9eglWAQPUnZF0Iui9eKxgn1AdynZdNoGs9klmRVK6b/x4l/xIqiIHv0npt0VfA0k8833zZDMFyZSGHTdd2dmdm5+YbGwVFxeWV1bL21sNkycag51HstYt0JmQAoFdRQooZVoYFEooRneXEz05i1oI2J1jcMEgoj1legJztBSndKpL5nqS6Dn+7RKfZ0VJ9RvgEZ6nufsrn5VPDY+DgAZ7ZTKbsWdBv0LvByUSR61TunJ78Y8jUAhl8yYtucmGIyYRsEljIt+aiBh/Ib1oW2hYhGYYDTdc0x3LdOlvVjbo5BO2e8TIxYZM4xC2xkxHJjf2oT8T2un2DsORkIlKYLi2UO9VFKM6cQ02hUaOMqhBYxrYf9K+YBpxtFaW7QmeL9X/gsaBxXPrXhXh+Wzam5HgWyTHbJHPHJEzsglqZE64eSOPJBn8uLcO4/Oq/OWtc44+cwW+RHOxycoGqOxAAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odBu3wMYA6nMMFXEEIN3AHD9CBLghI4BXevYn35n2suqp569LO4I+8zx84xIo4AAACHHicbVBNSwMxEJ31s9av6tVLUAQPUna96EXRevFYwVahu5RsOm2D2eySzAql+G+8+Fe8CCqi/8a0XUGrD5K8eW9CMi/OlLTk+5/ezOzc/MJiaam8vLK6tl7ZWGnaNDcCGyJVqbmJuUUlNTZIksKbzCBPYoXX8e35yL++Q2Nlqq9okGGU8J6WXSk4OaldOQkV1z2F7Gyf1VhoJsUxC5toiJ0V52SvfVcitSH1kThrV3b8qj8G+0uCguxAgXq78hJ2UpEnqEkobm0r8DOKhtyQFArvy2FuMePilvew5ajmCdpoOJ7znu06pcO6qXFLExurP28MeWLtIIldZ8Kpb6e9kfif18qpexQNpc5yQi0mD3VzxShlo9BYRxoUpAaOcGGk+ysTfW64IBdt2YUQTI/8lzQPqoFfDS59KMEWbMMeBHAIp3ABdWiAgAd4gld48x69Z+99EteMV+S2Cb/gfXwBJPCiHw==AAACHHicbVBNSwMxEJ31s9av6tVLUAQPUna96EXRevFYwVahu5RsOm2D2eySzAql+G+8+Fe8CCqi/8a0XUGrD5K8eW9CMi/OlLTk+5/ezOzc/MJiaam8vLK6tl7ZWGnaNDcCGyJVqbmJuUUlNTZIksKbzCBPYoXX8e35yL++Q2Nlqq9okGGU8J6WXSk4OaldOQkV1z2F7Gyf1VhoJsUxC5toiJ0V52SvfVcitSH1kThrV3b8qj8G+0uCguxAgXq78hJ2UpEnqEkobm0r8DOKhtyQFArvy2FuMePilvew5ajmCdpoOJ7znu06pcO6qXFLExurP28MeWLtIIldZ8Kpb6e9kfif18qpexQNpc5yQi0mD3VzxShlo9BYRxoUpAaOcGGk+ysTfW64IBdt2YUQTI/8lzQPqoFfDS59KMEWbMMeBHAIp3ABdWiAgAd4gld48x69Z+99EteMV+S2Cb/gfXwBJPCiHw==AAACJ3icbVBNSwMxEM36WetX1aOXYBE8SNn1ohelrRePFewHdEvJptM2NJtdklmhlP4bL/4VL4KK6NF/YtquoK0Dybx5b4ZkXhBLYdB1P52l5ZXVtfXMRnZza3tnN7e3XzNRojlUeSQj3QiYASkUVFGghEasgYWBhHowuJ7o9XvQRkTqDocxtELWU6IrOENLtXNXvmSqJ4GWTmmZ+npWXFK/BhppKc2zu/xT8cj42AdktJ3LuwV3GnQReCnIkzQq7dyL34l4EoJCLpkxTc+NsTViGgWXMM76iYGY8QHrQdNCxUIwrdF0zzE9tkyHdiNtj0I6ZX9PjFhozDAMbGfIsG/mtQn5n9ZMsHvRGgkVJwiKzx7qJpJiRCem0Y7QwFEOLWBcC/tXyvtMM47W2qw1wZtfeRHUzgqeW/Bu3XyxnNqRIYfkiJwQj5yTIrkhFVIlnDyQJ/JK3pxH59l5dz5mrUtOOnNA/oTz9Q0m2qOtAAACJ3icbVBLSwMxEM76rPVV9eglWAQPUnZF0Iui9eKxgn1AdynZdNoGs9klmRVK6b/x4l/xIqiIHv0npt0VfA0k8833zZDMFyZSGHTdd2dmdm5+YbGwVFxeWV1bL21sNkycag51HstYt0JmQAoFdRQooZVoYFEooRneXEz05i1oI2J1jcMEgoj1legJztBSndKpL5nqS6Dn+7RKfZ0VJ9RvgEZ6nufsrn5VPDY+DgAZ7ZTKbsWdBv0LvByUSR61TunJ78Y8jUAhl8yYtucmGIyYRsEljIt+aiBh/Ib1oW2hYhGYYDTdc0x3LdOlvVjbo5BO2e8TIxYZM4xC2xkxHJjf2oT8T2un2DsORkIlKYLi2UO9VFKM6cQ02hUaOMqhBYxrYf9K+YBpxtFaW7QmeL9X/gsaBxXPrXhXh+Wzam5HgWyTHbJHPHJEzsglqZE64eSOPJBn8uLcO4/Oq/OWtc44+cwW+RHOxycoGqOxAAACJ3icbVBLSwMxEM76rPVV9eglWAQPUnZF0Iui9eKxgn1AdynZdNoGs9klmRVK6b/x4l/xIqiIHv0npt0VfA0k8833zZDMFyZSGHTdd2dmdm5+YbGwVFxeWV1bL21sNkycag51HstYt0JmQAoFdRQooZVoYFEooRneXEz05i1oI2J1jcMEgoj1legJztBSndKpL5nqS6Dn+7RKfZ0VJ9RvgEZ6nufsrn5VPDY+DgAZ7ZTKbsWdBv0LvByUSR61TunJ78Y8jUAhl8yYtucmGIyYRsEljIt+aiBh/Ib1oW2hYhGYYDTdc0x3LdOlvVjbo5BO2e8TIxYZM4xC2xkxHJjf2oT8T2un2DsORkIlKYLi2UO9VFKM6cQ02hUaOMqhBYxrYf9K+YBpxtFaW7QmeL9X/gsaBxXPrXhXh+Wzam5HgWyTHbJHPHJEzsglqZE64eSOPJBn8uLcO4/Oq/OWtc44+cwW+RHOxycoGqOxAAACJ3icbVBLSwMxEM76rPVV9eglWAQPUnZF0Iui9eKxgn1AdynZdNoGs9klmRVK6b/x4l/xIqiIHv0npt0VfA0k8833zZDMFyZSGHTdd2dmdm5+YbGwVFxeWV1bL21sNkycag51HstYt0JmQAoFdRQooZVoYFEooRneXEz05i1oI2J1jcMEgoj1legJztBSndKpL5nqS6Dn+7RKfZ0VJ9RvgEZ6nufsrn5VPDY+DgAZ7ZTKbsWdBv0LvByUSR61TunJ78Y8jUAhl8yYtucmGIyYRsEljIt+aiBh/Ib1oW2hYhGYYDTdc0x3LdOlvVjbo5BO2e8TIxYZM4xC2xkxHJjf2oT8T2un2DsORkIlKYLi2UO9VFKM6cQ02hUaOMqhBYxrYf9K+YBpxtFaW7QmeL9X/gsaBxXPrXhXh+Wzam5HgWyTHbJHPHJEzsglqZE64eSOPJBn8uLcO4/Oq/OWtc44+cwW+RHOxycoGqOxAAACJ3icbVBLSwMxEM76rPVV9eglWAQPUnZF0Iui9eKxgn1AdynZdNoGs9klmRVK6b/x4l/xIqiIHv0npt0VfA0k8833zZDMFyZSGHTdd2dmdm5+YbGwVFxeWV1bL21sNkycag51HstYt0JmQAoFdRQooZVoYFEooRneXEz05i1oI2J1jcMEgoj1legJztBSndKpL5nqS6Dn+7RKfZ0VJ9RvgEZ6nufsrn5VPDY+DgAZ7ZTKbsWdBv0LvByUSR61TunJ78Y8jUAhl8yYtucmGIyYRsEljIt+aiBh/Ib1oW2hYhGYYDTdc0x3LdOlvVjbo5BO2e8TIxYZM4xC2xkxHJjf2oT8T2un2DsORkIlKYLi2UO9VFKM6cQ02hUaOMqhBYxrYf9K+YBpxtFaW7QmeL9X/gsaBxXPrXhXh+Wzam5HgWyTHbJHPHJEzsglqZE64eSOPJBn8uLcO4/Oq/OWtc44+cwW+RHOxycoGqOxAAACJ3icbVBLSwMxEM76rPVV9eglWAQPUnZF0Iui9eKxgn1AdynZdNoGs9klmRVK6b/x4l/xIqiIHv0npt0VfA0k8833zZDMFyZSGHTdd2dmdm5+YbGwVFxeWV1bL21sNkycag51HstYt0JmQAoFdRQooZVoYFEooRneXEz05i1oI2J1jcMEgoj1legJztBSndKpL5nqS6Dn+7RKfZ0VJ9RvgEZ6nufsrn5VPDY+DgAZ7ZTKbsWdBv0LvByUSR61TunJ78Y8jUAhl8yYtucmGIyYRsEljIt+aiBh/Ib1oW2hYhGYYDTdc0x3LdOlvVjbo5BO2e8TIxYZM4xC2xkxHJjf2oT8T2un2DsORkIlKYLi2UO9VFKM6cQ02hUaOMqhBYxrYf9K+YBpxtFaW7QmeL9X/gsaBxXPrXhXh+Wzam5HgWyTHbJHPHJEzsglqZE64eSOPJBn8uLcO4/Oq/OWtc44+cwW+RHOxycoGqOxAAACJ3icbVBLSwMxEM76rPVV9eglWAQPUnZF0Iui9eKxgn1AdynZdNoGs9klmRVK6b/x4l/xIqiIHv0npt0VfA0k8833zZDMFyZSGHTdd2dmdm5+YbGwVFxeWV1bL21sNkycag51HstYt0JmQAoFdRQooZVoYFEooRneXEz05i1oI2J1jcMEgoj1legJztBSndKpL5nqS6Dn+7RKfZ0VJ9RvgEZ6nufsrn5VPDY+DgAZ7ZTKbsWdBv0LvByUSR61TunJ78Y8jUAhl8yYtucmGIyYRsEljIt+aiBh/Ib1oW2hYhGYYDTdc0x3LdOlvVjbo5BO2e8TIxYZM4xC2xkxHJjf2oT8T2un2DsORkIlKYLi2UO9VFKM6cQ02hUaOMqhBYxrYf9K+YBpxtFaW7QmeL9X/gsaBxXPrXhXh+Wzam5HgWyTHbJHPHJEzsglqZE64eSOPJBn8uLcO4/Oq/OWtc44+cwW+RHOxycoGqOx
Introduction to Information Retrieval
w
T
x
+
b
=
0
A
H
w
d(A, L)
L
+
§ d(A, L) = /
– /
§ Note that H is on the line L, so
+ b = 0
§ Therefore = ( + b)/
(0,0)
10
Derivation of the Geometric Margin
Introduction to Information Retrieval
11
Linear SVM Mathematically
The linearly separable case
§ Assume that all data is at least distance 1 from the hyperplane, then the
following two constraints follow for a training set {(xi ,yi)}
§ For support vectors, the inequality becomes an equality
§ Then, since each example’s distance from the hyperplane is
§ The margin is:
wTxi + b ≥ 1 if yi= 1
wTxi + b ≤ −1 if yi= −1
w
2
=ρ
w
xw b
yr
T +
=
Sec. 15.1
Introduction to Information Retrieval
12
Derivation of ρ
§ Hyperplane
wT x + b = 0
§ Extra scale constraint:
mini=1,…,n |wTxi + b| = 1
§ This implies:
wT(xa–xb) = 2
ρ = ||xa–xb||2 = 2/||w||2
wT x + b = 0
wTxa + b = 1
wTxb + b = -1
ρ
Sec. 15.1
Introduction to Information Retrieval
13
Solving the Optimization Problem
§ This is now optimizing a quadratic function subject to linear constraints
§ Quadratic optimization problems are a well-known class of mathematical
programming problem, and many (intricate) algorithms exist for solving them
(with many special ones built for SVMs)
§ The solution involves constructing a dual problem where a Lagrange
multiplier αi is associated with every constraint in the primary problem:
Find w and b such that
Φ(w) =½ wTw is minimized;
and for all {(xi ,yi)}: yi (wTxi + b) ≥ 1
Find α1…αN such that
Q(α) =Σαi – ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi
Sec. 15.1
Introduction to Information Retrieval
14
Geometric Interpretation
• What are fixed and what are variables?
• Linear constraint(s): Where can the variables be?
• Quadratic objective function:
Find w and b such that
Φ(w) =½ wTw is minimized;
and for all {(xi ,yi)}: yi (wTxi + b) ≥ 1
Sec. 15.1
Introduction to Information Retrieval
15
The Optimization Problem Solution
§ The solution has the form:
§ Each non-zero αi indicates that corresponding xi is a support vector.
§ Then the scoring function will have the form:
§ Classification is based on the sign of f(x)
§ Notice that it relies on an inner product between the test point x and the
support vectors xi
§ We will return to this later.
§ Also keep in mind that solving the optimization problem involved
computing the inner products between all pairs of training points.
w =Σαiyixi b= yk- wTxk for any xk such that αk¹ 0
f(x) = ΣαiyixiTx + b
Sec. 15.1
Q: What are the model parameters? What does f(x) mean intuitively?
f(x) =
X
sv2SV
ysv · ↵sv · hxsv,xi+ b
AAACVnicbVFda9swFJXd9WPpl9s+7uWyMGhpCXYptC+Fsr70sWNLWohCuFbkRFSSjSSXBuM/2b20P2UvY3ISWNbugODonHulq6O0kMK6OH4NwpUPq2vrGx9bm1vbO7vR3n7P5qVhvMtymZv7FC2XQvOuE07y+8JwVKnkd+nDdePfPXJjRa5/uGnBBwrHWmSCofPSMFLZIVXoJmlWPdVHcAnUlmpY2UegQsP3Xg3TZlcDZaPcAUVZTHBJoRL1WHL4e8jMPFkSqJmXHEM6jNpxJ54B3pNkQdpkgdth9ExHOSsV145JtLafxIUbVGicYJLXLVpaXiB7wDHve6pRcTuoZrHU8MUrI8hy45d2MFOXOypU1k5V6iubYe1brxH/5/VLl10MKqGL0nHN5hdlpQSXQ5MxjIThzMmpJ8iM8LMCm6BB5vxPtHwIydsnvye9004Sd5JvZ+2rr4s4Nsgn8pkckoSckytyQ25JlzDyk/wKwmAleAl+h6vh+rw0DBY9B+QfhNEfkkO08Q==AAACVnicbVFda9swFJXd9WPpl9s+7uWyMGhpCXYptC+Fsr70sWNLWohCuFbkRFSSjSSXBuM/2b20P2UvY3ISWNbugODonHulq6O0kMK6OH4NwpUPq2vrGx9bm1vbO7vR3n7P5qVhvMtymZv7FC2XQvOuE07y+8JwVKnkd+nDdePfPXJjRa5/uGnBBwrHWmSCofPSMFLZIVXoJmlWPdVHcAnUlmpY2UegQsP3Xg3TZlcDZaPcAUVZTHBJoRL1WHL4e8jMPFkSqJmXHEM6jNpxJ54B3pNkQdpkgdth9ExHOSsV145JtLafxIUbVGicYJLXLVpaXiB7wDHve6pRcTuoZrHU8MUrI8hy45d2MFOXOypU1k5V6iubYe1brxH/5/VLl10MKqGL0nHN5hdlpQSXQ5MxjIThzMmpJ8iM8LMCm6BB5vxPtHwIydsnvye9004Sd5JvZ+2rr4s4Nsgn8pkckoSckytyQ25JlzDyk/wKwmAleAl+h6vh+rw0DBY9B+QfhNEfkkO08Q==AAACVnicbVFda9swFJXd9WPpl9s+7uWyMGhpCXYptC+Fsr70sWNLWohCuFbkRFSSjSSXBuM/2b20P2UvY3ISWNbugODonHulq6O0kMK6OH4NwpUPq2vrGx9bm1vbO7vR3n7P5qVhvMtymZv7FC2XQvOuE07y+8JwVKnkd+nDdePfPXJjRa5/uGnBBwrHWmSCofPSMFLZIVXoJmlWPdVHcAnUlmpY2UegQsP3Xg3TZlcDZaPcAUVZTHBJoRL1WHL4e8jMPFkSqJmXHEM6jNpxJ54B3pNkQdpkgdth9ExHOSsV145JtLafxIUbVGicYJLXLVpaXiB7wDHve6pRcTuoZrHU8MUrI8hy45d2MFOXOypU1k5V6iubYe1brxH/5/VLl10MKqGL0nHN5hdlpQSXQ5MxjIThzMmpJ8iM8LMCm6BB5vxPtHwIydsnvye9004Sd5JvZ+2rr4s4Nsgn8pkckoSckytyQ25JlzDyk/wKwmAleAl+h6vh+rw0DBY9B+QfhNEfkkO08Q==AAACVnicbVFda9swFJXd9WPpl9s+7uWyMGhpCXYptC+Fsr70sWNLWohCuFbkRFSSjSSXBuM/2b20P2UvY3ISWNbugODonHulq6O0kMK6OH4NwpUPq2vrGx9bm1vbO7vR3n7P5qVhvMtymZv7FC2XQvOuE07y+8JwVKnkd+nDdePfPXJjRa5/uGnBBwrHWmSCofPSMFLZIVXoJmlWPdVHcAnUlmpY2UegQsP3Xg3TZlcDZaPcAUVZTHBJoRL1WHL4e8jMPFkSqJmXHEM6jNpxJ54B3pNkQdpkgdth9ExHOSsV145JtLafxIUbVGicYJLXLVpaXiB7wDHve6pRcTuoZrHU8MUrI8hy45d2MFOXOypU1k5V6iubYe1brxH/5/VLl10MKqGL0nHN5hdlpQSXQ5MxjIThzMmpJ8iM8LMCm6BB5vxPtHwIydsnvye9004Sd5JvZ+2rr4s4Nsgn8pkckoSckytyQ25JlzDyk/wKwmAleAl+h6vh+rw0DBY9B+QfhNEfkkO08Q==
Introduction to Information Retrieval
16
Soft Margin Classification
§ If the training data is not
linearly separable, slack
variables ξi can be added to
allow misclassification of
difficult or noisy examples.
§ Allow some errors
§ Let some points be moved
to where they belong, at a
cost
§ Still, try to minimize training
set errors, and to place
hyperplane “far” from each
class (large margin)
ξj
ξi
Sec. 15.2.1
Introduction to Information Retrieval
17
Soft Margin Classification
Mathematically
§ The old formulation:
§ The new formulation incorporating slack variables:
§ Parameter C can be viewed as a way to control overfitting
§ A regularization term
Find w and b such that
Φ(w) =½ wTw is minimized and for all {(xi ,yi)}
yi (wTxi + b) ≥ 1
Find w and b such that
Φ(w) =½ wTw + CΣξi is minimized and for all {(xi ,yi)}
yi (wTxi + b) ≥ 1- ξi and ξi ≥ 0 for all i
Sec. 15.2.1
[Optional]
Introduction to Information Retrieval
Alternative View
§ SVM in the “natural” form
§ SVM uses “Hinge Loss”:
18
{ }å
=
+×-×+×
n
i
ii
bw
bxwyCww
1
2
1
,
)(1,0max minarg
Margin
Empirical loss L (how well we fit training data)
Hyper-parameter related to regularization
iii
n
i
i
bw
bxwyits
Cw
x
x
-³+××”
+ å
=
1)(,..
min
1
2
2
1
,
-1 0 1 2
0/1 loss
pe
na
lty
)( bwxyz ii +××=
Hinge loss: max{0, 1-z}
Introduction to Information Retrieval
19
Soft Margin Classification – Solution
§ The dual problem for soft margin classification:
§ Neither slack variables ξi nor their Lagrange multipliers appear in the dual
problem!
§ Again, xi with non-zero αi will be support vectors.
§ Solution to the dual problem is:
Find α1…αN such that
Q(α) =Σαi – ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi
w = Σαiyixi
b = yk(1- ξk) – wTxk where k = argmax αk�
k� f(x) = ΣαiyixiTx + b
w is not needed explicitly
for classification!
Sec. 15.2.1
[Optional]
Introduction to Information Retrieval
20
Classification with SVMs
§ Given a new point x, we can score its projection
onto the hyperplane normal:
§ I.e., compute score: wTx + b = ΣαiyixiTx + b
§ Decide class based on whether < or > 0
§ Can set confidence threshold t.
-1
0
1
Score > t: yes
Score < -t: no
Else: don’t know
Sec. 15.1
Introduction to Information Retrieval
21
Linear SVMs: Summary
§ The classifier is a separating hyperplane.
§ The most “important” training points are the support vectors; they define
the hyperplane.
§ Quadratic optimization algorithms can identify which training points xi are
support vectors with non-zero Lagrangian multipliers αi.
§ Both in the dual formulation of the problem and in the solution, training
points appear only inside inner products:
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi= 0
(2) 0 ≤ αi ≤ C for all αi
Sec. 15.2.1
f(x) =
X
sv2SV
ysv · ↵sv · hxsv,xi+ b
AAACVnicbVFda9swFJXd9WPpl9s+7uWyMGhpCXYptC+Fsr70sWNLWohCuFbkRFSSjSSXBuM/2b20P2UvY3ISWNbugODonHulq6O0kMK6OH4NwpUPq2vrGx9bm1vbO7vR3n7P5qVhvMtymZv7FC2XQvOuE07y+8JwVKnkd+nDdePfPXJjRa5/uGnBBwrHWmSCofPSMFLZIVXoJmlWPdVHcAnUlmpY2UegQsP3Xg3TZlcDZaPcAUVZTHBJoRL1WHL4e8jMPFkSqJmXHEM6jNpxJ54B3pNkQdpkgdth9ExHOSsV145JtLafxIUbVGicYJLXLVpaXiB7wDHve6pRcTuoZrHU8MUrI8hy45d2MFOXOypU1k5V6iubYe1brxH/5/VLl10MKqGL0nHN5hdlpQSXQ5MxjIThzMmpJ8iM8LMCm6BB5vxPtHwIydsnvye9004Sd5JvZ+2rr4s4Nsgn8pkckoSckytyQ25JlzDyk/wKwmAleAl+h6vh+rw0DBY9B+QfhNEfkkO08Q==AAACVnicbVFda9swFJXd9WPpl9s+7uWyMGhpCXYptC+Fsr70sWNLWohCuFbkRFSSjSSXBuM/2b20P2UvY3ISWNbugODonHulq6O0kMK6OH4NwpUPq2vrGx9bm1vbO7vR3n7P5qVhvMtymZv7FC2XQvOuE07y+8JwVKnkd+nDdePfPXJjRa5/uGnBBwrHWmSCofPSMFLZIVXoJmlWPdVHcAnUlmpY2UegQsP3Xg3TZlcDZaPcAUVZTHBJoRL1WHL4e8jMPFkSqJmXHEM6jNpxJ54B3pNkQdpkgdth9ExHOSsV145JtLafxIUbVGicYJLXLVpaXiB7wDHve6pRcTuoZrHU8MUrI8hy45d2MFOXOypU1k5V6iubYe1brxH/5/VLl10MKqGL0nHN5hdlpQSXQ5MxjIThzMmpJ8iM8LMCm6BB5vxPtHwIydsnvye9004Sd5JvZ+2rr4s4Nsgn8pkckoSckytyQ25JlzDyk/wKwmAleAl+h6vh+rw0DBY9B+QfhNEfkkO08Q==AAACVnicbVFda9swFJXd9WPpl9s+7uWyMGhpCXYptC+Fsr70sWNLWohCuFbkRFSSjSSXBuM/2b20P2UvY3ISWNbugODonHulq6O0kMK6OH4NwpUPq2vrGx9bm1vbO7vR3n7P5qVhvMtymZv7FC2XQvOuE07y+8JwVKnkd+nDdePfPXJjRa5/uGnBBwrHWmSCofPSMFLZIVXoJmlWPdVHcAnUlmpY2UegQsP3Xg3TZlcDZaPcAUVZTHBJoRL1WHL4e8jMPFkSqJmXHEM6jNpxJ54B3pNkQdpkgdth9ExHOSsV145JtLafxIUbVGicYJLXLVpaXiB7wDHve6pRcTuoZrHU8MUrI8hy45d2MFOXOypU1k5V6iubYe1brxH/5/VLl10MKqGL0nHN5hdlpQSXQ5MxjIThzMmpJ8iM8LMCm6BB5vxPtHwIydsnvye9004Sd5JvZ+2rr4s4Nsgn8pkckoSckytyQ25JlzDyk/wKwmAleAl+h6vh+rw0DBY9B+QfhNEfkkO08Q==AAACVnicbVFda9swFJXd9WPpl9s+7uWyMGhpCXYptC+Fsr70sWNLWohCuFbkRFSSjSSXBuM/2b20P2UvY3ISWNbugODonHulq6O0kMK6OH4NwpUPq2vrGx9bm1vbO7vR3n7P5qVhvMtymZv7FC2XQvOuE07y+8JwVKnkd+nDdePfPXJjRa5/uGnBBwrHWmSCofPSMFLZIVXoJmlWPdVHcAnUlmpY2UegQsP3Xg3TZlcDZaPcAUVZTHBJoRL1WHL4e8jMPFkSqJmXHEM6jNpxJ54B3pNkQdpkgdth9ExHOSsV145JtLafxIUbVGicYJLXLVpaXiB7wDHve6pRcTuoZrHU8MUrI8hy45d2MFOXOypU1k5V6iubYe1brxH/5/VLl10MKqGL0nHN5hdlpQSXQ5MxjIThzMmpJ8iM8LMCm6BB5vxPtHwIydsnvye9004Sd5JvZ+2rr4s4Nsgn8pkckoSckytyQ25JlzDyk/wKwmAleAl+h6vh+rw0DBY9B+QfhNEfkkO08Q==
Introduction to Information Retrieval
22
Non-linear SVMs
§ Datasets that are linearly separable (with some noise) work out great:
§ But what are we going to do if the dataset is just too hard?
§ How about … mapping data to a higher-dimensional space:
0
x2
x
0 x
0 x
Sec. 15.2.3
Introduction to Information Retrieval
23
Non-linear SVMs: Feature spaces
§ General idea: the original feature space can always
be mapped to some higher-dimensional feature
space where the training set is separable:
Φ: x→ φ(x)
Sec. 15.2.3
Introduction to Information Retrieval
24
The “Kernel Trick”
§ The linear classifier relies on an inner product between vectors
K(xi,xj)=xiTxj
§ If every datapoint is mapped into high-dimensional space via
some transformation Φ: x→ φ(x), the inner product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
§ A kernel function is some function that corresponds to an inner
product in some expanded feature space.
§ Usually, no need to construct the feature space explicitly.
Sec. 15.2.3
Introduction to Information Retrieval
What about ?
25
Example
Sec. 15.2.3
K(u,v) = (1 + u>v)2
= 1 + 2u>v + (u>v)2
= 1 + 2u>v +
X
i
uivi
!2
= 1 + 2u>v +
0
@
X
i
u2iv
2
i +
X
i 6=j
uiujvivj
1
A
= �(u)>�(v)
AAADqXicrZJda9swFIYVex9d9tF0u9yNWFhx6Qh2GKw3hbLdDLaLFpY0LEqCrMiJUln2pONAMP5t+w+727+ZnHiNm/RuExjec86jo9eHE6ZSGPD93w3HffDw0eODJ82nz56/OGwdveybJNOM91giEz0IqeFSKN4DAZIPUs1pHEp+Hd58KuvXS66NSNQ3WKV8FNOZEpFgFGxqctT4+cUjMYV5GOVZ8Q7/1cviBB/jc+wF+BRvgXFOIEmLOjbuYkKaJVui3X14y9q69x97Eckj8IjJ4omoobVguQ60mM1h0xv/e/Nx9057G1q4xHLLKf4DL4r7zdhgseNsGywql7ceSToXtWGd3E6rnrcjm7TafsdfH7wvgkq0UXUuJ61fZJqwLOYKmKTGDAM/hVFONQgmedEkmeEpZTd0xodWKhpzM8rXm1bgtzYzxVGi7acAr7P1GzmNjVnFoSVLi2a3Vibvqw0ziM5GuVBpBlyxzUNRJjEkuFxbPBWaM5ArKyjTwnrFbE41ZWCXu2mHEOz+8r7odzuB3wmu3rcvPlbjOECv0RvkoQB9QBfoM7pEPcScY+er03P67ql75Q7c7xvUaVR3XqE7x2V/AGVAKvo=AAADqXicrZJda9swFIYVex9d9tF0u9yNWFhx6Qh2GKw3hbLdDLaLFpY0LEqCrMiJUln2pONAMP5t+w+727+ZnHiNm/RuExjec86jo9eHE6ZSGPD93w3HffDw0eODJ82nz56/OGwdveybJNOM91giEz0IqeFSKN4DAZIPUs1pHEp+Hd58KuvXS66NSNQ3WKV8FNOZEpFgFGxqctT4+cUjMYV5GOVZ8Q7/1cviBB/jc+wF+BRvgXFOIEmLOjbuYkKaJVui3X14y9q69x97Eckj8IjJ4omoobVguQ60mM1h0xv/e/Nx9057G1q4xHLLKf4DL4r7zdhgseNsGywql7ceSToXtWGd3E6rnrcjm7TafsdfH7wvgkq0UXUuJ61fZJqwLOYKmKTGDAM/hVFONQgmedEkmeEpZTd0xodWKhpzM8rXm1bgtzYzxVGi7acAr7P1GzmNjVnFoSVLi2a3Vibvqw0ziM5GuVBpBlyxzUNRJjEkuFxbPBWaM5ArKyjTwnrFbE41ZWCXu2mHEOz+8r7odzuB3wmu3rcvPlbjOECv0RvkoQB9QBfoM7pEPcScY+er03P67ql75Q7c7xvUaVR3XqE7x2V/AGVAKvo=AAADqXicrZJda9swFIYVex9d9tF0u9yNWFhx6Qh2GKw3hbLdDLaLFpY0LEqCrMiJUln2pONAMP5t+w+727+ZnHiNm/RuExjec86jo9eHE6ZSGPD93w3HffDw0eODJ82nz56/OGwdveybJNOM91giEz0IqeFSKN4DAZIPUs1pHEp+Hd58KuvXS66NSNQ3WKV8FNOZEpFgFGxqctT4+cUjMYV5GOVZ8Q7/1cviBB/jc+wF+BRvgXFOIEmLOjbuYkKaJVui3X14y9q69x97Eckj8IjJ4omoobVguQ60mM1h0xv/e/Nx9057G1q4xHLLKf4DL4r7zdhgseNsGywql7ceSToXtWGd3E6rnrcjm7TafsdfH7wvgkq0UXUuJ61fZJqwLOYKmKTGDAM/hVFONQgmedEkmeEpZTd0xodWKhpzM8rXm1bgtzYzxVGi7acAr7P1GzmNjVnFoSVLi2a3Vibvqw0ziM5GuVBpBlyxzUNRJjEkuFxbPBWaM5ArKyjTwnrFbE41ZWCXu2mHEOz+8r7odzuB3wmu3rcvPlbjOECv0RvkoQB9QBfoM7pEPcScY+er03P67ql75Q7c7xvUaVR3XqE7x2V/AGVAKvo=AAADqXicrZJda9swFIYVex9d9tF0u9yNWFhx6Qh2GKw3hbLdDLaLFpY0LEqCrMiJUln2pONAMP5t+w+727+ZnHiNm/RuExjec86jo9eHE6ZSGPD93w3HffDw0eODJ82nz56/OGwdveybJNOM91giEz0IqeFSKN4DAZIPUs1pHEp+Hd58KuvXS66NSNQ3WKV8FNOZEpFgFGxqctT4+cUjMYV5GOVZ8Q7/1cviBB/jc+wF+BRvgXFOIEmLOjbuYkKaJVui3X14y9q69x97Eckj8IjJ4omoobVguQ60mM1h0xv/e/Nx9057G1q4xHLLKf4DL4r7zdhgseNsGywql7ceSToXtWGd3E6rnrcjm7TafsdfH7wvgkq0UXUuJ61fZJqwLOYKmKTGDAM/hVFONQgmedEkmeEpZTd0xodWKhpzM8rXm1bgtzYzxVGi7acAr7P1GzmNjVnFoSVLi2a3Vibvqw0ziM5GuVBpBlyxzUNRJjEkuFxbPBWaM5ArKyjTwnrFbE41ZWCXu2mHEOz+8r7odzuB3wmu3rcvPlbjOECv0RvkoQB9QBfoM7pEPcScY+er03P67ql75Q7c7xvUaVR3XqE7x2V/AGVAKvo=
�(u) =
⇥
1
p
2u1 . . .
p
2ud u
2
1 . . . u
2
d u1u2 . . . ud�1ud
⇤>
AAACznicdVJNbxMxEPUuX2X5SuHIxSKAyoFonQtcKkUgISQuQSJtpTiNvF5vYtVrb+3ZimCtuPKz+A3cuHHhf+BNKti0ZSRLb957Y489ziolHaTpzyi+dv3GzVs7t5M7d+/df9DbfXjgTG25mHCjjD3KmBNKajEBCUocVVawMlPiMDt52+qHZ8I6afQnWFViVrKFloXkDAI17/2m1VLu0ZLBMit83bzA+5hmYiG1zwJp5ecmIfg5pu7Ugh82/5zzNa1yA+5qPW/pjv14uFXQMW6U7s6d5H9VPn9Jmm0ipDShQud/ez/2FEzVzHv9dJCuA18G5Bz0R0+/v/uFEBrPez9obnhdCg1cMeemJK1g5pkFyZVoElo7UTF+whZiGqBmpXAzvx5Hg58FJseFsWFpwGu2W+FZ6dyqzIKz7d5d1FryKm1aQ/F65qWuahCabw4qaoXB4Ha2OJdWcFCrABi3MvSK+ZJZxiH8gCQ8Arl45cvgYDgg6YB8JP3RG7SJHfQYPUF7iKBXaITeozGaIB59iE6jL5GPx/FZ3MRfN9Y4Oq95hLYi/vYH0kPiFg==AAACznicdVLNbhMxEPZugZblpwGOXCwCqByIdnNpL5UCSAiJS5BIWylOI6/Xm1j12lt7tiK1Vlx5Cd6FZ+DGAyBx5gnwJhVs2jKSpW++7xt77HFaSmEhjn8E4caNm7c2t25Hd+7eu7/defDwwOrKMD5iWmpzlFLLpVB8BAIkPyoNp0Uq+WF68qbRD8+4sUKrj7Ao+aSgMyVywSh4atr5Scq52CEFhXmau6p+gfcxSflMKJd60ohPdZTg55jYUwOuX/9zTpe0zDTY6/WsoVv24/5aQcu4Uto7t5L/VbnsZVKvEz4lEeEq+9v7sSOgy3ra6ca9eBn4KkguQHfw9NvbX7+/vhpOO99JpllVcAVMUmvHSVzCxFEDgkleR6SyvKTshM742ENFC24nbjmOGj/zTIZzbfxSgJdsu8LRwtpFkXpn0729rDXkddq4gnxv4oQqK+CKrQ7KK4lB42a2OBOGM5ALDygzwveK2ZwaysD/gMg/QnL5ylfBQb+XxL3kQ9IdvEar2EKP0RO0gxK0iwboHRqiEWLB++A0OA9cOAzPwjr8vLKGwUXNI7QW4Zc/ScHj7g==AAACznicdVLNbhMxEPZugZblpwGOXCwCqByIdnNpL5UCSAiJS5BIWylOI6/Xm1j12lt7tiK1Vlx5Cd6FZ+DGAyBx5gnwJhVs2jKSpW++7xt77HFaSmEhjn8E4caNm7c2t25Hd+7eu7/defDwwOrKMD5iWmpzlFLLpVB8BAIkPyoNp0Uq+WF68qbRD8+4sUKrj7Ao+aSgMyVywSh4atr5Scq52CEFhXmau6p+gfcxSflMKJd60ohPdZTg55jYUwOuX/9zTpe0zDTY6/WsoVv24/5aQcu4Uto7t5L/VbnsZVKvEz4lEeEq+9v7sSOgy3ra6ca9eBn4KkguQHfw9NvbX7+/vhpOO99JpllVcAVMUmvHSVzCxFEDgkleR6SyvKTshM742ENFC24nbjmOGj/zTIZzbfxSgJdsu8LRwtpFkXpn0729rDXkddq4gnxv4oQqK+CKrQ7KK4lB42a2OBOGM5ALDygzwveK2ZwaysD/gMg/QnL5ylfBQb+XxL3kQ9IdvEar2EKP0RO0gxK0iwboHRqiEWLB++A0OA9cOAzPwjr8vLKGwUXNI7QW4Zc/ScHj7g==AAACznicdVJNbxMxEPUuHy3LV4AjF4sIVA5E61zopVLVXpC4BIm0leI08nq9iVWvvbVnqwZrxZXfx40fwP/Am0SwactIlt6898Yej51VSjpI019RfO/+g4c7u4+Sx0+ePnvee/HyxJnacjHmRhl7ljEnlNRiDBKUOKusYGWmxGl2cdzqp1fCOmn0V1hWYlqyuZaF5AwCNev9ptVC7tGSwSIrfN28xweYZmIutc8CaeV1kxD8DlN3acEPm3/O2YpWuQF3t563dMd+Ptwq6BjXSnfnTvK/Kp9/IM02EVKaUKHzv72fewqmama9fjpIV4FvA7IBfbSJ0az3k+aG16XQwBVzbkLSCqaeWZBciSahtRMV4xdsLiYBalYKN/Wr52jw28DkuDA2LA14xXYrPCudW5ZZcLbdu5taS96lTWoo9qde6qoGofn6oKJWGAxu3xbn0goOahkA41aGXjFfMMs4hB+QhCGQm1e+DU6GA5IOyBfSPzzajGMXvUZv0B4i6CM6RJ/QCI0Rjz5Hl9G3yMej+Cpu4u9raxxtal6hrYh//AG7dN/M
�(v) =
⇥
1
p
2v1 . . .
p
2vd v
2
1 . . . v
2
d v1v2 . . . vd�1ud
⇤>
AAACznicdZLLjtMwFIadcBvCZQos2VgU0LCgiruBDVIFEkJiUyQ6M1LdqRzHaa1x7Ix9UlGsiC2PxTOwY8eG98BpR0PmwpEs/f7+/yTHcbJKSQdp+iuKr12/cfPWzu3kzt1793d7Dx7uO1NbLibcKGMPM+aEklpMQIISh5UVrMyUOMiO37X+wUpYJ43+DOtKzEq20LKQnEFA894fWi3lHi0ZLLPCr5oX+A2mmVhI7bMArfzSJAQ/x9SdWPDD5l9yvsEqN+Cu9vMWd+JHw3MNneDW6T65s/lfl89fkuYM1C0IW5pQofOz2Y88BVM1814/HaSbwpcFORX90dMf738jhMbz3k+aG16XQgNXzLkpSSuYeWZBciWahNZOVIwfs4WYBqlZKdzMb66jwc8CyXFhbFga8IZ2OzwrnVuXWUi207uLXguv8qY1FK9nXuqqBqH59kVFrTAY3N4tzqUVHNQ6CMatDLNivmSWcQh/QBI+Arl45Mtifzgg6YB8Iv3RW7StHfQYPUF7iKBXaIQ+oDGaIB59jE6ir5GPx/EqbuJv22gcnfY8Qucq/v4X4h3iHg==AAACznicdZLLbtQwFIadcGkJtyks2VgMoLJgFM+GbioNRUJIbAaJaSuNpyPHcWasOk5qn4yYWhFbXoJ34RnY8QBIrHkCnJmqpBeOZOn39/8nOY6TlEpaiOOfQXjj5q3bG5t3orv37j942Nl6tG+LynAx4oUqzGHCrFBSixFIUOKwNILliRIHyfHbxj9YCGNloT/BshSTnM20zCRn4NG084uWc7lNcwbzJHOL+iXexTQRM6ld4qGRn+uI4BeY2hMDrl//S05XWKUF2Ov9tMGt+FH/QkMruHbaT25t/tfl0lekPgdVA/yWRlTo9Hz2I0ehKOtppxv34lXhq4Kcie7g2fd3v/98ezOcdn7QtOBVLjRwxawdk7iEiWMGJFeijmhlRcn4MZuJsZea5cJO3Oo6avzckxRnhfFLA17RdodjubXLPPHJZnp72Wvgdd64gmxn4qQuKxCar1+UVQpDgZu7xak0goNaesG4kX5WzOfMMA7+D4j8RyCXj3xV7Pd7JO6Rj6Q72EPr2kRP0FO0jQh6jQboPRqiEeLBh+AkOA1cOAwXYR1+WUfD4KznMbpQ4de/WZvj9g==AAACznicdZLLbtQwFIadcGkJtyks2VgMoLJgFM+GbioNRUJIbAaJaSuNpyPHcWasOk5qn4yYWhFbXoJ34RnY8QBIrHkCnJmqpBeOZOn39/8nOY6TlEpaiOOfQXjj5q3bG5t3orv37j942Nl6tG+LynAx4oUqzGHCrFBSixFIUOKwNILliRIHyfHbxj9YCGNloT/BshSTnM20zCRn4NG084uWc7lNcwbzJHOL+iXexTQRM6ld4qGRn+uI4BeY2hMDrl//S05XWKUF2Ov9tMGt+FH/QkMruHbaT25t/tfl0lekPgdVA/yWRlTo9Hz2I0ehKOtppxv34lXhq4Kcie7g2fd3v/98ezOcdn7QtOBVLjRwxawdk7iEiWMGJFeijmhlRcn4MZuJsZea5cJO3Oo6avzckxRnhfFLA17RdodjubXLPPHJZnp72Wvgdd64gmxn4qQuKxCar1+UVQpDgZu7xak0goNaesG4kX5WzOfMMA7+D4j8RyCXj3xV7Pd7JO6Rj6Q72EPr2kRP0FO0jQh6jQboPRqiEeLBh+AkOA1cOAwXYR1+WUfD4KznMbpQ4de/WZvj9g==AAACznicdZJNb9MwGMedjJcRXlbgyMWiAo0DVdwLXJAmuEziUiS6Taq7ynGc1ppjZ/aTimJFu+7zceMD8D1w2mpkLzxSpL9////jPLGTVUo6SNPfUbxz7/6Dh7uPksdPnj7b6z1/ceRMbbkYc6OMPcmYE0pqMQYJSpxUVrAyU+I4O/vS+sdLYZ00+jusKjEt2VzLQnIGAc16f2i1kPu0ZLDICr9s3uFPmGZiLrXPArTyR5MQ/BZTd27BD5t/ydkaq9yAu9vPW9yJnw6vNXSCG6e7c2fxvy6fvyfNFahbEJY0oULnV7Ofegqmama9fjpI14VvC7IVfbSt0az3i+aG16XQwBVzbkLSCqaeWZBciSahtRMV42dsLiZBalYKN/Xr62jwm0ByXBgbHg14TbsdnpXOrcosJNvp3U2vhXd5kxqKj1MvdVWD0HzzoqJWGAxu7xbn0goOahUE41aGWTFfMMs4hD8gCYdAbn7ybXE0HJB0QL6R/sHn7XHsolfoNdpHBH1AB+gQjdAY8ehrdB79jHw8ipdxE19sonG07XmJrlV8+RfLTt/U
Non-linear Non-linear + feature combinationLinear
K(u,v) = (1 + u>v)3
AAACKnicbVDLSsNAFL2pr1pfUZduBotQUUqiC90IVTeCmwpWhSYtk+nEDp08mJkUSsi/uHPjr7jpQilu/RAnrWJ9HBg4nHMuc+/xYs6ksqyRUZiZnZtfKC6WlpZXVtfM9Y0bGSWC0AaJeCTuPCwpZyFtKKY4vYsFxYHH6a3XO8/92z4VkkXhtRrE1A3wfch8RrDSUts8vaw4AVZdz0+TbB998X62i05QxUZ76NtupY6K4mw61DpEbbNsVa0x0F9if5JybdvZewCAetscOp2IJAENFeFYyqZtxcpNsVCMcJqVnETSGJMevqdNTUMcUOmm41MztKOVDvIjoV+o0FidnkhxIOUg8HQy31L+9nLxP6+ZKP/YTVkYJ4qGZPKRn3CkIpT3hjpMUKL4QBNMBNO7ItLFAhOl2y3pEuzfJ/8lNwdV26raV3a5dgYTFGELtqECNhxBDS6gDg0g8AjP8AKvxpMxNEbG2yRaMD5nNuEHjPcP5UGn0Q==AAACKnicbVDLSgMxFM3UV62vUZduQotQqZQZXehGqLoR3FSwttCZlkyaaUMzD5JMYRjmL/wHN/6Kmy6U4tYPMX2ItfVA4HDOueTe44SMCmkYIy2zsrq2vpHdzG1t7+zu6fsHTyKIOCY1HLCANxwkCKM+qUkqGWmEnCDPYaTu9G/Hfn1AuKCB/yjjkNge6vrUpRhJJbX16/ui5SHZc9wkSk/hDx+kJ/AKFk1Ygr92K7FkEKbzodY5bOsFo2xMAJeJOSOFSt4qPY8qcbWtD61OgCOP+BIzJETTNEJpJ4hLihlJc1YkSIhwH3VJU1EfeUTYyeTUFB4rpQPdgKvnSzhR5ycS5AkRe45KjrcUi95Y/M9rRtK9tBPqh5EkPp5+5EYMygCOe4MdygmWLFYEYU7VrhD3EEdYqnZzqgRz8eRl8nRWNo2y+WAWKjdgiiw4AnlQBCa4ABVwB6qgBjB4AW/gHXxor9pQG2mf02hGm80cgj/Qvr4B7qupVw==AAACKnicbVDLSgMxFM3UV62vUZduQotQqZQZXehGqLoR3FSwttCZlkyaaUMzD5JMYRjmL/wHN/6Kmy6U4tYPMX2ItfVA4HDOueTe44SMCmkYIy2zsrq2vpHdzG1t7+zu6fsHTyKIOCY1HLCANxwkCKM+qUkqGWmEnCDPYaTu9G/Hfn1AuKCB/yjjkNge6vrUpRhJJbX16/ui5SHZc9wkSk/hDx+kJ/AKFk1Ygr92K7FkEKbzodY5bOsFo2xMAJeJOSOFSt4qPY8qcbWtD61OgCOP+BIzJETTNEJpJ4hLihlJc1YkSIhwH3VJU1EfeUTYyeTUFB4rpQPdgKvnSzhR5ycS5AkRe45KjrcUi95Y/M9rRtK9tBPqh5EkPp5+5EYMygCOe4MdygmWLFYEYU7VrhD3EEdYqnZzqgRz8eRl8nRWNo2y+WAWKjdgiiw4AnlQBCa4ABVwB6qgBjB4AW/gHXxor9pQG2mf02hGm80cgj/Qvr4B7qupVw==AAACKnicbVDLSsNAFJ3UV62vqEs3g0WoKCXRhW6EqhvBTQX7gCYtk+mkHTp5MDMplJDvceOvuOlCKW79ECdtxNp6YOBwzrnMvccJGRXSMCZabmV1bX0jv1nY2t7Z3dP3D+oiiDgmNRywgDcdJAijPqlJKhlphpwgz2Gk4QzuU78xJFzQwH+Wo5DYHur51KUYSSV19NvHkuUh2XfcOErO4Q8fJqfwBpZMeAZ/7XZsySBM5kPtS9jRi0bZmAIuEzMjRZCh2tHHVjfAkUd8iRkSomUaobRjxCXFjCQFKxIkRHiAeqSlqI88Iux4emoCT5TShW7A1fMlnKrzEzHyhBh5jkqmW4pFLxX/81qRdK/tmPphJImPZx+5EYMygGlvsEs5wZKNFEGYU7UrxH3EEZaq3YIqwVw8eZnUL8qmUTafzGLlLqsjD47AMSgBE1yBCngAVVADGLyAN/AOPrRXbaxNtM9ZNKdlM4fgD7Svb9LCpkg=
Introduction to Information Retrieval
26
Kernels
§ Why use kernels?
§ Make non-separable problem separable.
§ Map data into better representational space
§ Can be learned to capture some notion of “similarity”
§ Common kernels
§ Linear
§ Polynomial K(x,z) = (1+xTz)d
§ Gives feature combinations
§ Radial basis function (infinite dimensional space)
§ Text classification:
§ Usually use linear or polynomial kernels
Sec. 15.2.3
K(xi,xj ;�) = exp(�
kxi � xjk2
2�2
)
AAACVHicbVHLSgMxFM1Mrdb6qrp0EyxCC7bMFEFBhKIbwU0F+4BOLZn0ThubeZBkpGXoR+pC8EvcuDB9LPrwQuDk3HOSmxM34kwqy/o2zNRWensns5vd2z84PModnzRkGAsKdRryULRcIoGzAOqKKQ6tSADxXQ5Nd/gw7TffQUgWBi9qHEHHJ/2AeYwSpalubvhUcHyiBq6XjCZddomXdm+32JGs75MivsMOjKJCCTueIDRxGiDUspTh0ooTzxSvlUlSmR+hYbGby1tla1Z4E9gLkEeLqnVzn04vpLEPgaKcSNm2rUh1EiIUoxwmWSeWEBE6JH1oaxgQH2QnmYUywRea6WEvFHoFCs/YZUdCfCnHvquV08Hlem9K/tdrx8q76SQsiGIFAZ1f5MUcqxBPE8Y9JoAqPtaAUMH0rJgOiI5N6X/I6hDs9SdvgkalbFtl+/kqX71fxJFBZ+gcFZCNrlEVPaIaqiOKPtCPgQzD+DJ+zZSZnktNY+E5RStlHv4Be+6yug==AAACVHicbVHLSgMxFM1Mrdb6qrp0EyxCC7bMFEFBhKIbwU0F+4BOLZn0ThubeZBkpGXoR+pC8EvcuDB9LPrwQuDk3HOSmxM34kwqy/o2zNRWensns5vd2z84PModnzRkGAsKdRryULRcIoGzAOqKKQ6tSADxXQ5Nd/gw7TffQUgWBi9qHEHHJ/2AeYwSpalubvhUcHyiBq6XjCZddomXdm+32JGs75MivsMOjKJCCTueIDRxGiDUspTh0ooTzxSvlUlSmR+hYbGby1tla1Z4E9gLkEeLqnVzn04vpLEPgaKcSNm2rUh1EiIUoxwmWSeWEBE6JH1oaxgQH2QnmYUywRea6WEvFHoFCs/YZUdCfCnHvquV08Hlem9K/tdrx8q76SQsiGIFAZ1f5MUcqxBPE8Y9JoAqPtaAUMH0rJgOiI5N6X/I6hDs9SdvgkalbFtl+/kqX71fxJFBZ+gcFZCNrlEVPaIaqiOKPtCPgQzD+DJ+zZSZnktNY+E5RStlHv4Be+6yug==AAACVHicbVHLSgMxFM1Mrdb6qrp0EyxCC7bMFEFBhKIbwU0F+4BOLZn0ThubeZBkpGXoR+pC8EvcuDB9LPrwQuDk3HOSmxM34kwqy/o2zNRWensns5vd2z84PModnzRkGAsKdRryULRcIoGzAOqKKQ6tSADxXQ5Nd/gw7TffQUgWBi9qHEHHJ/2AeYwSpalubvhUcHyiBq6XjCZddomXdm+32JGs75MivsMOjKJCCTueIDRxGiDUspTh0ooTzxSvlUlSmR+hYbGby1tla1Z4E9gLkEeLqnVzn04vpLEPgaKcSNm2rUh1EiIUoxwmWSeWEBE6JH1oaxgQH2QnmYUywRea6WEvFHoFCs/YZUdCfCnHvquV08Hlem9K/tdrx8q76SQsiGIFAZ1f5MUcqxBPE8Y9JoAqPtaAUMH0rJgOiI5N6X/I6hDs9SdvgkalbFtl+/kqX71fxJFBZ+gcFZCNrlEVPaIaqiOKPtCPgQzD+DJ+zZSZnktNY+E5RStlHv4Be+6yug==AAACVHicbVHLSgMxFM1Mrdb6qrp0EyxCC7bMFEFBhKIbwU0F+4BOLZn0ThubeZBkpGXoR+pC8EvcuDB9LPrwQuDk3HOSmxM34kwqy/o2zNRWensns5vd2z84PModnzRkGAsKdRryULRcIoGzAOqKKQ6tSADxXQ5Nd/gw7TffQUgWBi9qHEHHJ/2AeYwSpalubvhUcHyiBq6XjCZddomXdm+32JGs75MivsMOjKJCCTueIDRxGiDUspTh0ooTzxSvlUlSmR+hYbGby1tla1Z4E9gLkEeLqnVzn04vpLEPgaKcSNm2rUh1EiIUoxwmWSeWEBE6JH1oaxgQH2QnmYUywRea6WEvFHoFCs/YZUdCfCnHvquV08Hlem9K/tdrx8q76SQsiGIFAZ1f5MUcqxBPE8Y9JoAqPtaAUMH0rJgOiI5N6X/I6hDs9SdvgkalbFtl+/kqX71fxJFBZ+gcFZCNrlEVPaIaqiOKPtCPgQzD+DJ+zZSZnktNY+E5RStlHv4Be+6yug==
Introduction to Information Retrieval
27
Classification with SVMs with Kernels
§ Given a new point x, we can compute its score
i.e.,
§ Decide class based on whether < or > 0
§ E.g., in scikit-learn
Sec. 15.1
-1
0
1
P
sv2SV ↵svK(xsv,x) + b
AAACKnicbZDLSsNAFIZPvNZ6q7p0M1iEFqUkbnThoupGcFPRXqApZTKdtEMnkzAzKZYQ8G3c+Bzu3HShFLc+iElbUKs/DPx85xzmnN8JOFPaNMfGwuLS8spqZi27vrG5tZ3b2a0pP5SEVonPfdlwsKKcCVrVTHPaCCTFnsNp3elfpfX6gErFfHGvhwFtebgrmMsI1glq5y5sFXrtSA2QzQS6q8XIxjzo4RTF6KZge1j3HDd6iCfkGH2DIjpCDsq2c3mzZE6E/hprZvLl4sv5IwBU2rmR3fFJ6FGhCcdKNS0z0K0IS80Ip3HWDhUNMOnjLm0mVmCPqlY0OTVGhwnpINeXyRMaTejPiQh7Sg09J+lMF1XztRT+V2uG2j1rRUwEoaaCTD9yQ460j9LcUIdJSjQfJgYTyZJdEelhiYlO0k1DsOZP/mtqJyXLLFm3Vr58CVNlYB8OoAAWnEIZrqECVSDwBK/wBu/GszEyxsbHtHXBmM3swS8Zn19QVKinAAACKnicbZDLSsNAFIYn9VbrrerSzWARWpSSuNGFi6obwU3F3qAJYTKdtEMnkzAzKZaQd3Hnxudw56YLpbj1LdyYtAW19YeBn++cw5zzOwGjUun6WMssLa+srmXXcxubW9s7+d29hvRDgUkd+8wXLQdJwigndUUVI61AEOQ5jDSd/nVabw6IkNTnNTUMiOWhLqcuxUglyM5fmjL07EgOoEk5vG/E0EQs6KEUxfC2aHpI9Rw3eogn5AT+gBI8hg7M2fmCXtYngovGmJlCpfRyob5qj1U7PzI7Pg49whVmSMq2oQfKipBQFDMS58xQkgDhPuqSdmI58oi0osmpMTxKSAe6vkgeV3BCf09EyJNy6DlJZ7qonK+l8L9aO1TuuRVRHoSKcDz9yA0ZVD5Mc4MdKghWbJgYhAVNdoW4hwTCKkk3DcGYP3nRNE7Lhl427oxC5QpMlQUH4BAUgQHOQAXcgCqoAwyewCt4A+/aszbSxtrHtDWjzWb2wR9pn9/V96qKAAACKnicbZDLSsNAFIYn9VbrrerSzWARWpSSuNGFi6obwU3F3qAJYTKdtEMnkzAzKZaQd3Hnxudw56YLpbj1LdyYtAW19YeBn++cw5zzOwGjUun6WMssLa+srmXXcxubW9s7+d29hvRDgUkd+8wXLQdJwigndUUVI61AEOQ5jDSd/nVabw6IkNTnNTUMiOWhLqcuxUglyM5fmjL07EgOoEk5vG/E0EQs6KEUxfC2aHpI9Rw3eogn5AT+gBI8hg7M2fmCXtYngovGmJlCpfRyob5qj1U7PzI7Pg49whVmSMq2oQfKipBQFDMS58xQkgDhPuqSdmI58oi0osmpMTxKSAe6vkgeV3BCf09EyJNy6DlJZ7qonK+l8L9aO1TuuRVRHoSKcDz9yA0ZVD5Mc4MdKghWbJgYhAVNdoW4hwTCKkk3DcGYP3nRNE7Lhl427oxC5QpMlQUH4BAUgQHOQAXcgCqoAwyewCt4A+/aszbSxtrHtDWjzWb2wR9pn9/V96qKAAACKnicbZDLSsNAFIYn9VbrLerSzWARKkpJ3Oiy6kZwU9FeoAlhMp20QyeTMDMplpDnceOruOlCKW59EJM0oLb+MPDznXOYc343ZFQqw5hppZXVtfWN8mZla3tnd0/fP2jLIBKYtHDAAtF1kSSMctJSVDHSDQVBvstIxx3dZvXOmAhJA/6kJiGxfTTg1KMYqRQ5+rUlI9+J5RhalMPHdgItxMIhylAC72uWj9TQ9eLnJCfn8AecwjPowoqjV426kQsuG7MwVVCo6ehTqx/gyCdcYYak7JlGqOwYCUUxI0nFiiQJER6hAemlliOfSDvOT03gSUr60AtE+riCOf09ESNfyonvpp3ZonKxlsH/ar1IeVd2THkYKcLx/CMvYlAFMMsN9qkgWLFJahAWNN0V4iESCKs03SwEc/HkZdO+qJtG3Xwwq42bIo4yOALHoAZMcAka4A40QQtg8ALewDv40F61qTbTPuetJa2YOQR/pH19A7kcprw=
Introduction to Information Retrieval
28
Pros and Cons of the SVM Classifier
§ Pros
§ High accuracy
§ Fast classification speed
§ Works with cases where #features > #samples
§ Can adapt to different objects (due to Kernel)
§ Any K(u, v) can be used if symmetric, continuous, and positive
semi-definite.
§ Or explicit engineer the feature space.
§ Cons
§ Training does not scale well with number of training
samples (O(d*n2) or O(d*n3))
§ Hyper-parameters needs tuning
Sec. 15.1
Introduction to Information Retrieval
Resources for today�s lecture
§ Christopher J. C. Burges. 1998. A Tutorial on Support Vector Machines for Pattern Recognition
§ S. T. Dumais. 1998. Using SVMs for text categorization, IEEE Intelligent Systems, 13(4)
§ S. T. Dumais, J. Platt, D. Heckerman and M. Sahami. 1998. Inductive learning algorithms and
representations for text categorization. CIKM �98, pp. 148-155.
§ Yiming Yang, Xin Liu. 1999. A re-examination of text categorization methods. 22nd Annual
International SIGIR
§ Tong Zhang, Frank J. Oles. 2001. Text Categorization Based on Regularized Linear
Classification Methods. Information Retrieval 4(1): 5-31
§ Trevor Hastie, Robert Tibshirani and Jerome Friedman. Elements of Statistical Learning: Data
Mining, Inference and Prediction. Springer-Verlag, New York.
§ T. Joachims, Learning to Classify Text using Support Vector Machines. Kluwer, 2002.
§ Fan Li, Yiming Yang. 2003. A Loss Function Analysis for Classification Methods in Text
Categorization. ICML 2003: 472-479.
§ Tie-Yan Liu, Yiming Yang, Hao Wan, et al. 2005. Support Vector Machines Classification with
Very Large Scale Taxonomy, SIGKDD Explorations, 7(1): 36-43.
§ �Classic� Reuters-21578 data set: http://www.daviddlewis.com /resources
/testcollections/reuters21578/
Ch. 15