程序代写代做代考 information retrieval flex algorithm Lx_SVM

Lx_SVM

Introduction to Information Retrieval

SUPPORT VECTOR MACHINE

1

Mainly based on
https://nlp.stanford.edu/IR-book/pdf/15svm.pdf

https://nlp.stanford.edu/IR-book/pdf/15svm.pdf

Introduction to Information Retrieval

Overview
§ SVM is a huge topic

§ Integration of MMDS, IIR, and Andrew Moore’s slides here
§ Our foci:

§ Geometric intuition è Primal form
§ Alternative interpretation from Empirical Risk Minimization point

of view.

§ Understand the final solution (in the dual form)
§ Generalizes well to Kernel functions

§ SVM + Kernels

2

Introduction to Information Retrieval

3

Linear classifiers: Which Hyperplane?

§ Lots of possible solutions for a, b, c.
§ Some methods find a separating

hyperplane, but not the optimal one
[according to some criterion of expected
goodness]

§ E.g., perceptron

§ Support Vector Machine (SVM) finds an
optimal* solution.
§ Maximizes the distance between the

hyperplane and the “difficult points”
close to decision boundary

§ One intuition: if there are no points near
the decision surface, then there are no
very uncertain classification decisions

This line represents
the decision
boundary:

ax + by � c = 0

Ch. 15

wTxi + b = 0

Introduction to Information Retrieval

4

Another intuition
§ If you have to place a fat separator between classes,

you have less choices, and so the capacity of the
model has been decreased

Sec. 15.1

Introduction to Information Retrieval

5

Support Vector Machine (SVM)
Support vectors

Maximizes
margin

§ SVMs maximize the margin around
the separating hyperplane.

§ A.k.a. large margin classifiers

§ The decision function is fully
specified by a subset of training
samples, the support vectors.

§ Solving SVMs is a quadratic
programming problem

§ Seen by many as the most
successful current text
classification method*

*but other discriminative methods
often perform very similarly

Sec. 15.1

Narrower
margin

Introduction to Information Retrieval

6

§ w: decision hyperplane normal vector
§ xi: data point i
§ yi: class of data point i (+1 or -1)
§ Classifier is: f(xi) = sign(wTxi + b)
§ Functional margin of xi is: yi (wTxi + b)

§ But note that we can increase this margin simply by scaling w, b….

§ Functional margin of dataset is twice the minimum
functional margin for any point
§ The factor of 2 comes from measuring the whole width of the

margin

Maximum Margin: Formalization

Sec. 15.1

NB: Not 1/0

NB: a common
trick

Introduction to Information Retrieval

+ +

+

+

+

+

+

+

+



A

B

C

Largest Margin
§ Distance from the

separating
hyperplane
corresponds to the
“confidence”
of prediction

§ Example:
§ We are more sure

about the class of A
and B than of C

7wTxi + b = 0

wTxi + b = 3.7

w

wTxi + b = 7.4

Introduction to Information Retrieval

8

Geometric Margin
§ Distance from example to the separator is

§ Examples closest to the hyperplane are support vectors.

§ Margin ρ of the separator is the width of separation between support vectors
of classes.

w
xw b

yr
T +

=

r

ρx

x′

w

Algebraic derivation of finding r:
Dotted line x��x is perpendicular to
decision boundary so parallel to w.
Unit vector is w/|w|, so line is
rw/|w|, for some r.
x� = x – yrw/|w|.
x� satisfies wTx�+b = 0.
So wT(x –yrw/|w|) + b = 0
Recall that |w| = sqrt(wTw).
So wTx –yr|w| + b = 0
So, solving for r gives:
r = y(wTx + b)/|w|

Sec. 15.1

Introduction to Information Retrieval

Help from Inner Product
§ Remember: Dot product / inner product

§ (Or use an algebraic derivation similar to prev slide)

§ A’s projection onto B = /
9

! “#$%

hA,Bi = kAkkBk cos ✓
AAACJ3icbVBLSwMxEM76rPVV9eglWAQPUnZF0Iui9eKxgn1AdynZdNoGs9klmRVK6b/x4l/xIqiIHv0npt0VfA0k8833zZDMFyZSGHTdd2dmdm5+YbGwVFxeWV1bL21sNkycag51HstYt0JmQAoFdRQooZVoYFEooRneXEz05i1oI2J1jcMEgoj1legJztBSndKpL5nqS6Dn+7RKfZ0VJ9RvgEZ6nufsrn5VPDY+DgAZ7ZTKbsWdBv0LvByUSR61TunJ78Y8jUAhl8yYtucmGIyYRsEljIt+aiBh/Ib1oW2hYhGYYDTdc0x3LdOlvVjbo5BO2e8TIxYZM4xC2xkxHJjf2oT8T2un2DsORkIlKYLi2UO9VFKM6cQ02hUaOMqhBYxrYf9K+YBpxtFaW7QmeL9X/gsaBxXPrXhXh+Wzam5HgWyTHbJHPHJEzsglqZE64eSOPJBn8uLcO4/Oq/OWtc44+cwW+RHOxycoGqOxAAACJ3icbVBLSwMxEM76rPVV9eglWAQPUnZF0Iui9eKxgn1AdynZdNoGs9klmRVK6b/x4l/xIqiIHv0npt0VfA0k8833zZDMFyZSGHTdd2dmdm5+YbGwVFxeWV1bL21sNkycag51HstYt0JmQAoFdRQooZVoYFEooRneXEz05i1oI2J1jcMEgoj1legJztBSndKpL5nqS6Dn+7RKfZ0VJ9RvgEZ6nufsrn5VPDY+DgAZ7ZTKbsWdBv0LvByUSR61TunJ78Y8jUAhl8yYtucmGIyYRsEljIt+aiBh/Ib1oW2hYhGYYDTdc0x3LdOlvVjbo5BO2e8TIxYZM4xC2xkxHJjf2oT8T2un2DsORkIlKYLi2UO9VFKM6cQ02hUaOMqhBYxrYf9K+YBpxtFaW7QmeL9X/gsaBxXPrXhXh+Wzam5HgWyTHbJHPHJEzsglqZE64eSOPJBn8uLcO4/Oq/OWtc44+cwW+RHOxycoGqOxAAACJ3icbVBLSwMxEM76rPVV9eglWAQPUnZF0Iui9eKxgn1AdynZdNoGs9klmRVK6b/x4l/xIqiIHv0npt0VfA0k8833zZDMFyZSGHTdd2dmdm5+YbGwVFxeWV1bL21sNkycag51HstYt0JmQAoFdRQooZVoYFEooRneXEz05i1oI2J1jcMEgoj1legJztBSndKpL5nqS6Dn+7RKfZ0VJ9RvgEZ6nufsrn5VPDY+DgAZ7ZTKbsWdBv0LvByUSR61TunJ78Y8jUAhl8yYtucmGIyYRsEljIt+aiBh/Ib1oW2hYhGYYDTdc0x3LdOlvVjbo5BO2e8TIxYZM4xC2xkxHJjf2oT8T2un2DsORkIlKYLi2UO9VFKM6cQ02hUaOMqhBYxrYf9K+YBpxtFaW7QmeL9X/gsaBxXPrXhXh+Wzam5HgWyTHbJHPHJEzsglqZE64eSOPJBn8uLcO4/Oq/OWtc44+cwW+RHOxycoGqOxAAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odBu3wMYA6nMMFXEEIN3AHD9CBLghI4BXevYn35n2suqp569LO4I+8zx84xIo4AAACHHicbVBNSwMxEJ31s9av6tVLUAQPUna96EXRevFYwVahu5RsOm2D2eySzAql+G+8+Fe8CCqi/8a0XUGrD5K8eW9CMi/OlLTk+5/ezOzc/MJiaam8vLK6tl7ZWGnaNDcCGyJVqbmJuUUlNTZIksKbzCBPYoXX8e35yL++Q2Nlqq9okGGU8J6WXSk4OaldOQkV1z2F7Gyf1VhoJsUxC5toiJ0V52SvfVcitSH1kThrV3b8qj8G+0uCguxAgXq78hJ2UpEnqEkobm0r8DOKhtyQFArvy2FuMePilvew5ajmCdpoOJ7znu06pcO6qXFLExurP28MeWLtIIldZ8Kpb6e9kfif18qpexQNpc5yQi0mD3VzxShlo9BYRxoUpAaOcGGk+ysTfW64IBdt2YUQTI/8lzQPqoFfDS59KMEWbMMeBHAIp3ABdWiAgAd4gld48x69Z+99EteMV+S2Cb/gfXwBJPCiHw==AAACHHicbVBNSwMxEJ31s9av6tVLUAQPUna96EXRevFYwVahu5RsOm2D2eySzAql+G+8+Fe8CCqi/8a0XUGrD5K8eW9CMi/OlLTk+5/ezOzc/MJiaam8vLK6tl7ZWGnaNDcCGyJVqbmJuUUlNTZIksKbzCBPYoXX8e35yL++Q2Nlqq9okGGU8J6WXSk4OaldOQkV1z2F7Gyf1VhoJsUxC5toiJ0V52SvfVcitSH1kThrV3b8qj8G+0uCguxAgXq78hJ2UpEnqEkobm0r8DOKhtyQFArvy2FuMePilvew5ajmCdpoOJ7znu06pcO6qXFLExurP28MeWLtIIldZ8Kpb6e9kfif18qpexQNpc5yQi0mD3VzxShlo9BYRxoUpAaOcGGk+ysTfW64IBdt2YUQTI/8lzQPqoFfDS59KMEWbMMeBHAIp3ABdWiAgAd4gld48x69Z+99EteMV+S2Cb/gfXwBJPCiHw==AAACJ3icbVBNSwMxEM36WetX1aOXYBE8SNn1ohelrRePFewHdEvJptM2NJtdklmhlP4bL/4VL4KK6NF/YtquoK0Dybx5b4ZkXhBLYdB1P52l5ZXVtfXMRnZza3tnN7e3XzNRojlUeSQj3QiYASkUVFGghEasgYWBhHowuJ7o9XvQRkTqDocxtELWU6IrOENLtXNXvmSqJ4GWTmmZ+npWXFK/BhppKc2zu/xT8cj42AdktJ3LuwV3GnQReCnIkzQq7dyL34l4EoJCLpkxTc+NsTViGgWXMM76iYGY8QHrQdNCxUIwrdF0zzE9tkyHdiNtj0I6ZX9PjFhozDAMbGfIsG/mtQn5n9ZMsHvRGgkVJwiKzx7qJpJiRCem0Y7QwFEOLWBcC/tXyvtMM47W2qw1wZtfeRHUzgqeW/Bu3XyxnNqRIYfkiJwQj5yTIrkhFVIlnDyQJ/JK3pxH59l5dz5mrUtOOnNA/oTz9Q0m2qOtAAACJ3icbVBLSwMxEM76rPVV9eglWAQPUnZF0Iui9eKxgn1AdynZdNoGs9klmRVK6b/x4l/xIqiIHv0npt0VfA0k8833zZDMFyZSGHTdd2dmdm5+YbGwVFxeWV1bL21sNkycag51HstYt0JmQAoFdRQooZVoYFEooRneXEz05i1oI2J1jcMEgoj1legJztBSndKpL5nqS6Dn+7RKfZ0VJ9RvgEZ6nufsrn5VPDY+DgAZ7ZTKbsWdBv0LvByUSR61TunJ78Y8jUAhl8yYtucmGIyYRsEljIt+aiBh/Ib1oW2hYhGYYDTdc0x3LdOlvVjbo5BO2e8TIxYZM4xC2xkxHJjf2oT8T2un2DsORkIlKYLi2UO9VFKM6cQ02hUaOMqhBYxrYf9K+YBpxtFaW7QmeL9X/gsaBxXPrXhXh+Wzam5HgWyTHbJHPHJEzsglqZE64eSOPJBn8uLcO4/Oq/OWtc44+cwW+RHOxycoGqOxAAACJ3icbVBLSwMxEM76rPVV9eglWAQPUnZF0Iui9eKxgn1AdynZdNoGs9klmRVK6b/x4l/xIqiIHv0npt0VfA0k8833zZDMFyZSGHTdd2dmdm5+YbGwVFxeWV1bL21sNkycag51HstYt0JmQAoFdRQooZVoYFEooRneXEz05i1oI2J1jcMEgoj1legJztBSndKpL5nqS6Dn+7RKfZ0VJ9RvgEZ6nufsrn5VPDY+DgAZ7ZTKbsWdBv0LvByUSR61TunJ78Y8jUAhl8yYtucmGIyYRsEljIt+aiBh/Ib1oW2hYhGYYDTdc0x3LdOlvVjbo5BO2e8TIxYZM4xC2xkxHJjf2oT8T2un2DsORkIlKYLi2UO9VFKM6cQ02hUaOMqhBYxrYf9K+YBpxtFaW7QmeL9X/gsaBxXPrXhXh+Wzam5HgWyTHbJHPHJEzsglqZE64eSOPJBn8uLcO4/Oq/OWtc44+cwW+RHOxycoGqOxAAACJ3icbVBLSwMxEM76rPVV9eglWAQPUnZF0Iui9eKxgn1AdynZdNoGs9klmRVK6b/x4l/xIqiIHv0npt0VfA0k8833zZDMFyZSGHTdd2dmdm5+YbGwVFxeWV1bL21sNkycag51HstYt0JmQAoFdRQooZVoYFEooRneXEz05i1oI2J1jcMEgoj1legJztBSndKpL5nqS6Dn+7RKfZ0VJ9RvgEZ6nufsrn5VPDY+DgAZ7ZTKbsWdBv0LvByUSR61TunJ78Y8jUAhl8yYtucmGIyYRsEljIt+aiBh/Ib1oW2hYhGYYDTdc0x3LdOlvVjbo5BO2e8TIxYZM4xC2xkxHJjf2oT8T2un2DsORkIlKYLi2UO9VFKM6cQ02hUaOMqhBYxrYf9K+YBpxtFaW7QmeL9X/gsaBxXPrXhXh+Wzam5HgWyTHbJHPHJEzsglqZE64eSOPJBn8uLcO4/Oq/OWtc44+cwW+RHOxycoGqOxAAACJ3icbVBLSwMxEM76rPVV9eglWAQPUnZF0Iui9eKxgn1AdynZdNoGs9klmRVK6b/x4l/xIqiIHv0npt0VfA0k8833zZDMFyZSGHTdd2dmdm5+YbGwVFxeWV1bL21sNkycag51HstYt0JmQAoFdRQooZVoYFEooRneXEz05i1oI2J1jcMEgoj1legJztBSndKpL5nqS6Dn+7RKfZ0VJ9RvgEZ6nufsrn5VPDY+DgAZ7ZTKbsWdBv0LvByUSR61TunJ78Y8jUAhl8yYtucmGIyYRsEljIt+aiBh/Ib1oW2hYhGYYDTdc0x3LdOlvVjbo5BO2e8TIxYZM4xC2xkxHJjf2oT8T2un2DsORkIlKYLi2UO9VFKM6cQ02hUaOMqhBYxrYf9K+YBpxtFaW7QmeL9X/gsaBxXPrXhXh+Wzam5HgWyTHbJHPHJEzsglqZE64eSOPJBn8uLcO4/Oq/OWtc44+cwW+RHOxycoGqOxAAACJ3icbVBLSwMxEM76rPVV9eglWAQPUnZF0Iui9eKxgn1AdynZdNoGs9klmRVK6b/x4l/xIqiIHv0npt0VfA0k8833zZDMFyZSGHTdd2dmdm5+YbGwVFxeWV1bL21sNkycag51HstYt0JmQAoFdRQooZVoYFEooRneXEz05i1oI2J1jcMEgoj1legJztBSndKpL5nqS6Dn+7RKfZ0VJ9RvgEZ6nufsrn5VPDY+DgAZ7ZTKbsWdBv0LvByUSR61TunJ78Y8jUAhl8yYtucmGIyYRsEljIt+aiBh/Ib1oW2hYhGYYDTdc0x3LdOlvVjbo5BO2e8TIxYZM4xC2xkxHJjf2oT8T2un2DsORkIlKYLi2UO9VFKM6cQ02hUaOMqhBYxrYf9K+YBpxtFaW7QmeL9X/gsaBxXPrXhXh+Wzam5HgWyTHbJHPHJEzsglqZE64eSOPJBn8uLcO4/Oq/OWtc44+cwW+RHOxycoGqOxAAACJ3icbVBLSwMxEM76rPVV9eglWAQPUnZF0Iui9eKxgn1AdynZdNoGs9klmRVK6b/x4l/xIqiIHv0npt0VfA0k8833zZDMFyZSGHTdd2dmdm5+YbGwVFxeWV1bL21sNkycag51HstYt0JmQAoFdRQooZVoYFEooRneXEz05i1oI2J1jcMEgoj1legJztBSndKpL5nqS6Dn+7RKfZ0VJ9RvgEZ6nufsrn5VPDY+DgAZ7ZTKbsWdBv0LvByUSR61TunJ78Y8jUAhl8yYtucmGIyYRsEljIt+aiBh/Ib1oW2hYhGYYDTdc0x3LdOlvVjbo5BO2e8TIxYZM4xC2xkxHJjf2oT8T2un2DsORkIlKYLi2UO9VFKM6cQ02hUaOMqhBYxrYf9K+YBpxtFaW7QmeL9X/gsaBxXPrXhXh+Wzam5HgWyTHbJHPHJEzsglqZE64eSOPJBn8uLcO4/Oq/OWtc44+cwW+RHOxycoGqOx

Introduction to Information Retrieval

w
T
x

+
b

=
0

A

H

w

d(A, L)

L

+

§ d(A, L) = /
/

§ Note that H is on the line L, so
+ b = 0

§ Therefore = ( + b)/

(0,0)
10

Derivation of the Geometric Margin

Introduction to Information Retrieval

11

Linear SVM Mathematically
The linearly separable case

§ Assume that all data is at least distance 1 from the hyperplane, then the
following two constraints follow for a training set {(xi ,yi)}

§ For support vectors, the inequality becomes an equality
§ Then, since each example’s distance from the hyperplane is

§ The margin is:

wTxi + b ≥ 1 if yi= 1

wTxi + b ≤ −1 if yi= −1

w
2

w
xw b

yr
T +

=

Sec. 15.1

Introduction to Information Retrieval

12

Derivation of ρ

§ Hyperplane
wT x + b = 0

§ Extra scale constraint:
mini=1,…,n |wTxi + b| = 1

§ This implies:
wT(xa–xb) = 2
ρ = ||xa–xb||2 = 2/||w||2

wT x + b = 0

wTxa + b = 1

wTxb + b = -1

ρ

Sec. 15.1

Introduction to Information Retrieval

13

Solving the Optimization Problem

§ This is now optimizing a quadratic function subject to linear constraints
§ Quadratic optimization problems are a well-known class of mathematical

programming problem, and many (intricate) algorithms exist for solving them
(with many special ones built for SVMs)

§ The solution involves constructing a dual problem where a Lagrange
multiplier αi is associated with every constraint in the primary problem:

Find w and b such that
Φ(w) =½ wTw is minimized;
and for all {(xi ,yi)}: yi (wTxi + b) ≥ 1

Find α1…αN such that
Q(α) =Σαi – ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi

Sec. 15.1

Introduction to Information Retrieval

14

Geometric Interpretation

• What are fixed and what are variables?
• Linear constraint(s): Where can the variables be?
• Quadratic objective function:

Find w and b such that
Φ(w) =½ wTw is minimized;
and for all {(xi ,yi)}: yi (wTxi + b) ≥ 1

Sec. 15.1

Introduction to Information Retrieval

15

The Optimization Problem Solution
§ The solution has the form:

§ Each non-zero αi indicates that corresponding xi is a support vector.
§ Then the scoring function will have the form:

§ Classification is based on the sign of f(x)

§ Notice that it relies on an inner product between the test point x and the
support vectors xi
§ We will return to this later.

§ Also keep in mind that solving the optimization problem involved
computing the inner products between all pairs of training points.

w =Σαiyixi b= yk- wTxk for any xk such that αk¹ 0

f(x) = ΣαiyixiTx + b

Sec. 15.1

Q: What are the model parameters? What does f(x) mean intuitively?

f(x) =
X

sv2SV
ysv · ↵sv · hxsv,xi+ b

AAACVnicbVFda9swFJXd9WPpl9s+7uWyMGhpCXYptC+Fsr70sWNLWohCuFbkRFSSjSSXBuM/2b20P2UvY3ISWNbugODonHulq6O0kMK6OH4NwpUPq2vrGx9bm1vbO7vR3n7P5qVhvMtymZv7FC2XQvOuE07y+8JwVKnkd+nDdePfPXJjRa5/uGnBBwrHWmSCofPSMFLZIVXoJmlWPdVHcAnUlmpY2UegQsP3Xg3TZlcDZaPcAUVZTHBJoRL1WHL4e8jMPFkSqJmXHEM6jNpxJ54B3pNkQdpkgdth9ExHOSsV145JtLafxIUbVGicYJLXLVpaXiB7wDHve6pRcTuoZrHU8MUrI8hy45d2MFOXOypU1k5V6iubYe1brxH/5/VLl10MKqGL0nHN5hdlpQSXQ5MxjIThzMmpJ8iM8LMCm6BB5vxPtHwIydsnvye9004Sd5JvZ+2rr4s4Nsgn8pkckoSckytyQ25JlzDyk/wKwmAleAl+h6vh+rw0DBY9B+QfhNEfkkO08Q==AAACVnicbVFda9swFJXd9WPpl9s+7uWyMGhpCXYptC+Fsr70sWNLWohCuFbkRFSSjSSXBuM/2b20P2UvY3ISWNbugODonHulq6O0kMK6OH4NwpUPq2vrGx9bm1vbO7vR3n7P5qVhvMtymZv7FC2XQvOuE07y+8JwVKnkd+nDdePfPXJjRa5/uGnBBwrHWmSCofPSMFLZIVXoJmlWPdVHcAnUlmpY2UegQsP3Xg3TZlcDZaPcAUVZTHBJoRL1WHL4e8jMPFkSqJmXHEM6jNpxJ54B3pNkQdpkgdth9ExHOSsV145JtLafxIUbVGicYJLXLVpaXiB7wDHve6pRcTuoZrHU8MUrI8hy45d2MFOXOypU1k5V6iubYe1brxH/5/VLl10MKqGL0nHN5hdlpQSXQ5MxjIThzMmpJ8iM8LMCm6BB5vxPtHwIydsnvye9004Sd5JvZ+2rr4s4Nsgn8pkckoSckytyQ25JlzDyk/wKwmAleAl+h6vh+rw0DBY9B+QfhNEfkkO08Q==AAACVnicbVFda9swFJXd9WPpl9s+7uWyMGhpCXYptC+Fsr70sWNLWohCuFbkRFSSjSSXBuM/2b20P2UvY3ISWNbugODonHulq6O0kMK6OH4NwpUPq2vrGx9bm1vbO7vR3n7P5qVhvMtymZv7FC2XQvOuE07y+8JwVKnkd+nDdePfPXJjRa5/uGnBBwrHWmSCofPSMFLZIVXoJmlWPdVHcAnUlmpY2UegQsP3Xg3TZlcDZaPcAUVZTHBJoRL1WHL4e8jMPFkSqJmXHEM6jNpxJ54B3pNkQdpkgdth9ExHOSsV145JtLafxIUbVGicYJLXLVpaXiB7wDHve6pRcTuoZrHU8MUrI8hy45d2MFOXOypU1k5V6iubYe1brxH/5/VLl10MKqGL0nHN5hdlpQSXQ5MxjIThzMmpJ8iM8LMCm6BB5vxPtHwIydsnvye9004Sd5JvZ+2rr4s4Nsgn8pkckoSckytyQ25JlzDyk/wKwmAleAl+h6vh+rw0DBY9B+QfhNEfkkO08Q==AAACVnicbVFda9swFJXd9WPpl9s+7uWyMGhpCXYptC+Fsr70sWNLWohCuFbkRFSSjSSXBuM/2b20P2UvY3ISWNbugODonHulq6O0kMK6OH4NwpUPq2vrGx9bm1vbO7vR3n7P5qVhvMtymZv7FC2XQvOuE07y+8JwVKnkd+nDdePfPXJjRa5/uGnBBwrHWmSCofPSMFLZIVXoJmlWPdVHcAnUlmpY2UegQsP3Xg3TZlcDZaPcAUVZTHBJoRL1WHL4e8jMPFkSqJmXHEM6jNpxJ54B3pNkQdpkgdth9ExHOSsV145JtLafxIUbVGicYJLXLVpaXiB7wDHve6pRcTuoZrHU8MUrI8hy45d2MFOXOypU1k5V6iubYe1brxH/5/VLl10MKqGL0nHN5hdlpQSXQ5MxjIThzMmpJ8iM8LMCm6BB5vxPtHwIydsnvye9004Sd5JvZ+2rr4s4Nsgn8pkckoSckytyQ25JlzDyk/wKwmAleAl+h6vh+rw0DBY9B+QfhNEfkkO08Q==

Introduction to Information Retrieval

16

Soft Margin Classification
§ If the training data is not

linearly separable, slack
variables ξi can be added to
allow misclassification of
difficult or noisy examples.

§ Allow some errors
§ Let some points be moved

to where they belong, at a
cost

§ Still, try to minimize training
set errors, and to place
hyperplane “far” from each
class (large margin)

ξj

ξi

Sec. 15.2.1

Introduction to Information Retrieval

17

Soft Margin Classification
Mathematically
§ The old formulation:

§ The new formulation incorporating slack variables:

§ Parameter C can be viewed as a way to control overfitting
§ A regularization term

Find w and b such that
Φ(w) =½ wTw is minimized and for all {(xi ,yi)}
yi (wTxi + b) ≥ 1

Find w and b such that
Φ(w) =½ wTw + CΣξi is minimized and for all {(xi ,yi)}
yi (wTxi + b) ≥ 1- ξi and ξi ≥ 0 for all i

Sec. 15.2.1

[Optional]

Introduction to Information Retrieval

Alternative View
§ SVM in the “natural” form

§ SVM uses “Hinge Loss”:

18

{ }å
=

+×-×+×
n

i
ii

bw
bxwyCww

1
2
1

,
)(1,0max minarg

Margin
Empirical loss L (how well we fit training data)

Hyper-parameter related to regularization

iii

n

i
i

bw

bxwyits

Cw

x

x

-³+××”

+ å
=

1)(,..

min
1

2
2
1

,

-1 0 1 2

0/1 loss

pe
na

lty

)( bwxyz ii +××=

Hinge loss: max{0, 1-z}

Introduction to Information Retrieval

19

Soft Margin Classification – Solution
§ The dual problem for soft margin classification:

§ Neither slack variables ξi nor their Lagrange multipliers appear in the dual
problem!

§ Again, xi with non-zero αi will be support vectors.
§ Solution to the dual problem is:

Find α1…αN such that
Q(α) =Σαi – ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi

w = Σαiyixi
b = yk(1- ξk) – wTxk where k = argmax αk�

k� f(x) = ΣαiyixiTx + b

w is not needed explicitly
for classification!

Sec. 15.2.1

[Optional]

Introduction to Information Retrieval

20

Classification with SVMs
§ Given a new point x, we can score its projection

onto the hyperplane normal:
§ I.e., compute score: wTx + b = ΣαiyixiTx + b

§ Decide class based on whether < or > 0

§ Can set confidence threshold t.

-1
0
1

Score > t: yes
Score < -t: no Else: don’t know Sec. 15.1 Introduction to Information Retrieval 21 Linear SVMs: Summary § The classifier is a separating hyperplane. § The most “important” training points are the support vectors; they define the hyperplane. § Quadratic optimization algorithms can identify which training points xi are support vectors with non-zero Lagrangian multipliers αi. § Both in the dual formulation of the problem and in the solution, training points appear only inside inner products: Find α1…αN such that Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and (1) Σαiyi= 0 (2) 0 ≤ αi ≤ C for all αi Sec. 15.2.1 f(x) = X sv2SV ysv · ↵sv · hxsv,xi+ b AAACVnicbVFda9swFJXd9WPpl9s+7uWyMGhpCXYptC+Fsr70sWNLWohCuFbkRFSSjSSXBuM/2b20P2UvY3ISWNbugODonHulq6O0kMK6OH4NwpUPq2vrGx9bm1vbO7vR3n7P5qVhvMtymZv7FC2XQvOuE07y+8JwVKnkd+nDdePfPXJjRa5/uGnBBwrHWmSCofPSMFLZIVXoJmlWPdVHcAnUlmpY2UegQsP3Xg3TZlcDZaPcAUVZTHBJoRL1WHL4e8jMPFkSqJmXHEM6jNpxJ54B3pNkQdpkgdth9ExHOSsV145JtLafxIUbVGicYJLXLVpaXiB7wDHve6pRcTuoZrHU8MUrI8hy45d2MFOXOypU1k5V6iubYe1brxH/5/VLl10MKqGL0nHN5hdlpQSXQ5MxjIThzMmpJ8iM8LMCm6BB5vxPtHwIydsnvye9004Sd5JvZ+2rr4s4Nsgn8pkckoSckytyQ25JlzDyk/wKwmAleAl+h6vh+rw0DBY9B+QfhNEfkkO08Q==AAACVnicbVFda9swFJXd9WPpl9s+7uWyMGhpCXYptC+Fsr70sWNLWohCuFbkRFSSjSSXBuM/2b20P2UvY3ISWNbugODonHulq6O0kMK6OH4NwpUPq2vrGx9bm1vbO7vR3n7P5qVhvMtymZv7FC2XQvOuE07y+8JwVKnkd+nDdePfPXJjRa5/uGnBBwrHWmSCofPSMFLZIVXoJmlWPdVHcAnUlmpY2UegQsP3Xg3TZlcDZaPcAUVZTHBJoRL1WHL4e8jMPFkSqJmXHEM6jNpxJ54B3pNkQdpkgdth9ExHOSsV145JtLafxIUbVGicYJLXLVpaXiB7wDHve6pRcTuoZrHU8MUrI8hy45d2MFOXOypU1k5V6iubYe1brxH/5/VLl10MKqGL0nHN5hdlpQSXQ5MxjIThzMmpJ8iM8LMCm6BB5vxPtHwIydsnvye9004Sd5JvZ+2rr4s4Nsgn8pkckoSckytyQ25JlzDyk/wKwmAleAl+h6vh+rw0DBY9B+QfhNEfkkO08Q==AAACVnicbVFda9swFJXd9WPpl9s+7uWyMGhpCXYptC+Fsr70sWNLWohCuFbkRFSSjSSXBuM/2b20P2UvY3ISWNbugODonHulq6O0kMK6OH4NwpUPq2vrGx9bm1vbO7vR3n7P5qVhvMtymZv7FC2XQvOuE07y+8JwVKnkd+nDdePfPXJjRa5/uGnBBwrHWmSCofPSMFLZIVXoJmlWPdVHcAnUlmpY2UegQsP3Xg3TZlcDZaPcAUVZTHBJoRL1WHL4e8jMPFkSqJmXHEM6jNpxJ54B3pNkQdpkgdth9ExHOSsV145JtLafxIUbVGicYJLXLVpaXiB7wDHve6pRcTuoZrHU8MUrI8hy45d2MFOXOypU1k5V6iubYe1brxH/5/VLl10MKqGL0nHN5hdlpQSXQ5MxjIThzMmpJ8iM8LMCm6BB5vxPtHwIydsnvye9004Sd5JvZ+2rr4s4Nsgn8pkckoSckytyQ25JlzDyk/wKwmAleAl+h6vh+rw0DBY9B+QfhNEfkkO08Q==AAACVnicbVFda9swFJXd9WPpl9s+7uWyMGhpCXYptC+Fsr70sWNLWohCuFbkRFSSjSSXBuM/2b20P2UvY3ISWNbugODonHulq6O0kMK6OH4NwpUPq2vrGx9bm1vbO7vR3n7P5qVhvMtymZv7FC2XQvOuE07y+8JwVKnkd+nDdePfPXJjRa5/uGnBBwrHWmSCofPSMFLZIVXoJmlWPdVHcAnUlmpY2UegQsP3Xg3TZlcDZaPcAUVZTHBJoRL1WHL4e8jMPFkSqJmXHEM6jNpxJ54B3pNkQdpkgdth9ExHOSsV145JtLafxIUbVGicYJLXLVpaXiB7wDHve6pRcTuoZrHU8MUrI8hy45d2MFOXOypU1k5V6iubYe1brxH/5/VLl10MKqGL0nHN5hdlpQSXQ5MxjIThzMmpJ8iM8LMCm6BB5vxPtHwIydsnvye9004Sd5JvZ+2rr4s4Nsgn8pkckoSckytyQ25JlzDyk/wKwmAleAl+h6vh+rw0DBY9B+QfhNEfkkO08Q==

Introduction to Information Retrieval

22

Non-linear SVMs
§ Datasets that are linearly separable (with some noise) work out great:

§ But what are we going to do if the dataset is just too hard?

§ How about … mapping data to a higher-dimensional space:

0

x2

x

0 x

0 x

Sec. 15.2.3

Introduction to Information Retrieval

23

Non-linear SVMs: Feature spaces
§ General idea: the original feature space can always

be mapped to some higher-dimensional feature
space where the training set is separable:

Φ: x→ φ(x)

Sec. 15.2.3

Introduction to Information Retrieval

24

The “Kernel Trick”
§ The linear classifier relies on an inner product between vectors

K(xi,xj)=xiTxj
§ If every datapoint is mapped into high-dimensional space via

some transformation Φ: x→ φ(x), the inner product becomes:
K(xi,xj)= φ(xi) Tφ(xj)

§ A kernel function is some function that corresponds to an inner
product in some expanded feature space.
§ Usually, no need to construct the feature space explicitly.

Sec. 15.2.3

Introduction to Information Retrieval

What about ?

25

Example

Sec. 15.2.3

K(u,v) = (1 + u>v)2

= 1 + 2u>v + (u>v)2

= 1 + 2u>v +

X

i

uivi

!2

= 1 + 2u>v +

0

@
X

i

u2iv
2
i +

X

i 6=j
uiujvivj

1

A

= �(u)>�(v)
AAADqXicrZJda9swFIYVex9d9tF0u9yNWFhx6Qh2GKw3hbLdDLaLFpY0LEqCrMiJUln2pONAMP5t+w+727+ZnHiNm/RuExjec86jo9eHE6ZSGPD93w3HffDw0eODJ82nz56/OGwdveybJNOM91giEz0IqeFSKN4DAZIPUs1pHEp+Hd58KuvXS66NSNQ3WKV8FNOZEpFgFGxqctT4+cUjMYV5GOVZ8Q7/1cviBB/jc+wF+BRvgXFOIEmLOjbuYkKaJVui3X14y9q69x97Eckj8IjJ4omoobVguQ60mM1h0xv/e/Nx9057G1q4xHLLKf4DL4r7zdhgseNsGywql7ceSToXtWGd3E6rnrcjm7TafsdfH7wvgkq0UXUuJ61fZJqwLOYKmKTGDAM/hVFONQgmedEkmeEpZTd0xodWKhpzM8rXm1bgtzYzxVGi7acAr7P1GzmNjVnFoSVLi2a3Vibvqw0ziM5GuVBpBlyxzUNRJjEkuFxbPBWaM5ArKyjTwnrFbE41ZWCXu2mHEOz+8r7odzuB3wmu3rcvPlbjOECv0RvkoQB9QBfoM7pEPcScY+er03P67ql75Q7c7xvUaVR3XqE7x2V/AGVAKvo=AAADqXicrZJda9swFIYVex9d9tF0u9yNWFhx6Qh2GKw3hbLdDLaLFpY0LEqCrMiJUln2pONAMP5t+w+727+ZnHiNm/RuExjec86jo9eHE6ZSGPD93w3HffDw0eODJ82nz56/OGwdveybJNOM91giEz0IqeFSKN4DAZIPUs1pHEp+Hd58KuvXS66NSNQ3WKV8FNOZEpFgFGxqctT4+cUjMYV5GOVZ8Q7/1cviBB/jc+wF+BRvgXFOIEmLOjbuYkKaJVui3X14y9q69x97Eckj8IjJ4omoobVguQ60mM1h0xv/e/Nx9057G1q4xHLLKf4DL4r7zdhgseNsGywql7ceSToXtWGd3E6rnrcjm7TafsdfH7wvgkq0UXUuJ61fZJqwLOYKmKTGDAM/hVFONQgmedEkmeEpZTd0xodWKhpzM8rXm1bgtzYzxVGi7acAr7P1GzmNjVnFoSVLi2a3Vibvqw0ziM5GuVBpBlyxzUNRJjEkuFxbPBWaM5ArKyjTwnrFbE41ZWCXu2mHEOz+8r7odzuB3wmu3rcvPlbjOECv0RvkoQB9QBfoM7pEPcScY+er03P67ql75Q7c7xvUaVR3XqE7x2V/AGVAKvo=AAADqXicrZJda9swFIYVex9d9tF0u9yNWFhx6Qh2GKw3hbLdDLaLFpY0LEqCrMiJUln2pONAMP5t+w+727+ZnHiNm/RuExjec86jo9eHE6ZSGPD93w3HffDw0eODJ82nz56/OGwdveybJNOM91giEz0IqeFSKN4DAZIPUs1pHEp+Hd58KuvXS66NSNQ3WKV8FNOZEpFgFGxqctT4+cUjMYV5GOVZ8Q7/1cviBB/jc+wF+BRvgXFOIEmLOjbuYkKaJVui3X14y9q69x97Eckj8IjJ4omoobVguQ60mM1h0xv/e/Nx9057G1q4xHLLKf4DL4r7zdhgseNsGywql7ceSToXtWGd3E6rnrcjm7TafsdfH7wvgkq0UXUuJ61fZJqwLOYKmKTGDAM/hVFONQgmedEkmeEpZTd0xodWKhpzM8rXm1bgtzYzxVGi7acAr7P1GzmNjVnFoSVLi2a3Vibvqw0ziM5GuVBpBlyxzUNRJjEkuFxbPBWaM5ArKyjTwnrFbE41ZWCXu2mHEOz+8r7odzuB3wmu3rcvPlbjOECv0RvkoQB9QBfoM7pEPcScY+er03P67ql75Q7c7xvUaVR3XqE7x2V/AGVAKvo=AAADqXicrZJda9swFIYVex9d9tF0u9yNWFhx6Qh2GKw3hbLdDLaLFpY0LEqCrMiJUln2pONAMP5t+w+727+ZnHiNm/RuExjec86jo9eHE6ZSGPD93w3HffDw0eODJ82nz56/OGwdveybJNOM91giEz0IqeFSKN4DAZIPUs1pHEp+Hd58KuvXS66NSNQ3WKV8FNOZEpFgFGxqctT4+cUjMYV5GOVZ8Q7/1cviBB/jc+wF+BRvgXFOIEmLOjbuYkKaJVui3X14y9q69x97Eckj8IjJ4omoobVguQ60mM1h0xv/e/Nx9057G1q4xHLLKf4DL4r7zdhgseNsGywql7ceSToXtWGd3E6rnrcjm7TafsdfH7wvgkq0UXUuJ61fZJqwLOYKmKTGDAM/hVFONQgmedEkmeEpZTd0xodWKhpzM8rXm1bgtzYzxVGi7acAr7P1GzmNjVnFoSVLi2a3Vibvqw0ziM5GuVBpBlyxzUNRJjEkuFxbPBWaM5ArKyjTwnrFbE41ZWCXu2mHEOz+8r7odzuB3wmu3rcvPlbjOECv0RvkoQB9QBfoM7pEPcScY+er03P67ql75Q7c7xvUaVR3XqE7x2V/AGVAKvo=

�(u) =

1

p
2u1 . . .

p
2ud u

2
1 . . . u

2
d u1u2 . . . ud�1ud

⇤>
AAACznicdVJNbxMxEPUuX2X5SuHIxSKAyoFonQtcKkUgISQuQSJtpTiNvF5vYtVrb+3ZimCtuPKz+A3cuHHhf+BNKti0ZSRLb957Y489ziolHaTpzyi+dv3GzVs7t5M7d+/df9DbfXjgTG25mHCjjD3KmBNKajEBCUocVVawMlPiMDt52+qHZ8I6afQnWFViVrKFloXkDAI17/2m1VLu0ZLBMit83bzA+5hmYiG1zwJp5ecmIfg5pu7Ugh82/5zzNa1yA+5qPW/pjv14uFXQMW6U7s6d5H9VPn9Jmm0ipDShQud/ez/2FEzVzHv9dJCuA18G5Bz0R0+/v/uFEBrPez9obnhdCg1cMeemJK1g5pkFyZVoElo7UTF+whZiGqBmpXAzvx5Hg58FJseFsWFpwGu2W+FZ6dyqzIKz7d5d1FryKm1aQ/F65qWuahCabw4qaoXB4Ha2OJdWcFCrABi3MvSK+ZJZxiH8gCQ8Arl45cvgYDgg6YB8JP3RG7SJHfQYPUF7iKBXaITeozGaIB59iE6jL5GPx/FZ3MRfN9Y4Oq95hLYi/vYH0kPiFg==AAACznicdVLNbhMxEPZugZblpwGOXCwCqByIdnNpL5UCSAiJS5BIWylOI6/Xm1j12lt7tiK1Vlx5Cd6FZ+DGAyBx5gnwJhVs2jKSpW++7xt77HFaSmEhjn8E4caNm7c2t25Hd+7eu7/defDwwOrKMD5iWmpzlFLLpVB8BAIkPyoNp0Uq+WF68qbRD8+4sUKrj7Ao+aSgMyVywSh4atr5Scq52CEFhXmau6p+gfcxSflMKJd60ohPdZTg55jYUwOuX/9zTpe0zDTY6/WsoVv24/5aQcu4Uto7t5L/VbnsZVKvEz4lEeEq+9v7sSOgy3ra6ca9eBn4KkguQHfw9NvbX7+/vhpOO99JpllVcAVMUmvHSVzCxFEDgkleR6SyvKTshM742ENFC24nbjmOGj/zTIZzbfxSgJdsu8LRwtpFkXpn0729rDXkddq4gnxv4oQqK+CKrQ7KK4lB42a2OBOGM5ALDygzwveK2ZwaysD/gMg/QnL5ylfBQb+XxL3kQ9IdvEar2EKP0RO0gxK0iwboHRqiEWLB++A0OA9cOAzPwjr8vLKGwUXNI7QW4Zc/ScHj7g==AAACznicdVLNbhMxEPZugZblpwGOXCwCqByIdnNpL5UCSAiJS5BIWylOI6/Xm1j12lt7tiK1Vlx5Cd6FZ+DGAyBx5gnwJhVs2jKSpW++7xt77HFaSmEhjn8E4caNm7c2t25Hd+7eu7/defDwwOrKMD5iWmpzlFLLpVB8BAIkPyoNp0Uq+WF68qbRD8+4sUKrj7Ao+aSgMyVywSh4atr5Scq52CEFhXmau6p+gfcxSflMKJd60ohPdZTg55jYUwOuX/9zTpe0zDTY6/WsoVv24/5aQcu4Uto7t5L/VbnsZVKvEz4lEeEq+9v7sSOgy3ra6ca9eBn4KkguQHfw9NvbX7+/vhpOO99JpllVcAVMUmvHSVzCxFEDgkleR6SyvKTshM742ENFC24nbjmOGj/zTIZzbfxSgJdsu8LRwtpFkXpn0729rDXkddq4gnxv4oQqK+CKrQ7KK4lB42a2OBOGM5ALDygzwveK2ZwaysD/gMg/QnL5ylfBQb+XxL3kQ9IdvEar2EKP0RO0gxK0iwboHRqiEWLB++A0OA9cOAzPwjr8vLKGwUXNI7QW4Zc/ScHj7g==AAACznicdVJNbxMxEPUuHy3LV4AjF4sIVA5E61zopVLVXpC4BIm0leI08nq9iVWvvbVnqwZrxZXfx40fwP/Am0SwactIlt6898Yej51VSjpI019RfO/+g4c7u4+Sx0+ePnvee/HyxJnacjHmRhl7ljEnlNRiDBKUOKusYGWmxGl2cdzqp1fCOmn0V1hWYlqyuZaF5AwCNev9ptVC7tGSwSIrfN28xweYZmIutc8CaeV1kxD8DlN3acEPm3/O2YpWuQF3t563dMd+Ptwq6BjXSnfnTvK/Kp9/IM02EVKaUKHzv72fewqmama9fjpIV4FvA7IBfbSJ0az3k+aG16XQwBVzbkLSCqaeWZBciSahtRMV4xdsLiYBalYKN/Wr52jw28DkuDA2LA14xXYrPCudW5ZZcLbdu5taS96lTWoo9qde6qoGofn6oKJWGAxu3xbn0goOahkA41aGXjFfMMs4hB+QhCGQm1e+DU6GA5IOyBfSPzzajGMXvUZv0B4i6CM6RJ/QCI0Rjz5Hl9G3yMej+Cpu4u9raxxtal6hrYh//AG7dN/M

�(v) =

1

p
2v1 . . .

p
2vd v

2
1 . . . v

2
d v1v2 . . . vd�1ud

⇤>
AAACznicdZLLjtMwFIadcBvCZQos2VgU0LCgiruBDVIFEkJiUyQ6M1LdqRzHaa1x7Ix9UlGsiC2PxTOwY8eG98BpR0PmwpEs/f7+/yTHcbJKSQdp+iuKr12/cfPWzu3kzt1793d7Dx7uO1NbLibcKGMPM+aEklpMQIISh5UVrMyUOMiO37X+wUpYJ43+DOtKzEq20LKQnEFA894fWi3lHi0ZLLPCr5oX+A2mmVhI7bMArfzSJAQ/x9SdWPDD5l9yvsEqN+Cu9vMWd+JHw3MNneDW6T65s/lfl89fkuYM1C0IW5pQofOz2Y88BVM1814/HaSbwpcFORX90dMf738jhMbz3k+aG16XQgNXzLkpSSuYeWZBciWahNZOVIwfs4WYBqlZKdzMb66jwc8CyXFhbFga8IZ2OzwrnVuXWUi207uLXguv8qY1FK9nXuqqBqH59kVFrTAY3N4tzqUVHNQ6CMatDLNivmSWcQh/QBI+Arl45Mtifzgg6YB8Iv3RW7StHfQYPUF7iKBXaIQ+oDGaIB59jE6ir5GPx/EqbuJv22gcnfY8Qucq/v4X4h3iHg==AAACznicdZLLbtQwFIadcGkJtyks2VgMoLJgFM+GbioNRUJIbAaJaSuNpyPHcWasOk5qn4yYWhFbXoJ34RnY8QBIrHkCnJmqpBeOZOn39/8nOY6TlEpaiOOfQXjj5q3bG5t3orv37j942Nl6tG+LynAx4oUqzGHCrFBSixFIUOKwNILliRIHyfHbxj9YCGNloT/BshSTnM20zCRn4NG084uWc7lNcwbzJHOL+iXexTQRM6ld4qGRn+uI4BeY2hMDrl//S05XWKUF2Ov9tMGt+FH/QkMruHbaT25t/tfl0lekPgdVA/yWRlTo9Hz2I0ehKOtppxv34lXhq4Kcie7g2fd3v/98ezOcdn7QtOBVLjRwxawdk7iEiWMGJFeijmhlRcn4MZuJsZea5cJO3Oo6avzckxRnhfFLA17RdodjubXLPPHJZnp72Wvgdd64gmxn4qQuKxCar1+UVQpDgZu7xak0goNaesG4kX5WzOfMMA7+D4j8RyCXj3xV7Pd7JO6Rj6Q72EPr2kRP0FO0jQh6jQboPRqiEeLBh+AkOA1cOAwXYR1+WUfD4KznMbpQ4de/WZvj9g==AAACznicdZLLbtQwFIadcGkJtyks2VgMoLJgFM+GbioNRUJIbAaJaSuNpyPHcWasOk5qn4yYWhFbXoJ34RnY8QBIrHkCnJmqpBeOZOn39/8nOY6TlEpaiOOfQXjj5q3bG5t3orv37j942Nl6tG+LynAx4oUqzGHCrFBSixFIUOKwNILliRIHyfHbxj9YCGNloT/BshSTnM20zCRn4NG084uWc7lNcwbzJHOL+iXexTQRM6ld4qGRn+uI4BeY2hMDrl//S05XWKUF2Ov9tMGt+FH/QkMruHbaT25t/tfl0lekPgdVA/yWRlTo9Hz2I0ehKOtppxv34lXhq4Kcie7g2fd3v/98ezOcdn7QtOBVLjRwxawdk7iEiWMGJFeijmhlRcn4MZuJsZea5cJO3Oo6avzckxRnhfFLA17RdodjubXLPPHJZnp72Wvgdd64gmxn4qQuKxCar1+UVQpDgZu7xak0goNaesG4kX5WzOfMMA7+D4j8RyCXj3xV7Pd7JO6Rj6Q72EPr2kRP0FO0jQh6jQboPRqiEeLBh+AkOA1cOAwXYR1+WUfD4KznMbpQ4de/WZvj9g==AAACznicdZJNb9MwGMedjJcRXlbgyMWiAo0DVdwLXJAmuEziUiS6Taq7ynGc1ppjZ/aTimJFu+7zceMD8D1w2mpkLzxSpL9////jPLGTVUo6SNPfUbxz7/6Dh7uPksdPnj7b6z1/ceRMbbkYc6OMPcmYE0pqMQYJSpxUVrAyU+I4O/vS+sdLYZ00+jusKjEt2VzLQnIGAc16f2i1kPu0ZLDICr9s3uFPmGZiLrXPArTyR5MQ/BZTd27BD5t/ydkaq9yAu9vPW9yJnw6vNXSCG6e7c2fxvy6fvyfNFahbEJY0oULnV7Ofegqmama9fjpI14VvC7IVfbSt0az3i+aG16XQwBVzbkLSCqaeWZBciSahtRMV42dsLiZBalYKN/Xr62jwm0ByXBgbHg14TbsdnpXOrcosJNvp3U2vhXd5kxqKj1MvdVWD0HzzoqJWGAxu7xbn0goOahUE41aGWTFfMMs4hD8gCYdAbn7ybXE0HJB0QL6R/sHn7XHsolfoNdpHBH1AB+gQjdAY8ehrdB79jHw8ipdxE19sonG07XmJrlV8+RfLTt/U

Non-linear Non-linear + feature combinationLinear

K(u,v) = (1 + u>v)3
AAACKnicbVDLSsNAFL2pr1pfUZduBotQUUqiC90IVTeCmwpWhSYtk+nEDp08mJkUSsi/uHPjr7jpQilu/RAnrWJ9HBg4nHMuc+/xYs6ksqyRUZiZnZtfKC6WlpZXVtfM9Y0bGSWC0AaJeCTuPCwpZyFtKKY4vYsFxYHH6a3XO8/92z4VkkXhtRrE1A3wfch8RrDSUts8vaw4AVZdz0+TbB998X62i05QxUZ76NtupY6K4mw61DpEbbNsVa0x0F9if5JybdvZewCAetscOp2IJAENFeFYyqZtxcpNsVCMcJqVnETSGJMevqdNTUMcUOmm41MztKOVDvIjoV+o0FidnkhxIOUg8HQy31L+9nLxP6+ZKP/YTVkYJ4qGZPKRn3CkIpT3hjpMUKL4QBNMBNO7ItLFAhOl2y3pEuzfJ/8lNwdV26raV3a5dgYTFGELtqECNhxBDS6gDg0g8AjP8AKvxpMxNEbG2yRaMD5nNuEHjPcP5UGn0Q==AAACKnicbVDLSgMxFM3UV62vUZduQotQqZQZXehGqLoR3FSwttCZlkyaaUMzD5JMYRjmL/wHN/6Kmy6U4tYPMX2ItfVA4HDOueTe44SMCmkYIy2zsrq2vpHdzG1t7+zu6fsHTyKIOCY1HLCANxwkCKM+qUkqGWmEnCDPYaTu9G/Hfn1AuKCB/yjjkNge6vrUpRhJJbX16/ui5SHZc9wkSk/hDx+kJ/AKFk1Ygr92K7FkEKbzodY5bOsFo2xMAJeJOSOFSt4qPY8qcbWtD61OgCOP+BIzJETTNEJpJ4hLihlJc1YkSIhwH3VJU1EfeUTYyeTUFB4rpQPdgKvnSzhR5ycS5AkRe45KjrcUi95Y/M9rRtK9tBPqh5EkPp5+5EYMygCOe4MdygmWLFYEYU7VrhD3EEdYqnZzqgRz8eRl8nRWNo2y+WAWKjdgiiw4AnlQBCa4ABVwB6qgBjB4AW/gHXxor9pQG2mf02hGm80cgj/Qvr4B7qupVw==AAACKnicbVDLSgMxFM3UV62vUZduQotQqZQZXehGqLoR3FSwttCZlkyaaUMzD5JMYRjmL/wHN/6Kmy6U4tYPMX2ItfVA4HDOueTe44SMCmkYIy2zsrq2vpHdzG1t7+zu6fsHTyKIOCY1HLCANxwkCKM+qUkqGWmEnCDPYaTu9G/Hfn1AuKCB/yjjkNge6vrUpRhJJbX16/ui5SHZc9wkSk/hDx+kJ/AKFk1Ygr92K7FkEKbzodY5bOsFo2xMAJeJOSOFSt4qPY8qcbWtD61OgCOP+BIzJETTNEJpJ4hLihlJc1YkSIhwH3VJU1EfeUTYyeTUFB4rpQPdgKvnSzhR5ycS5AkRe45KjrcUi95Y/M9rRtK9tBPqh5EkPp5+5EYMygCOe4MdygmWLFYEYU7VrhD3EEdYqnZzqgRz8eRl8nRWNo2y+WAWKjdgiiw4AnlQBCa4ABVwB6qgBjB4AW/gHXxor9pQG2mf02hGm80cgj/Qvr4B7qupVw==AAACKnicbVDLSsNAFJ3UV62vqEs3g0WoKCXRhW6EqhvBTQX7gCYtk+mkHTp5MDMplJDvceOvuOlCKW79ECdtxNp6YOBwzrnMvccJGRXSMCZabmV1bX0jv1nY2t7Z3dP3D+oiiDgmNRywgDcdJAijPqlJKhlphpwgz2Gk4QzuU78xJFzQwH+Wo5DYHur51KUYSSV19NvHkuUh2XfcOErO4Q8fJqfwBpZMeAZ/7XZsySBM5kPtS9jRi0bZmAIuEzMjRZCh2tHHVjfAkUd8iRkSomUaobRjxCXFjCQFKxIkRHiAeqSlqI88Iux4emoCT5TShW7A1fMlnKrzEzHyhBh5jkqmW4pFLxX/81qRdK/tmPphJImPZx+5EYMygGlvsEs5wZKNFEGYU7UrxH3EEZaq3YIqwVw8eZnUL8qmUTafzGLlLqsjD47AMSgBE1yBCngAVVADGLyAN/AOPrRXbaxNtM9ZNKdlM4fgD7Svb9LCpkg=

Introduction to Information Retrieval

26

Kernels
§ Why use kernels?

§ Make non-separable problem separable.

§ Map data into better representational space

§ Can be learned to capture some notion of “similarity”

§ Common kernels
§ Linear

§ Polynomial K(x,z) = (1+xTz)d
§ Gives feature combinations

§ Radial basis function (infinite dimensional space)

§ Text classification:
§ Usually use linear or polynomial kernels

Sec. 15.2.3

K(xi,xj ;�) = exp(�
kxi � xjk2

2�2
)

AAACVHicbVHLSgMxFM1Mrdb6qrp0EyxCC7bMFEFBhKIbwU0F+4BOLZn0ThubeZBkpGXoR+pC8EvcuDB9LPrwQuDk3HOSmxM34kwqy/o2zNRWensns5vd2z84PModnzRkGAsKdRryULRcIoGzAOqKKQ6tSADxXQ5Nd/gw7TffQUgWBi9qHEHHJ/2AeYwSpalubvhUcHyiBq6XjCZddomXdm+32JGs75MivsMOjKJCCTueIDRxGiDUspTh0ooTzxSvlUlSmR+hYbGby1tla1Z4E9gLkEeLqnVzn04vpLEPgaKcSNm2rUh1EiIUoxwmWSeWEBE6JH1oaxgQH2QnmYUywRea6WEvFHoFCs/YZUdCfCnHvquV08Hlem9K/tdrx8q76SQsiGIFAZ1f5MUcqxBPE8Y9JoAqPtaAUMH0rJgOiI5N6X/I6hDs9SdvgkalbFtl+/kqX71fxJFBZ+gcFZCNrlEVPaIaqiOKPtCPgQzD+DJ+zZSZnktNY+E5RStlHv4Be+6yug==AAACVHicbVHLSgMxFM1Mrdb6qrp0EyxCC7bMFEFBhKIbwU0F+4BOLZn0ThubeZBkpGXoR+pC8EvcuDB9LPrwQuDk3HOSmxM34kwqy/o2zNRWensns5vd2z84PModnzRkGAsKdRryULRcIoGzAOqKKQ6tSADxXQ5Nd/gw7TffQUgWBi9qHEHHJ/2AeYwSpalubvhUcHyiBq6XjCZddomXdm+32JGs75MivsMOjKJCCTueIDRxGiDUspTh0ooTzxSvlUlSmR+hYbGby1tla1Z4E9gLkEeLqnVzn04vpLEPgaKcSNm2rUh1EiIUoxwmWSeWEBE6JH1oaxgQH2QnmYUywRea6WEvFHoFCs/YZUdCfCnHvquV08Hlem9K/tdrx8q76SQsiGIFAZ1f5MUcqxBPE8Y9JoAqPtaAUMH0rJgOiI5N6X/I6hDs9SdvgkalbFtl+/kqX71fxJFBZ+gcFZCNrlEVPaIaqiOKPtCPgQzD+DJ+zZSZnktNY+E5RStlHv4Be+6yug==AAACVHicbVHLSgMxFM1Mrdb6qrp0EyxCC7bMFEFBhKIbwU0F+4BOLZn0ThubeZBkpGXoR+pC8EvcuDB9LPrwQuDk3HOSmxM34kwqy/o2zNRWensns5vd2z84PModnzRkGAsKdRryULRcIoGzAOqKKQ6tSADxXQ5Nd/gw7TffQUgWBi9qHEHHJ/2AeYwSpalubvhUcHyiBq6XjCZddomXdm+32JGs75MivsMOjKJCCTueIDRxGiDUspTh0ooTzxSvlUlSmR+hYbGby1tla1Z4E9gLkEeLqnVzn04vpLEPgaKcSNm2rUh1EiIUoxwmWSeWEBE6JH1oaxgQH2QnmYUywRea6WEvFHoFCs/YZUdCfCnHvquV08Hlem9K/tdrx8q76SQsiGIFAZ1f5MUcqxBPE8Y9JoAqPtaAUMH0rJgOiI5N6X/I6hDs9SdvgkalbFtl+/kqX71fxJFBZ+gcFZCNrlEVPaIaqiOKPtCPgQzD+DJ+zZSZnktNY+E5RStlHv4Be+6yug==AAACVHicbVHLSgMxFM1Mrdb6qrp0EyxCC7bMFEFBhKIbwU0F+4BOLZn0ThubeZBkpGXoR+pC8EvcuDB9LPrwQuDk3HOSmxM34kwqy/o2zNRWensns5vd2z84PModnzRkGAsKdRryULRcIoGzAOqKKQ6tSADxXQ5Nd/gw7TffQUgWBi9qHEHHJ/2AeYwSpalubvhUcHyiBq6XjCZddomXdm+32JGs75MivsMOjKJCCTueIDRxGiDUspTh0ooTzxSvlUlSmR+hYbGby1tla1Z4E9gLkEeLqnVzn04vpLEPgaKcSNm2rUh1EiIUoxwmWSeWEBE6JH1oaxgQH2QnmYUywRea6WEvFHoFCs/YZUdCfCnHvquV08Hlem9K/tdrx8q76SQsiGIFAZ1f5MUcqxBPE8Y9JoAqPtaAUMH0rJgOiI5N6X/I6hDs9SdvgkalbFtl+/kqX71fxJFBZ+gcFZCNrlEVPaIaqiOKPtCPgQzD+DJ+zZSZnktNY+E5RStlHv4Be+6yug==

Introduction to Information Retrieval

27

Classification with SVMs with Kernels
§ Given a new point x, we can compute its score

i.e.,

§ Decide class based on whether < or > 0

§ E.g., in scikit-learn

Sec. 15.1

-1
0
1

P
sv2SV ↵svK(xsv,x) + b

AAACKnicbZDLSsNAFIZPvNZ6q7p0M1iEFqUkbnThoupGcFPRXqApZTKdtEMnkzAzKZYQ8G3c+Bzu3HShFLc+iElbUKs/DPx85xzmnN8JOFPaNMfGwuLS8spqZi27vrG5tZ3b2a0pP5SEVonPfdlwsKKcCVrVTHPaCCTFnsNp3elfpfX6gErFfHGvhwFtebgrmMsI1glq5y5sFXrtSA2QzQS6q8XIxjzo4RTF6KZge1j3HDd6iCfkGH2DIjpCDsq2c3mzZE6E/hprZvLl4sv5IwBU2rmR3fFJ6FGhCcdKNS0z0K0IS80Ip3HWDhUNMOnjLm0mVmCPqlY0OTVGhwnpINeXyRMaTejPiQh7Sg09J+lMF1XztRT+V2uG2j1rRUwEoaaCTD9yQ460j9LcUIdJSjQfJgYTyZJdEelhiYlO0k1DsOZP/mtqJyXLLFm3Vr58CVNlYB8OoAAWnEIZrqECVSDwBK/wBu/GszEyxsbHtHXBmM3swS8Zn19QVKinAAACKnicbZDLSsNAFIYn9VbrrerSzWARWpSSuNGFi6obwU3F3qAJYTKdtEMnkzAzKZaQd3Hnxudw56YLpbj1LdyYtAW19YeBn++cw5zzOwGjUun6WMssLa+srmXXcxubW9s7+d29hvRDgUkd+8wXLQdJwigndUUVI61AEOQ5jDSd/nVabw6IkNTnNTUMiOWhLqcuxUglyM5fmjL07EgOoEk5vG/E0EQs6KEUxfC2aHpI9Rw3eogn5AT+gBI8hg7M2fmCXtYngovGmJlCpfRyob5qj1U7PzI7Pg49whVmSMq2oQfKipBQFDMS58xQkgDhPuqSdmI58oi0osmpMTxKSAe6vkgeV3BCf09EyJNy6DlJZ7qonK+l8L9aO1TuuRVRHoSKcDz9yA0ZVD5Mc4MdKghWbJgYhAVNdoW4hwTCKkk3DcGYP3nRNE7Lhl427oxC5QpMlQUH4BAUgQHOQAXcgCqoAwyewCt4A+/aszbSxtrHtDWjzWb2wR9pn9/V96qKAAACKnicbZDLSsNAFIYn9VbrrerSzWARWpSSuNGFi6obwU3F3qAJYTKdtEMnkzAzKZaQd3Hnxudw56YLpbj1LdyYtAW19YeBn++cw5zzOwGjUun6WMssLa+srmXXcxubW9s7+d29hvRDgUkd+8wXLQdJwigndUUVI61AEOQ5jDSd/nVabw6IkNTnNTUMiOWhLqcuxUglyM5fmjL07EgOoEk5vG/E0EQs6KEUxfC2aHpI9Rw3eogn5AT+gBI8hg7M2fmCXtYngovGmJlCpfRyob5qj1U7PzI7Pg49whVmSMq2oQfKipBQFDMS58xQkgDhPuqSdmI58oi0osmpMTxKSAe6vkgeV3BCf09EyJNy6DlJZ7qonK+l8L9aO1TuuRVRHoSKcDz9yA0ZVD5Mc4MdKghWbJgYhAVNdoW4hwTCKkk3DcGYP3nRNE7Lhl427oxC5QpMlQUH4BAUgQHOQAXcgCqoAwyewCt4A+/aszbSxtrHtDWjzWb2wR9pn9/V96qKAAACKnicbZDLSsNAFIYn9VbrLerSzWARKkpJ3Oiy6kZwU9FeoAlhMp20QyeTMDMplpDnceOruOlCKW59EJM0oLb+MPDznXOYc343ZFQqw5hppZXVtfWN8mZla3tnd0/fP2jLIBKYtHDAAtF1kSSMctJSVDHSDQVBvstIxx3dZvXOmAhJA/6kJiGxfTTg1KMYqRQ5+rUlI9+J5RhalMPHdgItxMIhylAC72uWj9TQ9eLnJCfn8AecwjPowoqjV426kQsuG7MwVVCo6ehTqx/gyCdcYYak7JlGqOwYCUUxI0nFiiQJER6hAemlliOfSDvOT03gSUr60AtE+riCOf09ESNfyonvpp3ZonKxlsH/ar1IeVd2THkYKcLx/CMvYlAFMMsN9qkgWLFJahAWNN0V4iESCKs03SwEc/HkZdO+qJtG3Xwwq42bIo4yOALHoAZMcAka4A40QQtg8ALewDv40F61qTbTPuetJa2YOQR/pH19A7kcprw=

Introduction to Information Retrieval

28

Pros and Cons of the SVM Classifier

§ Pros
§ High accuracy

§ Fast classification speed

§ Works with cases where #features > #samples

§ Can adapt to different objects (due to Kernel)
§ Any K(u, v) can be used if symmetric, continuous, and positive

semi-definite.

§ Or explicit engineer the feature space.

§ Cons
§ Training does not scale well with number of training

samples (O(d*n2) or O(d*n3))

§ Hyper-parameters needs tuning

Sec. 15.1

Introduction to Information Retrieval

Resources for today�s lecture
§ Christopher J. C. Burges. 1998. A Tutorial on Support Vector Machines for Pattern Recognition

§ S. T. Dumais. 1998. Using SVMs for text categorization, IEEE Intelligent Systems, 13(4)
§ S. T. Dumais, J. Platt, D. Heckerman and M. Sahami. 1998. Inductive learning algorithms and

representations for text categorization. CIKM �98, pp. 148-155.
§ Yiming Yang, Xin Liu. 1999. A re-examination of text categorization methods. 22nd Annual

International SIGIR

§ Tong Zhang, Frank J. Oles. 2001. Text Categorization Based on Regularized Linear
Classification Methods. Information Retrieval 4(1): 5-31

§ Trevor Hastie, Robert Tibshirani and Jerome Friedman. Elements of Statistical Learning: Data
Mining, Inference and Prediction. Springer-Verlag, New York.

§ T. Joachims, Learning to Classify Text using Support Vector Machines. Kluwer, 2002.
§ Fan Li, Yiming Yang. 2003. A Loss Function Analysis for Classification Methods in Text

Categorization. ICML 2003: 472-479.

§ Tie-Yan Liu, Yiming Yang, Hao Wan, et al. 2005. Support Vector Machines Classification with
Very Large Scale Taxonomy, SIGKDD Explorations, 7(1): 36-43.

§ �Classic� Reuters-21578 data set: http://www.daviddlewis.com /resources
/testcollections/reuters21578/

Ch. 15