程序代写代做代考 scheme decision tree Pattern Analysis & Machine Intelligence Research Group

Pattern Analysis & Machine Intelligence Research Group

SVM Classifier

� The goal of classification using SVM is to separate
two classes by a hyperplane induced from the
available examples

� The goal is to produce a classifier that will work
well on unseen examples (generalizes well)

� So it belongs to the decision (function) boundary
approach

1 of 22

Method Description

� Consider the problem of separating the set of
training vectors belonging to two different classes,

� The hyper plane equation is

� � � � � �^ ` ^ 1̀,1,,,,,,,, 2211 ��� yRxyxyxyxs dnn�

0 � bxwT

� Many decision boundaries
can separate these two
classes.

� Which one should we
choose?

Class 2

Class 1

2 of 22

mcrowley
Pencil

Method Description
Examples of decision boundaries

3 of 22

Large Margin Small Margin

mcrowley
Pencil

Method Description

Class 2

Class 1

�Optimal separation:
1. No errors.

2. The distance from the
closest vector to the
hyper plane is
maximized.

� The vectors closest to the
hyper plane are called
support vectors.

m

4 of 22

mcrowley
Pencil

Method Description

Class 2

Class 1

support vectors

m
0 � bxwT

� �
� � m

w
xx

xxwT

2
2

2


��

��

1

1

� �

� �

bxw

bxw
T

T

5 of 22

mcrowley
Pencil

mcrowley
Pencil

mcrowley
Pencil

mcrowley
Pencil

Method Description

To get optimal separation, it becomes a constrained
optimization problem

� �

1or 1

1 Subject to

with replaced
2
1

Minimize
22

��

�t�

i

i
T

i

y

ibxwy

www

= for support vectors

6 of 22

mcrowley
Pencil

Using Lagrange Multipliers

� �> @
¿
¾
½

¯
®
­

���¦

n

i
i

T
ii bxwywbw 1

2
1

2
1

,,
min

D
D

� �

w

bxw i
T

allfor
then one can achieve min by putting

01y

One problem with the above minimzation problem is if

i

i

f

t��

D

� �> @
¿
¾
½

¯
®
­

���
t ¦

n

i
i

T
ii bxwywbw 1

2
12

1
0

max
,

min

It is better instead to separate the problem like this:

D
D

7 of 22

mcrowley
Pencil

mcrowley
Pencil

mcrowley
Pencil

� �
ctors)support vefor (only 0

0set , 01- with points allFor i
Then only a few !

!�

i

i
T

i bxwy

D

D

ii

n

i xyw ¦
Solution using Karush – Kuhn Tucker

D

We can now substitute this in the objective
function

8 of 22

i=1

mcrowley
Pencil

Method Description
Dual Problem

00,y Subject to

}
2
1

{max

ii

n

1i
i

1,11

t

¦

¦¦

DD

DDD
D

j
T
ijij

n

ji
i

n

i
i xxyy

9 of 22

mcrowley
Pencil

Method Description

� The parameters of the equation of the hyper
plane are defined as

� The support vectors are with non-zero .

� The problem can be solved using
Quadratic Programming or
Gradient Descent

sv

1

y label itswith
ctor support vearbitrary an is where

, and

sv

svsvi

n

i
ii

x

xwybxyw � ¦

D

ix iD

10 of 22

mcrowley
Pencil

Method Description

� Two possible problems
• The previous optimization problem does not

have a solution if the classes are not linearly
separable.

• How to deal with possible noise in the training
data?

• We can allow some error on the training data.
• This is called the soft-margin SVM.

11 of 22

Soft Margin SVM

Class 2

Class 1

“”[� We allow some error in training

j[

jx

ix
i[

12 of 22

mcrowley
Pencil

mcrowley
Pencil

mcrowley
Pencil

mcrowley
Pencil

mcrowley
Pencil

New Optimization Formulation

The optimization problem becomes as follows

� �

� � 0 ,1 Subject to

2
1

b,w, vMinimize
1

2

iiiii
T

i

n

i
i

andbxwy

cw

�!��t�

� ¦

[[

[[

Soft margin parameter

13 of 22

mcrowley
Pencil

mcrowley
Pencil

Non-linear SVM

� SVM can be easily transformed into a non-linear
learner.

� The vectors are transformed into a higher
dimension feature space using a non-linear
mapping .

� The problem is linearly separable in the feature
space.

ix

� �ix)

14 of 22

mcrowley
Pencil

mcrowley
Pencil

Method Description

x

3z

1z

2z

� �x)

15 of 22

mcrowley
Pencil

mcrowley
Pencil

mcrowley
Pencil

mcrowley
Pencil

Method Description

The Kernel Trick

� Computing is computationally inefficient.

� Notice that in the formulation of the optimization
problem we have ݔ௜்ݔ௝ (dot product)

� So we need to compute

� Special mappings such as dot-products can
be computed very efficiently using the kernel
function

� �x)

� � � � � � � �. and x computing without , i jji xxx ))!))�
� �x)

� �., ji xxk
16 of 22

mcrowley
Pencil

mcrowley
Pencil

Method Description

� The relation is

� Example of common kernel functions

� In practice, we specify K, thereby specifying
indirectly, instead of choosing

� � � � � �jiji xxkxx ,, ))

� � � �� �
� � � �
� � � �� �cxxsxxk

xxxxk

xxxxk

sigmoid

d

poly

rbf

��

2121

2121

2
2121

,tanh,

1,,

exp, J

� ��)

17 of 22

mcrowley
Pencil

mcrowley
Pencil

Steps of SVM
1-Solve for ߙ as a quadratic progam

2- Determine the support vectors by
finding the points for which ߙ௜ ൒ 0

1 1, 1

n

i i i
i 1

1
max{ ( , )

2

Subject to y 0, 0

n n

i i j i j i j
i i j

y y K x x
D

D D D

D D

t

¦ ¦

¦

18 of 22

mcrowley
Pencil

3- Calculate

1

sv

and ,

where are the support vector
with its label y

n

i i i sv sv
i

sv

w y x b y w x

x

D

�¦

19 of 22

� To classify a new sample

� � � �

iii

ii
T

xyw

bxsKbxwy

¦

¦

� �

D

D ,y
i

i

support vectors

1

2 0

classotherwise

classtheny t

20 of 22

mcrowley
Pencil

SVM Decision Function
The number of support vectors is minimized by
maximizing the margin.

The decision function of the classifier can be written
using a kernel function of a new sample (to
be classified) and training sample (support vector)

� �,ik x x x
ix

� � � � 0, DOD
Q

� ¦

xxkxD ii
sx
i

i

i

set of support vectors (a subset of the training set)
1 the label of object x

0 the parameters optimized during training by
solving the optimization problem (using
Quad

i

i

s
O
D

r
t

ratic Optimization) can be costly�
21 of 22

– QP, choice of kernel can be simple linear
classifier or nonlinear

– Powerful classifier, maps to higher dimensional to
make nonlinearly separable linearly separable

j
T
i xx

� � � �
classifier polynomialnonlinear a

1 examplefor ,,
p

j
T
iji xxxxk �

22 of 22

Comments on SVM

Slide Number 1
Slide Number 2
Decision Boundary (function) Approach (Discriminant)
Slide Number 4
Slide Number 5
Slide Number 6
Slide Number 7
Slide Number 8
Slide Number 9
Slide Number 10
Slide Number 11
Slide Number 12
Slide Number 13
Slide Number 14
Slide Number 15
Slide Number 16
Slide Number 17
Slide Number 18
Slide Number 19
Slide Number 20
Slide Number 21
Slide Number 22
Slide Number 23
Slide Number 24
Slide Number 25
Slide Number 26
Slide Number 27
Slide Number 28
Slide Number 29
Slide Number 30
Slide Number 31
Slide Number 32
Slide Number 33
Slide Number 34
Slide Number 35
Slide Number 36
Slide Number 37
Slide Number 38
Slide Number 39
Slide Number 40
Slide Number 41
Slide Number 42
Slide Number 43
Steps of SVM
Slide Number 45
Slide Number 46
Slide Number 47
Slide Number 48
Slide Number 49
Neural Network Classifiers
Slide Number 51
Slide Number 52
Slide Number 53
Slide Number 54
NN Activation Functions
Multi-Layer Perceptron (MLP) Feedforward Network
Radial Basis Function Network
Radial Basis Function Network
NN Learning using Back Propagation
NN Advantages
NN Disadvantages
Decision Trees (from MD book ref 3)
Slide Number 63
DT (example)
Issues
ID3
Information/Entropy
ID3
ID3 Example (Output1)
ID3 Example (Output1)
C4.5 – WEKA
C5 – Windows
DT
Multiple Classifier Systems
Slide Number 75
Slide Number 76
Slide Number 77
Slide Number 78
Categorization of MCS Types of Classifiers
Modular Architecture
Modular Architectures
Ensemble Architecture
Aggregation (combining) Schemes
MCS References