CS代考 CS5487 Problem Set 9

CS5487 Problem Set 9

Support Vector Machines

Antoni of Computer Science
City University of

Hyperplanes

Problem 9.1 Margin

Let f(x) = wTx+ b and consider the hyperplane f(x) = 0.

(a) Show that the Euclidean distance from a point x

to the hyperplane is

kwk by minimizing

k2 subject to f(x) = 0.

(b) Show that the distance from the origin to the hyperplane is

onto the hyperplane is

. . . . . . . . .

Problem 9.2 SVM dual problem

Consider the SVM problem for linear separable data {X, y},

kwk2 s.t. y

+ b) � 1, 8i. (9.2)

We will derive the dual formulation of the SVM.

(a) Show that the Lagrangian of (9.2) is

L(w, b,↵) =

+ b)� 1), (9.3)

� 0 is a Lagrange multiplier for each i.

(b) Show that the minimum of L(w, b,↵) w.r.t. {w, b} satisfies

= 0. (9.4)

L(↵) = min

L(w, b,↵) =

yielding the SVM dual problem, max

. . . . . . . . .

Problem 9.3 Calculating the bias term

Solving the dual SVM problem yields only the hyperplane parameters w. In this problem, we
consider two ways to calculate the bias term b. Recall that the set of support vectors are those
training points with non-zero Lagrange multipliers,

> 0}. (9.7)

> 0 for points in SV , by the KKT conditions, the equality constraint must be “active” for
the support vector x

+ b) = 1 (9.8)

(a) Using the active constraints, show that b can be calculated as

(b) Another method for calculating the bias is to select one support vector on each side of the
hyperplan. Given points x+ and x�, which are on the positive and negative margins, show

wT (x+ + x�) (9.10)

. . . . . . . . .

Problem 9.4 Soft-margin SVM (1-norm penalty)

Consider the soft-margin SVM we saw in lecture,

+ b) � 1� ⇠

is the slack variable that allows the ith point to violate the margin. The new term in
the objective function penalizes large slack variables (with parameter C). Since the penalty on the
slack variable is their sum and the slack is non-negative, then the penalty function is the 1-norm
of the slack variables. We will derive the dual formulation of this SVM. (This SVM is sometimes
called C-SVM or 1-norm SVM. It is also the most popular one used.)

(a) Introduce Lagrange multipliers ↵

� 0 for the margin constraint, and r

� 0 for the non-negative
slack constraint, show that the Lagrangian is

L(w, b, ⇠,↵, r) =

+ b)� 1 + ⇠

(b) Show that the minimum of L(w, b, ⇠,↵, r) w.r.t. {w, b, ⇠} satisifies

L(↵) = min

L(w, b, ⇠,↵, r) =

(d) Show that the two non-negative constraints on the Lagrange multipliers {↵

} and the equality
constraint from (9.13),

� 0, (9.15)

� 0, (9.16)

are equivalent to

 C. (9.18)

Hence, the SVM dual problem is

(e) Use the KKT conditions to show that only one of these conditions hold for each point x

= 0 and 0 < ↵ < C, and the point is on the margin. = C, and the point is an outlier (violates the margin). = 0, and the point is correctly classified. . . . . . . . . . Problem 9.5 Soft-margin SVM risk function Consider the soft-margin SVM in Problem 9.4. In this problem we will interpret the soft-margin SVM as regularized risk minimization. (a) Show that the slack variables must satisfy, � max(0, 1� y + b)). (9.20) Given the above, at the optimum of the SVM primal problem, what is the expression for ⇠ (b) Show that the C-SVM primal problem is equivalent to max(0, 1� y + b)) + � kwk2 . (9.21) The first term is the empirical risk, where the loss function is ) = max(0, 1� z This is called the “hinge loss”. The second term is a regularization term on w to control its complexity. Hence, SVM is optimizing a regularized risk. (c) Plot the SVM loss function along with the other loss functions from Problem 8.5. Give an intuitive explanation about why the SVM loss function is good (and possibly better than . . . . . . . . . Problem 9.6 Soft-margin SVM (2-norm penalty) Consider the soft-margin SVM problem using a 2-norm penalty on the slack variables, + b) � 1� ⇠ is the slack variable that allows the ith point to violate the margin. We will derive the dual of this SVM. (a) Show that the non-negative constraint on ⇠ is redundant, and hence can be dropped. Hint: show that if ⇠ < 0 and the margin constraint is satisfied, then ⇠ = 0 is also a solution with lower cost. (b) Introduce Lagrange multipliers ↵ for the margin constraint, show that the Lagrangian is L(w, b, ⇠,↵) = + b)� 1 + ⇠ (c) Show that the minimum of L(w, b, ⇠,↵) w.r.t. {w, b, ⇠} satisifies (d) Use (9.25) on the Lagrangian to obtain the dual function, L(↵) = min L(w, b, ⇠,↵) = . Hence, the SVM dual problem is (e) (9.27) is the same as the standard SVM dual, except for the extra 1/C term. What is the role of this term, and how does it a↵ect the solution? . . . . . . . . . Problem 9.7 ⌫-SVM One limitation of the soft-margin SVM using the 1-norm penalty (Problem 9.4) is that there is no intuition for what the parameter C means and it can therefore be di�cult to find good values for it in practice. In this problem we consider a slightly di↵erent, but more intuitive formulation, based on the solution of the following problem, with ⌫ as the parameter kwk2 � ⌫⇢+ + b) � ⇢� ⇠ (a) Derive the dual problem and the resulting decision function. (b) Given the dual solution how would you determine the values of b and ⇢? (c) Define the fraction of margin errors as ) < ⇢}| (9.29) and suppose that we solve the optimization problem on a dataset with the result that ⇢ > 0.

(a) ⌫ is an upper bound on ✏

(b) ⌫ is a lower bound on the fraction of vectors that are support vectors.

(d) Show that if the solution of the second problem leads to ⇢ > 0, then the first problem with C
set a priori to 1

leads to the same decision function.

. . . . . . . . .

Other SVMs

Problem 9.8 Adaptive SVM

In this problem we will consider an adaptive SVM. Suppose we have used a dataset D

to learn a
linear classifier function f

x with decision rule y = sign(f

(x)). Since we have the classifier,
we then threw away the data D

. Now, suppose we receive a new set of data D = {(x

and now wish to update our classifier function f

(x). To do this, we will add a “delta function”
�f(x) = wTx to adapt our original classifier,

(x) +�f(x) = f

(x) + wTx, (9.30)

where w is the parameter vector of the delta-function.
Let’s consider the case when D is linearly separable. We wish to maximize the margin

between the updated classifier and the new training set, which yields the adaptive-SVM objective
function is

(a) Show that the Lagrangian of (9.31) is

� 1), (9.32)

� 0 are the Lagrange multipliers.

(b) Show that the minimum of L(w,↵) w.r.t. w satisifies

Hence, the ASVM dual problem is

s.t. 0  ↵

(d) Compare the ASVM dual in (9.34) with the original SVM dual function? What is the interpre-
tation of the ASVM dual (considering the original SVM dual)? What is the role of the original
classifier f

Note that we can also define a “soft” version of ASVM with slack variables to handle the non-linearly
separable case.

. . . . . . . . .

Problem 9.9 Support Vector Regression (SVR)

In this problem, we will consider support vector regression (SVR), which applies margin principles
from SVMs to regression. The goal is to learn a linear function,

f(x) = wTx+ b, (9.36)

which fits a given dataset D = {(x

2 Rd and y

2 R. Suppose we form a “band”
or “tube” around f(x) with width ✏ (see figure below).

−4 −2 0 2 4 6

We can consider any training pair (x

) that falls inside of the tube as correctly regressed, while
points falling outside of the tube are errors. Assuming that ✏ is known, the SVR problem is

(a) What are the roles of the inequality constraints and the objective function in (9.37)?

(b) Show that the Lagrangian of (9.37) is

L(w, b,↵, ↵̂) =

+ b), (9.38)

� 0 are the Lagrange multiplier for the first inequality constraint, and ↵̂

� 0 are for
the second inequality constraint.

) = 0. (9.39)

(d) Show that the SVR dual function is

L(↵, ↵̂) = min

L(w, b,↵, ↵̂) (9.40)

Hence the SVR dual problem is

(e) Use the KKT conditions to show that only one of these conditions holds for the ith data point,

= 0 and ↵̂

= 0, and the point is inside the ✏-tube.

> 0 and ↵̂

= 0, and the point is on the positive margin of the tube, i.e., y

= 0 and ↵̂

> 0, and the point is on the negative margin of the tube, i.e., y

= 0, i.e., the point can’t both be on the positive margin and negative margin at
the same time.

. . . . . . . . .

Problem 9.10 “Soft” Support Vector Regression

In this problem we consider a “soft” version of SVR. We introduce a set of slack variables ⇠

to allow for some errors on either side of the tube, i.e., some points can be outside of the tube.

−4 −2 0 2 4 6

The amount of error is penalized linearly, similar to SVMs. The SVR problem is now:

+ b)  ✏+ ⇠

At the optimum, if the ith point is inside the positive margin of the ✏-tube, then the slack variable

= 0. Otherwise, if the ith point is outside the positive margin, then ⇠

is the distance between
the point and the margin, i.e., the amount of error y

� f(x) � ✏. The same holds for ⇠̂

negative margin. In other words, the slack variables ⇠

contain the amount of error outside
of the ✏-tube.

(a) Let |z|

be the ✏-insensitive loss function,

= max(0, |z|� ✏) =

|z|� ✏ otherwise.

Show that the primal problem in (9.43) is equivalent to regularized regression with the ✏-
insensitive loss function,

+ � kwk2 . (9.45)

How does this compare with other regularized regression formulations, e.g., regularized least
squares? Comment on its robustness to outliers.

(b) Show that the soft SVR dual problem is:

 C, 0  ↵̂

)| < ✏ inside the ✏-tube. ) + ✏ on the positive margin of the tube. = 0, 0 < ↵̂ )� ✏ on the negative margin of the tube. ) + ✏ outside the positive margin of the tube. )� ✏ outside the negative margin of the tube. . . . . . . . . . 程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts