统计机器学习代写: Machine Learning for Statistics SDGB 7847

Machine Learning for Statistics

SDGB 7847 – Spring 2017

http://www.cs.columbia.edu/ ̃amoretti/7847

Homework # 2 Due March 12th

Problem 1 (10 points)

Recall that a quadratic form maps a vector to a scalar and is analogous to a quadratic function in multiple

dimensions:

Q(x)= 1xTDxdTx+c (1) 2

In the above, x 2 Rn is a vector, D 2 Rn⇥n is a symmetric positive definite matrix, d 2 Rn is a constant vector and c is a scalar. This is nothing more than a multivariate version of a quadratic function:

f(x)=↵x2 +x+c (2) Now consider a system of linear constraints on the vector x 2 Rn which can be written as follows:

Ax = f Bx g (3)

Again, A 2 Rm1⇥n (an m1 ⇥ n matrix) where m1  n and B 2 Rm2⇥n (an m2 ⇥ n matrix). The vectors f 2 Rm1 and g 2 Rm2 have lengths m1 and m2 respectively. Without loss of generality we can assume the

sum of the components of x is equal to 1.

Xn

(4)

x1 = 1 The quadratic program (QP) can then be written as follows:

minimizex2Rn : such that:

and:

Q(x)= 1xTDxdTx+c 2

Ax = f Bx g

i=1

Write down the quadratic program for the support vector machine in primal form as a function of the three parameters ✓, ✓0, ↵ and dual form as a function of the Lagrange multipliers ↵, including the function to be maximized or minimized, the linear inequalities and the equality constraints. Show all derivations.

1

Problem 2 (15 points)

Write a function that uses an existing QP solver to implement the linear support vector machine. To get you started, here is an example. Suppose we want to minimize the following objective function:

Q(x1,x2)=hx1 x2i” 2 1#”x1#h3 2i”x1#+4 (5) 1 2 x2 x2

Note that this describes a quadratic function of two variables:
Q(x1,x2)=x21 +x2 x1x2 +3×1 2×2 +4 (6)

Now suppose we want to minimize this function subject to linear constraints:

x2 2×1 (7) x2 2+x1 (8) x2  3 (9)

These three inequalities define three lines (or hyperplanes), each of which splits the space in half. The intersection of these three regions define a convex shape (a triangle) in which we are constrained to search for the solution. Plotting the region in R:

There are several good QP solvers (quadprog in R, CVX in Matlab, and CVXOPT in Python) that are all called similarly. To translate the objective function into a form acceptable by the above we will need to define the matrices and vectors of equation (1).

plot(0, 0, xlim = c(2,5.5), ylim = c(1,3.5), type = ”n”, xlab = ”x 1” , ylab = ”x 2” , main=”Feasible Region”)

polygon(c(2,5,1), c(0,3,3), border=TRUE, lwd=4, col=”blue”)

Define the constraints:

So that the following holds:

D = ” 2 1# d = “3# (10) 12 2

26 1 1 37 26 2 37
AT = 41 1 5 b0 = 425 (11)

01 3

26 1 1 37 ” x 1 # 26 2 37
41 15 x2 425 (12)

01 3 This is simple enough in R (or alternatively Matlab or Python):

2

require ( quadprog )

Dmat <dvec <A <bvec <Amat<sol <

2⇤matrix(c(1,1/2,1/2,1), nrow = 2, byrow=TRUE) c(3,2)
matrix(c(1,1,1,1,0,1), ncol = 2 , byrow=TRUE) c(2,2,3)

t(A)
solve .QP(Dmat, dvec , Amat, bvec , meq=0)

The parameter meq tells R that the first constraint should be treated as an equality. Examining the results from quadprog:

> sol$solution
[1] 0.1666667 1.8333333 sol$$value
[ 1 ] 0.08333333 sol$$unconstrained . solution [1] 1.3333333 0.3333333 sol$$iterations
[1]20
sol$$Lagrangian
[1] 1.5 0.0 0.0
sol$$iact
[1] 1

The first command returns the unique minimizer of Q(x1,x2) within the feasible set as the point (0.167, 1.83). The height of the function Q(x1,x2) is given by the value -0.083. We can examine a contour plot (birds eye view) of the solution subject to the constraints as follows (let x2 be y for simplicity):

Figure 1: Minimum of Q subject to constraint 3

# Contour Plot with Feasible region overlay

require( lattice )
qp sol < sol$solution
uc sol < sol$unconstrained . solution
x < seq(2, 5.5 , length . out = 500)
y < seq(1, 3.5 , length . out = 500)
grid < expand.grid(x=x, y=y)
grid$z < with(grid , xˆ2 + yˆ2 x⇤y +3⇤x 2⇤y + 4)

levelplot (z ̃x⇤y, grid , cuts=30, panel = function (…){

panel . levelplot ( . . . )
panel.polygon(c(2,5,1),c(0,3,3), border=TRUE, lwd=4, col=”transparent”) panel.points(c(uc sol[1],qp sol[1]),

c(uc sol[2],qp sol[2]),

lwd=5, col=c(”red”,”yellow”), pch=19)}, colorkey=FALSE,

col . regions = terrain . colors (30))

Your function should take as input a dataset for classification (Fisher’s famous Iris dataset can be called from R or Python or downloaded from the UCI Machine Learning Repository), return the value of ✓ and ✓0 and be able to predict the category of a new data point. Note you do not need to plot the objective function or feasible region as this will be high dimensional.

Problem 3 (15 points)

Run your function on the Iris dataset. Cross validate by splitting the data into two sets. Plot the decision boundary. Report your prediction error. If stuck you may use an existing implementation (see e1071 or sklearn) for partial credit.

Problem 4 (5 points)

Recall that a Mercer kernel is defined by k(x, x ̃) = (x)T (x ̃). The Kernel (Gram) matrix K is an n ⇥ n matrix formed by evaluating all pairs of the n inputs S = {x1, · · · , xn} so that Ki,j = k(xi, xj ). Mercer’s Theorem states that a symmetric function k(·, ·) is a kernel iff the kernel matrix K is positive semi-definite for any finite sample S.

• Prove that for any Mercer kernel and finite sample S, the kernel matrix K is positive semi-definite: cTKc 08c 2Rn.

• Prove that for any two Mercer kernels k1 (·, ·) and k2 (·, ·) the following are Mercer kernels. Hint, the power series representations of several functions may be useful.

– k(x,x ̃)=a·k1(x,x ̃)+b·k2(x,x ̃)fora,b0. 4

– k(x,x ̃)=k1(x,x ̃)⇥k2(x,x ̃).
– k(x, x ̃) = f(k1(x, x ̃)) where f is a polynomial with positive constants. – k(x,x ̃)=exp(k1(x,x ̃))

• Write a formula for corresponding to the Gaussian kernel below: K(x,y) = exp(1kx yk2) = (x) · (y)

2

Problem 4 (bonus)

Modify your code to implement a Gaussian (RBF) and polynomial kernel. Rerun your SVM reporting your classification accuracy on a test set. Plot the decision boundary of the RBF and polynomial kernels. You may also wish to try your function on the MNIST dataset of hand written digits. You may use the code below to download the data using R.

References

The following may be useful as references:

  1. https://www.r-bloggers.com/more-on-quadratic-progamming-in-r/
  2. https://www.mathworks.com/help/optim/ug/quadprog.html
  3. http://cvxopt.org/examples/
  4. http://blog.ryanwalker.us/2014/01/solving-quadratic-progams-with-rs.html
  5. https://people.eecs.berkeley.edu/ ̃jordan/courses/281B-spring04/lectures/lec3. pdf
  6. https://gist.github.com/brendano/39760

5