CS计算机代考程序代写 data structure python Machine Learning 1 TU Berlin, WiSe 2020/21

Machine Learning 1 TU Berlin, WiSe 2020/21
Kernel Support Vector Machines
In this exercise sheet, we will implement a kernel SVM. Our implementation will be based on a generic quadratic programming optimizer provided in CVXOPT (python-cvxopt package, or directly from the website www.cvxopt.org). The SVM will then be tested on the UCI breast cancer dataset, a simple binary classification dataset accessible via the scikit-learn library.
1. Building the Gaussian Kernel (5 P)
As a starting point, we would like to implement the Gaussian kernel, which we will make use of in our kernel SVM implementation. It is defined as:
k(x, x′) = exp 􏰆 − ∥x − x′∥2 􏰇 2σ2
• Implement a function getGaussianKernel that returns for a Gaussian kernel of scale σ, the Gram matrix of the two data sets given as argument.
In [1]: import numpy,scipy,scipy.spatial
def getGaussianKernel(X1,X2,scale):
### TODO: REPLACE BY YOUR OWN CODE
import solutions
K = solutions.getGaussianKernel(X1,X2,scale) return K
###
2. Building the Matrices for the CVXOPT Quadratic Solver (20 P)
We would like to learn a nonlinear SVM by optimizing its dual. An advantage of the dual SVM compared to the primal SVM is that it allows to use nonlinear kernels such as the Gaussian kernel. The dual SVM consists of solving the following quadratic program:
􏰃N i=1
which is of similar form to our dual SVM (note that x will correspond to the parameters (αi)i of the SVM). We need to build the data structures (vectors and matrices) that makes solving this quadratic problem equivalent to solving our dual SVM.
• Implement a function getQPMatrices that builds the matrices P, q, G, h, A, b (of type cvxopt.matrix) that need to be passed as argument to the optimizer cvxopt.solvers.qp.
􏰃N 1􏰃 max αi −
0 ≤ αi ≤ C
αiyi = 0.
We would like to rely on a CVXOPT solver to obtain a solution to our SVM dual. The function
cvxopt.solvers.qp solves an optimization problem of the type:
min 1x⊤Px+q⊤x
x2 subject to Gx ≼ h
and Ax = b.
α2 i=1 ij
αiαjyiyjk(xi,xj)
subject to:
and
1

In [2]: import cvxopt,cvxopt.solvers
cvxopt.solvers.options[’show_progress’] = False
def getQPMatrices(K,T,C):
### TODO: REPLACE BY YOUR CODE
import solutions
P,q,G,h,A,b = solutions.getQPMatrices(K,T,C) return P,q,G,h,A,b
###
3. Computing the Bias Parameters (10 P)
Given the parameters (αi)i the optimization procedure has found, the prediction of the SVM is given by:
N
f(x) = sign􏰆 􏰃 αiyik(x, xi) + θ􏰇
i=1
Note that the parameter θ has not been computed yet. It can be obtained from any support vector that lies exactly on the margin, or equivalently, whose associated parameter α is not equal to 0 or C. Calling one such vector “xM ”, the parameter θ can be computed as:
N
θ=yM −􏰃αjyjk(xM,xj)
j=1
• Implement a function getTheta that takes as input the Gram Matrix used for training, the label vector, the solution of our quadratic program, and the hyperparameter C. The function should return the parameter θ.
In [3]: def getTheta(K,T,alpha,C):
### TODO: REPLACE BY YOUR CODE
import solutions
theta = solutions.getTheta(K,T,alpha,C) return theta
###
4. Implementing a class GaussianSVM (15 P)
All functions that are needed to learn the SVM have now been built. We would like to implement a SVM class that connects them and make the SVM easily usable. The class structure is given below and contains two functions, one for training the model, and one for applying it to test data.
• Implement the function fit that makes use of the functions getGaussianKernel, getQPMatrices, getTheta you have already implemented. The function should learn the SVM model and store the support vectors, their label, (αi)i and θ into the object (self).
• Implement the function predict that makes use of the stored information to compute the SVM output for any new collection of data points
In [4]: class GaussianSVM:
def __init__(self,C=1.0,scale=1.0):
self.C, self.scale = C, scale
def fit(self,X,T):
2

5. Analysis
### TODO: REPLACE BY YOUR CODE
import solutions solutions.fit(self,X,T) ###
def predict(self,X):
### TODO: REPLACE BY YOUR CODE
import solutions
Y = solutions.predict(self,X) return Y
###
The following code tests the SVM on some breast cancer binary classification dataset for a range of scale and soft-margin parameters. For each combination of parameters, we output the number of support vectors as well as the train and test accuracy averaged over a number of random train/test splits. Running the code below should take approximately 1-2 minutes.
In [5]: import numpy,sklearn,sklearn.datasets,numpy
scale=
scale=
scale=
D = sklearn.datasets.load_breast_cancer()
X = D[’data’]
T = D[’target’]
T = (D[’target’]==1)*2.0-1.0
for scale in [30,100,300,1000,3000]:
for C in [10,100,1000,10000]:
acctrain,acctest,nbsvs = [],[],[]
svm = GaussianSVM(C=C,scale=scale)
for i in range(10):
# Split the data
R = numpy.random.mtrand.RandomState(i).permutation(len(X))
Xtrain,Xtest = X[R[:len(R)//2]]*1,X[R[len(R)//2:]]*1
Ttrain,Ttest = T[R[:len(R)//2]]*1,T[R[len(R)//2:]]*1
# Train and test the SVM
svm.fit(Xtrain,Ttrain)
acctrain += [(svm.predict(Xtrain)==Ttrain).mean()]
acctest += [(svm.predict(Xtest)==Ttest).mean()]
nbsvs += [len(svm.X)*1.0]
print(’scale=%9.1f C=%9.1f nSV: %4d train: %.3f test: %.3f’%(
scale,C,numpy.mean(nbsvs),numpy.mean(acctrain),numpy.mean(acctest)))
print(’’)
30.0 C= 10.0 nSV: 183 train: 0.997 test: 0.921
30.0 C= 100.0 nSV: 178 train: 1.000 test: 0.918
30.0 C= 1000.0 nSV: 184 train: 1.000 test: 0.918
3

scale= 30.0 C= 10000.0 nSV: 182 train: 1.000 test: 0.918
scale= 100.0 C= 10.0 nSV: 117 train: 0.965 test: 0.935
scale= 100.0 C= 100.0 nSV: 97 train: 0.987 test: 0.940
scale= 100.0 C= 1000.0 nSV: 85 train: 0.998 test: 0.932
scale= 100.0 C= 10000.0 nSV: 71 train: 1.000 test: 0.926
scale= 300.0 C= 10.0 nSV: 88 train: 0.939 test: 0.924
scale= 300.0 C= 100.0 nSV: 48 train: 0.963 test: 0.943
scale= 300.0 C= 1000.0 nSV: 36 train: 0.978 test: 0.946
scale= 300.0 C= 10000.0 nSV: 32 train: 0.991 test: 0.941
scale= 1000.0 C= 10.0 nSV: 66 train: 0.926 test: 0.916
scale= 1000.0 C= 100.0 nSV: 55 train: 0.935 test: 0.929
scale= 1000.0 C= 1000.0 nSV: 49 train: 0.956 test: 0.946
scale= 1000.0 C= 10000.0 nSV: 38 train: 0.971 test: 0.951
scale= 3000.0 C= 10.0 nSV: 87 train: 0.912 test: 0.903
scale= 3000.0 C= 100.0 nSV: 68 train: 0.926 test: 0.919
scale= 3000.0 C= 1000.0 nSV: 58 train: 0.934 test: 0.929
scale= 3000.0 C= 10000.0 nSV: 49 train: 0.953 test: 0.943
We observe that the highest accuracy is obtained with a scale parameter that is neither too small nor too large. Best parameters are also often associated to a low number of support vectors.
4