Introduction to Machine Learning Assignment 4
1. [ 40 points] Soft-Margin Classifier
In class, we considered a “soft-margin” classifier, which can tolerate some misclassified points. Specifically we considered the classifier given by the linear program:
such that
min ξi w⃗ , b , ξ⃗ i
∀i, yi〈w⃗,x⃗(i)〉+b≥1−ξi, and ∀i, ξi≥0.
As we discussed in class, interpreting the geometric meaning of the ξi in this optimization problem is a bit tricky, as they need to be interpreted in the context of a particular ∥w⃗∥.
For example, consider the one-dimensional dataset (−2, −1), (−1, 1), (1, −1), (2, 1), pictured below:
Figure 1: A simple non-separable dataset. Data points for which yi = +1 are marked in black; those for which yi = −1 are marked in white.
What are the optimal w ∈ and ξi for this dataset? (You may assume that b = 0.)
2. [ 20 points] Hard-Margin Classifier and Kernel
Write down (1). the convex optimization form of SVM for the standard hard-margin classifier; and (2) the associated hard-margin classifier for a particular kernel K(·,·) with Φ(x⃗) as a feature mapping.
3. [ 40 points] As introduced in the lecture, SVM solves the following optimization problem, and assume here wT x + b = 0 is the linear separator, wT x + b = 1 and wT x + b = −1 are the two marginal lines for the SVM :
1). The second term in the objective function (i.e., cΣNj=1εj) will be zero if what conditions are satisfied? Does it mean every sample is correctly classified or not?
2). If 0 < εj < 1, what does it mean for sample (xj, yj), where does this sample locate? 3). If 1 < εj < 2, what does it mean for sample (xj, yj), where does this sample locate? 4). If εj > 2, what does it mean for sample (xj, yj), where does this sample locate?
1