MAST90083 Computational Statistics & Data Mining SVM and ANNs
Tutorial & Practical 11: SVM and ANNs
Question 1
Assume a given data set of feature vectors xi ∈ Rp, i = 1, …, N with corresponding label values
t ∈ {−1,+1}. Within each class, we further assume that the density of the feature vector is
modeled using a kernel density estimator with kernel k (x, x′).
1. Provide the form of the estimated probability density of the feature vector x in each
class.
2. Derive the minimum misclassification-rate decision rule assuming the two classes have
equal prior probability
3. Show that, if the kernel is chosen to be k (x, x′) = x>x′, then the classification rule
reduces to simply assigning a new input feature vector to the class having the largest
mean
4. Show that, if the kernel takes the form k (x, x′) = φ (x)
>
φ (x′), that the classification is
based on the closest mean in the feature space φ (x)
Question 2
Show that the value M = d
2
of the margin for the maximum margin hyperplane is given by
1
M2
=
n∑
i=1
µi
where {µi}, i = 1, …, n are given maximizing
LD =
n∑
i=1
µi −
1
2
n∑
i=1
n∑
j=1
µiµjyiyj(x
>
i xj)
subject to
µi ≥ 0, i = 1, …, n, and
n∑
i=1
µiyi = 0.
Question 3
1. Find the relation between the hyperbolic tangent function
f(u) =
eu − e−u
eu + e−u
and the logistic or sigmoid function
g(u) =
1
1 + e−u
1
MAST90083 Computational Statistics & Data Mining SVM and ANNs
2. Consider a two layer neural network function in which g(.) is used for the hidden unit
nonlinear activation functions. Using the relation between f(.) and g(.) show that there
exits an equivalent neural network which compute the same function but where f(.) is
used for hidden unit activation functions.
Question 4
Consider a single hidden layer neural network function for quantitative output with identity
output function gk(t) = t and logistic sigmoid function for the hidden units. Assume the
weights βm from the input to the hidden units are nearly zero. Show that the resulting model
is nearly linear in the inputs.
2