程序代做 2 Discriminant Functions

2 Discriminant Functions
1. Consider a dichotomizer defined using the following linear discriminant function g(x) = wtx + w0 where
w = 1 and w0 = −5. Plot the decision boundary, and determine the class of the following feature vectors:
􏰗1􏰘􏰗2􏰘 􏰗3􏰘 1 , 2 ,and 3 .

At decision boundary g(x) = 0, therefore decision boundary is defined by wtx+w0 =0
(2 1)x−5=0
w has two elements, so we know we are working in a 2D feature space.
We also know that we are dealing with a linear discriminant function, so the boundary is a straight line. We can plot a straight line if we know two points on that line.
Let’s find where this line intercepts the axes.
i.e., intercept at (2.5,0)
i.e., intercept at (0,5)
(2 1) 0 −5=0
2×1 − 5 = 0 x1 = 2.5
(21) x −5=0
x2 − 5 = 0
So, hyperplane has equation:
x2 =mx1 +c=−2×1 +5 We could have got this by simply re-arranging
(2 1)x−5=0
In general,
x2 = mx1 + c = −w1 x1 + −w0 w2 w2

To classify a point x we calculate g(x), x is in class 1 if g(x) > 0.
1 −5=2+1−5=−2
2 −5=4+2−5=1
3 −5=6+3−5=4
􏰗1􏰘 Therefore 1
􏰗2􏰘 Therefore 2
􏰗3􏰘 Therefore 3
is in class 2.
is in class 1.
is in class 1.
Note, the vector normal to the hyperplane points towards class 1. The value of g(x) provides a measure of how far x is from the decision boundary. The actual distance is given by |g(x)| .
2. In augmented feature space, a dichotomizer is defined using the following linear discriminant function g(x) = tt 􏰗1􏰘 􏰗1􏰘􏰗2􏰘
a y where a = (−5, 2, 1) and y = x . Determine the class of the following feature vectors: 1 , 􏰗3􏰘
To classify a point x we calculate g(x), x is in class 1 if g(x) > 0.
(−5 2 1) 1 =−5+2+1=−2
(−5 2 1) 2 =−5+4+2=1 2
(−5 2 1) 3 =−5+6+3=4
Note, same as previous question, just using augmented vectors.
􏰗1􏰘 Therefore 1
􏰗2􏰘 Therefore 2
􏰗3􏰘 Therefore 3
is in class 2.
is in class 1.
is in class 1.
3. Consider a 3-dimensional feature space and quadratic discriminant function, g(x), where: g(x)=x21 −x23 +2x2x3 +4x1x2 +3×1 −2×2 +2
This discriminant function defines two classes, such that g(x) > 0 if x ∈ ω1 and g(x) ≤ 0 if x ∈ ω2. Determine the class of each of the following pattern vectors: (1 1 1)t, (−1 0 3)t, and (−1 0 0)t.
g(x=(1 1 1)t)=12 −12 +2+4+3−2+2=9 hence (1 1 1)t is in class 1.

g(x=(−1 0 3)t)=(−1)2 −32 +0+0−3−0+2=−9 hence (−1 0 3)t is in class 2.
g(x=(−1 0 0)t)=(−1)2 −0+0+0−3−0+2=0 hence (−1 0 0)t is in class 2.
4. Consider a dichotomizer defined in a 2-dimensional feature space using a quadratic discriminant function, g(x), where:
g(x) = xtAx + xtb + c Classify the following feature vectors: (0, −1)t, and (1, 1)t, when:
i)A= 1 4 ,b= 2 ,andc=−3.
2 , and c = −3.
􏰗 2 1 􏰘􏰗 x1 􏰘
g(x) = (x1, x2) 1 4 x
2 =(2×1 +x2,x1 +4×2) x
= 2×21 +4×2 +2x1x2 +x1 +2×2 −3
+x1 +2×2 −3 =2×21 +x1x2 +x1x2 +4×2 +x1 +2×2 −3
Whenx=(0,−1)t,g(x)=0+4(−1)2 +0+0+2(−1)−3=−1 g(x) ≤ 0 so x is in class 2.
When x = (1, 1)t, g(x) = 2 + 4 + 2 + 1 + 2 − 3 = 8 g(x) > 0 so x is in class 1.
The decision surface looks like this:
􏰗 1 􏰘 + (x1, x2) 2

􏰗−2 5􏰘􏰗x1􏰘 􏰗1􏰘
g(x) = (x1, x2) 5 −8 x
2 =(−2×1 +5×2,5×1 −8×2) x
=−2×21 +5x1x2 +5x1x2 −8×2 +x1 +2×2 −3
=−2×21 −8×2 +10x1x2 +x1 +2×2 −3 Whenx=(0,−1)t,g(x)=0−8(−1)2 +0+0+2(−1)−3=−13
g(x) ≤ 0 so x is in class 2.
When x = (1, 1)t, g(x) = −2 − 8 + 10 + 1 + 2 − 3 = 0 g(x) = 0 so x is in class 2.
The decision surface looks like this:
+ (x1, x2) 2 − 3 +x1 +2×2 −3

5. In augmented feature space, a dichotomizer is defined using the following linear discriminant function g(x) = 0
 −1  aty where at = (−3, 1, 2, 2, 2, 4) and yt = (1, xt). Determine the class of the following feature vectors, x:  0 
and  1 .  1 
To classify a point x we calculate g(x), x is in class 1 if g(x) > 0.
Therefore class 2.
(−3 1 2 2 2 4) 0 =−3+0−2+0+0+4=−1
  0  1
(−3 1 2 2 2 4) 1 =−3+1+2+2+2+4=8
  1  1
Therefore class 1.
Note, same as part (i) of previous question, just using generalised linear discriminant function where yt =
(1 x1 x2 x1x2 x21 x2).
6. A Linear Discriminant Function is used to define a Dichotomizer, such that x is assigned to class 1 if g(x) > 0, and x is assigned to class 2 otherwise. Use the Batch Perceptron Learning Algorithm (with augmented notation and sample normalisation), to find appropriate parameters for the linear discriminant function, when the data set is as shown.
Assume an initial values of a = (w0, wt)t = (−25, 6, 3)t, and use a learning rate of 1.
Using Augmented notation and sample normalisation, dataset is:
class (1, 5)t 1 (2, 5)t 1 (4, 1)t 2 (5, 1)t 2
(1, 5)t (2, 5)t (4, 1)t (5, 1)t
(1, 1, 5)t
(1, 2, 5)t (−1, −4, −1)t (−1, −5, −1)t

For the Batch Perceptron Learning Algorithm, weights are updated such that: a ← a + η 􏰕y∈χ y. Here, η = 1. Epoch 1: initial a = (−25, 6, 3)t
(1, 1, 5)t
(1, 2, 5)t (−1, −4, −1)t (−1, −5, −1)t
g(x) = aty
misclassified (i.e.,
g(x) ≤ 0)? (−25 × 1) + (6 × 1) + (3 × 5) = −4 yes (−25 × 1) + (6 × 2) + (3 × 5) = 2 no (−25 × −1) + (6 × −4) + (3 × −1) = −2 yes (−25 × −1) + (6 × −5) + (3 × −1) = −8 yes
a←(−25,6,3)t +(1,1,5)t +(−1,−4,−1)t +(−1,−5,−1)t =(−26,−2,6)t
Epoch 2: a = (−26, −2, 6)t y
(1, 1, 5)t
(1, 2, 5)t (−1, −4, −1)t (−1, −5, −1)t
g(x) = aty
misclassified (i.e.,
g(x) ≤ 0)? (−26 × 1) + (−2 × 1) + (6 × 5) = 2 no (−26 × 1) + (−2 × 2) + (6 × 5) = 0 yes (−26 × −1) + (−2 × −4) + (6 × −1) = 28 no (−26 × −1) + (−2 × −5) + (6 × −1) = 30 no
a←(−26,−2,6)t +(1,2,5)t =(−25,0,11)t
Epoch 3: a = (−25, 0, 11)t y
(1, 1, 5)t
(1, 2, 5)t (−1, −4, −1)t (−1, −5, −1)t
g(x) = att
misclassified (i.e.,
g(x) ≤ 0)? (−25 × 1) + (0 × 1) + (11 × 5) = 30 no (−25 × 1) + (0 × 2) + (11 × 5) = 30 no (−25 × −1) + (0 × −4) + (11 × −1) = 14 no (−25 × −1) + (0 × −5) + (11 × −1) = 14 no
Learning has converged, so required parameters are a = (−25, 0, 11)t.
In feature space, the decision surface found at each epoch looks like this:
-2 0 2 4 6
end epoch 1
-2 0 2 4 6
end epoch 2
-2 0 2 4 6

7. Repeat the previous question using the Sequential Perceptron Learning Algorithm (with augmented notation and sample normalisation).
Using Augmented notation and sample normalisation, dataset is:
(1, 5)t (2, 5)t (4, 1)t (5, 1)t
(1, 1, 5)t
(1, 2, 5)t (−1, −4, −1)t (−1, −5, −1)t
For the Sequential Perceptron Learning Algorithm, weights are updated such that: a ← a + ηyk, where yk is a misclassified exemplar. Here, η = 1.
updated at
(−25, 6, 3) + (1, 1, 5) = (−24, 7, 8) (−24, 7, 8)
Epoch 1: initial a = (−25, 6, 3)t yt
g(x) = aty
(−25 × 1) + (6 × 1) + (3 × 5) = −4 (−24 × 1) + (7 × 2) + (8 × 5) = 30
(−1,−4,−1) (−24×−1)+(7×−4)+(8×−1)=−12 (−24,7,8)+(−1,−4,−1)=(−25,3,7)
(−1, −5, −1) (−25 × −1) + (3 × −5) + (7 × −1) = 3
(−25, 3, 7)
updated at (−25, 3, 7) (−25, 3, 7) (−25,3,7) (−25,3,7)
Epoch 2: a = (−25, 3, 7)t yt
g(x) = aty
(−25 × 1) + (3 × 1) + (7 × 5) = 13 (−25 × 1) + (3 × 2) + (7 × 5) = 16
(−1,−4,−1) (−25×−1)+(3×−4)+(7×−1)=6 (−1,−5,−1) (−25×−1)+(3×−5)+(7×−1)=3
Learning has converged, so required parameters are a = (−25, 3, 7)t.
In feature space, the decision surface found at each update looks like this:
after 1st update
after 2nd update

8. Write pseudo-code for the sequential Perceptron Learning Algorithm
Augment and apply sample normalisation to the feature vectors. Initialise a and set the learning rate.
For each sample, yk, in the data-set:
Calculate g(x) = aT y
If yk is misclassified (i.e., if g(x) ≤ 0)
Update solution such that: a ← a + ηyk
Repeat until a unchanged by all samples (i.e. all samples correctly classified).
Augment the feature vectors.
Set ωk = 1 for samples in class 1, and ωk = −1 for samples in class 2. Initialise a and set the learning rate.
For each sample, yk, in the data-set:
Calculate g(x) = aT y
If yk is misclassified (i.e., if sign(g(x)) ̸= ωk)
Update solution such that: a ← a + ηωkyk
Repeat until a unchanged by all samples (i.e. all samples correctly classified).
9. Consider the following linearly separable data set.
(−3, 1)t −1 (−2, −1)t −1 (−3, −2)t −1
Apply the Sequential Perceptron Learning Algorithm to determine the parameters of a linear discriminant function that will correctly classify this data. Assume an initial values of a = (w0, wt)t = (1, 0, 0)t, and use a learning rate of 1.
For the Sequential Perceptron Learning Algorithm, weights are updated such that: a ← a + ηωk yk , where yk is a misclassified exemplar, and ωk is the class label associated with yk. Here, η = 1.
class(ω) (0, 2)t 1

iteration atold yt g(x) = aty ωk atnew = atold + ωkyt if misclassified
1 [1,0,0] [1,0,2] 1 1 [1,0,0] 2 [1,0,0] [1,1,2] 1 1 [1,0,0] 3 [1,0,0] [1,2,1] 1 1 [1,0,0] 4 [1,0,0] [1,−3,1] 1 −1 [0,3,−1] 5 [0,3,−1] [1,−2,−1] −5 −1 [0,3,−1] 6 [0,3,−1] [1,−3,−2] −7 −1 [0,3,−1] 7 [0,3,−1] [1,0,2] −2 1 [1,3,1] 8 [1,3,1] [1,1,2] 6 1 [1,3,1] 9 [1,3,1] [1,2,1] 8 1 [1,3,1]
10 [1,3,1] [1,−3,1] −7 −1 [1,3,1] 11 [1,3,1] [1,−2,−1] −6 −1 [1,3,1] 12 [1,3,1] [1,−3,−2] −10 −1 [1,3,1] 13 [1,3,1] [1,0,2] 3 1 [1,3,1]
Learning has converged (we have gone through all the data without needing to update the parameters), so required parameters are a = (1,3,1)t. In feature space, the decision surface found after each update looks like this:
after 1st update
after 2nd update
10. Repeat previous question using the sample normalisation method of implementing the Sequential Perceptron Learning Algorithm.
Using Augmented notation and sample normalisation, dataset is:
(0, 2) (1, 2) (2, 1) (−3, 1) (−2, −1) (−3, −2)
(1, 0, 2) (1, 1, 2) (1, 2, 1) (−1, 3, −1) (−1, 2, 1) (−1, 3, 2)
For the Sequential Perceptron Learning Algorithm, weights are updated such that: a ← a + ηyk , where yk is a misclassified exemplar. Here, η = 1.

iteration atold yt g(x) = aty atnew = atold + yt if misclassified
1 [1,0,0] [1,0,2] 1 [1,0,0] 2 [1,0,0] [1,1,2] 1 [1,0,0] 3 [1,0,0] [1,2,1] 1 [1,0,0] 4 [1,0,0] [−1,3,−1] −1 [0,3,−1] 5 [0,3,−1] [−1,2,1] 5 [0,3,−1] 6 [0,3,−1] [−1,3,2] 7 [0,3,−1] 7 [0,3,−1] [1,0,2] −2 [1,3,1] 8 [1,3,1] [1,1,2] 6 [1,3,1] 9 [1,3,1] [1,2,1] 8 [1,3,1]
10 [1,3,1] [−1,3,−1] 7 [1,3,1] 11 [1,3,1] [−1,2,1] 6 [1,3,1] 12 [1,3,1] [−1,3,2] 10 [1,3,1] 13 [1,3,1] [1,0,2] 3 [1,3,1]
Learning has converged, so required parameters are a = (1, 3, 1)t.
11. A data-set consists of exemplars from three classes.
􏰗 1 􏰘 􏰗 2 􏰘 􏰗 0 􏰘 􏰗 −1 􏰘 􏰗 −1 􏰘
Class 1: 1 , 0 . Class 2: 2 , 1 . Class 3: −1 . Use the Sequential Multiclass Perceptron
Learning algorithm to find the parameters for three linear discriminant functions that will correctly classify this data. Assume initial values for all parameters are zero, and use a learning rate of 1. If more than one discriminant function produces the maximum output, choose the function with the highest index (i.e., the one that represents the largest class label).
Sequential Multiclass Perceptron Learning algorithm: • Initialise aj for each class.
• For each exemplar (yk,ωk) in turn
– Find predicted class j = arg max(aj′ yk)
– If predicted class is not true class (i.e., j ̸= ωk), update weights:
aωk =aωk +ηyk aj = aj − ηyk
• repeat until weights stop changing
Using Augmented notation dataset is:
yt ω (1, 1, 1) 1 (1, 2, 0) 1 (1, 0, 2) 2
(1, −1, 1) 2 (1, −1, −1) 3

it old parameters yt
gi(x) = atiy ω
new parameters
1 [0, 0, 0]
2 [1, 1, 1]
3 [1, 1, 1]
6 [0,1, 1]
7 [1, 2, 0]
8 [1, 2, 0]
10 [1,2,0]
11 [1, 2, 0]
[0,0,0] [0,0,0] [1,1,1]
g2 g3 00 1 03 1 03 2 3 1 2
1 1 3 33 1 23 1 23 2 2 1 2 0 1 3 03 1
at1 [1, 1, 1] [1, 1, 1] [0,1, 1] [0,1,1] [0,1,1] [1, 2, 0] [1, 2, 0] [1, 2, 0] [1,2,0] [1,2,0] [1, 2, 0]
at2 [0, 0, 0] [0, 0, 0] [1, 0, 2] [1,0,2] [1,0,2] [0, 1, 1] [0, 1, 1] [0, 1, 1] [0,1,1] [0,1,1] [0, 1, 1]
at3 [1, 1, 1] [1, 1, 1] [1, 1, 1] [1,1,1] [1,1,1] [1, 1, 1] [1, 1, 1] [1, 1, 1] [1,1,1] [1,1,1] [1, 1, 1]
= (−1, −1, −1)t.
[0, 0, 0] [0, 0, 0] [1,0,2] [1,0,2] [1, 0, 2] [0, 1, 1] [0, 1, 1] [0,1,1] [0,1,1] [0, 1, 1]
[1, 1, 1] [1, 1, 1] [1,1,1] [1,1,1] [1, 1, 1] [1, 1, 1] [1, 1, 1] [1,1,1] [1,1,1] [1, 1, 1]
[1, 2, 0] [1, 0, 2]
[1,1,1] 2 [1,1,1] 0 [1, 1, 1] 0 [1, 2, 0] 5 [1, 0, 2] 1
[1,1,1] 1 [1,1,1] 1 [1, 1, 1] 3
Learning has converged, so required parameters are a1 = (1, 2, 0)t, a2 = (0, −1, 1)t, a3
12. Consider the following linearly separable data set.
(−3, 1)t −1 (−2, −1)t −1 (−3, −2)t −1
class (0, 2)t 1 (1, 2)t 1 (2, 1)t 1
Use the pseudoinverse (pinv in MATLAB) to calculate the parameters of a linear discriminant function that can be used to classify this data. Use an arbitrary margin vector b = [1 1 1 1 1 1]t.
Using Augmented notation and sample normalisation, dataset is:
(0, 2) (1, 2) (2, 1) (−3, 1) (−2, −1) (−3, −2)
(1, 0, 2) (1, 1, 2) (1, 2, 1) (−1, 3, −1) (−1, 2, 1) (−1, 3, 2)
1 2 1 Ya=bwhereY= −1 3 −1 
 − 1 2 1  −1 3 2
Find the pseudo-inverse of Y (Y† = (YtY)−1Yt) using MATLAB command pinv:
 0.0682 0.1648 0.3807 0.1023 −0.2330 −0.2557  Y† =  −0.0341 0.0426 0.1847 0.1989 −0.0085 0.0028 
0.1402 0.0748 −0.1203 −0.2064 0.1184 0.1828

 0.0682 0.1648 0.3807 0.1023 −0.2330 −0.2557  
Thus, a = Y†b =  −0.0341 0.0426 0.1847 0.1989 −0.0085 0.0028   0.1402 0.0748 −0.1203 −0.2064 0.1184 0.1828 
1  1  1  1  1
(1, 0, 2) (1, 1, 2) (1, 2, 1) (−1, 3, −1) (−1, 2, 1) (−1, 3, 2)
 0.2273  a= 0.3864 
g(x) = aty = (0.2273, 0.3864, 0.1894)y 0.6061
All positive, so discriminant function provides correct classification.
In feature space, the decision surface looks like this:
b=[1,1,1,1,1,1]t 2
13. Repeat the previous question using (a) b = [2, 2, 2, 1, 1, 1]t, and (b) b = [1, 1, 1, 2, 2, 2]t. 2
 0.0682 0.1648 0.3807 0.1023 −0.2330 −0.2557  
a) a = Y†b =  −0.0341 0.0426 0.1847 0.1989 −0.0085 0.0028  
2  2  1  1  1
0.1402 0.0748 −0.1203 −0.2064
 0.8409 a= 0.5795 0.2841
0.1184 0.1828 

In feature space, the decision surface looks like this:
b=[2,2,2,1,1,1]t 2
 0.0682 0.1648 0.3807 0.1023 −0.2330 −0.2557  
b) a = Y†b =  −0.0341 0.0426 0.1847 0.1989 −0.0085 0.0028   0.1402 0.0748 −0.1203 −0.2064 0.1184 0.1828 
 −0.1591  a= 0.5795 
1  1  2  2  2
In feature space, the decision surface looks like this:
b=[1,1,1,2,2,2]t 2
Each element of b corresponds to a data point. Increasing the value of b increases the margin (i.e. distance) of the hyperplane from the corresponding data point.
14. For the same dataset used in the preceding question, apply 12 iterations of the Sequential Widrow- Algorithm. Assume an initial values of a = (w0, wt)t = (1, 0, 0)t , use a margin vector b = [111111]t, and a learning rate of 0.1.
Using Augmented notation and sample normalisation, dataset is:
(0, 2) (1, 2) (2, 1) (−3, 1) (−2, −1) (−3, −2)
(1, 0, 2) (1, 1, 2) (1, 2, 1) (−1, 3, −1) (−1, 2, 1) (−1, 3, 2)
For the Sequential Widrow- Algorithm, weights are updated such that: a ← a + η(bk − atyk)yk. Here, η = 0.1.

at [1, 0, 0]
[1, 0, 0] [1,0,0] [0.8,0.6, 0.2] [0.72,0.76, 0.12] [0.752,0.664, 0.184] [0.8136,0.6640, 0.0608] [0.778,0.6284, 0.1320] [0.6877,0.4478, 0.2223] [0.6755,0.4844, 0.2345] [0.5814,0.6726, 0.1404]
ykt atyk [1,0,2] 1 [1,1,2] 1 [1,2,1] 1
atnew = at + 0.1(bk − atyk)ykt [1,0,0]+0.1(11)[1,0,2]=[1,0,0] [1,0,0]+0.1(11)[1,1,2]=[1,0,0] [1,0,0]+0.1(11)[1,2,1]=[1,0,0] [1,0,0]+0.1(1(1))[1,3, 1]=[0.8,0.6, 0.2] [0.8,0.6, 0.2]+0.1(1 0.2)[ 1,2,1]=[0.72,0.76, 0.12] [0.72,0.76, 0.12]+0.1(1 1.32)[ 1,3,2]=[0.752,0.664, 0.184] [0.752,0.664, 0.184]+0.1(1 0.384)[[1,0,2]=[0.8136,0.6640, 0.0608] [0.8136,0.6640, 0.0608]+0.1(1 1.356)[1,1,2]=[0.778,0.6284, 0.1320] [0.778,0.6284, 0.1320]+0.1(1 1.9028)[1,2,1]=[0.6877,0.4478, 0.2223] [0.6877,0.4478, 0.2223]+0.1(1 0.8781)[ 1,3, 1]=[0.6755,0.4844, 0.2345] [0.6755,0.4844, 0.2345]+0.1(1 0.0588)[ 1,2,1]=[0.5814,0.6726, 0.1404] [0.5814,0.6726, 0.1404]+0.1(1 1.1558)[ 1,3,2]=[0.597,0.6259, 0.1715]
a simple classification problem.
15. The table shows the training data set for Class feature vector
1 (0.15,0.35)
2 (0.15,0.28)
2 (0.12,0.2)
3 (0.1,0.32)
3 (0.06,0.25)
[1,3, 1] [ 1, 2, 1] [ 1, 3, 2] [1,0,2] [1, 1, 2] [1, 2, 1] [ 1, 3, 1] [ 1, 2, 1] [ 1, 3, 2]
1 0.2 1.32 0.384 1.356 1.9028 0.8781 0.0588 1.1558
Use the k-nearest-neighbour classifier to determine the class of a new feature vector x = (0.1, 0.25). Use Euclidean distance and
How would these results be affected if the first dimension of the feature space was scaled by a factor of two?
Calculate distance of each sample to x:
Class feature vector
1 (0.15,0.35)
2 (0.15,0.28)
2 (0.12,0.2)
3 (0.1,0.32)
3 (0.06,0.25)
Euclidean distance to (0.1,0.25)
􏰆(0.1 − 0.15)2 + (0.25 − 0.35)2 = 0.1118 􏰆(0.1 − 0.15)2 + (0.25 − 0.28)2 = 0.0583 􏰆(0.1 − 0.12)2 + (0.25 − 0.2)2 = 0.0539 􏰆(0.1 − 0.1)2 + (0.25 − 0.32)2 = 0.0700 􏰆(0.1 − 0.06)2 + (0.25 − 0.25)2 = 0.0400
a) The nearest neighbour has class label 3. Therefore class new sample as 3.
b) The three nearest neighbours have class labels 3, 2, and 2. Therefore class new sample as 2.
Note, the feature space looks like this:
0 0.05 0.1 0.15 0.2

If we scale the first dimension by a factor of two:
Class feature vector
1 (0.3,0.35)
2 (0.3,0.28)
2 (0.24,0.2)
3 (0.2,0.32)
3 (0.12,0.25)
Euclidean distance to (0.2,0.25)
􏰆(0.2 − 0.3)2 + (0.25 − 0.35)2 = 0.1414 􏰆(0.2 − 0.3)2 + (0.25 − 0.28)2 = 0.1044 􏰆(0.2 − 0.24)2 + (0.25 − 0.2)2 = 0.0640 􏰆(0.2 − 0.2)2 + (0.25 − 0.32)2 = 0.0700 􏰆(0.2 − 0.12)2 + (0.25 − 0.25)2 = 0.0800
For k=1, class of new sample is 2
For k=3, class of new sample is 3
Note, this is the opposite of the result we got previously. kNN is very sensitive to the scale of the feature dimensions!
The feature space now looks like this:
16. a) Plot the decision boundaries that result from applying a nearest neighbour classifier (i.e. a kNN classifier with k=1), to the data shown in the table. Assume Euclidean distance is used to define distance.
Class x1 x2 105 58 10 0 250 10 5
b) For the same data, plot the decision boundaries that result from applying a nearest mean classifier (i.e. one in which a new feature vector, x, is classified by assigning it to the same category as nearest sample mean).
0.4 0.35 0.3 0.25 0.2 0.15
0.2 0.3 0.4
0 2 4 6 8 10

0 2 4 6 8 10
0 2 4 6 8 10

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts