Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Fundamentals of Machine Learning for
Predictive Data Analytics
Chapter 5: Similarity-based Learning Sections 5.4, 5.5
and Namee and Aoife D’Arcy
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
1
2 3 4 5 6 7
Handling Noisy Data
Data Normalization
Predicting Continuous Targets Other Measures of Similarity Feature Selection
Efficient Memory Search Summary
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Handling Noisy Data
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Figure: Is the instance at the top right of the diagram really noise?
Noisy Data
Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
The k nearest neighbors model predicts the target level with the majority vote from the set of k nearest neightbors to the query q:
k
Mk(q) = argmax δ(ti,l) (1)
l∈levels(t) i=1
Kronecker delta function returns 1 if both parameters are the same, 0 otherwise
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Figure: The decision boundary using majority classification of the nearest 3 neighbors.
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Figure: The decision boundary using majority classification of the nearest 5 neighbors.
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Figure: The decision boundary when k is set to 15.
Noisy Data
Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
In a distance weighted k nearest neighbor algorithm the contribution of each neighbor to the classification decision is weighted by the reciprocal of the squared distance between the neighbor d and the query q:
1
dist (q, d)2
(2) The weighted k nearest neighbor model is defined as:
Mk(q)= argmaxk 1 ×δ(ti,l) (3) l∈levels(t) i=1 dist(q,di)2
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Figure: The weighted k nearest neighbor model decision boundary.
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Data Normalization
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Table: A dataset listing the salary and age information for customers and whether or not the purchased a pension plan .
ID Salary Age Purchased
1 53700 41 No
2 65300 37 No
3 48900 45 Yes
4 64800 49 Yes
5 44200 30 No
6 55900 57 Yes
7 48600 26 No
8 72800 60 Yes
9 45300 34 No
10 73200 52 Yes
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
The marketing department wants to decide whether or not they should contact a customer with the following profile:
⟨SALARY = 56, 000, AGE = 35⟩
9 5
7
?
3
1
6
8
10 4
2
45000 55000
Figure: The salary and age feature space with the data in Table 1 [12] plotted. The instances are labelled their IDs, triangles represent the negative instances and crosses represent the positive instances. The location of the query ⟨SALARY = 56, 000, AGE = 35⟩ is indicated by the ?.
Salary
65000 75000
Age
25 30 35 40 45 50 55 60
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Salary and Age Dist. Neigh.
Salary Only Dist. Neigh.
2300.0078 2 9300.0002 6 7100.0070 3 8800.0111 5
11800.0011 8
102.3914 1
7400.0055 4 16800.0186 9 10700.0000 7 17200.0084 10
2300 2 9300 6 7100 3 8800 5
11800 8
100 1
7400 4 16800 9 10700 7 17200 10
ID Salary Age Purch.
Age Only Dist. Neigh. 6 4
2 2 10 6 14 7
5 5
22 9
9 3 25 10 1 1
17 8
1 53700 41
2 65300 37
3 48900 45
4 64800 49
5 44200 30
6 55900 57
7 48600 26
8 72800 60
9 45300 34
10 73200 52
No No Yes Yes No Yes No Yes No Yes
Noisy Data
Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
This odd prediction is caused by features taking different ranges of values, this is equivalent to features having different variances.
We can adjust for this using normalization; the equation for range normalization is:
ai′ = ai −min(a) ×(high−low)+low (4) max(a) − min(a)
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Salary and Age Dist. Neigh.
Salary Only Dist. Neigh.
0.1935 1
0.3260 2 0.3827 5 0.5115 7 0.4327 6 0.6471 8 0.3677 3 0.9361 10 0.3701 4 0.7757 9
0.0793 2
0.3207 6 0.2448 3 0.3034 5 0.4069 8 0.0034 1 0.2552 4 0.5793 9 0.3690 7 0.5931 10
Normalized Dataset
Age Only Dist. Neigh.
0.17647 4
0.05882 2 0.29412 6 0.41176 7 0.14706 3 0.64706 9 0.26471 5 0.73529 10 0.02941 1 0.50000 8
ID Salary
1 0.3276
2 0.7276
3 0.1621
4 0.7103
5 0.0000
6 0.4034
7 0.1517
8 0.9862
9 0.0379
10 1.0000
Age Purch.
0.4412 No
0.3235 No 0.5588 Yes 0.6765 Yes 0.1176 No 0.9118 Yes 0.0000 No 1.0000 Yes 0.2353 No 0.7647 Yes
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Normalizing the data is an important thing to do for almost all machine learning algorithms, not just nearest neighbor!
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Predicting Continuous Targets
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency
Summary
Return the average value in the neighborhood:
1 k
Mk (q) = k
(5)
i=1
ti
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Table: A dataset of whiskeys listing the age (in years) and the rating (between 1 and 5, with 5 being the best) and the bottle price of each whiskey.
ID Age Rating Price
1 0 2 30.00
2 12 3.5 40.00
3 10 4 55.00
4 21 4.5 550.00
5 12 3 35.00
6 15 3.5 45.00
7 16 4 70.00
8 18 3 85.00
9 18 3.5 78.00
10 16 3 75.00
ID Age Rating
11 19 5 12 6 4.5 13 8 3.5 14 22 4 15 6 2 16 8 4.5 17 10 2 18 30 4.5 19 1 1 20 4 3
Price
500.00 200.00 65.00 120.00 12.00 250.00 18.00 450.00 10.00 30.00
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Table: The whiskey dataset after the descriptive features have been normalized.
ID Age Rating
1 0.0000 0.25
2 0.4000 0.63
3 0.3333 0.75
4 0.7000 0.88
5 0.4000 0.50
6 0.5000 0.63
7 0.5333 0.75
8 0.6000 0.50
9 0.6000 0.63
10 0.5333 0.50
Price ID
30.00 11 40.00 12 55.00 13
550.00 14 35.00 15 45.00 16 70.00 17 85.00 18 78.00 19 75.00 20
Age Rating
0.6333 1.00 0.2000 0.88 0.2667 0.63 0.7333 0.75 0.2000 0.25 0.2667 0.88 0.3333 0.25 1.0000 0.88 0.0333 0.00 0.1333 0.50
Price
500.00 200.00 65.00 120.00 12.00 250.00 18.00 450.00 10.00 30.00
?
12●●16● ●
●3●● ●●●● ●●●●
●●●
●
●
0.0 0.2 0.4
Figure: The AGE and RATING feature space for the whiskey dataset. The location of the query instance is indicated by the ? symbol. The circle plotted with a dashed line demarcates the border of the neighborhood around the query when k = 3. The three nearest neighbors to the query are labelled with their ID values.
Age
0.6 0.8 1.0
Rating
0.0 0.2 0.4 0.6 0.8 1.0
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
The model will return a price prediction that is the average price of the three neighbors:
200.00 + 250.00 + 55.00 = 168.33 3
Noisy Data
Normalization Cont. Targets Similarity Feat. Sel. Efficiency
Summary
In a weighted k nearest neighbor the model prediction equation is changed to:
k 1 ×ti i=1 dist(q,di)2
Mk (q) = k (6) 1
i=1 dist(q,di)2
Table: The calculations for the weighted k nearest neighbor prediction
ID Price Distance
1 30.00 0.7530 2 40.00 0.5017 3 55.00 0.3655 4 550.00 0.6456 5 35.00 0.6009 6 45.00 0.5731 7 70.00 0.5294 8 85.00 0.7311 9 78.00 0.6520 10 75.00 0.6839 11 500.00 0.5667 12 200.00 0.1828 13 65.00 0.4250 14 120.00 0.7120 15 12.00 0.7618 16 250.00 0.2358 17 18.00 0.7960 18 450.00 0.9417 19 10.00 1.0006 20 30.00 0.5044
Weight
1.7638 3.9724 7.4844 2.3996 2.7692 3.0450 3.5679 1.8711 2.3526 2.1378 3.1142
29.9376 5.5363 1.9726 1.7233
17.9775 1.5783 1.1277 0.9989 3.9301
Price×Weight 52.92 158.90 411.64 1319.78 96.92 137.03 249.75 159.04 183.50 160.33 1557.09 5987.53 359.86 236.71 20.68 4494.38 28.41 507.48 9.99 117.90 16 249.85
Totals: 99.2604
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Other Measures of Similarity
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Table: A binary dataset listing the behavior of two individuals on a website during a trial period and whether or not they subsequently signed-up for the website.
ID Profile FAQ HelpForum Newsletter Liked Signup
1 1 1 1 0 1 Yes 2 1 0 0 0 0 No
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Who is q more similar to d1 or d2?
q = ⟨PROFILE:1, FAQ:0, HELP FORUM:1, NEWSLETTER:0, LIKED:0, ⟩
ID Profile FAQ HelpForum Newsletter Liked Signup
1 1 1 1 0 1 Yes 2 1 0 0 0 0 No
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Pres. Abs. d Pres. CP=2 PA=0 1 Abs. AP=2 CA=1
Pres. Abs. d Pres. CP=1 PA=1 2 Abs. AP=0 CA=3
qq
Table: The similarity between the current trial user, q, and the two users in the dataset, d1 and d2, in terms of co-presence (CP), co-absence (CA), presence-absence (PA), and absence-presence (AP).
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency
Summary
Russel- ratio between the number of co-presenses and the total number of binary features considered.
simRR (q, d) = CP (q, d) (7) |q|
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Russel-Rao
simRR (q, d) = CP (q, d) (8) |q|
⟨PROFILE:1, FAQ:0, HELP FORUM:1, NEWSLETTER:0, LIKED:0, ⟩ qq
Pres. Abs. d Pres. CP=2 PA=0 1 Abs. AP=2 CA=1
Pres. Abs. d Pres. CP=1 PA=1 2 Abs. AP=0 CA=3
Noisy Data Normalization Cont. Targets
Similarity
Feat. Sel.
Efficiency
Summary
Russel-Rao
simRR (q, d) = CP (q, d) |q|
Example
simRR(q,d1) = 25 = 0.4 simRR(q,d2) = 51 = 0.2
(9)
The current trial user is judged to be more similar to instance d1 then d2.
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Sokal- -Michener is defined as the ratio between the total number of co-presences and co-absences, and the total number of binary features considered.
simSM (q, d) = CP (q, d) + CA(q, d) (10) |q|
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Sokal-Michener
simSM (q, d) = CP (q, d) + CA(q, d) (11) |q|
⟨PROFILE:1, FAQ:0, HELP FORUM:1, NEWSLETTER:0, LIKED:0, ⟩ qq
Pres. Abs. d Pres. CP=2 PA=0 1 Abs. AP=2 CA=1
Pres. Abs. d Pres. CP=1 PA=1 2 Abs. AP=0 CA=3
Noisy Data Normalization Cont. Targets Similarity
Feat. Sel.
Efficiency
Summary
Sokal-Michener
simSM (q, d) = CP (q, d) + CA(q, d) |q|
Example
simSM(q,d1) = 35 = 0.6 simSM(q,d2) = 45 = 0.8
(12)
The current trial user is judged to be more similar to instance d2 then d1.
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency
Summary
Jaccard
The Jacquard index ignore co-absences
simJ (q, d) = CP (q, d)
CP(q, d) + PA(q, d) + AP(q, d)
(13)
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Jaccard
simJ (q, d) = CP (q, d) (14) CP(q, d) + PA(q, d) + AP(q, d)
⟨PROFILE:1, FAQ:0, HELP FORUM:1, NEWSLETTER:0, LIKED:0, ⟩ qq
Pres. Abs. d Pres. CP=2 PA=0 1 Abs. AP=2 CA=1
Pres. Abs. d Pres. CP=1 PA=1 2 Abs. AP=0 CA=3
Noisy Data Normalization Cont. Targets Similarity Feat. Sel.
Efficiency
Summary
Jaccard
simJ (q, d) = CP (q, d)
CP(q, d) + PA(q, d) + AP(q, d)
Example
simJ(q,d1) = 24 = 0.5 simJ(q,d2) = 12 = 0.5
(15)
The current trial user is judged to be equally similar to instance d1 and d2!
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Continuous features
Cosine similarity between two instances is the cosine of the inner angle between the two vectors that extend from the origin to each instance.
Cosine
(a[1]×b[1])+···+(a[m]×b[m]) simCOSINE (a, b) = mi =1 a[i ]2 × mi =1 b[i ]2
Noisy Data
Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Calculate the cosine similarity between the following two instances:
d1 = ⟨SMS = 97, VOICE = 21⟩ d2 = ⟨SMS = 181, VOICE = 184⟩
mi=1 (a[i] × b[i]) simCOSINE (a, b) = mi =1 a[i ]2 × mi =1 b[i ]2
Noisy Data
Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Calculate the cosine similarity between the following two instances:
d1 = ⟨SMS = 97, VOICE = 21⟩ d2 = ⟨SMS = 181, VOICE = 184⟩
(97×181)+(21×184) simCOSINE (d1, d1) = √972 + 212 × √1812 + 1842
= 0.8362
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
θ
1
2
1
●
0 50 100 SMS
(a)
150
200
0.0
0.2
0.4 0.6 0.8 1.0 SMS
(b)
2
Figure: (a) The θ represents the inner angle between the vector emanating from the origin to instance d1 ⟨SMS = 97, VOICE = 21⟩ and the vector emanating from the origin to instance d2
⟨SMS = 181, VOICE = 184⟩; (b) shows d1 and d2 normalized to the unit circle.
●
θ
Voice
0 50 100 150 200
Voice
0.0 0.2 0.4 0.6 0.8 1.0
Noisy Data
Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Calculate the cosine similarity between the following two instances:
d1 = ⟨SMS = 97, VOICE = 21⟩ d3 = ⟨SMS = 194, VOICE = 42⟩
mi=1 (a[i] × b[i]) simCOSINE (a, b) = mi =1 a[i ]2 × mi =1 b[i ]2
Noisy Data
Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Calculate the cosine similarity between the following two instances:
d1 = ⟨SMS = 97, VOICE = 21⟩ d3 = ⟨SMS = 194, VOICE = 42⟩
(97×194)+(21×42) simCOSINE (d1, d1) = √972 + 212 × √1942 + 422
=1
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Mahalanobis Distance – used for measuring similarity between instances with continuous descriptive features
●B ●C ●A
●B
●C
YYY
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 XXX
(a) (b) (c)
Figure: Scatter plots of three bivariate datasets with the same center point A and two queries B and C both equidistant from A. (a) A dataset uniformly spread around the center point. (b) A dataset with negative covariance. (c) A dataset with positive covariance.
●A
●B
●C
●A
0 20 40 60 80 100
0 20 40 60 80 100
0 20 40 60 80 100
Noisy Data
Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
The mahalanobis distance uses covariance to scale distances so that distances along a direction where the dataset is spreadout a lot are scaled down and distances along directions where the dataset is tightly packed are scaled up.
Mahalanobis(a, b) =
− 1 a [ 1 ] − b [ 1 ]
(16) [a[1]−b[1],…,a[m]−b[m]]× × …
a[m] − b[m]
Noisy Data
Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Similar to Euclidean distance, the mahalanobis distance squares the differences of the features.
But it also rescales the differences (using the inverse covariance matrix) so that all the features have unit variance and the effects of covariance is removed.
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
80 80 80
60 60 60
YYY
40 40 40
20 20 20
20 40 60 80 20 40 60 80 20 40 60 80 XXX
(a) (b) (c)
Figure: The coordinate systems defined by the mahalanobis metric using the co-variance matrix for the dataset in Figure 9(c) [46] using three different origins: (a) (50, 50), (b) (63, 71), (c) (42, 35). The ellipses in each figure plot the 1, 3, and 5 unit distances contours from each origin.
●
5 3
1
1● 3
5
●
5 3
1
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
80
60
40
20
9.35
●B 2.15 ●C ●A
20 40 60 80
X
Figure: The effect of using a mahalanobis versus euclidean distance. Point A is the center of mass of the dataset in Figure 9(c) [46]. The ellipses plot the mahalanobis distance contours from A that B and C lie on. In euclidean terms B and C are equidistant from A, however using the mahalanobis metric C is much closer to A than B.
Y
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Feature Selection
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
(a) (b) (c)
Figure: A set of scatter plots illustrating the curse of dimensionality. Across figures (a), (b) and (c) the density of the marked unit hypercubes decreases as the number of dimensions increases.
(a) (b) (c)
(d) (e)
Figure: Figures (d) and (e) illustrate the cost we must incur if we wish to maintain the density of the instances in the feature space as the dimensionality of the feature space increases.
Noisy Data
Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
During our discussion of feature selection approaches it will be useful to distinguish between different classes of descriptive features:
1 Predictive
2 Interacting
3 Redundant
4 Irrelevant
Noisy Data
Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
When framed as a local search problem feature selection is defined in terms of an iterative process consisting of the following stages:
1 Subset Generation
2 Subset Selection
3 Termination Condition
The search can move through the search space in a number of ways:
Forward sequential selection Backward sequential selection
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
X
X
Y
Y
X
Z
X
Y
Z
Forward Selection Backward Selection
Z
Y
Z
Figure: Feature Subset Space for a dataset with 3 features X , Y , Z .
Figure: The process of model induction with feature selection.
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Efficient Memory Search
Noisy Data
Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Assuming that the training set will remain relatively stable it is possible to speed up the prediction speed of a nearest neighbor model by investing in some one-off computation to create an index of the instances that enables efficient retrieval of the nearest neighbors.
The k-d tree, which is short for k-dimensional tree, is one of the best known of these indexes.
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Example
Let’s build a k-d tree for the college athlete dataset
Table: The speed and agility ratings for 22 college athletes labelled with the decisions for whether they were drafted or not.
ID Speed Agility Draft
ID Speed Agility Draft
1 2.50
2 3.75
3 2.25
4 3.25
5 2.75
6 4.50
7 3.50
8 3.00
9 4.00
10 4.25
11 2.00
6.00 No 8.00 No 5.50 No 8.25 No 7.50 No 5.00 No 5.25 No 3.25 No 4.00 No 3.75 No 2.00 No
12 5.00 13 8.25 14 5.75 15 4.75 16 5.50 17 5.25 18 7.00 19 7.50 20 7.25 21 6.75
2.50 No 8.50 No 8.75 Yes 6.25 Yes 6.75 Yes 9.50 Yes 4.25 Yes 8.00 Yes 5.75 Yes 3.00 Yes
First split on the SPEED feature
ID=6 Speed:4.5
Speed<4.5 Speed≥4.5
IDs= 1,2,3,4,5,7, 8,9,10,11
IDs= 12,13,14,15,16, 17,18,19,20,21
(a) (b)
Figure: The k-d tree generated, for the dataset in Table 7 [60] after the initial split using the SPEED feature. (b) the partitioning of the feature space by the k-d tree in (a).
2468
Speed
2468
Agility
Next split on the AGILITY feature
ID=6 Speed:4.5
Speed<4.5 Speed≥4.5
Agility<5.5 Agility≥5.5
IDs=7,8,9,10,11 IDs=1,2,4,5
ID=3 Agility:5.5
IDs= 12,13,14,15,16, 17,18,19,20,21
(a) (b)
Figure: (a) the k-d tree after the dataset at the left child of the root has been split using the AGILITY feature with a threshold of 5.5. (b) the partitioning of the feature space by the k-d tree in (a)
2468
Speed
2468
Agility
After completing the tree building process
ID=6 Speed:4.5
Speed<4.5 Speed>=4.5
ID=3 Agility:5.5
Agility<5.5 Agility>=5.5
ID=4 Speed:3.25
Speed<3.25 Speed>=3.25
ID=16 Agility:6.75
Agility<6.75
ID=21 Speed:6.75
Speed<6.75
Agility>=6.75
ID=7 Speed:3.5
Speed<3.5 Speed>=3.5
Speed>=6.75
ID=19 Speed:7.5
Speed<7.5
ID=17 Agility:9.5
Agility<9.5
ID=14
Speed>=7.5
ID=13
ID=8 Agility:3.25
Agility<3.25
ID=11
ID=9 ID=5 Agility:4.0 Agility:7.5
Agility<4.0 Agility<7.5
ID=10 ID=1
ID=2
ID=15 Agility:6.25
Agility<6.25
ID=12
ID=20 Agility:5.75
Agility<5.75
ID=18
(a)
Figure: (a) The complete k-d tree generated, for the dataset in Table 7 [60]. (b) the partitioning of the feature space by the k-d tree in (a)
2
4
6 8
Speed
(b)
2468
Agility
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Finding neighbors for query SPEED= 6, AGILITY= 3.5.
15
18 ? 21
12
ID=12
(a)
2468 Speed
(b)
Figure: (a) the path taken from the root node to a leaf node when we search the tree. (b) the location of the query in the feature space is represented by the ? and the target hypersphere defining the region that must contain the true nearest neighbor.
Agility 2468
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Finding neighbors for query SPEED= 6, AGILITY= 3.5.
Instance=6 Split on: Speed Median: 4.5
15
18 ? 21
12
Speed<4.5
Speed>=4.5
Instance=16 Split on: Agility Median: 6.75
Prune subtree and ascend?
Instance=7 Split on: Speed Median: 3.5
Speed<3.5Speed>=3.5
Instance=9 Split on: Agility Median: 4.0
Agility<4.0
Instance=10
Instance=3 Split on: Agility Median: 5.5
Agility<5.5 Agility>=5.5
Instance=4 Split on: Speed Median: 3.25
Speed<3.25 Speed>=3.25
Agility<6.75
Instance=21 Split on: Speed Median: 6.75
Speed<6.75
Instance=15 Split on: Agility Median: 6.25
Agility<6.25
Instance=12
Agility>=6.75
Instance=19 Split on: Speed
Median: 7.5
Search Speed>=6.7s5ubtree? Speed<7.5
Speed>=7.5
Instance=13
Instance=8 Split on: Agility Median: 3.25
Agility<3.25
Instance=11
Instance=5 Split on: Agility Median: 7.5
Agility<7.5
Instance=1
Instance=2
Instance=20 Split on: Agility Median: 5.75
Agility<5.75
Instance=18
Instance=17 Split on: Agility Median: 9.5
Agility<9.5
Instance=14
2468 Speed
(a) (b)
Figure: (a) the state of the retrieval process after instance d21 has been stored as current-best. (b) the dashed circle illustrates the extent of the target hypersphere after current-best-distance has been updated.
Agility 2468
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
ID=21
Figure: The greyed out branches indicate the portions of the k-d tree pruned from the search. The node with the bold outline represents the instance (d21) that was returned as the nearest neighbor to the query.
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Summary
Noisy Data
Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
Nearest neighbor models are very sensitive to noise in the target feature the easiest way to solve this problem is to employ a k nearest neighbor.
Normalization techniques should almost always be applied when nearest neighbor models are used.
It is easy to adapt a nearest neighbor model to continuous targets.
There are many different measures of similarity.
Feature selection is a particularly important process for nearest neighbor algorithms it alleviates the curse of dimensionality.
As the number of instances becomes large, a nearest neighbor model will become slower—techniques such as the k-d tree can help with this issue.
Noisy Data Normalization Cont. Targets Similarity Feat. Sel. Efficiency Summary
1
2 3 4 5 6 7
Handling Noisy Data
Data Normalization
Predicting Continuous Targets Other Measures of Similarity Feature Selection
Efficient Memory Search Summary