K NEAREST NEIGHBOURS
COMP2420/COMP6420 INTRODUCTION TO DATA MANAGEMENT, ANALYSIS AND SECURITY
WEEK 4 – LECTURE 2 (recorded)
of Computing
Copyright By PowCoder代写 加微信 powcoder
College of Engineering and Computer Science
Credit: (previous course convenor)
Learning Outcomes
Recall what classification is
Explain various challenges in classification
Describe what K Nearest Neighbours(KNN) Classification is
04 Explain different distance metrics
05 Discuss the effect of the value of K
06 Discuss how to choose hyperparameters
Evaluate advantages and disadvantages of KNN
2 P KAN JOHN
CLASSIFICATION AND ITS CHALLENGES
Supervised Learning (recap)
• labeled data available
• Goal: Learn a function that maps input to
Given a training set of N example input-output pairs (𝑥!, 𝑦!), (𝑥”, 𝑦”),…(𝑥#, 𝑦#),
Where each 𝑦$was generated by an unknown function y=f(x),
discover a function h that approximates the true function f.
4 P KAN JOHN
Supervised Learning – Classification and Regression (recap)
Classification: when output y is one of a finite set of values [discrete categorical] (e.g. sunny, cloudy or rainy)
Regression: when output y is a real-valued number (e.g. average temperature, height)
5 P KAN JOHN
Classification
Classify an image
Semantic Gap
Challenges
Viewpoint variation
Challenges (contd…)
Illumination
Challenges (contd…)
Deformation
Challenges (contd…)
Challenges (contd…)
Background clutter
Challenges (contd…)
Interclass variation
Image Classifier
def classify_image(image): # Some logic/rules? return class_label
Unlike e.g. sorting a list of elements,
no obvious way to come up with algorithm to
classify an image as cat or any other class
Previous Attempts
Classical computer vision
Machine Learning Approach
def train(images, labels): # ML code
x=1 return model
def predict(model, images): # Use model to get
predictions
return test_labels
NEAREST NEIGHBOUR
What is a NN algorithm for classification?
What is this?
18 P KAN JOHN
Example Dataset
Attribution:CIFAR10
How to get nearest image? Distance Metric to compare images
Step by Step Algorithm
Further reading on Algorithm complexity
More formally
a training set 𝑇 = { 𝑋!,𝑌! |𝑖 = 1,2,…𝑛}
a distance (or similarity) metric 𝑑: 𝑋×𝑋 →
[0, ∞) and
a test value 𝑋” Find
𝑑 𝑋”,𝑋# = 𝑚𝑖𝑛{𝑑 𝑋”,𝑋! ,∀ 𝑋!,𝑌! ∈ 𝑇} Then, the predicted label for 𝑋” is 𝑌#
Load the training data as
𝑣𝑎𝑙𝑢𝑒, 𝑙𝑎𝑏𝑒𝑙 ordered pairs 𝑥! , 𝑦!
For a given test value 𝑥”
Calculate 𝑑 𝑥”, 𝑥! for each ordered
pair 𝑥!,𝑦! in the training data
Find 𝑥# that has the minimum 𝑑 value
Return the associated 𝑦# as the predicted label
K-NEAREST NEIGHBORS
K-Nearest Neighbours concept
Instead of copying label from nearest neighbour, take majority vote from K closest points
Shade shows the color of the pixel in image w.r.t. given points. White corresponds to tie cases.
Different distances
Manhattan or taxicab distance
Definition: The distance between two points measured along axes at right angles. In a plane with p1 at (x1, y1)andp2 at(x2,y2),itis|x1 -x2|+ |y1 – y2|
https://xlinux.nist.gov/dads/HTML/manhattanDistance.html
Different distances
Euclidean or Pythagorean distance
Definition: The straight line distance between two points. In a plane with p1 at (x1, y1) and p2 at (x2, y2), it is √((x1 – x2)2 + (y1 – y2)2).
https://xlinux.nist.gov/dads/HTML/euclidndstnc.html
Minkowski distance
Generalized distance metric
𝑑𝑋,𝑌= A|𝑥!−𝑦!|’ !$%
𝑝 = 1, Manhattan distance 𝑝 = 2, Euclidean distance
𝑝 = ∞, Chebyshev Distance
Different distances (contd..)
http://vision.stanford.edu/teaching/cs231n-demos/knn/
Distance measures
Distance function properties
1. 𝑑 𝑎,𝑏 ≥0and𝑑 𝑎,𝑏 =0𝑖𝑓𝑎𝑛𝑑𝑜𝑛𝑙𝑦𝑖𝑓
𝑎=𝑏 𝑓𝑜𝑟 𝑎𝑛𝑦 𝑎,𝑏
2. 𝑑 𝑎,𝑏 =𝑑 𝑏,𝑎 foranya,b
3. 𝑑 𝑎,𝑐 ≤𝑑 𝑎,𝑏 +𝑑 𝑏,𝑐 foranya,b,c
https://en.wikipedia.org/wiki/Triangle_inequality
37 P KAN JOHN
Some other distance metrics
• Cosine distance
• Mahalanobis distance
• Chi square distance
• Euclidean distance is commonly used for kNN classification. However, other distance measures may perform better in certain contexts
Further reading Cosine distance Mahalanobis distance Chi square distance
More formally kNN classification
a training set 𝑇 = { 𝑋!,𝑌! |𝑖 = 1,2,…𝑛} distance metric 𝑑: 𝑋×𝑋 → [0, ∞)
𝐾 value and test value 𝑋”
𝑅={𝑋!,𝑌!|𝑑𝑋”,𝑋# ≤…≤𝑑𝑋”,𝑋$} let𝑅%=first𝐾 𝑋!,𝑌! pairsfrom𝑅
Then, the predicted label for 𝑋” is the mode of the 𝑌! values in 𝑅%
ordered pairs 𝑥!,𝑦!
Load the training data as 𝑣𝑎𝑙𝑢𝑒, 𝑙𝑎𝑏𝑒𝑙 For a given test value 𝑥”
Ø Calculate 𝑑 𝑥” , 𝑥! for each ordered pair 𝑥!,𝑦! in the training data
Ø Find the 𝐾 𝑥! , 𝑦! pairs with the lowest 𝑑 values
Ø Find the 𝑦# value that occurs the most in these 𝐾 values
Return that 𝑦# as the predicted label
What happens if there is a tie?
KNN – ties?
What approaches can you use to break a tie?
Ø break the tie randomly
Ø assign more weight to the nearest points
note: sklearn uses random tie breaking
But you can specify weight parameter in knn model to be distance to assign more weight to nearer of the nearest neighbours
https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
P KAN JOHN
HYPERPARAMETERS
Hyperparameters
• What is the best value of k to use?
• What is the best distance to use?
• These are hyperparameters: choices about the algorithm that we set rather than learn
• Very problem-dependent. Must try them all out and see what works best.
A hyperparameter is a parameter that is set before the learning process begins. These parameters are tunable and can directly affect how well a model trains.
https://deepai.org/machine-learning-glossary-and-terms/hyperparameter
Setting Hyperparameters
Idea #4: Cross-Validation: Split data into folds, try each fold as validation and average the results
Useful for small datasets, but not used frequently for larger datasets.
Example of 5-fold cross-validation for different values of k
• Eachpoint:singleoutcome
• Thelinegoesthroughthemean,barsindicatedstandard
• Appearsthatk~=7worksbestforthisdata
• (Heuristic)Oddvalueof𝑘normallychosenforbinary classification to avoid ties (odd k for even #classes, even k for odd #classes)
Elbow method
https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/ https://www.youtube.com/watch?v=bz6TtWbOoqE
48 P KAN JOHN
Effect of k (intuition)
• Lowk(lowbias,highvariance) • Highk(highbias,lowvariance)
Think about what happens if k = #total points in data set?
k is a smoothing parameter allowing us to achieve best trade-off between variance and bias of the model.
49 P KAN JOHN
Curse of Dimensionality
Dimension reduction is carried out for high-dimensional data, to avoid the effects of the curse of dimensionality. Techniques used include:
• Principal Component Analysis (PCA)
• Linear Discriminant Analysis (LDA)
• Random Forest
• Independent Component Analysis (ICA)
KNN Advantages
• Simpletoimplement(doesnot require estimation)
• Adaptswelltonewtrainingdata • Easytointerprete
53 P KAN JOHN
KNN Disadvantages
• Slowtopredictduetomanydistance calculations
• Noinsightintodatagenerating process (the model is obtained by memorizing training data)
• Doesnotscalewell(increasing memory needed for larger datasets)
• Sensitivetonoiseandoutliers
• Accuracycanbreakdownwhenthere are many predictors (curse of dimensionality)
54 P KAN JOHN
Conclusion
KNN: Non-parametric lazy learning method
K-Nearest Neighbor predicts labels based on nearest training examples
Distance metric and K are hyperparameters
Use validation set to choose hyperparameters, run algorithm only once on test set
Not a desirable model for large datasets as prediction time is linear w.r.t. training points.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com