CS计算机代考程序代写 data structure Machine learning lecture slides

Machine learning lecture slides

COMS 4771 Fall 2020

0 / 24

Nearest neighbor classification

Outline

I Optical character recognition (OCR) example
I Nearest neighbor rule
I Error rate, test error rate
I k-nearest neighbor rule
I Hyperparameter tuning via cross-validation
I Distance functions, features
I Computational issues

1 / 24

Example: OCR for digits

I Goal: Automatically label images of handwritten digits
I Possible labels are {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
I Start with a large collection of already-labeled images

I D := {(x1, y1), . . . , (xn, yn)}
I xi is the i-th image; yi ∈ {0, 1, . . . , 9} is the corresponding

label.
I The National Institute for Standards and Technology (NIST)

has amassed such a data set.

2 / 24

Figure 1: Some images of handwritten digits from MNIST data set

3 / 24

Nearest neighbor (NN) classifier

I Nearest neighbor (NN) classifier NND:
I Represented using collection of labeled examples

D := ((x1, y1), . . . , (xn, yn)), plus a snippet of code
I Input: x

I Find xi in D that is “closest” to x (the nearest neighbor)
I (Break ties in some arbitrary fixed way)
I Return yi

4 / 24

Naïve distance between images of handwritten digits (1)

I Treat (grayscale) images as vectors in Euclidean space Rd
I d = 282 = 784
I Generalizes physical 3-dimensional space

I Each point x = (x1, . . . , xd) ∈ Rd is a vector of d real numbers
I ‖x− z‖2 =

√∑d
j=1(xj − zj)2

I Also called `2 distance
I WARNING: Here, xj refers to the j-th coordinate of x

Figure 2: Grayscale pixel representation of an image of a handwritten “4”

5 / 24

Naïve distance between images of handwritten digits (2)

I Why use this for images?

I Why not use this for images?

6 / 24

Recap: OCR via NN

I What is the core prediction problem?

I What features (i.e., predictive variables) are available?

I Will these features be available at time of prediction?

I Is there enough information (“training data”) to learn the
relationship between the features and label?

I What are the modeling assumptions?

I Is high-accuracy prediction a useful goal for the application?

7 / 24

Error rate

I Error rate (on a collection of labeled examples S)
I Fraction of labeled examples in S that have incorrect label

prediction from f̂
I Written as err(f̂ , S)
I (Often, the word “rate” is omitted)

I OCR via NN:
err(NND, D) =

8 / 24

Test error rate (1)

I Better evaluation: test error rate
I Train/test split, S ∩ T = ∅

I S is training data, T is test data
I Classifier f̂ only based on S
I Training error rate: err(f̂ , S)
I Test error rate: err(f̂ , T )

I On OCR data: test error rate is err(NNS , T ) = 3.09%
I Is this good?

I What is the test error rate of uniformly random predictions?
I What is the test error rate of a constant prediction?

9 / 24

Test error rate (2)

I Why is test error rate meaningful?

I What are the drawbacks of evaluation via test error rate?

10 / 24

Figure 3: A test example and its nearest neighbor in training data (2, 8)

11 / 24

Figure 4: A test example and its nearest neighbor in training data (3, 5)

12 / 24

Figure 5: A test example and its nearest neighbor in training data (5, 4)

13 / 24

Figure 6: A test example and its nearest neighbor in training data (4, 1)

14 / 24

Related Posts