CS代写 CS 189 (CDSS offering)

Lecture 21: Nearest neighbor CS 189 (CDSS offering)
2022/03/14

Today’s lecture

So far, we have focused on methods for linear regression and classification Then, we combined these with featurization to get nonlinear models
This week, we will study two models that are inherently nonlinear by construction
Today we will learn about nearest neighbor, W/F will be about decision trees
We will see that nonlinear models are oftentimes harder to learn, use, or analyze
Nevertheless, they also often work better and thus are popular in practice

Why study nonlinear models?
Also, the model has to be compatible with kernelization
We already learned how to use featurizations to make linear models nonlinear, The kernel trick is a great tool, but it only allows for some featurizations
why do we still need to learn about other nonlinear models?
The models that we will learn this week and later in the course further push the
boundaries of what types of problems we can tackle

Nearest neighbor
The nearest neighbor model is the simplest model to explain
At training time, do nothing — the only O(0) learning algorithm!
• However, make sure to store all of the training data (x1, y1), …, (xN, yN)
• When given a new point x′, compute the closest point xi from the training data, and predict the corresponding yi
This procedure is the same for both classification and regression tasks

k-nearest neighbor
The nearest neighbor model can be susceptible to outliers and noise
One way to “smooth out” the model is to instead consider the k closest training
points to a new point x′ when making a prediction, for some k > 1
For regression, our prediction is simply the average of all of the targets
Now, for classification, we will predict according to a “majority vote” amongst
these training points
Outliers will tend to be “outcompeted” by the other points, whereas noise will
tend to be “averaged out”

Nearest neighbor classifiers
“I will judge you by the company you keep”
1 nearest neighbor
15 nearest neighbors

The Bayes optimal classifier

Optimality of nearest neighbor classifiers
For nearest neighbor classifiers, as N → ∞, the error rate converges to at most For k-nearest neighbor classifiers, as both N → ∞ and k → ∞ but Nk → 0,
twice the Bayes error rate
the error rate converges to the Bayes error rate!

Challenges with nearest neighbor models
Nearest neighbor models seem great! Simple and theoretically optimal (sort of) What’s the catch?
We have to store all the training data, which may be impractical or infeasible We have to loop through the whole training set to make a prediction?
The curse of dimensionality: we need exponentially more data in high dimensions because all data points are far away from each other
How should we even define distance? Are simple distances always good?

Searching for the nearest neighbor(s)
O(d log N) runtime on average, by organizing the data beforehand and only searching through the necessary parts of the dataset
Naïvely searching through the entire training set and computing a distance for In some cases, smarter ways of storing the data (e.g., k-d trees) can result in
each point has O(dN) runtime, where d is the dimensionality of the input x
In many cases, it is good enough to use an approximate nearest neighbor These more advanced techniques are out of scope for this class, but good
method, which may only return a close (but not the nearest) neighbor
implementations exist online that take care of the details for you

Bad distance metrics
It is easy to imagine how the wrong distance metric may ruin nearest neighbor E.g., is Euclidean distance a good choice for digit recognition?

Defining the right distance metric
In general, a good distance metric will heavily depend on the problem at hand
We can benefit from, e.g., computing statistics of the training data and/or just
having domain expertise with the specific problem

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts