编程代写 K Nearest Neighbours

K Nearest Neighbours
I The first classification algorithm we present is called K nearest neighbors or KNN.
I KNN assigns a label to a new observation yi based on“what is the most common class around the vector xi ”?
I The idea is that, if you have similar X0s values, then also the label y should be similar.

Copyright By PowCoder代写 加微信 powcoder

I One dimensional space [draw example]

K Nearest Neighbours
Figure: A three class nearest neighbours visual example with two features X1, X2

K Nearest Neighbours
Figure: A three class nearest neighbours visual example with two features X1, X2

K Nearest Neighbours
Figure: A three class nearest neighbours visual example with two features X1, X2

K Nearest Neighbours
Figure: A three class nearest neighbours visual example with two features X1, X2

K Nearest Neighbours
Figure: A three class nearest neighbours visual example with two features X1, X2

K Nearest Neighbours
Figure: A three class nearest neighbours visual example with two features X1, X2

K Nearest Neighbours
Algorithm (K Nearest Neighbours)
1. Store all the training data (yi,xi1,…,xip)
2. For each new point xq = (xq1,…xqp), find the K nearest
neighbours in the training set, indexed by N0. 3. For each possible value j 2 C, compute
Pˆ ( Y = j | X = x q ) = K1 X I ( y j = j ) i 2N0
4. Assign the class j for which Pˆ(Y = j|X = xq) is maximal.

K Nearest Neighbours Estimates Complicated Functions
Figure: A three class nearest neighbours visual example with two features X1, X2

Some thoughts on KNN
I KNN is among the simplest of all machine learning algorithms.
I Its explicitly non-parametric (you dont estimate any coecients)
I KNN is referred to as a lazy learning algorithm: the function is only approximated locally and all computation is deferred until classification.
I So there is actually no explicit training step.
I Results from KNN are sensitive to the local structure of the data
(example highly skewed,“majority voting”distorted, weights).
I Its (potentially) too slow, as it may be impossible to hold all the training data in memory.

Applications in Research: Fetzer and Marden, 2014
Figure: Classifiying historical landsat imagery into Forest or Non Forest Status.
[See forested.R]

Applications in Research: Fetzer and Marden, 2014
Figure: Classifiying historical landsat imagery into Forest or Non Forest Status.
[See forested.R]

Applications in Research: Fetzer and Marden, 2014
Figure: Classifiying historical landsat imagery into Forest or Non Forest Status.
[See forested.R]

Applications in Research: Fetzer and Marden, 2014
Figure: Classifiying historical landsat imagery into Forest or Non Forest Status.
[See forested.R]

Plotting Out Band Values for Red, Near Infrared
> options(stringsAsFactors=FALSE)
> library(data.table)
> library(plyr)
> library(class)
> FORESTED <- data.table(read.csv("R/forested.csv")) > COMPOSITE <- data.table(read.csv("R/composite.csv")) > MODIS <- data.table(read.csv("R/modis.csv")) > setnames(MODIS, “mean”, “landcover”)
> setnames(FORESTED, “mean”, “forestcover”)
> DF<- join(join(FORESTED[, c("system.index", "forestcover"), wit + MODIS[, c("system.index", "landcover"), with=F]) > DF[, Forested := forestcover>.8]
> DF[, MODISforested := (landcover>0 & landcover<=5)] > DF[, B3:=scale(B3)]
> DF[, B4:=scale(B4)]
> df.xlim <- range(DF$B3) > df.ylim<- range(DF$B4) > plot(DF[Forested==TRUE][1:120, c(“B3”, “B4”), with=F], col=”gre
> points(DF[Forested==FALSE][1:80, c(“B3”, “B4”), with=F], col=”r

Plotting Out Band Values for Red, Near Infrared
Figure: Classifiying historical landsat imagery into Forest or Non Forest Status.

Results for KNN with K=10
10−nearest neighbour
Figure: K = 10

How do we perform in terms of classification?
> knn10 <- knn(DF[train, c("B3","B4"), with=F], test = DF[-t + cl = DF[train]$Forested, k = 10, prob = TRUE) > prob <- attr(knn10, "prob") > prob <- ifelse(knn10==TRUE, prob, 1-prob) > table(prob>0.5,DF[-train]$Forested)
FALSE TRUE
FALSE 59 14
TRUE 13 114
Overall error rate is quite low: 27/200 = 13.5%.
I Pixels are wrongly classified as forested (type 1 error) 13/200
I Pixels are wrongly classified as non forested 14/200
I Suppose you are sending an enforcement agent to fine people who
have illegally deforested land (predicted value = FALSE), i.e. for 73
pixels (around 36.5%) .
I It costs a lot of money to send police out there, but in 19%
(14/(59+14)) of the cases you find the forest is actually standing…

Changing c ̄…
Make it less likely, that we have pixels wrongly classified as being
forested…reduce the threshold c ̄.
> table(prob>0.3,DF[-train]$Forested)
FALSE TRUE
FALSE 57 7
TRUE 15 121
I Now, we will send enforcement teams out 64 times, but only in 10% of the cases, will the forest still be standing (i.e. there is nothing to enforce)
I While the threshold c ̄ = 0.5, theoretically minimizes overall prediction error, you may still prefer di↵erent c ̄, in case there are di↵erent costs associated with type 1 or type 2 errors.

Exercise: Plot an ROC Curve…
Can you try to compute ROC curves and plot them out?

How to Choose K?
I As you will have guessed, it turns out that K is a tuning parameter that we need to select.
I In binary (two class) classification problems, it is helpful to choose K to be an odd number – why?
I Cross validation is our preferred method, i.e. for each di↵erent value of K, we perform cross validation and plot out the overall average error rates.
I Why does the choice of K matter?

Results for KNN with K=15
15−nearest neighbour
Figure: K = 15

Results for KNN with K=1
1−nearest neighbour
Figure: K = 1

Cross Validation and choice of K
> RES<-NULL > for(K in 1:50) {
+ knncv <- knn.cv(DF[, c("B3","B4"), with=F], + cl = DF$Forested, k = K, prob = TRUE) + RES<-rbind(RES,cbind(K,1-(table(knncv,DF$Forested)[1,1]+table(knncv,DF$Forested)[2,2])/1000)) +} > plot(RES)
0 10 20 30 40 50 K
0.11 0.12 0.13 0.14

Comparing Logistic Regression Versus kNN Classification
I Cross Validation suggested that we should use k ⇡ 10 for kNN classification.
I Lets compare the performance of logistic regression versus 10NN for this example.
> set.seed(1151)
> train<-sample(1:1000, 800) > glm.fit<-glm(Forested ~ B3 + B4, data=DF[train], fa > glm.predict <- predict.glm(glm.fit, DF[-train], type="response") > knn10 <- knn(DF[train, c("B3","B4"), with=F], test = DF[-train, + cl = DF[train]$Forested, k = 10, prob = TRUE) > prob <- attr(knn10, "prob") > prob <- ifelse(knn10==TRUE, prob, 1-prob) > table(prob>0.5,DF[-train]$Forested)
FALSE TRUE
FALSE 59 14
TRUE 13 114
> table(glm.predict>0.5,DF[-train]$Forested)
FALSE TRUE
FALSE 58 9
TRUE 14 119

Comparing Logistic Regression Versus kNN Classification
I The result is not surprising. The data suggests a near linear decision boundary, both kNN as well as logistic regression estimate that linear boundary reasonably well.
I You may still decide that you prefer logistic regression, since it is much more interpretable and less of a black box.
I With logistic regression, you get interpretable coecients; here we do not care much for inference, so it doesnt really matter.
I But in general, a (reasonable) rule of thumb is that you should prefer a simple parametric interpretable method over a non parametric method in case they yield similar results.

When does kNN outperform?
Figure: Example of Non-Linear Classification Problems

When does kNN outperform?
Figure: Example of Non-Linear Classification Problems

kNN Regression
I With numeric features, KNN is a simple way to compute predictions.
I Compute the average of the K-Nearest Neighbours as the predicted value.

KNN is a discriminative classifier [non examinable]
KNN is a discriminative algorithm since it models the conditional probability of a sample belonging to a given class. To see this just consider how one gets to the decision rule of kNNs.
A class label corresponds to a set of points which belong to some region in the feature space R. If you draw sample points from the actual probability distribution, p(x), independently, then the probability of drawing a sample from that class is,
P=Z p(x)dx R
What if you have N points? The probability that K points of those N points fall in the region R follows the binomial distribution,
Prob(K) = ✓KN◆PK (1 P)NK
As N ! 1 this distribution is sharply peaked, so that the probability can be approximated by its mean value KN.

KNN is a discriminative classifier[non examinable]
An additional approximation is that the probability distribution over R remains approximately constant, so that one can approximate the integral by,
where V is the total volume of the region. Under this approximations
p(x)dx ⇡ p(x)V
Now, if we had several classes, we could repeat the same analysis for
p(x)⇡ K NV
each one, which would give us,
p(x|Ck)= V
where Kk is the amount of points from class k which falls within that region a is the total number of points which fall in that region. Notice kNk=N.
Repeating the analysis with the binomial distribution, it is easy to see that we can estimate the prior P(Ck) = Nk
Using Bayes rule,
P(Ck|x) = p(x|Ck)p(Ck) = Kk p(x) K
which is the rule for kNNs.

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com