Q: KNN in R — strange behavior

谁说我不能喝 提交于 2020-01-07 11:32:01

问题


(in continuation to this post)

Does anyone know why the below KNN R code gives different predictions for different seeds? This is strange as K<-5, and thus the majority is well defined. In addition, the floating numbers are large -- so no precision of data problem arises + the data is scaled and centered.

library(class)

from = -(2^30)
to = -(from)

seed <- -229881389  
set.seed(seed)

K <- 5
m = as.integer(runif(1, K, 20))   
n = as.integer(runif(1, 5, 1000)) 
train = matrix(runif(m*n, from, to), nrow=m, ncol=n)
trainLabels = sample.int(2, size = m, replace=T)-1
test = matrix(runif(n, from, to), nrow=1)

sc<-function(x){(x-mean(x))/sd(x)}
train<-apply(train,2,sc)

test<-t(apply(test,1,sc))

seed <-  as.integer(runif(1, from, to))
set.seed(seed)
pred_1 <- knn(train=train, test=test, cl = trainLabels, k=K)
message("predicted: ", pred_1, ", seed: ", seed)

seed <- as.integer(runif(1, from, to))
set.seed(seed)
pred_2 <- knn(train=train, test=test, cl = trainLabels, k=K)
message("predicted: ", pred_2, ", seed: ", seed)

A manual check:

euc.dist <- function(x1, x2) sqrt(sum((x1 - x2) ^ 2))
result = vector(mode="numeric", length=nrow(train))
for(i in 1:nrow(train)) {
  result[i] <- euc.dist(train[i,], test)
}
a <- data.frame(result, trainLabels)
names(a) = c("RSSE", "labels")
b <- a[with(a, order(result, decreasing =T)), ]
headK <- head(b, K)
message("Manual predicted K: ", paste(K," class:", names(which.max(table(headK[,2])))))
print(b)

would give the prediction 0, for the Top K(=5).


回答1:


There are several mistakes:

  • You have a mistake in using wrong test set in the knn - use test_ as the centered,scaled variable.
  • in creating b there is no variable sums, you can just use simple order that orders in increasing order by default.
  • The order has to be increasing in distance, as you are looking for nearest neighbours, look at smallest distance.
  • using set.seed before a code that has nothing stochastic (random) makes effect on the evaluation.

So it's basically same as I tried to explain in the previous post.



来源:https://stackoverflow.com/questions/38941780/q-knn-in-r-strange-behavior

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!