Value of k in k nearest neighbor algorithm

﹥>﹥吖頭↗ 提交于 2019-12-22 07:27:12

问题


I have 7 classes that needs to be classified and I have 10 features. Is there a optimal value for k that I need to use in this case or do I have to run the KNN for values of k between 1 and 10 (around 10) and determine the best value with the help of the algorithm itself?


回答1:


In addition to the article I posted in the comments there is this one as well that suggests:

Choice of k is very critical – A small value of k means that noise will have a higher influence on the result. A large value make it computationally expensive and kinda defeats the basic philosophy behind KNN (that points that are near might have similar densities or classes ) .A simple approach to select k is set k = n^(1/2).

It's going to depend a lot on your individual cases, sometimes it is best to run through each possible value for k and decide for yourself.




回答2:


Important thing to note in k-NN algorithm is the that the number of features and the number of classes both don't play a part in determining the value of k in k-NN algorithm. k-NN algorithm is an ad-hoc classifier used to classify test data based on distance metric, i.e a test sample is classified as Class-1 if there are more number of Class-1 training samples closer to the test sample compared to other Classes training samples. For Eg: If value of k = 5 samples, then the 5 closest training samples are selected based on a distance metric and then a voting for most number of samples per class is done. So if 3 samples belong to Class-1 and 2 belong to Class-5, then that test sample is classified as Class-1. So the value of k indicates the number of training samples that are needed to classify the test sample.

Coming to your question, the value of k is non-parametric and a general rule of thumb in choosing the value of k is k = sqrt(N)/2, where N stands for the number of samples in your training dataset. Another tip that I suggest is to try and keep the value of k odd, so that there is no tie between choosing a class but that points to the fact that training data is highly correlated between classes and using a simple classification algorithm such as k-NN would result in poor classification performance.




回答3:


In KNN, finding the value of k is not easy. A small value of k means that noise will have a higher influence on the result and a large value make it computationally expensive.

Data scientists usually choose :

1.An odd number if the number of classes is 2

2.Another simple approach to select k is set k = sqrt(n). where n = number of data points in training data.

Hope this will help you.



来源:https://stackoverflow.com/questions/11568897/value-of-k-in-k-nearest-neighbor-algorithm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!