kNN: training, testing, and validation

前端未结

关注

 2  1379

I am extracting image features from 10 classes with 1000 images each. Since there are 50 features that I can extract, I am thinking of finding the best feature combination t

相关标签:

2条回答

心在旅途

2021-01-01 00:09

kNN is not trained. All of the data is kept and used at run-time for prediction, so it is one of the most time and space consuming classification method. Feature reduction can reduce these problems. Cross validation is a much better way of testing then train/test split.

0 讨论(0)
发布评论:

提交评论
- 加载中...
我寻月下人不归

2021-01-01 00:24

So kNN is an exception to general workflow for building/testing supervised machine learning models. In particular, the model created via kNN is just the available labeled data, placed in some metric space.

In other words, for kNN, there is no training step because there is no model to build. Template matching & interpolation is all that is going on in kNN.

Neither is there a validation step. Validation measures model accuracy against the training data as a function of iteration count (training progress). Overfitting is evidenced by the upward movement of this empirical curve and indicates the point at which training should cease. In other words, because no model is built, there is nothing to validate.

But you can still test--i.e., assess the quality of the predictions using data in which the targets (labels or scores) are concealed from the model.

But even testing is a little different for kNN versus other supervised machine learning techniques. In particular, for kNN, the quality of predictions is of course dependent upon amount of data, or more precisely the density (number of points per unit volume)--i.e., if you are going to predict unkown values by averaging the 2-3 points closest to it, then it helps if you have points close to the one you wish to predict. Therefore, keep the size of the test set small, or better yet use k-fold cross-validation or leave-one-out cross-validation, both of which give you more thorough model testing but not at the cost of reducing the size of your kNN neighbor population.

0 讨论(0)
发布评论:

提交评论
- 加载中...