SciKit One-class SVM classifier training time increases exponentially with size of training data

后端未结

关注

 2  1740

I am using the Python SciKit OneClass SVM classifier to detect outliers in lines of text. The text is converted to numerical features first using bag of words and TF-IDF.

相关标签:

2条回答

孤城傲影

2021-01-14 13:20

Well scikit's SVM is a high-level implementation so there is only so much you can do, and in terms of speed, from their website, "SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation."

You can increase your kernel size parameter based on your available RAM, but this increase does not help much.

You can try changing your kernel, though your model might be incorrect.

Here is some advice from http://scikit-learn.org/stable/modules/svm.html#tips-on-practical-use: Scale your data.

Otherwise, don't use scikit and implement it yourself using neural nets.

0 讨论(0)
发布评论:

提交评论
- 加载中...
南笙

2021-01-14 13:37

Hope I'm not too late. OCSVM, and SVM, is resource hungry, and the length/time relationship is quadratic (the numbers you show follow this). If you can, see if Isolation Forest or Local Outlier Factor work for you, but if you're considering applying on a lengthier dataset I would suggest creating a manual AD model that closely resembles the context of these off-the-shelf solutions. By doing this then you should be able to work either in parallel or with threads.

0 讨论(0)
发布评论:

提交评论
- 加载中...