SciKit One-class SVM classifier training time increases exponentially with size of training data

后端 未结 2 1740
野趣味
野趣味 2021-01-14 12:52

I am using the Python SciKit OneClass SVM classifier to detect outliers in lines of text. The text is converted to numerical features first using bag of words and TF-IDF.

相关标签:
2条回答
  • 2021-01-14 13:20

    Well scikit's SVM is a high-level implementation so there is only so much you can do, and in terms of speed, from their website, "SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation."

    You can increase your kernel size parameter based on your available RAM, but this increase does not help much.

    You can try changing your kernel, though your model might be incorrect.

    Here is some advice from http://scikit-learn.org/stable/modules/svm.html#tips-on-practical-use: Scale your data.

    Otherwise, don't use scikit and implement it yourself using neural nets.

    0 讨论(0)
  • 2021-01-14 13:37

    Hope I'm not too late. OCSVM, and SVM, is resource hungry, and the length/time relationship is quadratic (the numbers you show follow this). If you can, see if Isolation Forest or Local Outlier Factor work for you, but if you're considering applying on a lengthier dataset I would suggest creating a manual AD model that closely resembles the context of these off-the-shelf solutions. By doing this then you should be able to work either in parallel or with threads.

    0 讨论(0)
提交回复
热议问题