SciKit One-class SVM classifier training time increases exponentially with size of training data

后端 未结 2 1739
野趣味
野趣味 2021-01-14 12:52

I am using the Python SciKit OneClass SVM classifier to detect outliers in lines of text. The text is converted to numerical features first using bag of words and TF-IDF.

2条回答
  •  南笙
    南笙 (楼主)
    2021-01-14 13:37

    Hope I'm not too late. OCSVM, and SVM, is resource hungry, and the length/time relationship is quadratic (the numbers you show follow this). If you can, see if Isolation Forest or Local Outlier Factor work for you, but if you're considering applying on a lengthier dataset I would suggest creating a manual AD model that closely resembles the context of these off-the-shelf solutions. By doing this then you should be able to work either in parallel or with threads.

提交回复
热议问题