Train scikit svm one by one (online or stochastic training)

前端未结

关注

 1  915

I am using scikit library for using svm. I have huge amount of data which I can\'t read together to give fit() function.
I want to give iterate over all my data which is

相关标签:

1条回答

無奈伤痛

2021-01-02 21:07

Support Vector Machine (at least as implemented in libsvm which scikit-learn is a wrapper of) is fundamentally a batch algorithm: it needs to have access to all the data in memory at once. Hence they are not scalable.

Instead you should use models that support incremental learning with the partial_fit method. For instance some linear models such as sklearn.linear_model.SGDClassifier support the partial_fit method. You can slice your dataset and load it as a sequence of minibatches with shape (batch_size, n_features). batch_size can be 1 but is not efficient because the of the python interpreter overhead (+ the data load overhead). So it is recommended to lead samples by minitaches of a least 100.

0 讨论(0)
发布评论:

提交评论
- 加载中...