Train scikit svm one by one (online or stochastic training)

前端 未结 1 915
盖世英雄少女心
盖世英雄少女心 2021-01-02 20:10

I am using scikit library for using svm. I have huge amount of data which I can\'t read together to give fit() function.
I want to give iterate over all my data which is

相关标签:
1条回答
  • 2021-01-02 21:07

    Support Vector Machine (at least as implemented in libsvm which scikit-learn is a wrapper of) is fundamentally a batch algorithm: it needs to have access to all the data in memory at once. Hence they are not scalable.

    Instead you should use models that support incremental learning with the partial_fit method. For instance some linear models such as sklearn.linear_model.SGDClassifier support the partial_fit method. You can slice your dataset and load it as a sequence of minibatches with shape (batch_size, n_features). batch_size can be 1 but is not efficient because the of the python interpreter overhead (+ the data load overhead). So it is recommended to lead samples by minitaches of a least 100.

    0 讨论(0)
提交回复
热议问题