Incremental Learning in Scikit with PassiveAggressiveClassifier's partial_fit

后端未结

关注

 4  1300

长发绾君心 2021-01-23 09:07

I\'m trying to train a PassiveAggressiveClassifier using TfidVectorizer with partial_fit technique in the script below:

4条回答

北荒 (楼主)

2021-01-23 09:17

I'm trying to train PassiveAggressiveClassifier using TfidVectorizer with partial_fit technique with below script:

You can't, because TfidfVectorizer does not work for online learning. You want HashingVectorizer for that.

As for what exactly is going on in your code, the problem is here:

training_set = vect.fit_transform(a) print(training_set.shape) training_result = np.array(r) model = model.partial_fit(training_set, training_result, classes=cls)

You are refitting your TF-IDF object at each step. So there is nothing stopping you from having a dictionary size at one iteration and another at the next iteration, which is exactly the error you are getting.

You can try a few things if you insist on using TF-IDF:

Append zeroes / trim the vector returned by fit_transform to make the length of the first one: very unlikely to work well;

Call fit on the TF-IDF object with an initial data set (preferably a large one) and then call transform on the others. This might work better, but I still suggest the HashingVectorizer.

0 讨论(0)

查看其它4个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复