Incremental Learning in Scikit with PassiveAggressiveClassifier's partial_fit

后端未结

关注

 4  1302

长发绾君心 2021-01-23 09:07

I\'m trying to train a PassiveAggressiveClassifier using TfidVectorizer with partial_fit technique in the script below:

4条回答

余生分开走 (楼主)

2021-01-23 09:16

This is what i understand from your problem.

1) You have a requirement to apply the partial fit model to do the online training.

2) Your feature space is so huge.

If I got it right then I faced the same problem. And if you will use the HashingVectorizer, there are high chances of key collision.

HashingVectorizer doc

There are also a couple of cons (vs using a CountVectorizer with an in-memory vocabulary): there is no way to compute the inverse transform (from feature indices to string feature names) which can be a problem when trying to introspect which features are most important to a model. there can be collisions: distinct tokens can be mapped to the same feature index. However in practice this is rarely an issue if n_features is large enough (e.g. 2 ** 18 for text classification problems). no IDF weighting as this would render the transformer stageful.

If key will collide then there are chances of reduction in accuracy.

In my online training, firstly i trained the classifier with partial_fit like this.

classifier = MultinomialNB(alpha=alpha_optimized).partial_fit(X_train_tfidf,y_train,classes=np.array([0,1]))

On second day i load the pickled classifier, count_vect and tfidf of first day training set. Then I only applied the transform on count_vet and tfidf. And it worked

X_train_counts = count_vect.transform(x_train) X_train_tfidf = tfidf.transform(X_train_counts) pf_classifier.partial_fit(X_train_tfidf,y_train)

In case of any doubt please reply.

0 讨论(0)

查看其它4个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复