I\'m trying to train a PassiveAggressiveClassifier
using TfidVectorizer
with partial_fit
technique in the script below:
Co
I'm trying to train PassiveAggressiveClassifier using TfidVectorizer with partial_fit technique with below script:
You can't, because TfidfVectorizer
does not work for online learning. You want HashingVectorizer for that.
As for what exactly is going on in your code, the problem is here:
training_set = vect.fit_transform(a)
print(training_set.shape)
training_result = np.array(r)
model = model.partial_fit(training_set, training_result, classes=cls)
You are refitting your TF-IDF object at each step. So there is nothing stopping you from having a dictionary size at one iteration and another at the next iteration, which is exactly the error you are getting.
You can try a few things if you insist on using TF-IDF:
Append zeroes / trim the vector returned by fit_transform
to make the length of the first one: very unlikely to work well;
Call fit
on the TF-IDF object with an initial data set (preferably a large one) and then call transform
on the others. This might work better, but I still suggest the HashingVectorizer
.