Incremental Learning in Scikit with PassiveAggressiveClassifier's partial_fit

后端 未结 4 1300
长发绾君心
长发绾君心 2021-01-23 09:07

I\'m trying to train a PassiveAggressiveClassifier using TfidVectorizer with partial_fit technique in the script below:

Co

4条回答
  •  北荒
    北荒 (楼主)
    2021-01-23 09:17

    I'm trying to train PassiveAggressiveClassifier using TfidVectorizer with partial_fit technique with below script:

    You can't, because TfidfVectorizer does not work for online learning. You want HashingVectorizer for that.

    As for what exactly is going on in your code, the problem is here:

    training_set = vect.fit_transform(a)
    print(training_set.shape)
    training_result = np.array(r)
    model = model.partial_fit(training_set, training_result, classes=cls)
    

    You are refitting your TF-IDF object at each step. So there is nothing stopping you from having a dictionary size at one iteration and another at the next iteration, which is exactly the error you are getting.

    You can try a few things if you insist on using TF-IDF:

    1. Append zeroes / trim the vector returned by fit_transform to make the length of the first one: very unlikely to work well;

    2. Call fit on the TF-IDF object with an initial data set (preferably a large one) and then call transform on the others. This might work better, but I still suggest the HashingVectorizer.

提交回复
热议问题