Why the following tfidf vectorization is failing?

问题

Hello I am making the following experiment, first I created a vectorizer called: tfidf:

tfidf_vectorizer = TfidfVectorizer(min_df=10,ngram_range=(1,3),analyzer='word',max_features=500)

Then I vectorized the following list:

tfidf = tfidf_vectorizer.fit_transform(listComments)

My list of comments looks as follows:

listComments = ["hello this is a test","the car is red",...]

I tried to save the model as follows:

#Saving tfidf
with open('vectorizerTFIDF.pickle','wb') as idxf:
    pickle.dump(tfidf, idxf, pickle.HIGHEST_PROTOCOL)

I would like to use my vectorizer to apply the same tfidf to the following list:

lastComment = ["this is a car"]

Opening Model:

with open('vectorizerTFIDF.pickle', 'rb') as infile:
    tdf = pickle.load(infile)

vector = tdf.transform(lastComment)

However I am getting:

Traceback (most recent call last):
  File "C:/Users/LDA_test/ldaTest.py", line 141, in <module>
    vector = tdf.transform(lastComment)
  File "C:\Program Files\Anaconda3\lib\site-packages\scipy\sparse\base.py", line 559, in __getattr__
    raise AttributeError(attr + " not found")
AttributeError: transform not found

I hope someone could support me with this issue thanks in advance,

回答1:

You've pickled the vectorized array, not the transformer, you need pickle.dump(tfidf_vectorizer, idxf, pickle.HIGHEST_PROTOCOL)

来源：https://stackoverflow.com/questions/41213978/why-the-following-tfidf-vectorization-is-failing

标签

scikit-learn

tf-idf

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!