How do I classify documents with SciKitLearn using TfIdfVectorizer?

后端 未结 2 1964
你的背包
你的背包 2021-02-11 02:44

The following example shows how one can train a classifier with the Sklearn 20 newsgroups data.

>>> from sklearn.feature_extraction.text import TfidfVec         


        
2条回答
  •  暗喜
    暗喜 (楼主)
    2021-02-11 03:10

    To address questions from comments; The whole basic process of working with tfidf representation in some classification task you should:

    1. You fit the vectorizer to your training data and save it in some variable, lets call it tfidf
    2. You transform training data (without labels, just text) through data = tfidf.transform(...)
    3. You fit the model (classifier) using some_classifier.fit( data, labels ), where labels are in the same order as documnents in data
    4. During testing you use tfidf.transform( ... ) on new data, and check the prediction of your model

提交回复
热议问题