问题
I know that the formula for tfidf vectorizer
is
Count of word/Total count * log(Number of documents / no.of documents where word is present)
I saw there's tfidf transformer in the scikit learn and I just wanted to difference between them. I could't find anything that's helpful.
回答1:
TfidfVectorizer is used on sentences, while TfidfTransformer is used on an existing count matrix, such as one returned by CountVectorizer
回答2:
Artem's answer pretty much sums up the difference. To make things clearer here is an example as referenced from here.
TfidfTransformer can be used as follows:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
train_set = ["The sky is blue.", "The sun is bright."]
vectorizer = CountVectorizer(stop_words='english')
trainVectorizerArray = vectorizer.fit_transform(article_master['stemmed_content'])
transformer = TfidfTransformer()
res = transformer.fit_transform(trainVectorizerArray)
print ((res.todense()))
## RESULT:
Fit Vectorizer to train set
[[1 0 1 0]
[0 1 0 1]]
[[0.70710678 0. 0.70710678 0. ]
[0. 0.70710678 0. 0.70710678]]
Extraction of count features, TF-IDF normalization and row-wise euclidean normalization can be done in one operation with TfidfVectorizer:
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(stop_words='english')
res1 = tfidf.fit_transform(train_set)
print ((res1.todense()))
## RESULT:
[[0.70710678 0. 0.70710678 0. ]
[0. 0.70710678 0. 0.70710678]]
Both processes produce a sparse matrix comprising of the same values.
Other useful references would be tfidfTransformer.fit_transform, countVectoriser_fit_transform and tfidfVectoriser .
回答3:
With Tfidftransformer you will compute word counts using CountVectorizer and then compute the IDF values and only then compute the Tf-idf scores. With Tfidfvectorizer you will do all three steps at once.
I think you should read this article which sums it up with an example.
来源:https://stackoverflow.com/questions/54745482/what-is-the-difference-between-tfidf-vectorizer-and-tfidf-transformer