发表新帖

发表新帖

Is a countvectorizer the same as tfidfvectorizer with use_idf=false?

前端未结

关注

 2  1428

臣服心动 2021-02-02 00:50

As the title states: Is a countvectorizer the same as tfidfvectorizer with use_idf=false ? If not why not ?

So does this also mean that adding

2条回答

梦如初夏 (楼主)

2021-02-02 01:07
No, they're not the same. TfidfVectorizer normalizes its results, i.e. each vector in its output has norm 1:
```
>>> CountVectorizer().fit_transform(["foo bar baz", "foo bar quux"]).A
array([[1, 1, 1, 0],
       [1, 0, 1, 1]])
>>> TfidfVectorizer(use_idf=False).fit_transform(["foo bar baz", "foo bar quux"]).A
array([[ 0.57735027,  0.57735027,  0.57735027,  0.        ],
       [ 0.57735027,  0.        ,  0.57735027,  0.57735027]])
```
This is done so that dot-products on the rows are cosine similarities. Also TfidfVectorizer can use logarithmically discounted frequencies when given the option sublinear_tf=True.

To make TfidfVectorizer behave as CountVectorizer, give it the constructor options use_idf=False, normalize=None.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题