tfidf.transform() function not returning correct values

心不动则不痛 提交于 2019-12-02 00:58:45

This is because of the 'l2 normalization' (default in TfidfVectorizer). As you expect, the first result of the transform() is:

array([[ 1.40546511,  1.40546511,  0.        ,  0.        ,  0.        ,
     0.        ]])

But now the normalization is done. In this, the above vector is divided by the divider:

dividor = sqrt(sqr(1.40546511)+sqr(1.40546511)+sqr(0)+sqr(0)+sqr(0)+sqr(0))
        = sqrt(1.975332175+1.975332175+0+0+0+0)
        = 1.98762782

So the resulting final array is:

array([[ 0.70710678,  0.70710678,  0.        ,  0.        ,  0.        ,
     0.        ]])

And then you apply sum, its result is = 1.4142135623730951.

Hope it is clear now. You can refer to my answer here for complete working of TfidfVectorizer.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!