Why does word2Vec use cosine similarity?

后端未结

关注

 2  1613

I have been reading the papers on Word2Vec (e.g. this one), and I think I understand training the vectors to maximize the probability of other words found in the same contexts.<

相关标签:

2条回答

粉色の甜心

2021-02-01 21:41

Those two distance metrics are probably strongly correlated so it might not matter all that much which one you use. As you point out, cosine distance means we don't have to worry about the length of the vectors at all.

This paper indicates that there is a relationship between the frequency of the word and the length of the word2vec vector. http://arxiv.org/pdf/1508.02297v1.pdf

0 讨论(0)
发布评论:

提交评论
- 加载中...
终归单人心

2021-02-01 21:54

Cosine similarity of two n-dimensional vectors A and B is defined as:

which simply is the cosine of the angle between A and B.

while the Euclidean distance is defined as

Now think about the distance of two random elements of the vector space. For the cosine distance, the maximum distance is 1 as the range of cos is [-1, 1].

However, for the euclidean distance this can be any non-negative value.

When the dimension n gets bigger, two randomly chosen points have a cosine distance which gets closer and closer to 90°, whereas points in the unit-cube of R^n have an euclidean distance of roughly 0.41 (n)^0.5 (source)

TL;DR

cosine distance is better for vectors in a high-dimensional space because of the curse of dimensionality. (I'm not absolutely sure about it, though)

0 讨论(0)
发布评论:

提交评论
- 加载中...

Why does word2Vec use cosine similarity?

TL;DR