what is word vector dimension

后端 未结 2 1395
时光说笑
时光说笑 2021-01-01 04:59

I am currently an amateur in deep learning and was reading about word2vector on this site https://www.kaggle.com/c/word2vec-nlp-tutorial/details/part-3-more-fun-with-word-ve

相关标签:
2条回答
  • 2021-01-01 05:38

    "Word Vector Dimension" is the dimension of the vector that you have trained with the training document. Technically you can choose any dimension, like 10, 100, 300, even 1000. Industry norm is 300-500 because we have experimented with different dimensions (300, 400, 500, ... 1000, etc.) but haven't noticed the significant performance improvement after 300-400. (This also depends on your training data.) As it sounds, more dimension means heavier computation. However, if we set the dimension too low, then there is not much vector space to capture the information that the entire training document contains.

    How to visualize it?

    You can't easily visualize 300-dimensional vector and probably visualizing 300-d vectors isn't too useful to you. What we can do is to project those vectors to 2-d space, the space that we are most familiar with and that we can understand easily.

    Your last statement So I guess the word vector dimension should be equal to the vocabulary size is WRONG! Vocab size is 171,476 words (total # of words in English)! Word vector dimension (mostly 300-500. You don't want to train 1-billion-dimensional vectors, do you?) is the size of vector you decide in advance to train the data. My video (shameless plug) will help you to understand the important word vector concepts: AI with the Best

    0 讨论(0)
  • 2021-01-01 05:40

    Actually the word vector dimension does not reflect the vocabulary size. What Word2Vec is doing is mapping the words to their representation in a vector space and you can make this space of any dimension you want: : Each word is represented by a point in this space and word vector dimension are the coordinates of this word in this space. Also words that tend to appear in the same context appear next to each other in this space.

    Hope this helps

    0 讨论(0)
提交回复
热议问题