what does the vector of a word in word2vec represents?

后端 未结 2 1157
粉色の甜心
粉色の甜心 2021-01-30 07:49

word2vec is a open source tool by Google:

  • For each word it provides a vector of float values, what exactly do they represent?

  • There is also a p

相关标签:
2条回答
  • 2021-01-30 07:54

    TLDR: Word2Vec is building word projections (embeddings) in a latent space of N dimensions, (N being the size of the word vectors obtained). The float values represents the coordinates of the words in this N dimensional space.

    The major idea behind latent space projections, putting objects in a different and continuous dimensional space, is that your objects will have a representation (a vector) that has more interesting calculus characteristics than basic objects.

    For words, what's useful is that you have a dense vector space which encodes similarity (i.e tree has a vector which is more similar to wood than from dancing). This opposes to classical sparse one-hot or "bag-of-word" encoding which treat each word as one dimension making them orthogonal by design (i.e tree,wood and dancing all have the same distance between them)

    Word2Vec algorithms do this:

    Imagine that you have a sentence:

    The dog has to go ___ for a walk in the park.

    You obviously want to fill the blank with the word "outside" but you could also have "out". The w2v algorithms are inspired by this idea. You'd like all words that fill in the blanks near, because they belong together - This is called the Distributional Hypothesis - Therefore the words "out" and "outside" will be closer together whereas a word like "carrot" would be farther away.

    This is sort of the "intuition" behind word2vec. For a more theorical explanation of what's going on i'd suggest reading:

    • GloVe: Global Vectors for Word Representation
    • Linguistic Regularities in Sparse and Explicit Word Representations
    • Neural Word Embedding as Implicit Matrix Factorization

    For paragraph vectors, the idea is the same as in w2v. Each paragraph can be represented by its words. Two models are presented in the paper.

    1. In a "Bag of Word" way (the pv-dbow model) where one fixed length paragraph vector is used to predict its words.
    2. By adding a fixed length paragraph token in word contexts (the pv-dm model). By retropropagating the gradient they get "a sense" of what's missing, bringing paragraph with the same words/topic "missing" close together.

    Bits from the article:

    The paragraph vector and word vectors are averaged or concatenated to predict the next word in a context. [...] The paragraph token can be thought of as another word. It acts as a memory that remembers what is missing from the current context – or the topic of the paragraph

    For full understanding on how these vectors are built you'll need to learn how neural nets are built and how the backpropagation algorithm works. (i'd suggest starting by this video and Andrew NG's Coursera class)

    NB: Softmax is just a fancy way of saying classification, each word in w2v algorithms is considered as a class. Hierarchical softmax/negative sampling are tricks to speed up softmax and handle a lot of classes.

    0 讨论(0)
  • 2021-01-30 08:15

    Fixed width contexts for each word are used as input into a neural network. The output of the network is a vector of float values - aka the word embedding - of a given dimension (typically 50 or 100). The network is trained so as to provide good word embedding given the train/test corpus.

    One can easily come up with a fixed size input for any word - say M words to the left and N words to the right of it. How to do so for a sentence or paragraph, whose sizes vary, is not as apparent, or at least it wasn't at first. Without reading the paper first, I'm guessing one can combine the fixed-width embedding of all the words in the sentence/paragraph to come up with a fixed-length vector embedding for a sentence/paragraph.

    0 讨论(0)
提交回复
热议问题