Embedding in pytorch

后端 未结 4 723
清歌不尽
清歌不尽 2021-01-30 08:38

I have checked the PyTorch tutorial and questions similar to this one on Stackoverflow.

I get confused; does the embedding in pytorch (Embedding) make the similar words

4条回答
  •  清歌不尽
    2021-01-30 09:30

    You could treat nn.Embedding as a lookup table where the key is the word index and the value is the corresponding word vector. However, before using it you should specify the size of the lookup table, and initialize the word vectors yourself. Following is a code example demonstrating this.

    import torch.nn as nn 
    
    # vocab_size is the number of words in your train, val and test set
    # vector_size is the dimension of the word vectors you are using
    embed = nn.Embedding(vocab_size, vector_size)
    
    # intialize the word vectors, pretrained_weights is a 
    # numpy array of size (vocab_size, vector_size) and 
    # pretrained_weights[i] retrieves the word vector of
    # i-th word in the vocabulary
    embed.weight.data.copy_(torch.fromnumpy(pretrained_weights))
    
    # Then turn the word index into actual word vector
    vocab = {"some": 0, "words": 1}
    word_indexes = [vocab[w] for w in ["some", "words"]] 
    word_vectors = embed(word_indexes)
    

提交回复
热议问题