I have checked the PyTorch tutorial and questions similar to this one on Stackoverflow.
I get confused; does the embedding in pytorch (Embedding) make the similar words
Agh! I think this part is still missing. Showcasing that when you set the embedding layer you automatically get the weights, that you may later alter with
nn.Embedding.from_pretrained(weight)
import torch
import torch.nn as nn
embedding = nn.Embedding(10, 4)
print(type(embedding))
print(embedding)
t1 = embedding(torch.LongTensor([0,1,2,3,4,5,6,7,8,9])) # adding, 10 won't work
print(t1.shape)
print(t1)
t2 = embedding(torch.LongTensor([1,2,3]))
print(t2.shape)
print(t2)
#predefined weights
weight = torch.FloatTensor([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]])
print(weight.shape)
embedding = nn.Embedding.from_pretrained(weight)
# get embeddings for ind 0 and 1
embedding(torch.LongTensor([0, 1]))
Output:
Embedding(10, 4)
torch.Size([10, 4])
tensor([[-0.7007, 0.0169, -0.9943, -0.6584],
[-0.7390, -0.6449, 0.1481, -1.4454],
[-0.1407, -0.1081, 0.6704, -0.9218],
[-0.2738, -0.2832, 0.7743, 0.5836],
[ 0.4950, -1.4879, 0.4768, 0.4148],
[ 0.0826, -0.7024, 1.2711, 0.7964],
[-2.0595, 2.1670, -0.1599, 2.1746],
[-2.5193, 0.6946, -0.0624, -0.1500],
[ 0.5307, -0.7593, -1.7844, 0.1132],
[-0.0371, -0.5854, -1.0221, 2.3451]], grad_fn=)
torch.Size([3, 4])
tensor([[-0.7390, -0.6449, 0.1481, -1.4454],
[-0.1407, -0.1081, 0.6704, -0.9218],
[-0.2738, -0.2832, 0.7743, 0.5836]], grad_fn=)
torch.Size([2, 3])
tensor([[0.1000, 0.2000, 0.3000],
[0.4000, 0.5000, 0.6000]])
And the last part is that the Embedding layer weights can be learned with the gradient descent.