PyTorch：Embedding | 易学教程

-柚子皮-

torch.nn.Embedding(num_embeddings: int, embedding_dim: int, padding_idx: Optional[int] = None, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, _weight: Optional[torch.Tensor] = None)

参数

num_embeddings (int) – size of the dictionary of embeddings 词典的大小尺寸，比如总共出现5000个词，那就输入5000。此时index为（0-4999）。注意这里num_embeddings必须要比词对应的最大index要大，而不是比词个数大就可以。

embedding_dim (int) – the size of each embedding vector 嵌入向量的维度，即用多少维来表示一个符号。embedding_dim的选择要注意，根据自己的符号数量，举个例子，如果你的词典尺寸是1024，那么极限压缩（用二进制表示）也需要10维，再考虑词性之间的相关性，怎么也要在15-20维左右，虽然embedding是用来降维的，但是>- 也要注意这种极限维度，结合实际情况，合理定义。

padding_idx (int, optional) – If given, pads the output with the embedding vector at padding_idx (initialized to zeros) whenever it encounters the index. 填充id，这样，网络在遇到填充id时，就不会计算其与其它符号的相关性（直接初始化为0）。

max_norm (float, optional) – If given, each embedding vector with norm larger than max_norm is renormalized to have norm max_norm.最大范数，如果嵌入向量的范数超过了这个界限，就要进行再归一化。

norm_type (float, optional) – The p of the p-norm to compute for the max_norm option. Default 2. 指定利用什么范数计算，并用于对比max_norm，默认为2范数。

scale_grad_by_freq (boolean, optional) – If given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default False.根据单词在mini-batch中出现的频率，对梯度进行放缩。默认为False。

sparse (bool, optional) – If True, gradient w.r.t. weight matrix will be a sparse tensor. See Notes for more details regarding sparse gradients.若为True,则与权重矩阵相关的梯度转变为稀疏张量。

_weight (Tensor) - 形状为(num_embeddings, embedding_dim)，模块中可学习的权值（初始化时的）。

变量

~Embedding.weight (Tensor) – the learnable weights of the module of shape (num_embeddings, embedding_dim) initialized from

Embedding类有个属性weight，是torch.nn.parameter.Parameter类型，作用就是存储真正的word embeddings。如果不给weight赋值，Embedding类会自动给他初始化，看源码[SOURCE]可知如果属性weight没有手动赋值，则会定义一个torch.nn.parameter.Parameter对象，然后对该对象进行reset_parameters()，对self.weight先转为Tensor在对其进行normal_(0, 1)(调整为$N(0, 1)$正态分布)。所以nn.Embeddig.weight默认初始化方式就是N(0, 1)分布，即均值$\mu=0$，方差$\sigma=1$的标准正态分布。

[nn.Embedding.weight初始化分布]

设置

1 如果不需要更新embedding，可以使用

if args["fix_embedding"]:
self.embedding.weight.requires_grad = False

2 如果embedding初始化后想修改初始化

比如默认是(0,1)的正态分布初始化，改成(0, 0.1)

row_ids = list(range(0, 7))
row_ids.remove(padding_idx)
embedding.weight.data[row_ids, :].normal_(0, 0.1)

这种应该也可以通过_weight参数搞定。

-柚子皮-

示例

import torch
import torch.nn as nn

padding_idx = 1
embedding = nn.Embedding(7, 5, padding_idx=padding_idx)

hello_idx = torch.tensor([[0, 2, 1], [5, 4, 6]]) # batch(2)*maxlen(3)
hello_embed = embedding(hello_idx)
print(hello_embed)

tensor([[[-0.3082, -0.9863, -0.4503, 0.2426, -1.3222],
[-0.6704, -0.2935, -0.7002, -0.3181, -1.5412],
[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]],

[[ 1.0155, -0.4772, 0.2604, 1.0059, -0.5082],
[ 0.1376, -0.7339, 0.3480, -1.7744, -0.6694],
[-0.7294, -0.3488, -0.0429, -1.4107, -0.9397]]],
grad_fn=<EmbeddingBackward>)

[https://www.cnblogs.com/lindaxin/p/7991436.html]

from:-柚子皮-

ref: [torch.nn > Embedding]

[通俗讲解pytorch中nn.Embedding原理及使用]

来源：oschina

链接：https://my.oschina.net/u/4375351/blog/4633249

标签

NoRM

optional

torch