Understanding a simple LSTM pytorch

前端 未结 3 1212
北荒
北荒 2021-01-30 04:13
import torch,ipdb
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable         


        
3条回答
  •  傲寒
    傲寒 (楼主)
    2021-01-30 05:05

    The output for the LSTM is the output for all the hidden nodes on the final layer.
    hidden_size - the number of LSTM blocks per layer.
    input_size - the number of input features per time-step.
    num_layers - the number of hidden layers.
    In total there are hidden_size * num_layers LSTM blocks.

    The input dimensions are (seq_len, batch, input_size).
    seq_len - the number of time steps in each input stream.
    batch - the size of each batch of input sequences.

    The hidden and cell dimensions are: (num_layers, batch, hidden_size)

    output (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_t) from the last layer of the RNN, for each t.

    So there will be hidden_size * num_directions outputs. You didn't initialise the RNN to be bidirectional so num_directions is 1. So output_size = hidden_size.

    Edit: You can change the number of outputs by using a linear layer:

    out_rnn, hn = rnn(input, (h0, c0))
    lin = nn.Linear(hidden_size, output_size)
    v1 = nn.View(seq_len*batch, hidden_size)
    v2 = nn.View(seq_len, batch, output_size)
    output = v2(lin(v1(out_rnn)))
    

    Note: for this answer I assumed that we're only talking about non-bidirectional LSTMs.

    Source: PyTorch docs.

提交回复
热议问题