import torch,ipdb
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
The output for the LSTM is the output for all the hidden nodes on the final layer.
hidden_size
- the number of LSTM blocks per layer.
input_size
- the number of input features per time-step.
num_layers
- the number of hidden layers.
In total there are hidden_size * num_layers
LSTM blocks.
The input dimensions are (seq_len, batch, input_size)
.
seq_len
- the number of time steps in each input stream.
batch
- the size of each batch of input sequences.
The hidden and cell dimensions are: (num_layers, batch, hidden_size)
output (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_t) from the last layer of the RNN, for each t.
So there will be hidden_size * num_directions
outputs. You didn't initialise the RNN to be bidirectional so num_directions
is 1. So output_size = hidden_size
.
Edit: You can change the number of outputs by using a linear layer:
out_rnn, hn = rnn(input, (h0, c0))
lin = nn.Linear(hidden_size, output_size)
v1 = nn.View(seq_len*batch, hidden_size)
v2 = nn.View(seq_len, batch, output_size)
output = v2(lin(v1(out_rnn)))
Note: for this answer I assumed that we're only talking about non-bidirectional LSTMs.
Source: PyTorch docs.