Difference between 1 LSTM with num_layers = 2 and 2 LSTMs in pytorch

我们两清 提交于 2021-01-20 16:39:02

问题


I am new to deep learning and currently working on using LSTMs for language modeling. I was looking at the pytorch documentation and was confused by it.

If I create a

nn.LSTM(input_size, hidden_size, num_layers) 

where hidden_size = 4 and num_layers = 2, I think I will have an architecture something like:

op0    op1 ....
LSTM -> LSTM -> h3
LSTM -> LSTM -> h2
LSTM -> LSTM -> h1
LSTM -> LSTM -> h0
x0     x1 .....

If I do something like

nn.LSTM(input_size, hidden_size, 1)
nn.LSTM(input_size, hidden_size, 1)

I think the network architecture will look exactly like above. Am I wrong? And if yes, what is the difference between these two?


回答1:


The multi-layer LSTM is better known as stacked LSTM where multiple layers of LSTM are stacked on top of each other.

Your understanding is correct. The following two definitions of stacked LSTM are same.

nn.LSTM(input_size, hidden_size, 2)

and

nn.Sequential(OrderedDict([
    ('LSTM1', nn.LSTM(input_size, hidden_size, 1),
    ('LSTM2', nn.LSTM(hidden_size, hidden_size, 1)
]))

Here, the input is feed into the lowest layer of LSTM and then the output of the lowest layer is forwarded to the next layer and so on so forth. Please note, the output size of the lowest LSTM layer and the rest of the LSTM layer's input size is hidden_size.

However, you may have seen people defined stacked LSTM in the following way:

rnns = nn.ModuleList()
for i in range(nlayers):
    input_size = input_size if i == 0 else hidden_size
    rnns.append(nn.LSTM(input_size, hidden_size, 1))

The reason people sometimes use the above approach is that if you create a stacked LSTM using the first two approaches, you can't get the hidden states of each individual layer. Check out what LSTM returns in PyTorch.

So, if you want to have the intermedia layer's hidden states, you have to declare each individual LSTM layer as a single LSTM and run through a loop to mimic the multi-layer LSTM operations. For example:

outputs = []
for i in range(nlayers):
    if i != 0:
        sent_variable = F.dropout(sent_variable, p=0.2, training=True)
    output, hidden = rnns[i](sent_variable)
    outputs.append(output)
    sent_variable = output

In the end, outputs will contain all the hidden states of each individual LSTM layer.



来源:https://stackoverflow.com/questions/49224413/difference-between-1-lstm-with-num-layers-2-and-2-lstms-in-pytorch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!