问题
Regarding the answer provided by @Shai in LSTM module for Caffe, where caffe.NetSpec()
is used to explicitly unroll LSTM units in time for training.
Using this code implementation, why does the "DummyData"
layer, or any data layer used instead as input X
, appears at the end of the t0
time step, just before "t1/lstm/Mx"
in the prototxt file? I don't get it...
A manipulation (cut / paste) is hence needed.
回答1:
Shai's NetSpec implementation of LSTM unrolls the net in time. Hence for every time step there is an LSTM unit with shared weights across time steps.
The "bottom" for each unit in time (e.g. t1/lstm/Mx
) is a different time step of the input X.
By the way, I suggest you use draw_net.py
caffe utility to draw the resulting prototxt and see the flow of data and the temporal repetitions of the unrolled LSTM unit.
Here's how the unrolled net looks like:
You can see the components of the three LSTM cells, and the different temporal slices of X
going to each temporal unrolled LSTM unit.
来源:https://stackoverflow.com/questions/36748063/datalayer-placement-in-the-prototxt-file-generated-by-shais-lstm-implementatio