I am confused on what is the correct way to use the initial state tensor in Tensorflow for RNNs. There is almost a 50/50 split between turtorials that either use LSTMStateTuple
The two are different things. state_is_tuple
is used on LSTM cells because the state of LSTM cells is a tuple. cell.zero_state
is the initializer of the state for all RNN cells.
You will generally prefer cell.zero_state
function as it will initialize the required state class depending on whether state_is_tuple
is true or not.
See this GitHub issue where you can see the cell.zero_state
recommended - "use the zero_state function on the cell object".
Another reason why you may want cell.zero_state
is because it is agnostic of the type of the cell (LSTM, GRU, RNN) and you can do something like this:
if type == 'GRU':
cell = BasicGRUCell
else:
cell = BasicLSTMCell(state_is_tuple=True)
init_state = cell.zero_state(batch_size)
with the initial state being set up OK.
LSTMStateTuple
will work only on cells that have the state as a tuple.
When to use LSTMStateTuple?
You'll want to use LSTMStateTuple
when you're initializing your state with custom values (passed by the trainer). cell.zero_state()
will return the state with all the values equal to 0.0.
If you want to keep state between batches than you'll have to get it after each batch and add it to your feed_dict
the next batch.
See this for an explanation on why LSTM state is a tuple.