RNN in Tensorflow vs Keras, depreciation of tf.nn.dynamic_rnn()

My question is: Are the tf.nn.dynamic_rnn and keras.layers.RNN(cell) truly identical as stated in docs?

I am planning on building an RNN, however, it seems that tf.nn.dynamic_rnn is depricated in favour of Keras.

In particular, it states that:

Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Please use keras.layers.RNN(cell), which is equivalent to this API

But I don't see how the APIs are equivalent, in the case of variable sequence lengths!

In raw TF, we can specify a tensor of shape (batch_size, seq_lengths). This way, if our sequence is [0, 1, 2, 3, 4] and the longest sequence in the batch is of size 10, we can pad it with 0s and [0, 1, 2, 3, 4, 0, 0, 0, 0, 0], we can say seq_length=5 to process [0, 1, 2, 3, 4].

However, in Keras, this is not how it works! What we can do, is specify the mask_zero=True in previous Layers, e.g. the Embedding Layer. This will also mask the 1st zero!

I can go around it by adding ones to the whole vector, but then thats extra preprocessing that I need to do after processing using tft.compute_vocabulary(), which maps vocabulary words to 0 indexed vector.

No, but they are (or can be made to be) not so different either.

TL;DR

tf.nn.dynamic_rnn replaces elements after the sequence end with 0s. This cannot be replicated with tf.keras.layers.* as far as I know, but you can get a similar behaviour with RNN(Masking(...) approach: it simply stops the computation and carries the last outputs and states forward. You will get the same (non-padding) outputs as those obtained from tf.nn.dynamic_rnn.

Experiment

Here is a minimal working example demonstrating the differences between tf.nn.dynamic_rnn and tf.keras.layers.GRU with and without the use of tf.keras.layers.Masking layer.

import numpy as np
import tensorflow as tf

test_input = np.array([
    [1, 2, 1, 0, 0],
    [0, 1, 2, 1, 0]
], dtype=int)
seq_length = tf.constant(np.array([3, 4], dtype=int))

emb_weights = (np.ones(shape=(3, 2)) * np.transpose([[0.37, 1, 2]])).astype(np.float32)
emb = tf.keras.layers.Embedding(
    *emb_weights.shape,
    weights=[emb_weights],
    trainable=False
)
mask = tf.keras.layers.Masking(mask_value=0.37)
rnn = tf.keras.layers.GRU(
    1,
    return_sequences=True,
    activation=None,
    recurrent_activation=None,
    kernel_initializer='ones',
    recurrent_initializer='zeros',
    use_bias=True,
    bias_initializer='ones'
)


def old_rnn(inputs):
    rnn_outputs, rnn_states = tf.nn.dynamic_rnn(
        rnn.cell,
        inputs,
        dtype=tf.float32,
        sequence_length=seq_length
    )
    return rnn_outputs


x = tf.keras.layers.Input(shape=test_input.shape[1:])
m0 = tf.keras.Model(inputs=x, outputs=emb(x))
m1 = tf.keras.Model(inputs=x, outputs=rnn(emb(x)))
m2 = tf.keras.Model(inputs=x, outputs=rnn(mask(emb(x))))

print(m0.predict(test_input).squeeze())
print(m1.predict(test_input).squeeze())
print(m2.predict(test_input).squeeze())

sess = tf.keras.backend.get_session()
print(sess.run(old_rnn(mask(emb(x))), feed_dict={x: test_input}).squeeze())

The outputs from m0 are there to show the result of applying the embedding layer. Note that there are no zero entries at all:

[[[1.   1.  ]    [[0.37 0.37]
  [2.   2.  ]     [1.   1.  ]
  [1.   1.  ]     [2.   2.  ]
  [0.37 0.37]     [1.   1.  ]
  [0.37 0.37]]    [0.37 0.37]]]

Now here are the actual outputs from the m1, m2 and old_rnn architectures:

m1: [[  -6.  -50. -156. -272.7276 -475.83362]
     [  -1.2876 -9.862801 -69.314 -213.94202 -373.54672 ]]
m2: [[  -6.  -50. -156. -156. -156.]
     [   0.   -6.  -50. -156. -156.]]
old [[  -6.  -50. -156.    0.    0.]
     [   0.   -6.  -50. -156.    0.]]

Summary

The old tf.nn.dynamic_rnn used to mask padding elements with zeros.
The new RNN layers without masking run over the padding elements as if they were data.
The new rnn(mask(...)) approach simply stops the computation and carries the last outputs and states forward. Note that the (non-padding) outputs that I obtained for this approach are exactly the same as those from tf.nn.dynamic_rnn.

Anyway, I cannot cover all possible edge cases, but I hope that you can use this script to figure things out further.

来源：https://stackoverflow.com/questions/54989442/rnn-in-tensorflow-vs-keras-depreciation-of-tf-nn-dynamic-rnn

标签

tensorflow

keras

tf.keras