RNN in Tensorflow vs Keras, depreciation of tf.nn.dynamic_rnn()

前端 未结 1 1382
-上瘾入骨i
-上瘾入骨i 2021-02-01 21:18

My question is: Are the tf.nn.dynamic_rnn and keras.layers.RNN(cell) truly identical as stated in docs?

I am planning on building an RNN, however, it seems

相关标签:
1条回答
  • 2021-02-01 22:12

    No, but they are (or can be made to be) not so different either.

    TL;DR

    tf.nn.dynamic_rnn replaces elements after the sequence end with 0s. This cannot be replicated with tf.keras.layers.* as far as I know, but you can get a similar behaviour with RNN(Masking(...) approach: it simply stops the computation and carries the last outputs and states forward. You will get the same (non-padding) outputs as those obtained from tf.nn.dynamic_rnn.

    Experiment

    Here is a minimal working example demonstrating the differences between tf.nn.dynamic_rnn and tf.keras.layers.GRU with and without the use of tf.keras.layers.Masking layer.

    import numpy as np
    import tensorflow as tf
    
    test_input = np.array([
        [1, 2, 1, 0, 0],
        [0, 1, 2, 1, 0]
    ], dtype=int)
    seq_length = tf.constant(np.array([3, 4], dtype=int))
    
    emb_weights = (np.ones(shape=(3, 2)) * np.transpose([[0.37, 1, 2]])).astype(np.float32)
    emb = tf.keras.layers.Embedding(
        *emb_weights.shape,
        weights=[emb_weights],
        trainable=False
    )
    mask = tf.keras.layers.Masking(mask_value=0.37)
    rnn = tf.keras.layers.GRU(
        1,
        return_sequences=True,
        activation=None,
        recurrent_activation=None,
        kernel_initializer='ones',
        recurrent_initializer='zeros',
        use_bias=True,
        bias_initializer='ones'
    )
    
    
    def old_rnn(inputs):
        rnn_outputs, rnn_states = tf.nn.dynamic_rnn(
            rnn.cell,
            inputs,
            dtype=tf.float32,
            sequence_length=seq_length
        )
        return rnn_outputs
    
    
    x = tf.keras.layers.Input(shape=test_input.shape[1:])
    m0 = tf.keras.Model(inputs=x, outputs=emb(x))
    m1 = tf.keras.Model(inputs=x, outputs=rnn(emb(x)))
    m2 = tf.keras.Model(inputs=x, outputs=rnn(mask(emb(x))))
    
    print(m0.predict(test_input).squeeze())
    print(m1.predict(test_input).squeeze())
    print(m2.predict(test_input).squeeze())
    
    sess = tf.keras.backend.get_session()
    print(sess.run(old_rnn(mask(emb(x))), feed_dict={x: test_input}).squeeze())
    

    The outputs from m0 are there to show the result of applying the embedding layer. Note that there are no zero entries at all:

    [[[1.   1.  ]    [[0.37 0.37]
      [2.   2.  ]     [1.   1.  ]
      [1.   1.  ]     [2.   2.  ]
      [0.37 0.37]     [1.   1.  ]
      [0.37 0.37]]    [0.37 0.37]]]
    

    Now here are the actual outputs from the m1, m2 and old_rnn architectures:

    m1: [[  -6.  -50. -156. -272.7276 -475.83362]
         [  -1.2876 -9.862801 -69.314 -213.94202 -373.54672 ]]
    m2: [[  -6.  -50. -156. -156. -156.]
         [   0.   -6.  -50. -156. -156.]]
    old [[  -6.  -50. -156.    0.    0.]
         [   0.   -6.  -50. -156.    0.]]
    

    Summary

    • The old tf.nn.dynamic_rnn used to mask padding elements with zeros.
    • The new RNN layers without masking run over the padding elements as if they were data.
    • The new rnn(mask(...)) approach simply stops the computation and carries the last outputs and states forward. Note that the (non-padding) outputs that I obtained for this approach are exactly the same as those from tf.nn.dynamic_rnn.

    Anyway, I cannot cover all possible edge cases, but I hope that you can use this script to figure things out further.

    0 讨论(0)
提交回复
热议问题