My question is: Are the tf.nn.dynamic_rnn
and keras.layers.RNN(cell)
truly identical as stated in docs?
I am planning on building an RNN, however, it seems that tf.nn.dynamic_rnn
is depricated in favour of Keras.
In particular, it states that:
Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Please use keras.layers.RNN(cell), which is equivalent to this API
But I don't see how the APIs are equivalent, in the case of variable sequence lengths!
In raw TF, we can specify a tensor of shape (batch_size, seq_lengths)
. This way, if our sequence is [0, 1, 2, 3, 4]
and the longest sequence in the batch is of size 10, we can pad it with 0s and [0, 1, 2, 3, 4, 0, 0, 0, 0, 0]
, we can say seq_length=5
to process [0, 1, 2, 3, 4]
.
However, in Keras, this is not how it works! What we can do, is specify the mask_zero=True
in previous Layers, e.g. the Embedding Layer. This will also mask the 1st zero!
I can go around it by adding ones to the whole vector, but then thats extra preprocessing that I need to do after processing using tft.compute_vocabulary()
, which maps vocabulary words to 0 indexed vector.
No, but they are (or can be made to be) not so different either.
TL;DR
tf.nn.dynamic_rnn
replaces elements after the sequence end with 0s. This cannot be replicated with tf.keras.layers.*
as far as I know, but you can get a similar behaviour with RNN(Masking(...)
approach: it simply stops the computation and carries the last outputs and states forward. You will get the same (non-padding) outputs as those obtained from tf.nn.dynamic_rnn
.
Experiment
Here is a minimal working example demonstrating the differences between tf.nn.dynamic_rnn
and tf.keras.layers.GRU
with and without the use of tf.keras.layers.Masking
layer.
import numpy as np
import tensorflow as tf
test_input = np.array([
[1, 2, 1, 0, 0],
[0, 1, 2, 1, 0]
], dtype=int)
seq_length = tf.constant(np.array([3, 4], dtype=int))
emb_weights = (np.ones(shape=(3, 2)) * np.transpose([[0.37, 1, 2]])).astype(np.float32)
emb = tf.keras.layers.Embedding(
*emb_weights.shape,
weights=[emb_weights],
trainable=False
)
mask = tf.keras.layers.Masking(mask_value=0.37)
rnn = tf.keras.layers.GRU(
1,
return_sequences=True,
activation=None,
recurrent_activation=None,
kernel_initializer='ones',
recurrent_initializer='zeros',
use_bias=True,
bias_initializer='ones'
)
def old_rnn(inputs):
rnn_outputs, rnn_states = tf.nn.dynamic_rnn(
rnn.cell,
inputs,
dtype=tf.float32,
sequence_length=seq_length
)
return rnn_outputs
x = tf.keras.layers.Input(shape=test_input.shape[1:])
m0 = tf.keras.Model(inputs=x, outputs=emb(x))
m1 = tf.keras.Model(inputs=x, outputs=rnn(emb(x)))
m2 = tf.keras.Model(inputs=x, outputs=rnn(mask(emb(x))))
print(m0.predict(test_input).squeeze())
print(m1.predict(test_input).squeeze())
print(m2.predict(test_input).squeeze())
sess = tf.keras.backend.get_session()
print(sess.run(old_rnn(mask(emb(x))), feed_dict={x: test_input}).squeeze())
The outputs from m0
are there to show the result of applying the embedding layer.
Note that there are no zero entries at all:
[[[1. 1. ] [[0.37 0.37]
[2. 2. ] [1. 1. ]
[1. 1. ] [2. 2. ]
[0.37 0.37] [1. 1. ]
[0.37 0.37]] [0.37 0.37]]]
Now here are the actual outputs from the m1
, m2
and old_rnn
architectures:
m1: [[ -6. -50. -156. -272.7276 -475.83362]
[ -1.2876 -9.862801 -69.314 -213.94202 -373.54672 ]]
m2: [[ -6. -50. -156. -156. -156.]
[ 0. -6. -50. -156. -156.]]
old [[ -6. -50. -156. 0. 0.]
[ 0. -6. -50. -156. 0.]]
Summary
- The old
tf.nn.dynamic_rnn
used to mask padding elements with zeros. - The new RNN layers without masking run over the padding elements as if they were data.
- The new
rnn(mask(...))
approach simply stops the computation and carries the last outputs and states forward. Note that the (non-padding) outputs that I obtained for this approach are exactly the same as those fromtf.nn.dynamic_rnn
.
Anyway, I cannot cover all possible edge cases, but I hope that you can use this script to figure things out further.
来源:https://stackoverflow.com/questions/54989442/rnn-in-tensorflow-vs-keras-depreciation-of-tf-nn-dynamic-rnn