问题
I'm using some LSTM layers from TF2.0.
For training purpose I'm using the callback LearningRateScheduler
, and for speed purpose I disable the eager mode of Tensorflow (disable_eager_execution
).
But when I am using both of these functions, tensorflow raise a warning:
Operation ... was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session
Here is a custom script to illustrate the problem that I have :
import tensorflow as tf
import numpy as np
import time
import math
EAGER = False
DECAY = True
EPOCHS = 5
if not EAGER:
tf.compat.v1.disable_eager_execution()
def decay_func(lr_init):
def step_decay(epoch):
lrate = lr_init * math.pow(0.1, math.floor(epoch / 10))
return lrate
return step_decay
decay = tf.keras.callbacks.LearningRateScheduler(decay_func(0.1))
class MySequence(tf.keras.utils.Sequence):
def __init__(self, batch_size):
super(MySequence, self).__init__()
self.batch_size = batch_size
def __len__(self):
return 200
def __getitem__(self, item):
x = np.expand_dims(np.arange(20), axis=1) + np.random.rand(self.batch_size, 20, 30)
y = np.expand_dims(np.arange(20, 40), axis=1) + np.random.rand(self.batch_size, 20, 10)
return x, y
my_sequence = MySequence(batch_size=4)
def build_model():
inputs = tf.keras.Input(shape=(20, 30))
x = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(20))(inputs)
x = tf.keras.layers.LSTM(20, return_sequences=True)(x)
outputs = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(10))(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
return model
model = build_model()
model.compile(optimizer='adam', loss='mae')
start_train = time.time()
callbacks = []
if DECAY:
callbacks.append(decay)
history = model.fit_generator(generator=my_sequence, epochs=EPOCHS, callbacks=callbacks)
end = time.time()
min_train, sec_train = int((end - start_train) // 60), int((end - start_train) % 60)
print(f'Time to train: {min_train}min{sec_train}sec')
So when EAGER == False
and DECAY = True
, here is the output:
WARNING:tensorflow:From D:\...\VirtualEnv\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1630: calling
BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
2019-12-13 17:35:17.211443: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Epoch 1/5
2019-12-13 17:35:17.604649: W tensorflow/c/c_api.cc:326] Operation '{name:'lstm/while' id:229 op device:{} def:{{{node lstm/while}} = While[T=[DT_INT32, DT_INT32, DT_INT32, DT_V
ARIANT, DT_FLOAT, ..., DT_VARIANT, DT_VARIANT, DT_VARIANT, DT_VARIANT, DT_VARIANT], _lower_using_switch_merge=true, _num_original_outputs=45, body=lstm_while_body_124[], cond=ls
tm_while_cond_123[], output_shapes=[[], [], [], [], [?,20], ..., [], [], [], [], []], parallel_iterations=32](lstm/while/loop_counter, lstm/while/maximum_iterations, lstm/time,
lstm/TensorArrayV2_1, lstm/zeros, lstm/zeros_1, lstm/strided_slice_1, lstm/TensorArrayUnstack/TensorListFromTensor, lstm/kernel, lstm/recurrent_kernel, lstm/bias, lstm/while/Emp
tyTensorList, lstm/while/EmptyTensorList_1, lstm/while/EmptyTensorList_2, lstm/while/EmptyTensorList_3, lstm/while/EmptyTensorList_4, lstm/while/EmptyTensorList_5, lstm/while/Em
ptyTensorList_6, lstm/while/EmptyTensorList_7, lstm/while/EmptyTensorList_8, lstm/while/EmptyTensorList_9, lstm/while/EmptyTensorList_10, lstm/while/EmptyTensorList_11, lstm/whi
le/EmptyTensorList_12, lstm/while/EmptyTensorList_13, lstm/while/EmptyTensorList_14, lstm/while/EmptyTensorList_15, lstm/while/EmptyTensorList_16, lstm/while/EmptyTensorList_17,
lstm/while/EmptyTensorList_18, lstm/while/EmptyTensorList_19, lstm/while/EmptyTensorList_20, lstm/while/EmptyTensorList_21, lstm/while/EmptyTensorList_22, lstm/while/EmptyTenso
rList_23, lstm/while/EmptyTensorList_24, lstm/while/EmptyTensorList_25, lstm/while/EmptyTensorList_26, lstm/while/EmptyTensorList_27, lstm/while/EmptyTensorList_28, lstm/while/E
mptyTensorList_29, lstm/while/EmptyTensorList_30, lstm/while/EmptyTensorList_31, lstm/while/EmptyTensorList_32, lstm/while/EmptyTensorList_33)}}' was changed by setting attribut
e after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session
.
200/200 [==============================] - 2s 10ms/step - loss: 5.8431
Epoch 2/5
200/200 [==============================] - 2s 8ms/step - loss: 4.6052
Epoch 3/5
200/200 [==============================] - 1s 7ms/step - loss: 4.5750
Epoch 4/5
200/200 [==============================] - 2s 8ms/step - loss: 4.5366
Epoch 5/5
200/200 [==============================] - 2s 8ms/step - loss: 4.4898
Time to train: 0min8sec
The model seems to be still working but with a bigger model, it takes long time for tensorflow to raise the warning (around 10 minutes) which is pretty annoying.
How can I resolve this behavior ?
回答1:
I ran into similar performance issues while upgrading my code from TensorFlow 1.15 to 2.0. I was using fit_generator()
which is unfortunately buggy: It literally executes everything eagerly if eager mode is enabled instead of compiling a graph. I reported this as #35513 to which someone replied that fit_generator()
is deprecated as of TF 2.1 and people should use fit()
instead. However I didn't manage to use fit()
with a generator yet, but that might be my own bug, though I'm not sure whether that's already supposed to work in TF 2.0. In any case, this is likely why you see slow training with eager mode enabled and why disabling it helps to speed things up. (And by the way, this issue also causes insane GPU memory usages.)
However due to another bug that I reported as #35501 TF 2.0 will fail to use the cuDNN implementations of LSTM and GRU layers when eager mode is disabled, which again causes slower training than what I was used to from TF 1.15. If you have an Nvidia device, you definitely want cuDNN to be used, because it's a lot faster than regular implementations.
If you want maximum training speed, you could use TF 2.0 with fit_generator()
and leave eager mode enabled (to get the cuDNN benefits) and use model.compile(..., experimental_run_tf_function=False)
to fall back to the old training function (or model._experimental_run_tf_function = False
if loading a model). And then quickly upgrade to TF 2.1 as soon as it becomes available. A release candidate is already available for 2.1.
Edit: #35501 was closed as invalid. Apparently you can't have any cuDNN with eager mode disabled. This makes very little sense to me, but I can live with it. In the long term you want to use TF in the way it's intended to be used anyway, which is with eager mode enabled.
来源:https://stackoverflow.com/questions/59319853/tf-2-0-w-operation-was-changed-when-disabling-eager-mode-and-using-a-callbac