问题
I've been working on a data set (1000,3253) using a CNN. I'm running gradient calculations through gradient tape but it keeps running out of memory. Yet if I remove the line appending a gradient calculation to a list the script runs through all the epochs. I'm not entirely sure why this would happen but I am also new to tensorflow and the use of gradient tape. Any advice or input would be appreciated
#create a batch loop
for x, y_true in train_dataset:
#create a tape to record actions
with tf.GradientTape(watch_accessed_variables=False) as tape:
x_var = tf.Variable(x)
tape.watch([model.trainable_variables,x_var])
y_pred = model(x_var,training=True)
tape.stop_recording()
loss = los_func(y_true, y_pred)
epoch_loss_avg.update_state(loss)
epoch_accuracy.update_state(y_true, y_pred)
#pdb.set_trace()
gradients,something = tape.gradient(loss, (model.trainable_variables,x_var))
#sa_input.append(tape.gradient(loss, x_var))
del tape
#apply gradients
sa_input.append(something)
opti_func.apply_gradients(zip(gradients, model.trainable_variables))
train_loss_results.append(epoch_loss_avg.result())
train_accuracy_results.append(epoch_accuracy.result())
回答1:
As you are new to TF2, would recommend to go through this guide. This guide covers training, evaluation, and prediction (inference) models in TensorFlow 2.0 in two broad situations:
- When using built-in APIs for training & validation (such as model.fit(), model.evaluate(), model.predict()). This is covered in the section "Using built-in training & evaluation loops".
- When writing custom loops from scratch using eager execution and the GradientTape object. This is covered in the section "Writing your own training & evaluation loops from scratch".
Below is a program where I am computing the gradients after every epoch and appending to a list. At end of the program I am converting the list
to array
for simplicity.
Code - This program throws OOM Error error if I use a deep network of many layers and bigger filter size
# Importing dependency
%tensorflow_version 2.x
from tensorflow import keras
from tensorflow.keras import backend as K
from tensorflow.keras import datasets
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout, Flatten, Conv2D, MaxPooling2D
from tensorflow.keras.layers import BatchNormalization
import numpy as np
import tensorflow as tf
# Import Data
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Build Model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(32,32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(10))
# Model Summary
model.summary()
# Model Compile
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
# Define the Gradient Fucntion
epoch_gradient = []
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# Define the Gradient Function
@tf.function
def get_gradient_func(model):
with tf.GradientTape() as tape:
logits = model(train_images, training=True)
loss = loss_fn(train_labels, logits)
grad = tape.gradient(loss, model.trainable_weights)
model.optimizer.apply_gradients(zip(grad, model.trainable_variables))
return grad
# Define the Required Callback Function
class GradientCalcCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
grad = get_gradient_func(model)
epoch_gradient.append(grad)
epoch = 4
print(train_images.shape, train_labels.shape)
model.fit(train_images, train_labels, epochs=epoch, validation_data=(test_images, test_labels), callbacks=[GradientCalcCallback()])
# (7) Convert to a 2 dimensiaonal array of (epoch, gradients) type
gradient = np.asarray(epoch_gradient)
print("Total number of epochs run:", epoch)
Output -
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_12 (Conv2D) (None, 30, 30, 32) 896
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 15, 15, 32) 0
_________________________________________________________________
conv2d_13 (Conv2D) (None, 13, 13, 64) 18496
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 6, 6, 64) 0
_________________________________________________________________
conv2d_14 (Conv2D) (None, 4, 4, 64) 36928
_________________________________________________________________
flatten_4 (Flatten) (None, 1024) 0
_________________________________________________________________
dense_11 (Dense) (None, 64) 65600
_________________________________________________________________
dense_12 (Dense) (None, 10) 650
=================================================================
Total params: 122,570
Trainable params: 122,570
Non-trainable params: 0
_________________________________________________________________
(50000, 32, 32, 3) (50000, 1)
Epoch 1/4
1563/1563 [==============================] - 109s 70ms/step - loss: 1.7026 - accuracy: 0.4081 - val_loss: 1.4490 - val_accuracy: 0.4861
Epoch 2/4
1563/1563 [==============================] - 145s 93ms/step - loss: 1.2657 - accuracy: 0.5506 - val_loss: 1.2076 - val_accuracy: 0.5752
Epoch 3/4
1563/1563 [==============================] - 151s 96ms/step - loss: 1.1103 - accuracy: 0.6097 - val_loss: 1.1122 - val_accuracy: 0.6127
Epoch 4/4
1563/1563 [==============================] - 152s 97ms/step - loss: 1.0075 - accuracy: 0.6475 - val_loss: 1.0508 - val_accuracy: 0.6371
Total number of epochs run: 4
Hope this answers your question. Happy Learning.
来源:https://stackoverflow.com/questions/61843077/out-of-memory-oom-using-tensorflow-gradient-tape-but-only-happens-when-i-append