I\'ve been messing with Keras, and like it so far. There\'s one big issue I have been having, when working with fairly deep networks: When calling model.train_on_batch, or model
Both Theano and Tensorflow augments the symbolic graph that is created, though both differently.
To analyze how the memory consumption is happening you can start with a smaller model and grow it to see the corresponding growth in memory. Similarly you can grow the batch_size
to see the corresponding growth in memory.
Here is a code snippet for increasing batch_size
based on your initial code:
from scipy import misc
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Convolution2D, MaxPooling2D, Reshape, Flatten, ZeroPadding2D, Dropout
import os
import matplotlib.pyplot as plt
def gpu_memory():
out = os.popen("nvidia-smi").read()
ret = '0MiB'
for item in out.split("\n"):
if str(os.getpid()) in item and 'python' in item:
ret = item.strip().split(' ')[-2]
return float(ret[:-3])
gpu_mem = []
gpu_mem.append(gpu_memory())
model = Sequential()
model.add(Convolution2D(100, 3, 3, border_mode='same', input_shape=(16,16,1)))
model.add(Convolution2D(256, 3, 3, border_mode='same'))
model.add(Convolution2D(32, 3, 3, border_mode='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(4))
model.add(Dense(1))
model.summary()
gpu_mem.append(gpu_memory())
model.compile(optimizer='sgd',
loss='mse',
metrics=['accuracy'])
gpu_mem.append(gpu_memory())
batches = []
n_batches = 20
batch_size = 1
for ibatch in range(n_batches):
batch_size = (ibatch+1)*10
batches.append(batch_size)
x = np.random.rand(batch_size, 16,16,1)
y = np.random.rand(batch_size, 1)
print y.shape
model.train_on_batch(x, y)
print("Trained one iteration")
gpu_mem.append(gpu_memory())
fig = plt.figure()
plt.plot([-100, -50, 0]+batches, gpu_mem)
plt.show()
Also, for speed Tensorflow hogs up the all available GPU memory. To stop that and you need to add config.gpu_options.allow_growth = True
in get_session()
# keras/backend/tensorflow_backend.py
def get_session():
global _SESSION
if tf.get_default_session() is not None:
session = tf.get_default_session()
else:
if _SESSION is None:
if not os.environ.get('OMP_NUM_THREADS'):
config = tf.ConfigProto(allow_soft_placement=True,
)
else:
nb_thread = int(os.environ.get('OMP_NUM_THREADS'))
config = tf.ConfigProto(intra_op_parallelism_threads=nb_thread,
allow_soft_placement=True)
config.gpu_options.allow_growth = True
_SESSION = tf.Session(config=config)
session = _SESSION
if not _MANUAL_VAR_INIT:
_initialize_variables()
return session
Now if you run the prev snippet you get plots like:
Theano: Tensorflow:
Theano: After model.compile()
whatever the memory is needed, during the start of training, it almost doubles. This is because Theano augments the symbolic graph to do back-propagation and each tensor needs a corresponding tensor to achieve the backward flow of gradients. The memory needs don't seem to grow with batch_size
and this is unexpected to me as the placeholder size should increase to accommodate the data inflow from CPU->GPU.
Tensorflow: No GPU memory is allocated even after model.compile()
as Keras don't call get_session()
till that time which actually calls _initialize_variables()
. Tensorflow seems to hog memory in chunks for speed and so the memory don't grow linearly with batch_size
.
Having said all that Tensorflow seems to be memory hungry but for big graphs its very fast.. Theano on the other hand is very gpu memory efficient but takes a hell lot of time to initialize the graph at the start of training. After that its also pretty fast.