Memory leak with TensorFlow

问题

I have a memory leak with TensorFlow. I refered to Tensorflow : Memory leak even while closing Session? to address my issue, and I followed the advices of the answer, that seemed to have solved the problem. However it does not work here.

In order to recreate the memory leak, I have created a simple example. First, I use this function (that I got here : How to get current CPU and RAM usage in Python?) to check the memory use of the python process :

def memory():
    import os
    import psutil
    pid = os.getpid()
    py = psutil.Process(pid)
    memoryUse = py.memory_info()[0]/2.**30  # memory use in GB...I think
    print('memory use:', memoryUse)

Then, everytime I call the build_model function, the use of memory increases.

Here is the build_model function that has a memory leak :

def build_model():

    '''Model'''

    tf.reset_default_graph()


    with tf.Graph().as_default(), tf.Session() as sess:
        tf.contrib.keras.backend.set_session(sess)

        labels = tf.placeholder(tf.float32, shape=(None, 1))
        input = tf.placeholder(tf.float32, shape=(None, 1))

        x = tf.contrib.keras.layers.Dense(30, activation='relu', name='dense1')(input)
        x1 = tf.contrib.keras.layers.Dropout(0.5)(x)
        x2 = tf.contrib.keras.layers.Dense(30, activation='relu', name='dense2')(x1)
        y = tf.contrib.keras.layers.Dense(1, activation='sigmoid', name='dense3')(x2)


        loss = tf.reduce_mean(tf.contrib.keras.losses.binary_crossentropy(labels, y))

        train_step = tf.train.AdamOptimizer(0.004).minimize(loss)

        #Initialize all variables
        init_op = tf.global_variables_initializer()
        sess.run(init_op)

        sess.close()

    tf.reset_default_graph()

    return

I would have thought that using the block with tf.Graph().as_default(), tf.Session() as sess: and then closing the session and calling tf.reset_default_graph would clear all the memory used by TensorFlow. Apparently it does not.

The memory leak can be recreated as following :

memory()
build_model()
memory()
build_model()
memory()

The output of this is (for my computer) :

memory use: 0.1794891357421875
memory use: 0.184417724609375
memory use: 0.18923568725585938

Clearly we can see that all the memory used by TensorFlow is not freed afterwards. Why?

I plotted the use of memory over 100 iterations of calling build_model, and this is what I get :

I think that goes to show that there is a memory leak.

回答1:

The problem was due to Tensorflow version 0.11. As of today Tensorflow 0.12 is out and the bug is resolved. Upgrade to a newer version and it should work as expected. Don't forget to call tf.contrib.keras.backend.clear_session() at the end.

回答2:

I had this same problem. Tensorflow (v2.0.0) was consuming ~ 0.3GB every EPOCH in an LSTM model I was training. I discovered that the tensorflow callback hooks were the main culprit. I removed the tensorboard callback & it worked fine after

history = model.fit(
        train_x,
        train_y,
        epochs=EPOCHS,
        batch_size=BATCH_SIZE,
        validation_data=(test_x, test_y)
        ,callbacks=[tensorboard, checkpoint]
)

回答3:

Normally what happened is we use the loop outside of a session. I think here what is happening is at each time you add more and more memory chunks when running this init_op = tf.global_variables_initializer(). Because if the loop is outside the session it will only get initialized for once. What happen hear is it's always get initialized and keep that in the memory.

Editing the answer since still you have the memory issue

The possibly it's the graph. Because each time you will create a graph which will hold the memory.Try to remove it and run. By removing it will take your all operations as the default graph. I think you need some kind of memory flush function outside the tensorflow. because each time when you run this it will stack up a graph.

回答4:

I faced something similar in TF 1.12 as well. Don't create the graph and session for every iteration. Every time the graph is created and variable initialized, you are not redefining the old graph but creating new ones leading to memory leaks. I was able to solve this by defining the graph once and then passing the session to my iterative logic.

From How not program Tensorflow

Be conscious of when you’re creating ops, and only create the ones you need. Try to keep op creation distinct from op execution.

Especially if you’re just working with the default graph and running interactively in a regular REPL or a notebook, you can end up with a lot of abandoned ops in your graph. Every time you re-run a notebook cell that defines any graph ops, you aren’t just redefining ops—you’re creating new ones.

Also, see this great answer for better understanding.

回答5:

This memory leak issue was resolved in the recent stable version Tensorflow 1.15.0. I ran the code in the question and I see almost a no leak as shown below. There were lots of performance improvements in the recent stable version of TF1.15 and TF2.0.

memory use: 0.4033699035644531
memory use: 0.4062042236328125
memory use: 0.4088172912597656

Please check the colab gist here. Thanks!

来源：https://stackoverflow.com/questions/44327803/memory-leak-with-tensorflow

标签

python

memory

memory-leaks

tensorflow

keras