Tensorflow OOM on GPU

做~自己de王妃 提交于 2019-11-28 09:13:37

Try to take a look at this

Be careful not to run the evaluation and training binary on the same GPU or else you might run out of memory. Consider running the evaluation on a separate GPU if available or suspending the training binary while running the evaluation on the same GPU.

https://www.tensorflow.org/tutorials/deep_cnn

I resolve this issue by reducing batch_size=52 Only to reduce memory use is to reduce batch_size.

Batch_size depends on your gpu graphics card, size of VRAM, Cache memory etc.

Please prefer this Another Stack Overflow Link

I came across the same problem. I shut down all the anaconda prompt windows and cleared all the python tasks. Reopened an Anaconda prompt window and executed the train.py file. It worked for me the next time. The Anaconda and Python terminals were taking up the memory which doesn't leave space for the training process.

Also, try to reduce the batch size of the training process if the above approach doesn't work.

Hope this helps 👍

When encountering OOM on GPU I believe changing batch size is the right option to try at first.

For different GPU you may need different batch size based on the GPU memory you have.

Recently I faced the similar type of problem, tweaked a lot to do the different type of experiment.

Here is the link to the question (also some tricks are included).

However, while reducing the size of the batch you may find that your training gets slower. So if you have multiple GPU you may use them. To check about your GPU you can write on the terminal,

nvidia-smi

It will show you necessary information about your gpu rack.

I have recently had a very similar error and it was due to accidentally having a training process running in the background while trying to train in a different process. Stopping one fixed the error immediately.

Had same OOM problem running model permutations one after another. It seems after completing one model, then defining and running a new model, that GPU memory is NOT completely cleared of previous model(s), and something is building up in memory and causing eventual OOM error.

Answer from g-eoj to another problem:

keras.backend.clear_session()

should clear the previous model. From https://keras.io/backend/ Destroys the current TF graph and creates a new one. Useful to avoid clutter from old models / layers. After running and saving one model, clear the session, then run next model.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!