This is a follow-up to my question posted here: Memory error with larger images when running convolutional neural network using TensorFlow on AWS instance g2.2xlarge
It would be good, if you can upload your code or at least a minimal example in order to see what is going on. Just looking at these numbers, it seems allow_growth
is working as it should, that is, it is only allocating the amount of memory that it actually needs (the 2.148 GiB you calculated above).
Also can you provide the full console output of the error you are getting. My guess is, that you are confusing a non fatal warning message from the TF resource allocator for the actual error that's causing your program to fail.
Is this similar to the message that you are seeing?
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_1_bfc) ran out of memory trying to allocate 2.55GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
Because this is just a warning that you may ignore unless you want to optimize the runtime performance of your code. It is not something that will cause your program to fail.
Looking the error log it looks like or you are running out of GPU memory or the tensor is not initiated at that point. You can try to insert Tensor::IsInitialized before the line that start the problem (99) to make sure it's the GPU, if it is, you may have some code still running in the GPU, from previous tries, make sure that is not happening. There is two discussions that I think may be relevant for your problem, here: https://github.com/tensorflow/tensorflow/issues/7025 and here: https://github.com/aymericdamien/TensorFlow-Examples/issues/38 Good luck