I am trying to train my model on a GPU instead of a CPU on an AWS p2.xlarge instance from my Jupyter Notebook. I am using tensorflow-gpu backend (only tensorflow-gpu
That happens because you're using LSTM layers.
Tensorflow's implementation for LSTM layers is not that great. The reason is probably that recurrent calculations are not parallel calculations, and GPUs are great for parallel processing.
I confirmed that by my own experience:
This article about using GPUs and tensorflow also confirms that:
You may try using the new CuDNNLSTM, which seems prepared specially for using GPUs.
I never tested it, but you'll most probably get a better performance with this.
Another thing that I haven't tested, and I'm not sure it's designed for that reason, but I suspect it is: you can put unroll=True
in your LSTM layers. With that, I suspect the recurrent calculations will be transformed in parallel ones.