Keras shows no Improvements to training speed with GPU (partial GPU usage?!)

前端 未结 3 1867
独厮守ぢ
独厮守ぢ 2021-02-07 20:49

I am trying to train my model on a GPU instead of a CPU on an AWS p2.xlarge instance from my Jupyter Notebook. I am using tensorflow-gpu backend (only tensorflow-gpu

相关标签:
3条回答
  • 2021-02-07 21:05

    That happens because you're using LSTM layers.

    Tensorflow's implementation for LSTM layers is not that great. The reason is probably that recurrent calculations are not parallel calculations, and GPUs are great for parallel processing.

    I confirmed that by my own experience:

    • Got terrible speed using LSTMs in my model
    • Decided to test the model removing all LSTMs (got a pure convolutional model)
    • The resulting speed was simply astonishing!!!

    This article about using GPUs and tensorflow also confirms that:

    • http://minimaxir.com/2017/07/cpu-or-gpu/

    A possible solution?

    You may try using the new CuDNNLSTM, which seems prepared specially for using GPUs.

    I never tested it, but you'll most probably get a better performance with this.

    Another thing that I haven't tested, and I'm not sure it's designed for that reason, but I suspect it is: you can put unroll=True in your LSTM layers. With that, I suspect the recurrent calculations will be transformed in parallel ones.

    0 讨论(0)
  • 2021-02-07 21:13

    Try to use some bigger value for batch_size in model.fit, because the default is 32. Increase it until you get 100% CPU utilization.

    Following suggestion from @dgumo, you can also put your data into /run/shm. This is an in-memory file system, which allows to access data in fastest possible way. Alternatively, you can ensure that your data resides at least on SSD. For example in /tmp.

    0 讨论(0)
  • 2021-02-07 21:13

    The bottleneck in your case is transferring data to and from the GPU. The best way to speed up your computation (and maximize your GPU usage) is to load as much of your data as your memory can hold at once. Since you have plenty of memory, you can put all your data at once, by doing:

    model.fit(X_np, y_np, epochs=100, validation_split=0.25, batch_size=X_np.shape[0])
    

    (You should also probably increase the number of epochs when doing this).

    Note however that there are advantages to minibatching (e.g. better handling of local minima), so you should probably consider choosing a batch_size somewhere in between.

    0 讨论(0)
提交回复
热议问题