Keras shows no Improvements to training speed with GPU (partial GPU usage?!)

前端未结

关注

 3  1874

I am trying to train my model on a GPU instead of a CPU on an AWS p2.xlarge instance from my Jupyter Notebook. I am using tensorflow-gpu backend (only tensorflow-gpu

相关标签:

3条回答

梦毁少年i

2021-02-07 21:05
That happens because you're using LSTM layers.

Tensorflow's implementation for LSTM layers is not that great. The reason is probably that recurrent calculations are not parallel calculations, and GPUs are great for parallel processing.

I confirmed that by my own experience:
- Got terrible speed using LSTMs in my model
- Decided to test the model removing all LSTMs (got a pure convolutional model)
- The resulting speed was simply astonishing!!!
This article about using GPUs and tensorflow also confirms that:
- http://minimaxir.com/2017/07/cpu-or-gpu/
A possible solution?

You may try using the new CuDNNLSTM, which seems prepared specially for using GPUs.

I never tested it, but you'll most probably get a better performance with this.

Another thing that I haven't tested, and I'm not sure it's designed for that reason, but I suspect it is: you can put unroll=True in your LSTM layers. With that, I suspect the recurrent calculations will be transformed in parallel ones.
0 讨论(0)
发布评论:

提交评论
- 加载中...
清歌不尽

2021-02-07 21:13

Try to use some bigger value for batch_size in model.fit, because the default is 32. Increase it until you get 100% CPU utilization.

Following suggestion from @dgumo, you can also put your data into /run/shm. This is an in-memory file system, which allows to access data in fastest possible way. Alternatively, you can ensure that your data resides at least on SSD. For example in /tmp.

0 讨论(0)
发布评论:

提交评论
- 加载中...
忘了有多久

2021-02-07 21:13
The bottleneck in your case is transferring data to and from the GPU. The best way to speed up your computation (and maximize your GPU usage) is to load as much of your data as your memory can hold at once. Since you have plenty of memory, you can put all your data at once, by doing:
```
model.fit(X_np, y_np, epochs=100, validation_split=0.25, batch_size=X_np.shape[0])
```
(You should also probably increase the number of epochs when doing this).

Note however that there are advantages to minibatching (e.g. better handling of local minima), so you should probably consider choosing a batch_size somewhere in between.
0 讨论(0)
发布评论:

提交评论
- 加载中...

Keras shows no Improvements to training speed with GPU (partial GPU usage?!)

A possible solution?