Data Parallelism for RNN in tensorflow

问题

Recently, I have used tensorflow to develop an NMT system. I tried to train this system on multi-gpus using data-parallelism method to speed up it. I follow the standard data-parallelism way widely used in tensorflow. For example, if we want to run it on a 8-gpus computer. First, we construct a large batch which contains 8 times the size of batch used in a single GPU. Then we split this large batch equally to 8 mini-batch. We separately train them in different gpus. In the end, we collect gradients to update paramters. But I find when I used dynamic_rnn, the average time taken by one iteration in 8 gpus is two times long of that taken by one iteration trained in a single gpu. I make sure the batch size for each gpu is the same. Who has a better way to speed up the training of RNN in tensorflow?

来源：https://stackoverflow.com/questions/47530023/data-parallelism-for-rnn-in-tensorflow

标签

tensorflow

parallel-processing

rnn

multi-gpu

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!