I have a standard tensorflow Estimator with some model and want to run it on multiple GPUs instead of just one. How can this be done using data parallelism?
I searched
The standard example is: https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/contrib/learn/python/learn/estimators/estimator.py
One way to run it data-parallel would be to loop over available GPU devices, and send chunks of your batch to copied versions of your model (all done within your model_fn), then merge the results.