How to run Tensorflow Estimator on multiple GPUs with data parallelism

后端 未结 5 809
青春惊慌失措
青春惊慌失措 2021-01-31 06:17

I have a standard tensorflow Estimator with some model and want to run it on multiple GPUs instead of just one. How can this be done using data parallelism?

I searched

5条回答
  •  臣服心动
    2021-01-31 06:58

    I think tf.contrib.estimator.replicate_model_fn is a cleaner solution. The following is from tf.contrib.estimator.replicate_model_fn documentation,

    ...
    def model_fn(...):  # See `model_fn` in `Estimator`.
      loss = ...
      optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
      optimizer = tf.contrib.estimator.TowerOptimizer(optimizer)
      if mode == tf.estimator.ModeKeys.TRAIN:
        #  See the section below on `EstimatorSpec.train_op`.
        return EstimatorSpec(mode=mode, loss=loss,
                             train_op=optimizer.minimize(loss))
    
      #  No change for `ModeKeys.EVAL` or `ModeKeys.PREDICT`.
      return EstimatorSpec(...)
    ...
    classifier = tf.estimator.Estimator(
      model_fn=tf.contrib.estimator.replicate_model_fn(model_fn))
    

    What you need to do is to wrap optimizer with tf.contrib.estimator.TowerOptimize and model_fn() with tf.contrib.estimator.replicate_model_fn(). I followed the description and make an TPU squeezenet model work on a machine with 4 GPUs. My modifications here.

提交回复
热议问题