Distributed tensorflow with multiple gpu

后端 未结 2 551
旧巷少年郎
旧巷少年郎 2021-01-17 05:31

It seems that tf.train.replica_device_setter doesn\'t allow specify gpu which work with.

What I want to do is like below:

 with tf.devi         


        
相关标签:
2条回答
  • 2021-01-17 05:53

    I didn't check previous versions, but in Tensorflow 1.4/1.5, you can specify devices in replica_device_setter(worker_device='job:worker/task:%d/gpu:%d' % (FLAGS.task_index, i), cluster=self.cluster).

    See tensorflow/python/training/device_setter.py line 199-202:

    if ps_ops is None: # TODO(sherrym): Variables in the LOCAL_VARIABLES collection should not be # placed in the parameter server. ps_ops = ["Variable", "VariableV2", "VarHandleOp"]

    Thanks to the code provided by @Yaroslav Bulatov, but his protocol is different from replica_device_setter, and may fail in some cases.

    0 讨论(0)
  • 2021-01-17 06:08

    If your parameters are not sharded, you could do it with a simplified version of replica_device_setter like below:

    def assign_to_device(worker=0, gpu=0, ps_device="/job:ps/task:0/cpu:0"):
        def _assign(op):
            node_def = op if isinstance(op, tf.NodeDef) else op.node_def
            if node_def.op == "Variable":
                return ps_device
            else:
                return "/job:worker/task:%d/gpu:%d" % (worker, gpu)
        return _assign
    
    with tf.device(assign_to_device(1, 2)):
      # this op goes on worker 1 gpu 2
      my_op = tf.ones(())
    
    0 讨论(0)
提交回复
热议问题