Tensorflow Cross Device Communication

前端 未结 1 1963
孤城傲影
孤城傲影 2020-12-28 10:37

As the tensorflow paper states, Tensorflow\' cross-device communication is achieved by adding \"receive node\" and \"send node\" into devices.

From my understanding,

相关标签:
1条回答
  • 2020-12-28 11:24

    In TensorFlow, cross-device communication is achieved using the Rendezvous interface, which has multiple different implementations, depending on the deployment. The comment on that interface describes the general idea:

    // A Rendezvous is an abstraction for passing a Tensor
    // from a producer to a consumer, where the consumer may safely
    // request the Tensor before or after it has been produced.  A
    // producer never blocks when using a Rendezvous.  A consumer has the
    // choice of making a blocking call or providing a callback: in either
    // case, the consumer receives the Tensor as soon as it is available.
    

    As you noted in your question, TensorFlow represents communication in the dataflow graph using Send and Recv ops that are added to the graph automatically when the graph is partitioned across devices. For each edge that has a source and destination on different devices, the graph partitioner inserts a pair of Send and Recv ops that share the same "rendezvous key" (an automatically generated string name that is used as a key in the rendezvous' index of pending tensors to be communicated). The implementation of the Send op is simple: it calls Rendezvous::Send(), passing in its rendezvous key and single input tensor, then returns immediately without blocking. The implementation of the Recv op is slightly more complicated: it registers a callback to be called when the tensor with the given key becomes available. That callback is responsible for "producing" the output of the Recv op, and unblocking subsequent computation.

    The Rendezvous implementations perform the actual work of transferring the data:

    • IntraProcessRendezvous handles the transfer of data between devices in the same process. In the (unlikely) event that the transfer is between two CPU devices in the same process, the transfer can be achieved by a simple Tensor assignment. Otherwise, TensorFlow kicks off a device-specific DMA routine to transfer data between a CPU and GPU device.

    • The BaseRemoteRendezvous class and its subclasses handle cross-device communication in the case that the send and receiver can be in different processes. The main implementation of this class is RpcRemoteRendezvous, which uses gRPC to handle the remote transfers.

    0 讨论(0)
提交回复
热议问题