In distributed TensorFlow, is it possible to share the same queue across different workers?

后端 未结 1 2031
北荒
北荒 2021-02-09 02:23

In TensorFlow, I want to have a filename queue shared across different workers on different machines, such that each machine can get a subset of files to train. I searched a lot

相关标签:
1条回答
  • 2021-02-09 02:52

    It is possible to share the same queue across workers, by setting the optional shared_name argument when creating the queue. Just as with tf.Variable objects, you can place the queue on any device that can be accessed from different workers. For example:

    with tf.device("/job:ps/task:0"):  # Place queue on parameter server.
      q = tf.FIFOQueue(..., shared_name="shared_queue")
    

    A few notes:

    • The value for shared_name must be unique to the particular queue that you are sharing. Unfortunately, the Python API does not currently use scoping or automatic name uniqification to make this easier, so you will have to ensure this manually.

    • You do not need to place the queue on a parameter server. One possible configuration would be to set up an additional "input job" (e.g. "/job:input") containing a set of tasks that perform pre-processing, and export a shared queue for the workers to use.

    0 讨论(0)
提交回复
热议问题