In TensorFlow, I want to have a filename queue shared across different workers on different machines, such that each machine can get a subset of files to train. I searched a lot
It is possible to share the same queue across workers, by setting the optional shared_name
argument when creating the queue. Just as with tf.Variable
objects, you can place the queue on any device that can be accessed from different workers. For example:
with tf.device("/job:ps/task:0"): # Place queue on parameter server.
q = tf.FIFOQueue(..., shared_name="shared_queue")
A few notes:
The value for shared_name
must be unique to the particular queue that you are sharing. Unfortunately, the Python API does not currently use scoping or automatic name uniqification to make this easier, so you will have to ensure this manually.
You do not need to place the queue on a parameter server. One possible configuration would be to set up an additional "input job" (e.g. "/job:input"
) containing a set of tasks that perform pre-processing, and export a shared queue for the workers to use.