How do I get TensorFlow example queues into proper batches for training?
I\'ve got some images and labels:
IMG_6642.JPG 1
IMG_6643.JPG 2
If you wish to make this input pipeline work, you will need add an asynchronous queue'ing mechanism that generate batches of examples. This is performed by creating a tf.RandomShuffleQueue
or a tf.FIFOQueue
and inserting JPEG images that have been read, decoded and preprocessed.
You can use handy constructs that will generate the Queues and the corresponding threads for running the queues via tf.train.shuffle_batch_join
or tf.train.batch_join
. Here is a simplified example of what this would like. Note that this code is untested:
# Let's assume there is a Queue that maintains a list of all filenames
# called 'filename_queue'
_, file_buffer = reader.read(filename_queue)
# Decode the JPEG images
images = []
image = decode_jpeg(file_buffer)
# Generate batches of images of this size.
batch_size = 32
# Depends on the number of files and the training speed.
min_queue_examples = batch_size * 100
images_batch = tf.train.shuffle_batch_join(
image,
batch_size=batch_size,
capacity=min_queue_examples + 3 * batch_size,
min_after_dequeue=min_queue_examples)
# Run your network on this batch of images.
predictions = my_inference(images_batch)
Depending on how you need to scale up your job, you might need to run multiple independent threads that read/decode/preprocess images and dump them in your example queue. A complete example of such a pipeline is provided in the Inception/ImageNet model. Take a look at batch_inputs
:
https://github.com/tensorflow/models/blob/master/inception/inception/image_processing.py#L407
Finally, if you are working with >O(1000) JPEG images, keep in mind that it is extremely inefficient to individually ready 1000's of small files. This will slow down your training quite a bit.
A more robust and faster solution to convert a dataset of images to a sharded TFRecord
of Example
protos. Here is a fully worked script for converting the ImageNet data set to such a format. And here is a set of instructions for running a generic version of this preprocessing script on an arbitrary directory containing JPEG images.