How to read data into TensorFlow batches from example queue?

后端 未结 1 1579
青春惊慌失措
青春惊慌失措 2020-12-23 15:27

How do I get TensorFlow example queues into proper batches for training?

I\'ve got some images and labels:

IMG_6642.JPG 1
IMG_6643.JPG 2
相关标签:
1条回答
  • 2020-12-23 15:39

    If you wish to make this input pipeline work, you will need add an asynchronous queue'ing mechanism that generate batches of examples. This is performed by creating a tf.RandomShuffleQueue or a tf.FIFOQueue and inserting JPEG images that have been read, decoded and preprocessed.

    You can use handy constructs that will generate the Queues and the corresponding threads for running the queues via tf.train.shuffle_batch_join or tf.train.batch_join. Here is a simplified example of what this would like. Note that this code is untested:

    # Let's assume there is a Queue that maintains a list of all filenames
    # called 'filename_queue'
    _, file_buffer = reader.read(filename_queue)
    
    # Decode the JPEG images
    images = []
    image = decode_jpeg(file_buffer)
    
    # Generate batches of images of this size.
    batch_size = 32
    
    # Depends on the number of files and the training speed.
    min_queue_examples = batch_size * 100
    images_batch = tf.train.shuffle_batch_join(
      image,
      batch_size=batch_size,
      capacity=min_queue_examples + 3 * batch_size,
      min_after_dequeue=min_queue_examples)
    
    # Run your network on this batch of images.
    predictions = my_inference(images_batch)
    

    Depending on how you need to scale up your job, you might need to run multiple independent threads that read/decode/preprocess images and dump them in your example queue. A complete example of such a pipeline is provided in the Inception/ImageNet model. Take a look at batch_inputs:

    https://github.com/tensorflow/models/blob/master/inception/inception/image_processing.py#L407

    Finally, if you are working with >O(1000) JPEG images, keep in mind that it is extremely inefficient to individually ready 1000's of small files. This will slow down your training quite a bit.

    A more robust and faster solution to convert a dataset of images to a sharded TFRecord of Example protos. Here is a fully worked script for converting the ImageNet data set to such a format. And here is a set of instructions for running a generic version of this preprocessing script on an arbitrary directory containing JPEG images.

    0 讨论(0)
提交回复
热议问题