Getting good mixing with many input datafiles in tensorflow

后端 未结 2 1937
无人及你
无人及你 2021-01-12 08:05

I\'m working with tensorflow hoping to train a deep CNN to do move prediction for the game Go. The dataset I created consists of 100,000 binary data files, where each datafi

相关标签:
2条回答
  • 2021-01-12 08:23

    In your case it is not a problem to do some preprocessing and create one file out of all the files you have. For this type of games, where the history is not important and the position determines everything your dataset can consist just from position -> next_move.


    For a more broad case TF provides everything to allow the shuffling you want. There are two types shuffling which serve different purposes and shuffle different things:

    • tf.train.string_input_producer shuffle: Boolean. If true, the strings are randomly shuffled within each epoch.. So if you have a few files ['file1', 'file2', ..., 'filen'] this randomly selects a file from this list. If case of false, the files follow one after each other.
    • tf.train.shuffle_batch Creates batches by randomly shuffling tensors. So it takes batch_size tensors from your queue (you will need to create a queue with tf.train.start_queue_runners ) and shuffles them.
    0 讨论(0)
  • 2021-01-12 08:35

    Yes - what you want is to use a combination of two things. (Note that this answer was written for TensorFlow v1, and some of the functionality has been replaced by the new tf.data pipelines; I've updated the answers to point to the v1 compat versions of things, but if you're coming to this answer for new code, please consult tf.data instead.)

    First, randomly shuffle the order in which you input your datafiles, by reading from them using a tf.train.string_input_producer with shuffle=True that feeds into whatever input method you use (if you can put your examples into tf.Example proto format, that's easy to use with parse_example). To be very clear, you put the list of filenames in the string_input_producer and then read them with another method such as read_file, etc.

    Second, you need to mix at a finer granularity. You can accomplish this by feeding the input examples into a tf.train.shuffle_batch node with a large capacity and large value of min_after_dequeue. One particularly nice way is to use a shuffle_batch_join that receives input from multiple files, so that you get a lot of mixing. Set the capacity of the batch big enough to mix well without exhausting your RAM. Tens of thousands of examples usually works pretty well.

    Keep in mind that the batch functions add a QueueRunner to the QUEUE_RUNNERS collection, so you need to run tf.train.start_queue_runners()

    0 讨论(0)
提交回复
热议问题