问题
For instance, after I have created my operations, fed the batch data through the operation and run the operation, does tf.train.batch automatically feed in another batch of data to the session?
I ask this because tf.train.batch has an attribute of allow_smaller_final_batch
which makes it possible for the final batch to be loaded as a size lesser than the indicated batch size. Does this mean even without a loop, the next batch could be automatically fed? From the tutorial codes I am rather confused. When I load a single batch, I get literally a single batch size of shape [batch_size, height, width, num_channels], but the documentation says it Creates batches of tensors in tensors.
Also, when I read the tutorial code in the tf-slim walkthrough tutorial, where there is a function called load_batch, there are only 3 tensors returned: images, images_raw, labels
. Where are 'batches' of data as explained in the documentation?
Thank you for your help.
回答1:
... does tf.train.batch automatically feeds in another batch of data to the session?
No. Nothing happens automatically. You must call sess.run(...)
again to load a new batch.
Does this mean even without a loop, the next batch could be automatically fed?
No. tf.train.batch(..)
will always load batch_size
tensors. If you have for example 100 images and a batch_size=30
then you will have 3*30 batches as in you can call sess.run(batch)
three times before the input queue will start from the beginning (or stop if epoch=1
). This means that you miss out 100-3*30=10
samples from training. In case you do not want to miss them you can do tf.train.batch(..., allow_smaller_final_batch=True)
so now you will have 3x 30-sample-batches and 1x 10-sample-batch before the input queue will restart.
Let me also elaborate with a code sample:
queue = tf.train.string_input_producer(filenames,
num_epochs=1) # only iterate through all samples in dataset once
reader = tf.TFRecordReader() # or any reader you need
_, example = reader.read(queue)
image, label = your_conversion_fn(example)
# batch will now load up to 100 image-label-pairs on sess.run(...)
# most tf ops are tuned to work on batches
# this is faster and also gives better result on e.g. gradient calculation
batch = tf.train.batch([image, label], batch_size=100)
with tf.Session() as sess:
# "boilerplate" code
sess.run([
tf.local_variables_initializer(),
tf.global_variables_initializer(),
])
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
# in most cases coord.should_stop() will return True
# when there are no more samples to read
# if num_epochs=0 then it will run for ever
while not coord.should_stop():
# will start reading, working data from input queue
# and "fetch" the results of the computation graph
# into raw_images and raw_labels
raw_images, raw_labels = sess.run([images, labels])
finally:
coord.request_stop()
coord.join(threads)
回答2:
You need to call sess.run and pass the batch to it everytime when you want to load the next batch. See the code below.
img = [0,1,2,3,4,5,6,7,8]
lbl = [0,1,2,3,4,5,6,7,8]
images = tf.convert_to_tensor(img)
labels = tf.convert_to_tensor(lbl)
input_queue = tf.train.slice_input_producer([images,labels])
sliced_img = input_queue[0]
sliced_lbl = input_queue[1]
img_batch, lbl_batch = tf.train.batch([sliced_img,sliced_lbl], batch_size=3)
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(0,3): #batch size
image_batch,label_batch = sess.run([img_batch,lbl_batch ])
print(image_batch, label_batch)
coord.request_stop()
coord.join(threads)
the answer would be something like this:
[4,1,8] [4,1,8]
[2,3,7] [2,3,7]
[2,6,8] [2,6,8]
来源:https://stackoverflow.com/questions/41673889/tensorflow-does-tf-train-batch-automatically-load-the-next-batch-when-the-batch