Accessing filename from file queue in Tensor Flow

前端未结

关注

 5  1505

I have a directory of images, and a separate file matching image filenames to labels. So the directory of images has files like \'train/001.jpg\' and the labeling file looks lik

相关标签:

5条回答

伪装坚强ぢ

2021-02-20 10:34

Here's what I was able to do.

I first shuffled the filenames and matched the labels to them in Python:

np.random.shuffle(filenames)
labels = [label_dict[f] for f in filenames]

Then created a string_input_producer for the filenames with shuffle off, and a FIFO for labels:

lv = tf.constant(labels)
label_fifo = tf.FIFOQueue(len(filenames),tf.int32, shapes=[[]])
file_fifo = tf.train.string_input_producer(filenames, shuffle=False, capacity=len(filenames))
label_enqueue = label_fifo.enqueue_many([lv])

Then to read the image I could use a WholeFileReader and to get the label I could dequeue the fifo:

reader = tf.WholeFileReader()
image = tf.image.decode_jpeg(value, channels=3)
image.set_shape([128,128,3])
result.uint8image = image
result.label = label_fifo.dequeue()

And generate the batches as follows:

min_fraction_of_examples_in_queue = 0.4
min_queue_examples = int(num_examples_per_epoch *
                         min_fraction_of_examples_in_queue)
num_preprocess_threads = 16
images, label_batch = tf.train.shuffle_batch(
  [result.uint8image, result.label],
  batch_size=FLAGS.batch_size,
  num_threads=num_preprocess_threads,
  capacity=min_queue_examples + 3 * FLAGS.batch_size,
  min_after_dequeue=min_queue_examples)

0 讨论(0)

忘掉有多难

2021-02-20 10:43
I used this:
```
 filename = filename.strip().decode('ascii')
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
走了就别回头了

2021-02-20 10:44
Given that your data is not too large for you to supply the list of filenames as a python array, I'd suggest just doing the preprocessing in Python. Create two lists (same order) of the filenames and the labels, and insert those into either a randomshufflequeue or a queue, and dequeue from that. If you want the "loops infinitely" behavior of the string_input_producer, you could re-run the 'enqueue' at the start of every epoch.

A very toy example:
```
import tensorflow as tf

f = ["f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8"]
l = ["l1", "l2", "l3", "l4", "l5", "l6", "l7", "l8"]

fv = tf.constant(f)
lv = tf.constant(l)

rsq = tf.RandomShuffleQueue(10, 0, [tf.string, tf.string], shapes=[[],[]])
do_enqueues = rsq.enqueue_many([fv, lv])

gotf, gotl = rsq.dequeue()

with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())
    tf.train.start_queue_runners(sess=sess)
    sess.run(do_enqueues)
    for i in xrange(2):
        one_f, one_l = sess.run([gotf, gotl])
        print "F: ", one_f, "L: ", one_l
```
The key is that you're effectively enqueueing pairs of filenames/labels when you do the enqueue, and those pairs are returned by the dequeue.
0 讨论(0)
发布评论:

提交评论
- 加载中...
逝去的感伤

2021-02-20 10:46
Another suggestion is to save your data in TFRecord format. In this case you would be able to save all images and all labels in the same file. For a big number of files it gives a lot of advantages:
- can store data and labels at the same place
- data is allocated at one place (no need to remember various directories)
- if there are many files (images), opening/closing a file is time consuming. Seeking the location of the file from ssd/hdd also takes time
0 讨论(0)
发布评论:

提交评论
- 加载中...

花落未央

2021-02-20 10:52

There is tf.py_func() you could utilize to implement a mapping from file path to label.

files = gfile.Glob(data_pattern)
filename_queue = tf.train.string_input_producer(
files, num_epochs=num_epochs, shuffle=True) #  list of files to read

def extract_label(s):
    # path to label logic for cat&dog dataset
    return 0 if os.path.basename(str(s)).startswith('cat') else 1

def read(filename_queue):
  key, value = reader.read(filename_queue)
  image = tf.image.decode_jpeg(value, channels=3)
  image = tf.cast(image, tf.float32)
  image = tf.image.resize_image_with_crop_or_pad(image, width, height)
  label = tf.cast(tf.py_func(extract_label, [key], tf.int64), tf.int32)
  label = tf.reshape(label, [])

training_data = [read(filename_queue) for _ in range(num_readers)]

...

tf.train.shuffle_batch_join(training_data, ...)

0 讨论(0)