How to generate/read sparse sequence labels for CTC loss within Tensorflow?

问题

From a list of word images with their transcriptions, I am trying to create and read sparse sequence labels (for tf.nn.ctc_loss) using a tf.train.slice_input_producer, avoiding

serializing pre-packaged training data to disk in TFRecord format
the apparent limitations of tf.py_func,
any unnecessary or premature padding, and
reading the entire data set to RAM.

The main issue seems to be converting a string to the sequence of labels (a SparseTensor) needed for tf.nn.ctc_loss.

For example, with the character set in the (ordered) range [A-Z], I'd want to convert the text label string "BAD" to the sequence label class list [1,0,3].

Each example image I want to read contains the text as part of the filename, so it's straightforward to extract and do the conversion in straight up python. (If there's a way to do it within TensorFlow computations, I haven't found it.)

Several previous questions glance at these issues, but I haven't been able to integrate them successfully. For example,

Tensorflow read images with labels shows a straightforward framework with discrete, categorical labels, which I've begun with as a model.
How to load sparse data with TensorFlow? nicely explains an approach for loading sparse data, but assumes pre-packaging tf.train.Examples.

Is there a way to integrate these approaches?

Another example (SO question #38012743) shows how I might delay the conversion from string to list until after dequeuing the filename for decoding, but it relies on tf.py_func, which has caveats. (Should I worry about them?)

I recognize that "SparseTensors don't play well with queues" (per the tf docs), so it might be necessary to do some voodoo on the result (serialization?) before batching, or even rework where the computation happens; I'm open to that.

Following MarvMind's outline, here is a basic framework with the computations I want (iterate over lines containing example filenames, extract each label string and convert to sequence), but I have not successfully determined the "Tensorflow" way to do it.

Thank you for the right "tweak", a more appropriate strategy for my goals, or an indication that tf.py_func won't wreck training efficiency or something else downstream (e.g.,loading trained models for future use).

EDIT (+7 hours) I found the missing ops to patch things up. While still need to verify this connects with CTC_Loss downstream, I have checked that the edited version below correctly batches and reads in the images and sparse tensors.

out_charset="ABCDEFGHIJKLMNOPQRSTUVWXYZ"

def input_pipeline(data_filename):
    filenames,seq_labels = _get_image_filenames_labels(data_filename)
    data_queue = tf.train.slice_input_producer([filenames, seq_labels])
    image,label = _read_data_format(data_queue)
    image,label = tf.train.batch([image,label],batch_size=2,dynamic_pad=True)
    label = tf.deserialize_many_sparse(label,tf.int32)
    return image,label

def _get_image_filenames_labels(data_filename):
    filenames = []
    labels = []
    with open(data_filename)) as f:
        for line in f:
            # Carve out the ground truth string and file path from 
            # lines formatted like:
            # ./241/7/158_NETWORK_51375.jpg 51375
            filename = line.split(' ',1)[0][2:] # split off "./" and number
            # Extract label string embedded within image filename
            # between underscores, e.g. NETWORK
            text = os.path.basename(filename).split('_',2)[1]
            # Transform string text to sequence of indices using charset, e.g.,
            # NETWORK -> [13, 4, 19, 22, 14, 17, 10]
            indices = [[i] for i in range(0,len(text))]
            values = [out_charset.index(c) for c in list(text)]
            shape = [len(text)]
            label = tf.SparseTensorValue(indices,values,shape)
            label = tf.convert_to_tensor_or_sparse_tensor(label)
            label = tf.serialize_sparse(label) # needed for batching
            # Add data to lists for conversion
            filenames.append(filename)
            labels.append(label)
    filenames = tf.convert_to_tensor(filenames)
    labels = tf.convert_to_tensor_or_sparse_tensor(labels)
    return filenames, labels

def _read_data_format(data_queue):
    label = data_queue[1]
    raw_image = tf.read_file(data_queue[0])
    image = tf.image.decode_jpeg(raw_image,channels=1)
    return image,label

回答1:

The key ideas seem to be creating a SparseTensorValue from the data wanted, pass it through tf.convert_to_tensor_or_sparse_tensor and then (if you want to batch the data) serialize it with tf.serialize_sparse. After batching, you can restore the values with tf.deserialize_many_sparse.

Here's the outline. Create the sparse values, convert to tensor, and serialize:

indices = [[i] for i in range(0,len(text))]
values = [out_charset.index(c) for c in list(text)]
shape = [len(text)]
label = tf.SparseTensorValue(indices,values,shape)
label = tf.convert_to_tensor_or_sparse_tensor(label)
label = tf.serialize_sparse(label) # needed for batching

Then, you can do the batching and deserialize:

image,label = tf.train.batch([image,label],dynamic_pad=True)
label = tf.deserialize_many_sparse(label,tf.int32)

来源：https://stackoverflow.com/questions/42578532/how-to-generate-read-sparse-sequence-labels-for-ctc-loss-within-tensorflow

标签

python

tensorflow

recurrent-neural-network