How to get original string data back from TFRecordData

问题

I followed Tensorflow guide to save my string data using:

def _create_string_feature(values):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[values.encode('utf-8')]))

I also used ["tf.string", "FixedLenFeature"] as my feature original type, and "tf.string" as my feature convert type.

However, during my training when I run my session and I create iterators, my string feature for a batch size of 2 (for example: ['food fruit', 'cupcake food' ]) would be like below. The problem is that this list is of size 1, and not 2 (batch_size=2), why instances in one batch are stick together rather than being splitted?

[b'food fruit' b'cupcake food']

For my other features which are int or float, they are bumpy arrays of shape (batch_size, feature_len) which are fine but not sure why string features are not separated in a single batch?

Any help would be appreciated.

回答1:

This will convert a BytesList or bytes_list string object to a string:

my_bytes_list_object.value[0].decode()

Or, in the case one is extracting the string from a TFRecord Example object:

my_example.features.feature['MyFeatureName'].bytes_list.value[0].decode()

From what I can see, bytes_list returns a BytesList object, from which we can read the value field. This will return a RepeatedScalarContainer, which operates like a simple list object. In fact, if you wrap it with the list() operation it will convert it to a list. However, instead we can just access it as if it were a list and use [0] to get the zeroth item. The returned item is a bytes array, which can be converted to a standard str object with the decode() method.

来源：https://stackoverflow.com/questions/60177218/how-to-get-original-string-data-back-from-tfrecorddata

标签

python

string

tensorflow

tfrecord