问题
I followed Tensorflow guide to save my string data using:
def _create_string_feature(values):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[values.encode('utf-8')]))
I also used ["tf.string", "FixedLenFeature"]
as my feature original type, and "tf.string"
as my feature convert type.
However, during my training when I run my session and I create iterators, my string feature for a batch size of 2 (for example: ['food fruit', 'cupcake food' ]) would be like below. The problem is that this list is of size 1, and not 2 (batch_size=2), why instances in one batch are stick together rather than being splitted?
[b'food fruit' b'cupcake food']
For my other features which are int or float, they are bumpy arrays of shape (batch_size, feature_len) which are fine but not sure why string features are not separated in a single batch?
Any help would be appreciated.
回答1:
This will convert a BytesList
or bytes_list
string object to a string:
my_bytes_list_object.value[0].decode()
Or, in the case one is extracting the string from a TFRecord Example object:
my_example.features.feature['MyFeatureName'].bytes_list.value[0].decode()
From what I can see, bytes_list
returns a BytesList
object, from which we can read the value
field. This will return a RepeatedScalarContainer
, which operates like a simple list
object. In fact, if you wrap it with the list()
operation it will convert it to a list. However, instead we can just access it as if it were a list and use [0]
to get the zeroth item. The returned item is a bytes
array, which can be converted to a standard str
object with the decode()
method.
来源:https://stackoverflow.com/questions/60177218/how-to-get-original-string-data-back-from-tfrecorddata