How to decode Unicode string in Tensorflow's graph pipeline

左心房为你撑大大i 提交于 2019-12-10 13:35:33

问题


I have created a tfRecord file to store data. I have to store Hindi text so, I have saved it in the bytes using string.encode('utf-8').

But, I am stuck at the time of reading the data. I am reading data with help of tensorflow dataset APIs. I know that i can decode it using string.decode('utf-8'), but this is not what I am looking for. I want some solution through which i can decode my byte string back to Unicode string inside graph only.

I have tried as_text, decoding_raw but they are giving error.

My parse(map) function:

def _parse_function(tfrecord_serialized):
    features={'float': tf.FixedLenSequenceFeature([], 
    tf.float32,allow_missing=True),
         'byte': tf.FixedLenFeature([], tf.string),
          'int': tf.FixedLenSequenceFeature([], 
    tf.int64,allow_missing=True)}
    parsed_features = tf.parse_single_example(tfrecord_serialized, 
    features)
    return parsed_features['float'],parsed_features['byte'], parsed_features['int']`

I am reading my tfRecord file as follows.

    filenames = ["data.tfrecord"] ## List of filename,Multiple filename can be provided together.
    dataset = tf.data.TFRecordDataset(filenames)
    dataset = dataset.map(_parse_function)
    iterator = dataset.make_initializable_iterator()`

    t1,t2,t3 = iterator.get_next()
    sess = tf.Session()
    sess.run(iterator.initializer)
    a,b,c = sess.run([t1,t2,t3])
    print(a,b,c)
    b.decode('utf-8')`

On b.decode I am getting output perfectly fine, I wish to do it inside in the graph for the obvious reason, coming back from tf to python and going back again is generally is not a good idea.

来源:https://stackoverflow.com/questions/52578654/how-to-decode-unicode-string-in-tensorflows-graph-pipeline

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!