Tensorflow Dataset API: input pipeline with parquet files

三世轮回 提交于 2019-12-12 11:18:27

问题


I am trying to design an input pipeline with Dataset API. I am working with parquet files. What is a good way to add them to my pipeline?


回答1:


We have released Petastorm, an open source library that allows you to use Apache Parquet files directly via Tensorflow Dataset API.

Here is a small example:

   with Reader('hdfs://.../some/hdfs/path') as reader:
        dataset = make_petastorm_dataset(reader)
        iterator = dataset.make_one_shot_iterator()
        tensor = iterator.get_next()
        with tf.Session() as sess:
            sample = sess.run(tensor)
            print(sample.id)


来源:https://stackoverflow.com/questions/51732446/tensorflow-dataset-api-input-pipeline-with-parquet-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!