Tensorflow Dataset API: input pipeline with parquet files
问题 I am trying to design an input pipeline with Dataset API. I am working with parquet files. What is a good way to add them to my pipeline? 回答1: We have released Petastorm, an open source library that allows you to use Apache Parquet files directly via Tensorflow Dataset API. Here is a small example: with Reader('hdfs://.../some/hdfs/path') as reader: dataset = make_petastorm_dataset(reader) iterator = dataset.make_one_shot_iterator() tensor = iterator.get_next() with tf.Session() as sess: