Input multiple files into Tensorflow dataset

余生长醉 提交于 2019-12-03 17:41:28
mikkola

You have the right idea using tf.data.TextLineDataset. What your current implementation does however, is yield every line of every file in its input tensor of filenames except the first one of the first file. The way you are skipping the first line now only affects the very first line in the very first file. In the second file, the first line is not skipped.

Based on the example on the Datasets guide, you should adapt your code to first create a regular Dataset from the filenames, then run flat_map on each filename to read it using TextLineDataset, simultaneously skipping the first row:

d = tf.data.Dataset.from_tensor_slices(filenames) 
# get dataset from each file, skipping first line of each file
d = d.flat_map(lambda filename: tf.data.TextLineDataset(filename).skip(1))
d = d.map(_parse_line) # And whatever else you need to do

Here, flat_map creates a new dataset from every element of the original dataset by reading the contents of the file and skipping the first line.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!