tensorflow-datasets

Iterating over a Dataset TF 2.0 with for loop

独自空忆成欢 提交于 2019-12-20 03:14:43
问题 This problem is about how to iterate over a TF Dataset given that make_initializable_iterator() is deprecated. I read a data set with the function below: def read_dataset_new(filename, target='delay'): ds = tf.data.TFRecordDataset(filename) ds = ds.map(lambda buf: parse(buf, target=target)) ds = ds.batch(1) return ds Then I want to iterate over the data set. I have been using: https://www.tensorflow.org/api_docs/python/tf/data/Dataset#make_initializable_iterator with tf.compat.v1.Session() as

tf.data.Dataset: how to get the dataset size (number of elements in a epoch)?

青春壹個敷衍的年華 提交于 2019-12-19 16:09:08
问题 Let's say I have defined a dataset in this way: filename_dataset = tf.data.Dataset.list_files("{}/*.png".format(dataset)) how can I get the number of elements that are inside the dataset (hence, the number of single elements that compose an epoch)? I know that tf.data.Dataset already knows the dimension of the dataset, because the repeat() method allows repeating the input pipeline for a specified number of epochs. So it must be a way to get this information. 回答1: tf.data.Dataset.list_files

Using tensorflow's Dataset pipeline, how do I *name* the results of a `map` operation?

女生的网名这么多〃 提交于 2019-12-19 08:06:25
问题 I have the map function below (runnable example), which inputs a string and outputs a string and an integer . in tf.data.Dataset.from_tensor_slices I named the original input 'filenames' . But when I return the values from the map function map_element_counts I can only return a tuple (returning a dictionary generates an exception). Is there a way to name the 2 elements returned from my map_element_counts function? import tensorflow as tf filelist = ['fileA_6', 'fileB_10', 'fileC_7'] def map

How to use tf.data's initializable iterators within a tf.estimator's input_fn?

孤街醉人 提交于 2019-12-19 05:47:47
问题 I would like to manage my training with a tf.estimator.Estimator but have some trouble to use it alongside the tf.data API. I have something like this: def model_fn(features, labels, params, mode): # Defines model's ops. # Initializes with tf.train.Scaffold. # Returns an tf.estimator.EstimatorSpec. def input_fn(): dataset = tf.data.TextLineDataset("test.txt") # map, shuffle, padded_batch, etc. iterator = dataset.make_initializable_iterator() return iterator.get_next() estimator = tf.estimator

Tensorflow Dataset API: dataset.batch(n).prefetch(m) prefetches m batches or samples?

假装没事ソ 提交于 2019-12-19 02:54:09
问题 If I use dataset.batch(n).prefetch(m), m batches or m samples will be prefetched? 回答1: The Dataset.prefetch(m) transformation prefetches m elements of its direct input. In this case, since its direct input is dataset.batch(n) and each element of that dataset is a batch (of n elements), it will prefetch m batches . 来源: https://stackoverflow.com/questions/49707062/tensorflow-dataset-api-dataset-batchn-prefetchm-prefetches-m-batches-or-sam

How do I create padded batches in Tensorflow for tf.train.SequenceExample data using the DataSet API?

三世轮回 提交于 2019-12-18 13:06:12
问题 For training an LSTM model in Tensorflow , I have structured my data into a tf.train.SequenceExample format and stored it into a TFRecord file . I would now like to use the new DataSet API to generate padded batches for training . In the documentation there is an example for using padded_batch, but for my data I can't figure out what the value of padded_shapes should be. For reading the TFrecord file into the batches I have written the following Python code: import math import tensorflow as

How to create tf.data.dataset from directories of tfrecords?

空扰寡人 提交于 2019-12-18 12:42:07
问题 My dataset has different directories and each directory is corresponding to one class. There are different numbers of .tfrecords in each directory. My question is that how can I sample 5 images (each .tfrecord file corresponds to one image) from each directory? My other question is that how can I sample 5 of these directories and then sample 5 images from each. I just want to do it with tf.data.dataset. So I want to have a dataset from which I get an iterator and that iterator.next() gives me

Tensorflow v1.10: store images as byte strings or per channel?

心已入冬 提交于 2019-12-18 07:08:18
问题 Context It is known that, at the moment, TF's Record documentation leaves something to be desired. My question is in regards to what is optimal for storing: a sequence, its per-element class probabilities, and some (context?) information (e.g. name of the sequence) as a TF Record. Namely, this questions considers storing the sequence and class probabilities as channels vs as a byte string and whether or not the meta information should go in as features of a tf.train.Example or as the context

How do I split Tensorflow datasets?

一个人想着一个人 提交于 2019-12-17 16:17:41
问题 I have a tensorflow dataset based on one .tfrecord file. How do I split the dataset into test and train datasets? E.g. 70% Train and 30% test? Edit: My Tensorflow Version: 1.8 I've checked, there is no "split_v" function as mentioned in the possible duplicate. Also I am working with a tfrecord file. 回答1: You may use Dataset.take() and Dataset.skip() : train_size = int(0.7 * DATASET_SIZE) val_size = int(0.15 * DATASET_SIZE) test_size = int(0.15 * DATASET_SIZE) full_dataset = tf.data

parallelising tf.data.Dataset.from_generator

走远了吗. 提交于 2019-12-17 09:22:57
问题 I have a non trivial input pipeline that from_generator is perfect for... dataset = tf.data.Dataset.from_generator(complex_img_label_generator, (tf.int32, tf.string)) dataset = dataset.batch(64) iter = dataset.make_one_shot_iterator() imgs, labels = iter.get_next() Where complex_img_label_generator dynamically generates images and returns a numpy array representing a (H, W, 3) image and a simple string label. The processing not something I can represent as reading from files and tf.image