tensorflow-datasets | 易学教程

Iterating over a Dataset TF 2.0 with for loop

阅读更多关于 Iterating over a Dataset TF 2.0 with for loop

问题 This problem is about how to iterate over a TF Dataset given that make_initializable_iterator() is deprecated. I read a data set with the function below: def read_dataset_new(filename, target='delay'): ds = tf.data.TFRecordDataset(filename) ds = ds.map(lambda buf: parse(buf, target=target)) ds = ds.batch(1) return ds Then I want to iterate over the data set. I have been using: https://www.tensorflow.org/api_docs/python/tf/data/Dataset#make_initializable_iterator with tf.compat.v1.Session() as

tf.data.Dataset: how to get the dataset size (number of elements in a epoch)?

阅读更多关于 tf.data.Dataset: how to get the dataset size (number of elements in a epoch)?

问题 Let's say I have defined a dataset in this way: filename_dataset = tf.data.Dataset.list_files("{}/*.png".format(dataset)) how can I get the number of elements that are inside the dataset (hence, the number of single elements that compose an epoch)? I know that tf.data.Dataset already knows the dimension of the dataset, because the repeat() method allows repeating the input pipeline for a specified number of epochs. So it must be a way to get this information. 回答1: tf.data.Dataset.list_files

Using tensorflow's Dataset pipeline, how do I name the results of a `map` operation?

阅读更多关于 Using tensorflow's Dataset pipeline, how do I *name* the results of a `map` operation?

问题 I have the map function below (runnable example), which inputs a string and outputs a string and an integer . in tf.data.Dataset.from_tensor_slices I named the original input 'filenames' . But when I return the values from the map function map_element_counts I can only return a tuple (returning a dictionary generates an exception). Is there a way to name the 2 elements returned from my map_element_counts function? import tensorflow as tf filelist = ['fileA_6', 'fileB_10', 'fileC_7'] def map

How to use tf.data's initializable iterators within a tf.estimator's input_fn?

阅读更多关于 How to use tf.data's initializable iterators within a tf.estimator's input_fn?

问题 I would like to manage my training with a tf.estimator.Estimator but have some trouble to use it alongside the tf.data API. I have something like this: def model_fn(features, labels, params, mode): # Defines model's ops. # Initializes with tf.train.Scaffold. # Returns an tf.estimator.EstimatorSpec. def input_fn(): dataset = tf.data.TextLineDataset("test.txt") # map, shuffle, padded_batch, etc. iterator = dataset.make_initializable_iterator() return iterator.get_next() estimator = tf.estimator

Tensorflow Dataset API: dataset.batch(n).prefetch(m) prefetches m batches or samples?

阅读更多关于 Tensorflow Dataset API: dataset.batch(n).prefetch(m) prefetches m batches or samples?

问题 If I use dataset.batch(n).prefetch(m), m batches or m samples will be prefetched? 回答1: The Dataset.prefetch(m) transformation prefetches m elements of its direct input. In this case, since its direct input is dataset.batch(n) and each element of that dataset is a batch (of n elements), it will prefetch m batches . 来源： https://stackoverflow.com/questions/49707062/tensorflow-dataset-api-dataset-batchn-prefetchm-prefetches-m-batches-or-sam

How do I create padded batches in Tensorflow for tf.train.SequenceExample data using the DataSet API?

阅读更多关于 How do I create padded batches in Tensorflow for tf.train.SequenceExample data using the DataSet API?

问题 For training an LSTM model in Tensorflow , I have structured my data into a tf.train.SequenceExample format and stored it into a TFRecord file . I would now like to use the new DataSet API to generate padded batches for training . In the documentation there is an example for using padded_batch, but for my data I can't figure out what the value of padded_shapes should be. For reading the TFrecord file into the batches I have written the following Python code: import math import tensorflow as

How to create tf.data.dataset from directories of tfrecords?

阅读更多关于 How to create tf.data.dataset from directories of tfrecords?

问题 My dataset has different directories and each directory is corresponding to one class. There are different numbers of .tfrecords in each directory. My question is that how can I sample 5 images (each .tfrecord file corresponds to one image) from each directory? My other question is that how can I sample 5 of these directories and then sample 5 images from each. I just want to do it with tf.data.dataset. So I want to have a dataset from which I get an iterator and that iterator.next() gives me

Tensorflow v1.10: store images as byte strings or per channel?

阅读更多关于 Tensorflow v1.10: store images as byte strings or per channel?

问题 Context It is known that, at the moment, TF's Record documentation leaves something to be desired. My question is in regards to what is optimal for storing: a sequence, its per-element class probabilities, and some (context?) information (e.g. name of the sequence) as a TF Record. Namely, this questions considers storing the sequence and class probabilities as channels vs as a byte string and whether or not the meta information should go in as features of a tf.train.Example or as the context

How do I split Tensorflow datasets?

阅读更多关于 How do I split Tensorflow datasets?

问题 I have a tensorflow dataset based on one .tfrecord file. How do I split the dataset into test and train datasets? E.g. 70% Train and 30% test? Edit: My Tensorflow Version: 1.8 I've checked, there is no "split_v" function as mentioned in the possible duplicate. Also I am working with a tfrecord file. 回答1: You may use Dataset.take() and Dataset.skip() : train_size = int(0.7 * DATASET_SIZE) val_size = int(0.15 * DATASET_SIZE) test_size = int(0.15 * DATASET_SIZE) full_dataset = tf.data

parallelising tf.data.Dataset.from_generator

阅读更多关于 parallelising tf.data.Dataset.from_generator

问题 I have a non trivial input pipeline that from_generator is perfect for... dataset = tf.data.Dataset.from_generator(complex_img_label_generator, (tf.int32, tf.string)) dataset = dataset.batch(64) iter = dataset.make_one_shot_iterator() imgs, labels = iter.get_next() Where complex_img_label_generator dynamically generates images and returns a numpy array representing a (H, W, 3) image and a simple string label. The processing not something I can represent as reading from files and tf.image