tensorflow-datasets | 易学教程

How to use tf.Dataset design in both training and inferring?

阅读更多关于 How to use tf.Dataset design in both training and inferring?

问题 Say, we have input x and label y : iterator = tf.data.Iterator.from_structure((x_type, y_type), (x_shape, y_shape)) tf_x, tf_y = iterator.get_next() Now I use generate function to create dataset: def gen(): for ....: yield (x, y) ds = tf.data.Dataset.from_generator(gen, (x_type, y_type), (x_shape, y_shape)) In my graph, I use tf_x and tf_y to do training, that is fine. But now I want to do referring, where I don't have label y . One workaround I made is to fake a y (like tf.zeros(y_shape)),

Unable to use canned Tensorflow RNN Estimator

阅读更多关于 Unable to use canned Tensorflow RNN Estimator

问题 I am trying to use the canned RNN Estimator from Tensorflow as follows: import tensorflow as tf sequence_feature_colums = [tf.contrib.feature_column.sequence_numeric_column("test")] estimator = tf.contrib.estimator.RNNEstimator( head=tf.contrib.estimator.regression_head(), sequence_feature_columns=sequence_feature_colums) def input_fn_train(): dataset = tf.data.Dataset.from_tensor_slices(({"test": [0]}, [0])) dataset = dataset.batch(1) return dataset estimator.train(input_fn=input_fn_train,

Is there a way to stack two tensorflow datasets?

阅读更多关于 Is there a way to stack two tensorflow datasets?

问题 I want to stack two datasets objects in Tensorflow (rbind function in R). I have created one dataset A from tfRecord files and one dataset B from numpy arrays. Both have same variables. Do you know if there is a way to stack these two datasets to create a bigger one ? Or to create an iterrator that will randomly read data from this two sources ? Thanks 回答1: The tf.data.Dataset.concatenate() method is the closest analog of tf.stack() when workind with datasets. If you have two datasets with

How to improve the performance of this data pipeline for my tensorflow model

阅读更多关于 How to improve the performance of this data pipeline for my tensorflow model

问题 I have a tensorflow model which I am training on google-colab. The actual model is more complex, but I condensed it into a reproducible example (removed saving/restoring, learning rate decay, asserts, tensorboard events, gradient clipping and so on). The model works reasonably (converges to acceptable loss) and I am looking for a way to speed up the training (iterations per second). Currently on colab's GPU it takes 10 minutes to train for 1000 iteration . With my current batch size of 512 it

What is the difference between Dataset.from_tensors and Dataset.from_tensor_slices?

阅读更多关于 What is the difference between Dataset.from_tensors and Dataset.from_tensor_slices?

问题 I have a dataset represented as a NumPy matrix of shape (num_features, num_examples) and I wish to convert it to TensorFlow type tf.Dataset . I am struggling trying to understand the difference between these two methods: Dataset.from_tensors and Dataset.from_tensor_slices . What is the right one and why? TensorFlow documentation (link) says that both method accept a nested structure of tensor although when using from_tensor_slices the tensor should have same size in the 0-th dimension. 回答1:

How to feed .h5 files in tf.data pipeline in tensorflow model

阅读更多关于 How to feed .h5 files in tf.data pipeline in tensorflow model

问题 I'm trying to optimize the input pipeline for .h5 data with tf.data. But I encountered a TypeError: expected str, bytes or os.PathLike object, not Tensor . I did a research but can't find anything about converting a tensor of string to string. This simplified code is executable and return the same error: batch_size = 1000 conv_size = 3 nb_conv = 32 learning_rate = 0.0001 # define parser function def parse_function(fname): with h5py.File(fname, 'r') as f: #Error comes from here X = f['X']

Tensorflow error : unsupported callable

阅读更多关于 Tensorflow error : unsupported callable

问题 I follow the tutorial https://www.tensorflow.org/tutorials/layers and I want use it to use my own dataset. def train_input_fn_custom(filenames_array, labels_array, batch_size): # Reads an image from a file, decodes it into a dense tensor, and resizes it to a fixed shape. def _parse_function(filename, label): image_string = tf.read_file(filename) image_decoded = tf.image.decode_png(image_string, channels=1) image_resized = tf.image.resize_images(image_decoded, [40, 40]) return image_resized,

Parallelism isn't reducing the time in dataset map

阅读更多关于 Parallelism isn't reducing the time in dataset map

问题 TF Map function supports parallel calls. I'm seeing no improvements passing num_parallel_calls to map. With num_parallel_calls=1 and num_parallel_calls=10 , there is no improvement in performance run time. Here is a simple code import time def test_two_custom_function_parallelism(num_parallel_calls=1, batch=False, batch_size=1, repeat=1, num_iterations=10): tf.reset_default_graph() start = time.time() dataset_x = tf.data.Dataset.range(1000).map(lambda x: tf.py_func( squarer, [x], [tf.int64]),

Raising to a square with TensorFlow with a Dataset class

阅读更多关于 Raising to a square with TensorFlow with a Dataset class

问题 I want to write a neural network which look for a x^2 distribution without a predefined model. Precisely, it is given some points in [-1,1] with their squares to train, and then it would have to reproduce and predict similar for e.g. [-10,10]. I've more or less done it - without datasets. But then I tried to modify it in order to use datasets and learn how to use it. Now, I succeded in making the program run, but the output is worse then before, mainly it's constant 0. Previous version was

Tensorflow 1.14+: Make intentionally unbalanced mini batch with Dataset API

阅读更多关于 Tensorflow 1.14+: Make intentionally unbalanced mini batch with Dataset API

问题 This question is somewhat of an extension of Produce balanced mini batch with Dataset API and references the interleave function from the tf.data.Dataset documentation. Context: Suppose you have the following: dataset with n=4 classes a list of filenames where each file corresponds to a record the label for each file Then we can construct the labeled dataset as follows: path_ds = tf.data.Dataset.from_tensor_slices(files) indx_ds = tf.data.Dataset.from_tensor_slices(labels) ds = tf.data