tensorflow-datasets

How to use tf.Dataset design in both training and inferring?

寵の児 提交于 2020-01-06 06:46:05
问题 Say, we have input x and label y : iterator = tf.data.Iterator.from_structure((x_type, y_type), (x_shape, y_shape)) tf_x, tf_y = iterator.get_next() Now I use generate function to create dataset: def gen(): for ....: yield (x, y) ds = tf.data.Dataset.from_generator(gen, (x_type, y_type), (x_shape, y_shape)) In my graph, I use tf_x and tf_y to do training, that is fine. But now I want to do referring, where I don't have label y . One workaround I made is to fake a y (like tf.zeros(y_shape)),

Unable to use canned Tensorflow RNN Estimator

可紊 提交于 2020-01-06 05:45:06
问题 I am trying to use the canned RNN Estimator from Tensorflow as follows: import tensorflow as tf sequence_feature_colums = [tf.contrib.feature_column.sequence_numeric_column("test")] estimator = tf.contrib.estimator.RNNEstimator( head=tf.contrib.estimator.regression_head(), sequence_feature_columns=sequence_feature_colums) def input_fn_train(): dataset = tf.data.Dataset.from_tensor_slices(({"test": [0]}, [0])) dataset = dataset.batch(1) return dataset estimator.train(input_fn=input_fn_train,

Is there a way to stack two tensorflow datasets?

大城市里の小女人 提交于 2020-01-03 16:40:06
问题 I want to stack two datasets objects in Tensorflow (rbind function in R). I have created one dataset A from tfRecord files and one dataset B from numpy arrays. Both have same variables. Do you know if there is a way to stack these two datasets to create a bigger one ? Or to create an iterrator that will randomly read data from this two sources ? Thanks 回答1: The tf.data.Dataset.concatenate() method is the closest analog of tf.stack() when workind with datasets. If you have two datasets with

How to improve the performance of this data pipeline for my tensorflow model

北慕城南 提交于 2019-12-31 11:01:30
问题 I have a tensorflow model which I am training on google-colab. The actual model is more complex, but I condensed it into a reproducible example (removed saving/restoring, learning rate decay, asserts, tensorboard events, gradient clipping and so on). The model works reasonably (converges to acceptable loss) and I am looking for a way to speed up the training (iterations per second). Currently on colab's GPU it takes 10 minutes to train for 1000 iteration . With my current batch size of 512 it

What is the difference between Dataset.from_tensors and Dataset.from_tensor_slices?

自闭症网瘾萝莉.ら 提交于 2019-12-31 08:27:28
问题 I have a dataset represented as a NumPy matrix of shape (num_features, num_examples) and I wish to convert it to TensorFlow type tf.Dataset . I am struggling trying to understand the difference between these two methods: Dataset.from_tensors and Dataset.from_tensor_slices . What is the right one and why? TensorFlow documentation (link) says that both method accept a nested structure of tensor although when using from_tensor_slices the tensor should have same size in the 0-th dimension. 回答1:

How to feed .h5 files in tf.data pipeline in tensorflow model

岁酱吖の 提交于 2019-12-31 05:33:08
问题 I'm trying to optimize the input pipeline for .h5 data with tf.data. But I encountered a TypeError: expected str, bytes or os.PathLike object, not Tensor . I did a research but can't find anything about converting a tensor of string to string. This simplified code is executable and return the same error: batch_size = 1000 conv_size = 3 nb_conv = 32 learning_rate = 0.0001 # define parser function def parse_function(fname): with h5py.File(fname, 'r') as f: #Error comes from here X = f['X']

Tensorflow error : unsupported callable

吃可爱长大的小学妹 提交于 2019-12-29 06:32:00
问题 I follow the tutorial https://www.tensorflow.org/tutorials/layers and I want use it to use my own dataset. def train_input_fn_custom(filenames_array, labels_array, batch_size): # Reads an image from a file, decodes it into a dense tensor, and resizes it to a fixed shape. def _parse_function(filename, label): image_string = tf.read_file(filename) image_decoded = tf.image.decode_png(image_string, channels=1) image_resized = tf.image.resize_images(image_decoded, [40, 40]) return image_resized,

Parallelism isn't reducing the time in dataset map

烂漫一生 提交于 2019-12-28 06:21:10
问题 TF Map function supports parallel calls. I'm seeing no improvements passing num_parallel_calls to map. With num_parallel_calls=1 and num_parallel_calls=10 , there is no improvement in performance run time. Here is a simple code import time def test_two_custom_function_parallelism(num_parallel_calls=1, batch=False, batch_size=1, repeat=1, num_iterations=10): tf.reset_default_graph() start = time.time() dataset_x = tf.data.Dataset.range(1000).map(lambda x: tf.py_func( squarer, [x], [tf.int64]),

Raising to a square with TensorFlow with a Dataset class

爱⌒轻易说出口 提交于 2019-12-24 22:00:54
问题 I want to write a neural network which look for a x^2 distribution without a predefined model. Precisely, it is given some points in [-1,1] with their squares to train, and then it would have to reproduce and predict similar for e.g. [-10,10]. I've more or less done it - without datasets. But then I tried to modify it in order to use datasets and learn how to use it. Now, I succeded in making the program run, but the output is worse then before, mainly it's constant 0. Previous version was

Tensorflow 1.14+: Make intentionally unbalanced mini batch with Dataset API

白昼怎懂夜的黑 提交于 2019-12-24 18:31:07
问题 This question is somewhat of an extension of Produce balanced mini batch with Dataset API and references the interleave function from the tf.data.Dataset documentation. Context: Suppose you have the following: dataset with n=4 classes a list of filenames where each file corresponds to a record the label for each file Then we can construct the labeled dataset as follows: path_ds = tf.data.Dataset.from_tensor_slices(files) indx_ds = tf.data.Dataset.from_tensor_slices(labels) ds = tf.data