tensorflow-datasets

parallelising tf.data.Dataset.from_generator

早过忘川 提交于 2019-12-17 09:21:35
问题 I have a non trivial input pipeline that from_generator is perfect for... dataset = tf.data.Dataset.from_generator(complex_img_label_generator, (tf.int32, tf.string)) dataset = dataset.batch(64) iter = dataset.make_one_shot_iterator() imgs, labels = iter.get_next() Where complex_img_label_generator dynamically generates images and returns a numpy array representing a (H, W, 3) image and a simple string label. The processing not something I can represent as reading from files and tf.image

Numpy to TFrecords: Is there a more simple way to handle batch inputs from tfrecords?

倖福魔咒の 提交于 2019-12-17 04:27:32
问题 My question is about how to get batch inputs from multiple (or sharded) tfrecords. I've read the example https://github.com/tensorflow/models/blob/master/inception/inception/image_processing.py#L410. The basic pipeline is, take the training set as as example, (1) first generate a series of tfrecords (e.g., train-000-of-005 , train-001-of-005 , ...), (2) from these filenames, generate a list and fed them into the tf.train.string_input_producer to get a queue, (3) simultaneously generate a tf

Numpy to TFrecords: Is there a more simple way to handle batch inputs from tfrecords?

痴心易碎 提交于 2019-12-17 04:27:20
问题 My question is about how to get batch inputs from multiple (or sharded) tfrecords. I've read the example https://github.com/tensorflow/models/blob/master/inception/inception/image_processing.py#L410. The basic pipeline is, take the training set as as example, (1) first generate a series of tfrecords (e.g., train-000-of-005 , train-001-of-005 , ...), (2) from these filenames, generate a list and fed them into the tf.train.string_input_producer to get a queue, (3) simultaneously generate a tf

Batch sequential data coming from multiple TFRecord files with tf.data

☆樱花仙子☆ 提交于 2019-12-14 02:30:02
问题 Let's consider a dataset split into multiple TFRecord files: 1.tfrecord , 2.tfrecord , etc. I would like to generate sequences of size t (say 3 ) consisting of consecutive elements from the same TFRecord file, I do not want a sequence to have elements belonging to different TFRecord files. For instance, if we have two TFRecord files containing data like: 1.tfrecord : {0, 1, 2, ..., 7} 2.tfrecord : {1000, 1001, 1002, ..., 1007} without any shuffling, I would like to get the following batches:

What happens if number of samples changes every epoch using tf.data.Dataset filter method? (my program hangs)

删除回忆录丶 提交于 2019-12-13 16:07:01
问题 I want to have different number of samples on every epoch. For example at epoch 1 I want to have 100 samples (all samples) and at the second epoch I want only 50 samples. Right now, I'm doing this using tf.data.Dataset filter method. I'm using models/official/resnet tf code and using multi gpus. My problem: after random number of epochs, the program hangs and even CTRL+C cannot kill the program. My questions is: Would the different number of samples per epoch cause any problem? my filter's

Could two tf.data.Dataset coexist and controled by tf.cond()

不问归期 提交于 2019-12-13 06:18:25
问题 I put two Dataset pipeline for train/test = 9:1 set in my Graph and the control the flow by a tf.cond. I encountered a problem that during the training the both pipelines are activated at each step. The testset ran out before the trainset as it has less during training. OutOfRangeError (see above for traceback): End of sequence First, nest the input pipeline in a function: def input_pipeline(*args): ... # construct iterator it = batch.make_initializable_iterator() iter_init_op = it

The established way to use TF Dataset API in Keras is to feed `model.fit` with `make_one_shot_iterator()`, But this iterator only good for one Epoch

跟風遠走 提交于 2019-12-13 03:49:55
问题 Edit: To clarify why this question is different from the suggested duplicates, this SO question follows up on those suggested duplicates, on what exactly is Keras doing with the techniques described in those SO questions. The suggested duplicates specify using a dataset API make_one_shot_iterator() in model.fit , my follow up is that make_one_shot_iterator() can only go through the dataset once, however in the solutions given, several epochs are specified. This is a follow up to these SO

tf.datasets input_fn getting error after 1 epoch

南楼画角 提交于 2019-12-13 03:39:54
问题 So I am trying to switch to an input_fn() using tf.datasets as described in this question. While I have been able to get superior steps/sec using tf.datasets with the input_fn() below, I appear to run into an error after 1 epoch when running this experiment on GCMLE. Consider this input_fn(): def input_fn(...): files = tf.data.Dataset.list_files(filenames).shuffle(num_shards) dataset = files.apply(tf.contrib.data.parallel_interleave(lambda filename: tf.data.TextLineDataset(filename).skip(1),

Tensorflow not predicting accurate enough results

限于喜欢 提交于 2019-12-12 18:05:53
问题 I have some fundamental questions about the algorithms I picked in my Tensorflow project. I fed in around 1 million sets of training data and still couldn't get the accurate enough prediction results. The code I am using is based on an old Tensorflow example (https://github.com/tensorflow/tensorflow/blob/r1.3/tensorflow/examples/tutorials/estimators/abalone.py). The goal of this example is to predict the age of an abalone based on the training features provided. My purpose is very similar.

iterator.get_next() cause terminate called after throwing an instance of 'std::system_error

元气小坏坏 提交于 2019-12-12 16:17:52
问题 I am training a resNet50 with tensorflow, using a shared server with these properties: ubuntu 16.04 3 gtx 1080 gpus tensorflow 1.3 python 2.7 but always after two epochs, and during the third epoch, I encounter this error: terminate called after throwing an instance of 'std::system_error' what(): Resource temporarily unavailable Aborted this is code convert tfrecord to dataset: filenames = ["balanced_t.tfrecords"] dataset = tf.contrib.data.TFRecordDataset(filenames) def parser(record): keys