tensorflow-datasets

How to convert “tensor” to “numpy” array in tensorflow?

徘徊边缘 提交于 2019-12-11 00:21:55
问题 I am trying to convert a tensor to numpy in the tesnorflow2.0 version. Since tf2.0 have eager execution enabled then it should work by default and working too in normal runtime. While I execute code in tf.data.Dataset API then it gives an error "AttributeError: 'Tensor' object has no attribute 'numpy'" I have tried ".numpy()" after tensorflow variable and for ".eval()" I am unable to get default session. from __future__ import absolute_import, division, print_function, unicode_literals import

Split .tfrecords file into many .tfrecords files

ぐ巨炮叔叔 提交于 2019-12-10 20:15:58
问题 Is there any way to split .tfrecords file into many .tfrecords files directly, without writing back each Dataset example ? 回答1: You can use a function like this: import tensorflow as tf def split_tfrecord(tfrecord_path, split_size): with tf.Graph().as_default(), tf.Session() as sess: ds = tf.data.TFRecordDataset(tfrecord_path).batch(split_size) batch = ds.make_one_shot_iterator().get_next() part_num = 0 while True: try: records = sess.run(batch) part_path = tfrecord_path + '.{:03d}'.format

How to mix unbalanced Datasets to reach a desired distribution per label?

℡╲_俬逩灬. 提交于 2019-12-10 17:16:51
问题 I am running my neural network on ubuntu 16.04, with 1 GPU (GTX 1070) and 4 CPUs. My dataset contains around 35,000 images, but the dataset is not balanced: class 0 has 90%, and class 1,2,3,4 share the other 10%. Therefore I over-sample class 1-4 by using dataset.repeat(class_weight) [I also use a function to apply random augmentation], and then concatenate them. The re-sampling strategy is: 1) At the very beginning, class_weight[n] will be set to a large number so that each class will have

How to improve data input pipeline performance?

怎甘沉沦 提交于 2019-12-10 16:31:13
问题 I try to optimize my data input pipeline. The dataset is a set of 450 TFRecord files of size ~70MB each, hosted on GCS. The job is executed with GCP ML Engine. There is no GPU. Here is the pipeline: def build_dataset(file_pattern): return tf.data.Dataset.list_files( file_pattern ).interleave( tf.data.TFRecordDataset, num_parallel_calls=tf.data.experimental.AUTOTUNE ).shuffle( buffer_size=2048 ).batch( batch_size=2048, drop_remainder=True, ).cache( ).repeat( ).map( map_func=_parse_example

TensorFlow performance bottleneck on IteratorGetNext

情到浓时终转凉″ 提交于 2019-12-10 15:42:59
问题 While fiddling around with TensorFlow, I noticed that a relatively simple task (batching some of our 3D accelerometer data and taking the sum of each epoch) was having relatively poor performance. Here's the essence of what I had running, once I got the (incredibly nifty!) Timeline functionality up: import numpy as np import tensorflow as tf from tensorflow.python.client import timeline # Some dummy functions to compute "features" from the data def compute_features( data ): feature_functions

Get input (filenames) from tensorflow dataset iterators

*爱你&永不变心* 提交于 2019-12-10 11:26:27
问题 I am using tensorflow datasets to train a model. A list of filenames is taken by the dataset to read them during the session, and I would like to get the filename together with the image. In more detail, I have something like this: filenames = tf.constant(["/var/data/image1.jpg", "/var/data/image2.jpg", ...]) labels = tf.constant([0, 37, ...]) dataset = tf.data.Dataset.from_tensor_slices((filenames, labels)) dataset.shuffle() def _parse_function(filename, label): image_string = tf.read_file

Tensorflow - String processing in Dataset API

十年热恋 提交于 2019-12-10 08:17:52
问题 I have .txt files in a directory of format <text>\t<label> . I am using the TextLineDataset API to consume these text records: filenames = ["/var/data/file1.txt", "/var/data/file2.txt"] dataset = tf.contrib.data.Dataset.from_tensor_slices(filenames) dataset = dataset.flat_map( lambda filename: ( tf.contrib.data.TextLineDataset(filename) .map(_parse_data))) def _parse_data(line): line_split = tf.string_split([line], '\t') features = {"raw_text": tf.string(line_split.values[0].strip().lower()),

How can I return the same batch twice from a tensorflow dataset iterator?

匆匆过客 提交于 2019-12-10 07:54:43
问题 I am converting some legacy code to use the Dataset API - this code uses feed_dict to feed one batch to the train operation (actually three times) and then recalculates the losses for display using the same batch . So I need to have an iterator that returns the exact same batch two (or several) times. Unfortunately, I can't seem to find a way of doing it with tensorflow datasets - is it possible? 回答1: You can repeat individual elements of a Dataset using Dataset.flat_map(), Dataset.from

TensorFlow tf.data.Dataset and bucketing

假装没事ソ 提交于 2019-12-10 02:52:03
问题 For an LSTM network, I've seen great improvements with bucketing. I've come across the bucketing section in the TensorFlow docs which (tf.contrib). Though in my network, I am using the tf.data.Dataset API, specifically I'm working with TFRecords, so my input pipeline looks something like this dataset = tf.data.TFRecordDataset(TFRECORDS_PATH) dataset = dataset.map(_parse_function) dataset = dataset.map(_scale_function) dataset = dataset.shuffle(buffer_size=10000) dataset = dataset.padded_batch

How to make tf.data.Dataset return all of the elements in one call?

笑着哭i 提交于 2019-12-09 15:59:17
问题 Is there an easy way to get the entire set of elements in a tf.data.Dataset ? i.e. I want to set batch size of the Dataset to be the size of my dataset without specifically passing it the number of elements. This would be useful for validation dataset where I want to measure accuracy on the entire dataset in one go. I'm surprised there isn't a method to get the size of a tf.data.Dataset 回答1: In short, there is not a good way to get the size/length; tf.data.Dataset is built for pipelines of