tensorflow-datasets

Tensorflow dataset API - Apply windows to multiple sequences

不问归期 提交于 2021-02-08 06:55:43
问题 I want to setup a data pipeline working with sequential data. Each data point in a sequence has a fixed dimenstionality, e.g. 64x64. I have multiple sequences with variable length. So my dataset can be simplified to: seq1 = np.arange(5)[:, None, None] seq2 = np.arange(8)[:, None, None] seq3 = np.arange(7)[:, None, None] sequences = [seq1, seq2, seq3] Now, I want to operate on a series of time frames within the sequences, resulting in 3-dimensional data cubes [N_frames, data_dim1, data_dim2].

Tensorflow input pipeline where multiple rows correspond to a single observation?

社会主义新天地 提交于 2021-02-08 05:44:24
问题 So I've just started using Tensorflow, and I'm struggling to properly understand input pipelines. The problem I'm working on is sequence classification. I'm trying to read in a CSV file with shape (100000, 4). First 3 columns are features, 4th column is the label. BUT - the data represents sequences of length 10 i.e. rows 1-10 are sequence 1, rows 11-20 are sequence 2 etc. This also means each label is repeated 10 times. So at some point in this input pipeline, I'll need to reshape my feature

How to use the Tensorflow Dataset Pipeline for Variable Length Inputs?

风流意气都作罢 提交于 2021-02-06 12:49:53
问题 I am training a Recurrent Neural Network in Tensorflow over a dataset of sequence of numbers of varying lengths and have been trying to use the tf.data API to create an efficient pipeline. However I can't seem to get this thing to work My approach My data set is a NumPy array of shape [10000, ?, 32, 2] which is saved on my disk as a file in the .npy format. Here the ? denotes that elements have variable length in the second dimension. 10000 denotes the number of minibatches in the dataset and

TF data API: how to efficiently sample small patches from images

人走茶凉 提交于 2021-02-05 20:35:45
问题 Consider the problem of creating a dataset of sampling random small image patches from a directory of high-resolution images. The Tensorflow dataset API allows for a very easy way of doing this, by constructing a dataset of image names, shuffling them, mapping it to loaded images, then to random cropped patches. However, this naive implementation is very inefficient as a separate high-resolution image will be loaded and cropped to generate each patch. Ideally an image could be loaded once and

Alternative function for tf.contrib.layers.flatten(x) Tensor Flow

家住魔仙堡 提交于 2021-01-29 00:12:50
问题 i am using Tensor flow 0.8.0 verison on Jetson TK1 with Cuda 6.5 on 32 bit arm architecture. For that i can't upgrade the Tensor Flow version and i am facing trouble in Flatten function x = tf.placeholder(dtype = tf.float32, shape = [None, 28, 28]) y = tf.placeholder(dtype = tf.int32, shape = [None]) images_flat = tf.contrib.layers.flatten(x) The error i am getting at this point is AttributeError: 'module' object has no attribute 'flatten' is there any alternative to this function that may be

Alternative function for tf.contrib.layers.flatten(x) Tensor Flow

孤者浪人 提交于 2021-01-28 23:57:18
问题 i am using Tensor flow 0.8.0 verison on Jetson TK1 with Cuda 6.5 on 32 bit arm architecture. For that i can't upgrade the Tensor Flow version and i am facing trouble in Flatten function x = tf.placeholder(dtype = tf.float32, shape = [None, 28, 28]) y = tf.placeholder(dtype = tf.int32, shape = [None]) images_flat = tf.contrib.layers.flatten(x) The error i am getting at this point is AttributeError: 'module' object has no attribute 'flatten' is there any alternative to this function that may be

Splitting TensorFlow Dataset created with make_csv_dataset into 3 parts (X1_Train, X2_Train and Y_Train) for multi-input model

ぐ巨炮叔叔 提交于 2021-01-28 21:58:56
问题 I am training a deep learning model with Tensorflow 2 and Keras. I read my big CSV file with tf.data.experimental.make_csv_dataset and then split it into train and test datasets. However, I need to split my train dataset into three parts since my deep learning model takes two set of inputs in different layers so I need to pass [x1_train, x2_train],y_train to model.fit . My question is that how can I split train_dataset into x1_train,x2_train and y_train ? (some features shall be in x1_train

how to use shared_name on initializable_iterator

你说的曾经没有我的故事 提交于 2021-01-28 14:13:01
问题 In distributed tensorflow, I need processing input datas on one worker and consuming them on other different session. "make_initializable_iterator" have an undocumented parameter "shared_name", but how could I initialize the iterator without create the datasets on the every session. def make_initializable_iterator(self, shared_name=None): """Creates an `Iterator` for enumerating the elements of this dataset. Note: The returned iterator will be in an uninitialized state, and you must run the

Subsampling an unbalanced dataset in tensorflow

跟風遠走 提交于 2021-01-28 10:55:45
问题 Tensorflow beginner here. This is my first project and I am working with pre-defined estimators. I have an extremely unbalanced dataset where positive outcomes represent roughly 0.1% of the total data and I suspect this imbalance to considerably affect the performance of my model. As a first attempt to solve the issue, since I have tons of data, I would like to throw away most of my negatives in order to create a balanced dataset. I can see two ways of doing it: preprocess the data to keep

How to generate custom mini-batches using Tensorflow 2.0, such as those in the paper “In defense of the triplet loss”?

给你一囗甜甜゛ 提交于 2021-01-28 07:03:55
问题 I want to implement a custom mini-batch generator in Tensorflow 2.0 using tf.data.Dataset API. Concretely, I have image data, 100 classes with ~200 examples each. For each mini-batch, I want to randomly sample P classes, and K images from each class, for a total of P*K examples in a mini-batch (as described in the paper In Defense of the Triplet Loss for Person Re-Identification]). I've been searching through documentation for tf.data.Dataset, but can't seem to find the right method. I've