tensorflow-datasets

tensorflow pipeline for pickled pandas data input

我的未来我决定 提交于 2019-12-24 10:16:43
问题 I would like to input compressed pd.read_pickle(filename, compression='xz') pandas dataframes as a pipeline to tensorflow. I want to use the high level API tf.estimator classifier which requires an input function. My data files are large matrices ~(1400X16) of floats, and each matrix corresponds to a particular type (label). Each type (label) is contained in a different directory, so I know the matrix label from its directory. At the low level, I know I can populate data using a feed_dict={X

FailedPreconditionError: Table already initialized

≯℡__Kan透↙ 提交于 2019-12-24 04:21:07
问题 I am reading data from tfrecords with dataset api. I am converting string data to dummy data with following code. SFR1 = tf.feature_column.indicator_column( tf.feature_column.categorical_column_with_vocabulary_list("SFR1 ", vocabulary_list=("1", "2"))) But when i run my code, tensorflow is throwing following error. tensorflow.python.framework.errors_impl.FailedPreconditionError: Table already initialized. [[Node: Generator/input_layer/SFR1 _indicator/SFR1 _lookup/hash_table/table_init =

FailedPreconditionError: Table already initialized

和自甴很熟 提交于 2019-12-24 04:21:06
问题 I am reading data from tfrecords with dataset api. I am converting string data to dummy data with following code. SFR1 = tf.feature_column.indicator_column( tf.feature_column.categorical_column_with_vocabulary_list("SFR1 ", vocabulary_list=("1", "2"))) But when i run my code, tensorflow is throwing following error. tensorflow.python.framework.errors_impl.FailedPreconditionError: Table already initialized. [[Node: Generator/input_layer/SFR1 _indicator/SFR1 _lookup/hash_table/table_init =

What does experimental in TensorFlow mean?

五迷三道 提交于 2019-12-24 01:17:22
问题 In TensorFlow 2.0 APIs, there is a module tf.experimental . Such a name also appears in other places like tf.data.experimental . I just would like to know what the motivate for designing these modules is. 回答1: tf.experimental indicates that the said class/method is in early development, incomplete, or less commonly, not up-to-standards. It's a collection of user contributions which weren't yet integrated w/ main TensorFlow, but are still available as a part of open-source for users to test

How to pad to fixed BATCH_SIZE in tf.data.Dataset?

心已入冬 提交于 2019-12-23 13:07:05
问题 I have a dataset with 11 samples. And when I choose the BATCH_SIZE be 2, the following code will have errors: dataset = tf.contrib.data.TFRecordDataset(filenames) dataset = dataset.map(parser) if shuffle: dataset = dataset.shuffle(buffer_size=128) dataset = dataset.batch(batch_size) dataset = dataset.repeat(count=1) The problem lies in dataset = dataset.batch(batch_size) , when the Dataset looped into the last batch, the remaining count of samples is just 1, so is there any way to pick

Dataset API for TensorFlow : Variable sized Input

余生颓废 提交于 2019-12-23 10:06:39
问题 I have my entire dataset in memory as list of tuples where each tuple corresponds to a batch of fixed size 'N' . i.e (x[i],label[i],length[i]) x[i]: numpy array of shape [N,W,F]; here there are N examples, with W timestep each; all timesteps have fixed number of features F label[i] : class: shape [N,] one for each example in batch length[i] : length (number of timesteps ) in data : shape [N,] : this is number of timesteps (W) for each example in batch Main problem : Across the batches W

Upgrade to tf.dataset not working properly when parsing csv

我是研究僧i 提交于 2019-12-23 08:47:15
问题 I have a GCMLE experiment and I am trying to upgrade my input_fn to use the new tf.data functionality. I have created the following input_fn based off of this sample def input_fn(...): dataset = tf.data.Dataset.list_files(filenames).shuffle(num_shards) # shuffle up the list of input files dataset = dataset.interleave(lambda filename: # mix together records from cycle_length number of shards tf.data.TextLineDataset(filename).skip(1).map(lambda row: parse_csv(row, hparams)), cycle_length=5) if

Upgrade to tf.dataset not working properly when parsing csv

别来无恙 提交于 2019-12-23 08:46:30
问题 I have a GCMLE experiment and I am trying to upgrade my input_fn to use the new tf.data functionality. I have created the following input_fn based off of this sample def input_fn(...): dataset = tf.data.Dataset.list_files(filenames).shuffle(num_shards) # shuffle up the list of input files dataset = dataset.interleave(lambda filename: # mix together records from cycle_length number of shards tf.data.TextLineDataset(filename).skip(1).map(lambda row: parse_csv(row, hparams)), cycle_length=5) if

How to expand tf.data.Dataset with additional example transformations in Tensorflow

被刻印的时光 ゝ 提交于 2019-12-23 04:37:27
问题 I would like to double the size of an existing dataset I'm using to train a neural network in tensorflow on the fly by adding random noise to it. So when I'm done I'll have all the existing examples and also all the examples with noise added to them. I'd also like to interleave these as I transform them, so they come out in this order: example 1 without noise, example 1 with noise, example 2 without noise, example 2 with noise, etc. I'm struggling to accomplish this using the Dataset api. I

Tensorflow dataset questions about .shuffle, .batch and .repeat

穿精又带淫゛_ 提交于 2019-12-23 01:35:22
问题 I had a question about the use of batch, repeat and shuffle with tf.Dataset. It is not clear to me exactly how repeat and shuffle are used. I understand that .batch will dictate how many training examples will undergo stochastic gradient descent, the uses of .repeat and .shuffle are still not clear to me. First Question Even after reviewing here and here, .repeat is used to reiterate over the dataset once a tf.errors.OutOfRangeError is thrown. Therefore, in my code does that mean I no longer