tfrecord | 易学教程

tensorflow: Reading time series data from TFRecord

阅读更多关于 tensorflow: Reading time series data from TFRecord

问题 I'm using a SequenceExample protobuf to read/write time-series data into a TFRecord file. I serialized a pair the np arrays as follows: writer = tf.python_io.TFRecordWriter(file_name) context = tf.train.Features( ... Feature( ... ) ... ) feature_data = tf.train.FeatureList(feature=[ tf.train.Feature(float_list=tf.train.FloatList(value= np.random.normal(size=([4065000,]))]) labels = tf.train.FeatureList(feature=[ tf.train.Feature(int64_list=tf.train.Int64List(value= np.random.random_integers(0

Unable to read from Tensorflow tfrecord file

阅读更多关于 Unable to read from Tensorflow tfrecord file

问题 I am able to create the tfrecords file by using the below code. def _int64_feature(value): return tf.train.Feature(int64_list=tf.train.Int64List(value=[value])) def _bytes_feature(value): return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value])) def convert_to_tfrecord(images,labels,file_name): # images is a numpy array of shape (num_images,channel,rows,column) # labels is a numpy array of shape (num_images,) num_labels = np.shape(labels) (num_images,depth,rows,cols) = np.shape

Split .tfrecords file into many .tfrecords files

阅读更多关于 Split .tfrecords file into many .tfrecords files

问题 Is there any way to split .tfrecords file into many .tfrecords files directly, without writing back each Dataset example ? 回答1: You can use a function like this: import tensorflow as tf def split_tfrecord(tfrecord_path, split_size): with tf.Graph().as_default(), tf.Session() as sess: ds = tf.data.TFRecordDataset(tfrecord_path).batch(split_size) batch = ds.make_one_shot_iterator().get_next() part_num = 0 while True: try: records = sess.run(batch) part_path = tfrecord_path + '.{:03d}'.format

Numpy array to TFrecord

阅读更多关于 Numpy array to TFrecord

问题 I'm trying to train a custom dataset through tensorflow object detection api. Dataset contains 40k training images and labels which are in numpy ndarray format ( uint8 ). training dataset shape=2 ([40000,23456]) and labels shape = 1 ([0..., 3]). I want to generate tfrecord for this dataset. how do I do that? I'm quit new for tensorflow. 回答1: This tutorial will walk you through the process of creating TFRecords from your data: https://medium.com/mostly-ai/tensorflow-records-what-they-are-and

How to decode Unicode string in Tensorflow's graph pipeline

阅读更多关于 How to decode Unicode string in Tensorflow's graph pipeline

问题 I have created a tfRecord file to store data. I have to store Hindi text so, I have saved it in the bytes using string.encode('utf-8'). But, I am stuck at the time of reading the data. I am reading data with help of tensorflow dataset APIs. I know that i can decode it using string.decode('utf-8'), but this is not what I am looking for. I want some solution through which i can decode my byte string back to Unicode string inside graph only. I have tried as_text, decoding_raw but they are giving

Shuffling tfrecords files

阅读更多关于 Shuffling tfrecords files

I have 5 tfrecords files, one for each object. While training I want to read data equally from all the 5 tfrecords i.e. if my batch size is 50, I should get 10 samples from 1st tfrecord file, 10 samples from the second tfrecord file and so on. Currently, it just reads sequentially from all the three files i.e. I get 50 samples from the same record. Is there a way to sample from differnt tfrecords files? I advise you to read the tutorial by @mrry on tf.data . On slide 42 he explains how to use tf.data.Dataset.interleave() to read multiple tfrecord files at the same time. For instance if you

tensorflow ValueError: features should be a dictionary of `Tensor`s. Given type: <class 'tensorflow.python.framework.ops.Tensor'>

阅读更多关于 tensorflow ValueError: features should be a dictionary of `Tensor`s. Given type:

This is my code! My tensorflow version is 1.6.0, python version is 3.6.4. If I direct use dataset to read csv file, I can train and no wrong. But I convert csv file to tfrecords file, it's wrong. I google it in Internet and almost people say tensorflow should be updated, but it don't work for me. import tensorflow as tf tf.logging.set_verbosity(tf.logging.INFO) feature_names = [ 'SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth' ] def my_input_fn(is_shuffle=False, repeat_count=1): dataset = tf.data.TFRecordDataset(['csv.tfrecords']) # filename is a list def parser(record): keys_to

Best way to process terabytes of data on gcloud ml-engine with keras

阅读更多关于 Best way to process terabytes of data on gcloud ml-engine with keras

I want to train a model on about 2TB of image data on gcloud storage. I saved the image data as separate tfrecords and tried to use the tensorflow data api following this example https://medium.com/@moritzkrger/speeding-up-keras-with-tfrecord-datasets-5464f9836c36 But it seems like keras' model.fit(...) doesn't support validation for tfrecord datasets based on https://github.com/keras-team/keras/pull/8388 Is there a better approach for processing large amounts of data with keras from ml-engine that I'm missing? Thanks a lot! If you are willing to use tf.keras instead of actual Keras, you can

Best way to process terabytes of data on gcloud ml-engine with keras

阅读更多关于 Best way to process terabytes of data on gcloud ml-engine with keras

问题 I want to train a model on about 2TB of image data on gcloud storage. I saved the image data as separate tfrecords and tried to use the tensorflow data api following this example https://medium.com/@moritzkrger/speeding-up-keras-with-tfrecord-datasets-5464f9836c36 But it seems like keras' model.fit(...) doesn't support validation for tfrecord datasets based on https://github.com/keras-team/keras/pull/8388 Is there a better approach for processing large amounts of data with keras from ml

TensorFlow strings: what they are and how to work with them

阅读更多关于 TensorFlow strings: what they are and how to work with them

When I read file with tf.read_file I get something with type tf.string . Documentation says only that it is "Variable length byte arrays. Each element of a Tensor is a byte array." ( https://www.tensorflow.org/versions/r0.10/resources/dims_types.html ). I have no idea how to interpret this. I can do nothing with this type. In usual python you can get elements by index like my_string[:4] , but when I run following code I get an error. import tensorflow as tf import numpy as np x = tf.constant("This is string") y = x[:4] init = tf.initialize_all_variables() sess = tf.Session() sess.run(init)