Reshape and write ImageDataGenerator output to CSV file

跟風遠走 提交于 2021-01-07 01:05:55

问题


I'm working with the MNIST data set. I have the training data vectors in one CSV file (i.e. 60,000 rows, each with 784 columns), and the labels in a separate CSV file.

I want to bulk up the amount of training data, and append it to the CSV. It has to be done like this, because then the CSV file has to be fed in to a separate pipeline.

I originally wrote this script:

import keras
from keras.preprocessing.image import ImageDataGenerator
import pandas as pd

X_train = pd.read_csv('train-images-idx3-ubyte.csv')


datagen = ImageDataGenerator(
        featurewise_center=False,  
        samplewise_center=False,  
        featurewise_std_normalization=False,  
        samplewise_std_normalization=False, 
        zca_whitening=False,  
        rotation_range=10,  
        zoom_range = 0.2, 
        width_shift_range=0.2,  
        height_shift_range=0.2,  
        horizontal_flip=False,  
        vertical_flip=False) 


datagen.fit(X_train)

And I got the error:

ValueError: Input to `.fit()` should have rank 4. Got array with shape: (59999, 784)

So then I reshaped the data, and ran it again:

import keras
from keras.preprocessing.image import ImageDataGenerator
import pandas as pd

X_train = pd.read_csv('train-images-idx3-ubyte.csv')
X_train = X_train.values.reshape(-1,28,28,1)

datagen = ImageDataGenerator(
        featurewise_center=False,  
        samplewise_center=False,  
        featurewise_std_normalization=False, 
        samplewise_std_normalization=False,
        zca_whitening=False,  
        rotation_range=10, 
        zoom_range = 0.2, 
        width_shift_range=0.2,  
        height_shift_range=0.2,  
        horizontal_flip=False, 
        vertical_flip=False)  


datagen.fit(X_train)

But now I'm stuck, how do I (1) reshape the data back to it's original format, and (2) append the extra output to a CSV file/write to a new CSV file, so the output looks exactly the same as the input (i.e. 784 columns) but just with extra rows added.

When I change the last line from:

datagen.fit(X_train)

To:

output = datagen.fit(X_train)
print(output[0])

The error is:

    print(output[0])
TypeError: 'NoneType' object is not subscriptable

So I can't really understand how specifically to do it, if someone could show me the code I'd appreciate it.

Just to note that this data needs to eventually be put back into the MNIST-specific binary format.

Edit 1: I've just added the tensorflow tag because I know the two are closely linked, and if there's a better method in tensorflow for this purpose that would be great either.

来源:https://stackoverflow.com/questions/65142970/reshape-and-write-imagedatagenerator-output-to-csv-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!