I have about 0.8 million images of 256x256 in RGB, which amount to over 7GB.
I want to use them as training data in a Convolutional Neural Network, and want to put them in a cPickle file, along with their labels.
Now, this is taking a lot of memory, to the extent that it needs to swap with my hard drive memory, and almost consume it all.
Is this is a bad idea?
What would be the smarter/more practical way to load into CNN or pickle them without causing too much memory issue?
This is what the code looks like
import numpy as np import cPickle from PIL import Image import sys,os pixels = [] labels = [] traindata = [] data=[] for subdir, dirs, files in os.walk('images'): curdir='' for file in files: if file.endswith(".jpg"): floc=str(subdir)+'/'+str(file) im= Image.open(floc) pix=np.array(im.getdata()) pixels.append(pix) labels.append(1) pixels=np.array(pixels) labels=np.array(labels) traindata.append(pixels) traindata.append(labels) traindata=np.array(traindata) .....# do the same for validation and test data .....# put all data and labels into 'data' array cPickle.dump(data,open('data.pkl','wb'))