问题
I'm trying to run a loop that iterates through an image folder and returns two numpy arrays: x - stores the image as a numpy array y - stores the label.
A folder can easily have over 40.000 rgb images, with dimensions (224,224). I have around 12Gb of memory but after some iterations, the used memory just spikes up and everything stops.
What can I do to fix this issue?
def create_set(path, quality):
x_file = glob.glob(path + '*')
x = []
for i, img in enumerate(x_file):
image = cv2.imread(img, cv2.IMREAD_COLOR)
x.append(np.asarray(image))
if i % 50 == 0:
print('{} - {} images processed'.format(path, i))
x = np.asarray(x)
x = x/255
y = np.zeros((x.shape[0], 2))
if quality == 0:
y[:,0] = 1
else:
y[:,1] = 1
return x, y
回答1:
You just can't load that many images into memory. You're trying to load every file in a given path to memory, by appending them to x.
Try processing them in batches, or if you're doing this for a tensorflow application try writing them to .tfrecords first.
If you want to save some memory, leave the images as np.uint8 rather than casting them to float (which happens automatically when you normalise them in this line > x = x/255
)
You also don't need np.asarray
in your x.append(np.asarray(image))
line. image
is already an array. np.asarray
is for converting lists, tuples, etc to arrays.
edit:
a very rough batching example:
def batching function(imlist, batchsize):
ims = []
batch = imlist[:batchsize]
for image in batch:
ims.append(image)
other_processing()
new_imlist = imlist[batchsize:]
return x, new_imlist
def main():
imlist = all_the_globbing_here()
for i in range(total_files/batch_size):
ims, imlist = batching_function(imlist, batchsize)
process_images(ims)
来源:https://stackoverflow.com/questions/50721762/out-of-memory-converting-image-files-to-numpy-array