Out of memory converting image files to numpy array

十年热恋 提交于 2020-01-06 05:11:16

问题


I'm trying to run a loop that iterates through an image folder and returns two numpy arrays: x - stores the image as a numpy array y - stores the label.

A folder can easily have over 40.000 rgb images, with dimensions (224,224). I have around 12Gb of memory but after some iterations, the used memory just spikes up and everything stops.

What can I do to fix this issue?

def create_set(path, quality):
    x_file = glob.glob(path + '*')
    x = []

    for i, img in enumerate(x_file):
        image = cv2.imread(img, cv2.IMREAD_COLOR)
        x.append(np.asarray(image))
        if i % 50 == 0:
            print('{} - {} images processed'.format(path, i))

    x = np.asarray(x)
    x = x/255

    y = np.zeros((x.shape[0], 2))
    if quality == 0:
        y[:,0] = 1
    else:
        y[:,1] = 1 

    return x, y

回答1:


You just can't load that many images into memory. You're trying to load every file in a given path to memory, by appending them to x.

Try processing them in batches, or if you're doing this for a tensorflow application try writing them to .tfrecords first.

If you want to save some memory, leave the images as np.uint8 rather than casting them to float (which happens automatically when you normalise them in this line > x = x/255)

You also don't need np.asarray in your x.append(np.asarray(image)) line. image is already an array. np.asarray is for converting lists, tuples, etc to arrays.

edit:

a very rough batching example:

def batching function(imlist, batchsize):
    ims = []
    batch = imlist[:batchsize]

    for image in batch:
        ims.append(image)
        other_processing()

    new_imlist = imlist[batchsize:]
    return x, new_imlist

def main():
    imlist = all_the_globbing_here()
    for i in range(total_files/batch_size):
        ims, imlist = batching_function(imlist, batchsize)
        process_images(ims)


来源:https://stackoverflow.com/questions/50721762/out-of-memory-converting-image-files-to-numpy-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!