Memory leakage issue in python list

问题

The identities list contains an big array of approximately 57000 images. Now, I am creating a negative list with the help of itertools.product(). This store the whole list in memory which is very costly and my system hanged after 4 minutes.

How can i optimize the below code and avoid saving in memory?`

for i in range(0, len(idendities) - 1):
    for j in range(i + 1, len(idendities)):
        cross_product = itertools.product(samples_list[i], samples_list[j])
        cross_product = list(cross_product)

        for cross_sample in cross_product:
            negative = []
            negative.append(cross_sample[0])
            negative.append(cross_sample[1])
            negatives.append(negative)
            print(len(negatives))

negatives = pd.DataFrame(negatives, columns=["file_x", "file_y"])
negatives["decision"] = "No"

negatives = negatives.sample(positives.shape[0])

The memory 9.30 is going to be higher and higher and on one point the system has been completely hanged.

I also implemented the below answer and modified code according to his answer.

for i in range(0, len(idendities) - 1):
    for j in range(i + 1, len(idendities)):
        for cross_sample in itertools.product(samples_list[i], samples_list[j]):
            negative = [cross_sample[0], cross_sample[1]]
            negatives.append(negative)
            print(len(negatives))

negatives = pd.DataFrame(negatives, columns=["file_x", "file_y"])
negatives["decision"] = "No"

Third version of code

This CSV file is too big even if you open a file then it gives an alert that your program can not load all files. Regarding the process, it ten minutes, and then again system going to be hanged completely.

for i in range(0, len(idendities) - 1):
    for j in range(i + 1, len(idendities)):
        for cross_sample in itertools.product(samples_list[i], samples_list[j]):
            with open('/home/khawar/deepface/tests/results.csv', 'a+') as csvfile:
                writer = csv.writer(csvfile)
                writer.writerow([cross_sample[0], cross_sample[1]])
            negative = [cross_sample[0], cross_sample[1]]
            negatives.append(negative)

negatives = pd.DataFrame(negatives, columns=["file_x", "file_y"])
negatives["decision"] = "No"

negatives = negatives.sample(positives.shape[0])

Memory screenshot.

回答1:

The product from itertools is a generator so naturally it dose not store the whole list in memory, but in the next line, cross_product = list(cross_product) you convert it to list object which store the whole data in your memory.

The idea of a generator is that you don't do all the calculation at the same time, as you do with your call list(itertools.product(samples_list[i], samples_list[j])). So what you want to do is generate the results one by one:

Try something like this:

for i in range(len(idendities) - 1):
    for j in range(i + 1, len(idendities)):
        for cross_sample in itertools.product(samples_list[i], samples_list[j]):
            # do something ...

So i guess i found your problem; you are appending all samples to negatives list first because of that your memory is going to be higher and higher, you need to write each row on realtime, one line at time;

Your data is csv right? so you can do this like:

import csv
for i in range(0, len(idendities) - 1):
    for j in range(i + 1, len(idendities)):
        for cross_sample in itertools.product(samples_list[i], samples_list[j]):

            with open('results.csv', 'a+') as csvfile:
                writer = csv.writer(csvfile)
                writer.writerow([cross_sample[0], cross_sample[1]])

The idea is writing your rows realtime

Check this link too how to write the real time data into csv file in python

Some credits to @9mat, @cybot and this question How to get Cartesian product in Python using a generator?, how to write the real time data into csv file in python

来源：https://stackoverflow.com/questions/65855793/memory-leakage-issue-in-python-list

标签

python

list

optimization

product

itertools