I wrote a Python program that acts on a large input file to create a few million objects representing triangles. The algorithm is:
If you don't care about vertex reuse, you could have two output files--one for vertices and one for triangles. Then append the triangle file to the vertex file when you are done.
Unfortunately (depending on your version and release of Python) some types of objects use "free lists" which are a neat local optimization but may cause memory fragmentation, specifically by making more and more memory "earmarked" for only objects of a certain type and thereby unavailable to the "general fund".
The only really reliable way to ensure that a large but temporary use of memory DOES return all resources to the system when it's done, is to have that use happen in a subprocess, which does the memory-hungry work then terminates. Under such conditions, the operating system WILL do its job, and gladly recycle all the resources the subprocess may have gobbled up. Fortunately, the multiprocessing
module makes this kind of operation (which used to be rather a pain) not too bad in modern versions of Python.
In your use case, it seems that the best way for the subprocesses to accumulate some results and yet ensure those results are available to the main process is to use semi-temporary files (by semi-temporary I mean, NOT the kind of files that automatically go away when closed, just ordinary files that you explicitly delete when you're all done with them).
According to Python Official Documentation, you can force the Garbage Collector to release unreferenced memory with gc.collect()
. Example:
import gc
gc.collect()
(del
can be your friend, as it marks objects as being deletable when there no other references to them. Now, often the CPython interpreter keeps this memory for later use, so your operating system might not see the "freed" memory.)
Maybe you would not run into any memory problem in the first place by using a more compact structure for your data.
Thus, lists of numbers are much less memory-efficient than the format used by the standard array
module or the third-party numpy
module. You would save memory by putting your vertices in a NumPy 3xN array and your triangles in an N-element array.