Using Python's pickle in Sage results in high memory usage

I am using the Python based Sage Mathematics software to create a very long list of vectors. The list contains roughly 100,000,000 elements and sys.getsizeof() tells me that it is of size a little less than 1GB.

This list I pickle into a file (which already takes a long time -- but fair enough). Only when I unpickle this list it gets annoying. The RAM usage increases from 1.15GB to 4.3GB, and I am wondering what's going on?

How can I find out in Sage what all the memory is used for? And do you have any ideas how to optimize this by maybe applying Python tricks?

This is a reply to the comment of kcrisman.

The exact code I cannot post since it would be too long. But here is a simple example where the phenomena can be observed. I am working on Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64 GNU/Linux.

Start Sage and execute:

import pickle
L = [vector([1,2,3]) for k in range(1000000)]
f = open("mylist", 'w')
pickle.dump(L, f)

On my system the list is 8697472 bytes big, and the file I pickled into has roughly 130MB. Now close Sage and watch your memory (with htop, for example). Then execute the following lines:

import pickle
f = open("mylist", 'r')
pickle.load(f)

Without sage my Linux system uses 1035MB of memory, when Sage is running the usage increases to 1131MB. After I unpickled the file it uses 2535MB which I find odd.

It's probably better to not use python's pickle module directly. cPickle is already a bit better, but a lot of pickling in sage assumes protocol 2, which (c)Pickle doesn't default to. You can use sage's own wrappers of pickle. If I do your example with

sage: open("mylist",'w').write(dumps(L))

and then load it in a fresh session via

sage: L = loads(open("mylist",'r').read())

I observe no problems.

Note that the above interface is not the best one to pickle/unpickle in sage to a file. You'd be better off using save/load. I just did it that way to stay as close as possible to your example.

来源：https://stackoverflow.com/questions/20294628/using-pythons-pickle-in-sage-results-in-high-memory-usage

标签

python

memory-management

sage