It\'s taking me up to an hour to read a 1-gigabyte NetworkX graph data structure using cPickle (its 1-GB when stored on disk as a binary pickle file).
Note that the
This is ridiculous.
I have a huge ~150MB dictionary (collections.Counter
actually) that I was reading and writing using cPickle in the binary format.
Writing it took about 3 min.
I stopped reading it in at the 16 min mark, with my RAM completely choked up.
I'm now using marshal, and it takes:
write: ~3s
read: ~5s
I poked around a bit, and came across this article.
Guess I've never looked at the pickle source, but it builds an entire VM to reconstruct the dictionary?
There should be a note about performance on very large objects in the documentation IMHO.
Maybe the best thing you can do is to split the big data into smallest object smaller, let's say, than 50MB, so can be stored in ram, and recombine it.
Afaik there's no way to automatic splitting data via pickle module, so you have to do by yourself.
Anyway, another way (which is quite harder) is to use some NoSQL Database like MongoDB to store your data...