How can I speed up unpickling large objects if I have plenty of RAM?

前端 未结 8 853
别那么骄傲
别那么骄傲 2020-12-09 09:07

It\'s taking me up to an hour to read a 1-gigabyte NetworkX graph data structure using cPickle (its 1-GB when stored on disk as a binary pickle file).

Note that the

相关标签:
8条回答
  • 2020-12-09 09:52

    This is ridiculous.

    I have a huge ~150MB dictionary (collections.Counter actually) that I was reading and writing using cPickle in the binary format.

    Writing it took about 3 min.
    I stopped reading it in at the 16 min mark, with my RAM completely choked up.

    I'm now using marshal, and it takes: write: ~3s
    read: ~5s

    I poked around a bit, and came across this article.
    Guess I've never looked at the pickle source, but it builds an entire VM to reconstruct the dictionary?
    There should be a note about performance on very large objects in the documentation IMHO.

    0 讨论(0)
  • 2020-12-09 09:52

    Maybe the best thing you can do is to split the big data into smallest object smaller, let's say, than 50MB, so can be stored in ram, and recombine it.

    Afaik there's no way to automatic splitting data via pickle module, so you have to do by yourself.

    Anyway, another way (which is quite harder) is to use some NoSQL Database like MongoDB to store your data...

    0 讨论(0)
提交回复
热议问题