How can I speed up unpickling large objects if I have plenty of RAM?

前端未结

关注

 8  853

It\'s taking me up to an hour to read a 1-gigabyte NetworkX graph data structure using cPickle (its 1-GB when stored on disk as a binary pickle file).

Note that the

相关标签:

8条回答

隐瞒了意图╮

2020-12-09 09:52

This is ridiculous.

I have a huge ~150MB dictionary (collections.Counter actually) that I was reading and writing using cPickle in the binary format.

Writing it took about 3 min.
I stopped reading it in at the 16 min mark, with my RAM completely choked up.

I'm now using marshal, and it takes: write: ~3s
read: ~5s

I poked around a bit, and came across this article.
Guess I've never looked at the pickle source, but it builds an entire VM to reconstruct the dictionary?
There should be a note about performance on very large objects in the documentation IMHO.

0 讨论(0)
发布评论:

提交评论
- 加载中...
梦如初夏

2020-12-09 09:52

Maybe the best thing you can do is to split the big data into smallest object smaller, let's say, than 50MB, so can be stored in ram, and recombine it.

Afaik there's no way to automatic splitting data via pickle module, so you have to do by yourself.

Anyway, another way (which is quite harder) is to use some NoSQL Database like MongoDB to store your data...

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2