Python: storing big data structures

后端未结

关注

 5  1164

南方客 2021-01-15 12:04

I\'m currently doing a project in python that uses dictionaries that are relatively big (around 800 MB). I tried to store one of this dictionaries by using pickle, but got a

5条回答

滥情空心 (楼主)

2021-01-15 12:13
As opposed to shelf, klepto doesn't need to store the entire dict in a single file (using a single file is very slow for read-write when you only need one entry). Also, as opposed to shelf, klepto can store almost any type of python object you can put in a dictionary (you can store functions, lambdas, class instances, sockets, multiprocessing queues, whatever).

klepto provides a dictionary abstraction for writing to a database, including treating your filesystem as a database (i.e. writing the entire dictionary to a single file, or writing each entry to it's own file). For large data, I often choose to represent the dictionary as a directory on my filesystem, and have each entry be a file. klepto also offers a variety of caching algorithms (like mru, lru, lfu, etc) to help you manage your in-memory cache, and will use the algorithm do the dump and load to the archive backend for you.
```
>>> from klepto.archives import dir_archive
>>> d = {'a':1, 'b':2, 'c':map, 'd':None}
>>> # map a dict to a filesystem directory
>>> demo = dir_archive('demo', d, serialized=True) 
>>> demo['a']
1
>>> demo['c']

>>> demo          
dir_archive('demo', {'a': 1, 'c': , 'b': 2, 'd': None}, cached=True)
>>> # is set to cache to memory, so use 'dump' to dump to the filesystem 
>>> demo.dump()
>>> del demo
>>> 
>>> demo = dir_archive('demo', {}, serialized=True)
>>> demo
dir_archive('demo', {}, cached=True)
>>> # demo is empty, load from disk
>>> demo.load()
>>> demo
dir_archive('demo', {'a': 1, 'c': , 'b': 2, 'd': None}, cached=True)
>>> demo['c']

>>> 
```
klepto also provides the use of memory-mapped file backends, for fast read-write. There are other flags such as compression that can be used to further customize how your data is stored. It's equally easy (the same exact interface) to use a (MySQL, etc) database as a backend instead of your filesystem. You can use the flag cached=False to turn off memory caching completely, and directly read and write to and from disk or database.
```
>>> from klepto.archives import dir_archive
>>> # does not hold entries in memory, each entry will be stored on disk
>>> demo = dir_archive('demo', {}, serialized=True, cached=False)
>>> demo['a'] = 10
>>> demo['b'] = 20
>>> demo['c'] = min
>>> demo['d'] = [1,2,3]
```
Get klepto here: https://github.com/uqfoundation
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...