I\'m currently doing a project in python that uses dictionaries that are relatively big (around 800 MB). I tried to store one of this dictionaries by using pickle, but got a
As opposed to shelf
, klepto
doesn't need to store the entire dict in a single file (using a single file is very slow for read-write when you only need one entry). Also, as opposed to shelf
, klepto
can store almost any type of python object you can put in a dictionary (you can store functions, lambdas, class instances, sockets, multiprocessing queues, whatever).
klepto
provides a dictionary abstraction for writing to a database, including treating your filesystem as a database (i.e. writing the entire dictionary to a single file, or writing each entry to it's own file). For large data, I often choose to represent the dictionary as a directory on my filesystem, and have each entry be a file. klepto
also offers a variety of caching algorithms (like mru
, lru
, lfu
, etc) to help you manage your in-memory cache, and will use the algorithm do the dump and load to the archive backend for you.
>>> from klepto.archives import dir_archive
>>> d = {'a':1, 'b':2, 'c':map, 'd':None}
>>> # map a dict to a filesystem directory
>>> demo = dir_archive('demo', d, serialized=True)
>>> demo['a']
1
>>> demo['c']
>>> demo
dir_archive('demo', {'a': 1, 'c': , 'b': 2, 'd': None}, cached=True)
>>> # is set to cache to memory, so use 'dump' to dump to the filesystem
>>> demo.dump()
>>> del demo
>>>
>>> demo = dir_archive('demo', {}, serialized=True)
>>> demo
dir_archive('demo', {}, cached=True)
>>> # demo is empty, load from disk
>>> demo.load()
>>> demo
dir_archive('demo', {'a': 1, 'c': , 'b': 2, 'd': None}, cached=True)
>>> demo['c']
>>>
klepto
also provides the use of memory-mapped file backends, for fast read-write. There are other flags such as compression
that can be used to further customize how your data is stored. It's equally easy (the same exact interface) to use a (MySQL, etc) database as a backend instead of your filesystem. You can use the flag cached=False
to turn off memory caching completely, and directly read and write to and from disk or database.
>>> from klepto.archives import dir_archive
>>> # does not hold entries in memory, each entry will be stored on disk
>>> demo = dir_archive('demo', {}, serialized=True, cached=False)
>>> demo['a'] = 10
>>> demo['b'] = 20
>>> demo['c'] = min
>>> demo['d'] = [1,2,3]
Get klepto
here: https://github.com/uqfoundation