Python: storing big data structures

后端 未结 5 1164
南方客
南方客 2021-01-15 12:04

I\'m currently doing a project in python that uses dictionaries that are relatively big (around 800 MB). I tried to store one of this dictionaries by using pickle, but got a

5条回答
  •  滥情空心
    2021-01-15 12:13

    As opposed to shelf, klepto doesn't need to store the entire dict in a single file (using a single file is very slow for read-write when you only need one entry). Also, as opposed to shelf, klepto can store almost any type of python object you can put in a dictionary (you can store functions, lambdas, class instances, sockets, multiprocessing queues, whatever).

    klepto provides a dictionary abstraction for writing to a database, including treating your filesystem as a database (i.e. writing the entire dictionary to a single file, or writing each entry to it's own file). For large data, I often choose to represent the dictionary as a directory on my filesystem, and have each entry be a file. klepto also offers a variety of caching algorithms (like mru, lru, lfu, etc) to help you manage your in-memory cache, and will use the algorithm do the dump and load to the archive backend for you.

    >>> from klepto.archives import dir_archive
    >>> d = {'a':1, 'b':2, 'c':map, 'd':None}
    >>> # map a dict to a filesystem directory
    >>> demo = dir_archive('demo', d, serialized=True) 
    >>> demo['a']
    1
    >>> demo['c']
    
    >>> demo          
    dir_archive('demo', {'a': 1, 'c': , 'b': 2, 'd': None}, cached=True)
    >>> # is set to cache to memory, so use 'dump' to dump to the filesystem 
    >>> demo.dump()
    >>> del demo
    >>> 
    >>> demo = dir_archive('demo', {}, serialized=True)
    >>> demo
    dir_archive('demo', {}, cached=True)
    >>> # demo is empty, load from disk
    >>> demo.load()
    >>> demo
    dir_archive('demo', {'a': 1, 'c': , 'b': 2, 'd': None}, cached=True)
    >>> demo['c']
    
    >>> 
    

    klepto also provides the use of memory-mapped file backends, for fast read-write. There are other flags such as compression that can be used to further customize how your data is stored. It's equally easy (the same exact interface) to use a (MySQL, etc) database as a backend instead of your filesystem. You can use the flag cached=False to turn off memory caching completely, and directly read and write to and from disk or database.

    >>> from klepto.archives import dir_archive
    >>> # does not hold entries in memory, each entry will be stored on disk
    >>> demo = dir_archive('demo', {}, serialized=True, cached=False)
    >>> demo['a'] = 10
    >>> demo['b'] = 20
    >>> demo['c'] = min
    >>> demo['d'] = [1,2,3]
    

    Get klepto here: https://github.com/uqfoundation

提交回复
热议问题