How to split dictionary into multiple dictionaries fast

前端 未结 4 385
清酒与你
清酒与你 2020-12-01 09:40

I have found a solution but it is really slow:

def chunks(self,data, SIZE=10000):
    for i in xrange(0, len(data), SIZE):
        yield dict(data.items()[i:         


        
相关标签:
4条回答
  • 2020-12-01 10:01

    Another method is iterators zipping:

    >>> from itertools import izip_longest, ifilter
    >>> d = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5, 'f':6, 'g':7, 'h':8}
    

    Create a list with copies of dict iterators (number of copies is number of elements in result dicts). By passing each iterator from chunks list to izip_longest you will get needed number of elements from source dict (ifilter used to remove None from zip results). With generator expression you can lower memory usage:

    >>> chunks = [d.iteritems()]*3
    >>> g = (dict(ifilter(None, v)) for v in izip_longest(*chunks))
    >>> list(g)
    [{'a': 1, 'c': 3, 'b': 2},
     {'e': 5, 'd': 4, 'g': 7},
     {'h': 8, 'f': 6}]
    
    0 讨论(0)
  • 2020-12-01 10:13

    This code takes a large dictionary and splits it into a list of small dictionaries. max_limit variable is to tell maximum number of key-value pairs allowed in a sub-dictionary. This code doesn't take much effort to break the dictionary, just one complete parsing over the dictionary object.

    import copy
    def split_dict_to_multiple(input_dict, max_limit=200):
    """Splits dict into multiple dicts with given maximum size. 
    Returns a list of dictionaries."""
    chunks = []
    curr_dict ={}
    for k, v in input_dict.items():
        if len(curr_dict.keys()) < max_limit:
            curr_dict.update({k: v})
        else:
            chunks.append(copy.deepcopy(curr_dict))
            curr_dict = {k: v}
    # update last curr_dict
    chunks.append(curr_dict)
    return chunks
    
    0 讨论(0)
  • 2020-12-01 10:20

    Since the dictionary is so big, it would be better to keep all the items involved to be just iterators and generators, like this

    from itertools import islice
    
    def chunks(data, SIZE=10000):
        it = iter(data)
        for i in xrange(0, len(data), SIZE):
            yield {k:data[k] for k in islice(it, SIZE)}
    

    Sample run:

    for item in chunks({i:i for i in xrange(10)}, 3):
        print item
    

    Output

    {0: 0, 1: 1, 2: 2}
    {3: 3, 4: 4, 5: 5}
    {8: 8, 6: 6, 7: 7}
    {9: 9}
    
    0 讨论(0)
  • 2020-12-01 10:20
    import numpy as np
    chunk_size = 3
    chunked_data = [[k, v] for k, v in d.items()]
    chunked_data = np.array_split(chunked_data, chunk_size)
    

    Afterwards you have ndarray which is iterable like this:

    for chunk in chunked_data:
        for key, value in chunk:
            print(key)
            print(value)
    

    Which could be re-assigned to a list of dicts using a simple for loop.

    0 讨论(0)
提交回复
热议问题