elegant way to reduce a list of dictionaries?

前端 未结 5 1210
我在风中等你
我在风中等你 2021-01-13 02:30

I have a list of dictionaries and each dictionary contains exactly the same keys. I want to find the average value for each key and I would like to know how to do it using r

相关标签:
5条回答
  • 2021-01-13 03:05

    Here is a terrible one liner using list comprehension. You probably are better off not using this.

    final =  dict(zip(lst[0].keys(), [n/len(lst) for n in [sum(i) for i in zip(*[tuple(x1.values()) for x1 in lst])]]))
    
    for key, value in final.items():
        print (key, value)
    
    #Output
    recall 0.818703703704
    precision 0.820167289406
    f_measure 0.818495375556
    accuracy 0.79
    
    0 讨论(0)
  • 2021-01-13 03:18

    Here's another way, a little more step-by-step:

    from functools import reduce
    
    d = [
      {
        "accuracy": 0.78,
        "f_measure": 0.8169374016795885,
        "precision": 0.8192088044235794,
        "recall": 0.8172222222222223
      },
      {
        "accuracy": 0.77,
        "f_measure": 0.8159133315763016,
        "precision": 0.8174754717495807,
        "recall": 0.8161111111111111
      },
      {
        "accuracy": 0.82,
        "f_measure": 0.8226353934130455,
        "precision": 0.8238175920455686,
        "recall": 0.8227777777777778
      }
    ]
    
    key_arrays = {}
    for item in d:
      for k, v in item.items():
        key_arrays.setdefault(k, []).append(v)
    
    ave = {k: reduce(lambda x, y: x+y, v) / len(v) for k, v in key_arrays.items()}
    
    print(ave)
    # {'accuracy': 0.79, 'recall': 0.8187037037037038,
    #  'f_measure': 0.8184953755563118, 'precision': 0.820167289406243}
    
    0 讨论(0)
  • 2021-01-13 03:20

    Here you go, a solution using reduce():

    from functools import reduce  # Python 3 compatibility
    
    summed = reduce(
        lambda a, b: {k: a[k] + b[k] for k in a},
        list_of_dicts,
        dict.fromkeys(list_of_dicts[0], 0.0))
    result = {k: v / len(list_of_dicts) for k, v in summed.items()}
    

    This produces a starting point with 0.0 values from the keys of the first dictionary, then sums all values (per key) into a final dictionary. The sums are then divided to produce an average.

    Demo:

    >>> from functools import reduce
    >>> list_of_dicts = [
    ...   {
    ...     "accuracy": 0.78,
    ...     "f_measure": 0.8169374016795885,
    ...     "precision": 0.8192088044235794,
    ...     "recall": 0.8172222222222223
    ...   },
    ...   {
    ...     "accuracy": 0.77,
    ...     "f_measure": 0.8159133315763016,
    ...     "precision": 0.8174754717495807,
    ...     "recall": 0.8161111111111111
    ...   },
    ...   {
    ...     "accuracy": 0.82,
    ...     "f_measure": 0.8226353934130455,
    ...     "precision": 0.8238175920455686,
    ...     "recall": 0.8227777777777778
    ...   }, # ...
    ... ]
    >>> summed = reduce(
    ...     lambda a, b: {k: a[k] + b[k] for k in a},
    ...     list_of_dicts,
    ...     dict.fromkeys(list_of_dicts[0], 0.0))
    >>> summed
    {'recall': 2.4561111111111114, 'precision': 2.4605018682187287, 'f_measure': 2.4554861266689354, 'accuracy': 2.37}
    >>> {k: v / len(list_of_dicts) for k, v in summed.items()}
    {'recall': 0.8187037037037038, 'precision': 0.820167289406243, 'f_measure': 0.8184953755563118, 'accuracy': 0.79}
    >>> from pprint import pprint
    >>> pprint(_)
    {'accuracy': 0.79,
     'f_measure': 0.8184953755563118,
     'precision': 0.820167289406243,
     'recall': 0.8187037037037038}
    
    0 讨论(0)
  • 2021-01-13 03:22

    You could use a Counter to do the summing elegantly:

    from itertools import Counter
    
    summed = sum((Counter(d) for d in folds), Counter())
    averaged = {k: v/len(folds) for k, v in summed.items()}
    

    If you really feel like it, it can even be turned into a oneliner:

    averaged = {
        k: v/len(folds)
        for k, v in sum((Counter(d) for d in folds), Counter()).items()
    }
    

    In any case, I consider either more readable than a complicated reduce(); sum() itself is an appropriately specialized version of that.

    An even simpler oneliner that doesn't require any imports:

    averaged = {
        k: sum(d[k] for d in folds)/len(folds)
        for k in folds[0]
    }
    

    Interestingly, it's considerably faster (even than pandas?!), and also the statistic is easier to change.

    I tried replacing the manual calculation by statistics.mean() function in Python 3.5, but that makes it over 10 times slower.

    0 讨论(0)
  • 2021-01-13 03:25

    As an alternative, if you're going to be doing such calculations on data, then you may wish to use pandas (which will be overkill for a one off, but will greatly simplify such tasks...)

    import pandas as pd
    
    data = [
      {
        "accuracy": 0.78,
        "f_measure": 0.8169374016795885,
        "precision": 0.8192088044235794,
        "recall": 0.8172222222222223
      },
      {
        "accuracy": 0.77,
        "f_measure": 0.8159133315763016,
        "precision": 0.8174754717495807,
        "recall": 0.8161111111111111
      },
      {
        "accuracy": 0.82,
        "f_measure": 0.8226353934130455,
        "precision": 0.8238175920455686,
        "recall": 0.8227777777777778
      }, # ...
    ]
    
    result = pd.DataFrame.from_records(data).mean().to_dict()
    

    Which gives you:

    {'accuracy': 0.79000000000000004,
     'f_measure': 0.8184953755563118,
     'precision': 0.82016728940624295,
     'recall': 0.81870370370370382}
    
    0 讨论(0)
提交回复
热议问题