elegant way to reduce a list of dictionaries?

不羁的心 提交于 2020-06-08 07:51:31


I have a list of dictionaries and each dictionary contains exactly the same keys. I want to find the average value for each key and I would like to know how to do it using reduce (or if not possible with another more elegant way than using nested fors).

Here is the list:

    "accuracy": 0.78,
    "f_measure": 0.8169374016795885,
    "precision": 0.8192088044235794,
    "recall": 0.8172222222222223
    "accuracy": 0.77,
    "f_measure": 0.8159133315763016,
    "precision": 0.8174754717495807,
    "recall": 0.8161111111111111
    "accuracy": 0.82,
    "f_measure": 0.8226353934130455,
    "precision": 0.8238175920455686,
    "recall": 0.8227777777777778
  }, ...

I would like to get back I dictionary like this:

  "accuracy": 0.81,
  "f_measure": 0.83,
  "precision": 0.84,
  "recall": 0.83

Here is what I had so far, but I don't like it:

folds = [ ... ]

keys = folds[0].keys()
results = dict.fromkeys(keys, 0)

for fold in folds:
    for k in keys:
        results[k] += fold[k] / len(folds)



As an alternative, if you're going to be doing such calculations on data, then you may wish to use pandas (which will be overkill for a one off, but will greatly simplify such tasks...)

import pandas as pd

data = [
    "accuracy": 0.78,
    "f_measure": 0.8169374016795885,
    "precision": 0.8192088044235794,
    "recall": 0.8172222222222223
    "accuracy": 0.77,
    "f_measure": 0.8159133315763016,
    "precision": 0.8174754717495807,
    "recall": 0.8161111111111111
    "accuracy": 0.82,
    "f_measure": 0.8226353934130455,
    "precision": 0.8238175920455686,
    "recall": 0.8227777777777778
  }, # ...

result = pd.DataFrame.from_records(data).mean().to_dict()

Which gives you:

{'accuracy': 0.79000000000000004,
 'f_measure': 0.8184953755563118,
 'precision': 0.82016728940624295,
 'recall': 0.81870370370370382}


Here you go, a solution using reduce():

from functools import reduce  # Python 3 compatibility

summed = reduce(
    lambda a, b: {k: a[k] + b[k] for k in a},
    dict.fromkeys(list_of_dicts[0], 0.0))
result = {k: v / len(list_of_dicts) for k, v in summed.items()}

This produces a starting point with 0.0 values from the keys of the first dictionary, then sums all values (per key) into a final dictionary. The sums are then divided to produce an average.


>>> from functools import reduce
>>> list_of_dicts = [
...   {
...     "accuracy": 0.78,
...     "f_measure": 0.8169374016795885,
...     "precision": 0.8192088044235794,
...     "recall": 0.8172222222222223
...   },
...   {
...     "accuracy": 0.77,
...     "f_measure": 0.8159133315763016,
...     "precision": 0.8174754717495807,
...     "recall": 0.8161111111111111
...   },
...   {
...     "accuracy": 0.82,
...     "f_measure": 0.8226353934130455,
...     "precision": 0.8238175920455686,
...     "recall": 0.8227777777777778
...   }, # ...
... ]
>>> summed = reduce(
...     lambda a, b: {k: a[k] + b[k] for k in a},
...     list_of_dicts,
...     dict.fromkeys(list_of_dicts[0], 0.0))
>>> summed
{'recall': 2.4561111111111114, 'precision': 2.4605018682187287, 'f_measure': 2.4554861266689354, 'accuracy': 2.37}
>>> {k: v / len(list_of_dicts) for k, v in summed.items()}
{'recall': 0.8187037037037038, 'precision': 0.820167289406243, 'f_measure': 0.8184953755563118, 'accuracy': 0.79}
>>> from pprint import pprint
>>> pprint(_)
{'accuracy': 0.79,
 'f_measure': 0.8184953755563118,
 'precision': 0.820167289406243,
 'recall': 0.8187037037037038}


You could use a Counter to do the summing elegantly:

from itertools import Counter

summed = sum((Counter(d) for d in folds), Counter())
averaged = {k: v/len(folds) for k, v in summed.items()}

If you really feel like it, it can even be turned into a oneliner:

averaged = {
    k: v/len(folds)
    for k, v in sum((Counter(d) for d in folds), Counter()).items()

In any case, I consider either more readable than a complicated reduce(); sum() itself is an appropriately specialized version of that.

An even simpler oneliner that doesn't require any imports:

averaged = {
    k: sum(d[k] for d in folds)/len(folds)
    for k in folds[0]

Interestingly, it's considerably faster (even than pandas?!), and also the statistic is easier to change.

I tried replacing the manual calculation by statistics.mean() function in Python 3.5, but that makes it over 10 times slower.


Here is a terrible one liner using list comprehension. You probably are better off not using this.

final =  dict(zip(lst[0].keys(), [n/len(lst) for n in [sum(i) for i in zip(*[tuple(x1.values()) for x1 in lst])]]))

for key, value in final.items():
    print (key, value)

recall 0.818703703704
precision 0.820167289406
f_measure 0.818495375556
accuracy 0.79


Here's another way, a little more step-by-step:

from functools import reduce

d = [
    "accuracy": 0.78,
    "f_measure": 0.8169374016795885,
    "precision": 0.8192088044235794,
    "recall": 0.8172222222222223
    "accuracy": 0.77,
    "f_measure": 0.8159133315763016,
    "precision": 0.8174754717495807,
    "recall": 0.8161111111111111
    "accuracy": 0.82,
    "f_measure": 0.8226353934130455,
    "precision": 0.8238175920455686,
    "recall": 0.8227777777777778

key_arrays = {}
for item in d:
  for k, v in item.items():
    key_arrays.setdefault(k, []).append(v)

ave = {k: reduce(lambda x, y: x+y, v) / len(v) for k, v in key_arrays.items()}

# {'accuracy': 0.79, 'recall': 0.8187037037037038,
#  'f_measure': 0.8184953755563118, 'precision': 0.820167289406243}

