I have a list of dictionaries and each dictionary contains exactly the same keys. I want to find the average value for each key and I would like to know how to do it using r
Here is a terrible one liner using list comprehension. You probably are better off not using this.
final = dict(zip(lst[0].keys(), [n/len(lst) for n in [sum(i) for i in zip(*[tuple(x1.values()) for x1 in lst])]]))
for key, value in final.items():
print (key, value)
#Output
recall 0.818703703704
precision 0.820167289406
f_measure 0.818495375556
accuracy 0.79
Here's another way, a little more step-by-step:
from functools import reduce
d = [
{
"accuracy": 0.78,
"f_measure": 0.8169374016795885,
"precision": 0.8192088044235794,
"recall": 0.8172222222222223
},
{
"accuracy": 0.77,
"f_measure": 0.8159133315763016,
"precision": 0.8174754717495807,
"recall": 0.8161111111111111
},
{
"accuracy": 0.82,
"f_measure": 0.8226353934130455,
"precision": 0.8238175920455686,
"recall": 0.8227777777777778
}
]
key_arrays = {}
for item in d:
for k, v in item.items():
key_arrays.setdefault(k, []).append(v)
ave = {k: reduce(lambda x, y: x+y, v) / len(v) for k, v in key_arrays.items()}
print(ave)
# {'accuracy': 0.79, 'recall': 0.8187037037037038,
# 'f_measure': 0.8184953755563118, 'precision': 0.820167289406243}
Here you go, a solution using reduce()
:
from functools import reduce # Python 3 compatibility
summed = reduce(
lambda a, b: {k: a[k] + b[k] for k in a},
list_of_dicts,
dict.fromkeys(list_of_dicts[0], 0.0))
result = {k: v / len(list_of_dicts) for k, v in summed.items()}
This produces a starting point with 0.0
values from the keys of the first dictionary, then sums all values (per key) into a final dictionary. The sums are then divided to produce an average.
Demo:
>>> from functools import reduce
>>> list_of_dicts = [
... {
... "accuracy": 0.78,
... "f_measure": 0.8169374016795885,
... "precision": 0.8192088044235794,
... "recall": 0.8172222222222223
... },
... {
... "accuracy": 0.77,
... "f_measure": 0.8159133315763016,
... "precision": 0.8174754717495807,
... "recall": 0.8161111111111111
... },
... {
... "accuracy": 0.82,
... "f_measure": 0.8226353934130455,
... "precision": 0.8238175920455686,
... "recall": 0.8227777777777778
... }, # ...
... ]
>>> summed = reduce(
... lambda a, b: {k: a[k] + b[k] for k in a},
... list_of_dicts,
... dict.fromkeys(list_of_dicts[0], 0.0))
>>> summed
{'recall': 2.4561111111111114, 'precision': 2.4605018682187287, 'f_measure': 2.4554861266689354, 'accuracy': 2.37}
>>> {k: v / len(list_of_dicts) for k, v in summed.items()}
{'recall': 0.8187037037037038, 'precision': 0.820167289406243, 'f_measure': 0.8184953755563118, 'accuracy': 0.79}
>>> from pprint import pprint
>>> pprint(_)
{'accuracy': 0.79,
'f_measure': 0.8184953755563118,
'precision': 0.820167289406243,
'recall': 0.8187037037037038}
You could use a Counter
to do the summing elegantly:
from itertools import Counter
summed = sum((Counter(d) for d in folds), Counter())
averaged = {k: v/len(folds) for k, v in summed.items()}
If you really feel like it, it can even be turned into a oneliner:
averaged = {
k: v/len(folds)
for k, v in sum((Counter(d) for d in folds), Counter()).items()
}
In any case, I consider either more readable than a complicated reduce()
; sum()
itself is an appropriately specialized version of that.
An even simpler oneliner that doesn't require any imports:
averaged = {
k: sum(d[k] for d in folds)/len(folds)
for k in folds[0]
}
Interestingly, it's considerably faster (even than pandas
?!), and also the statistic is easier to change.
I tried replacing the manual calculation by statistics.mean()
function in Python 3.5, but that makes it over 10 times slower.
As an alternative, if you're going to be doing such calculations on data, then you may wish to use pandas (which will be overkill for a one off, but will greatly simplify such tasks...)
import pandas as pd
data = [
{
"accuracy": 0.78,
"f_measure": 0.8169374016795885,
"precision": 0.8192088044235794,
"recall": 0.8172222222222223
},
{
"accuracy": 0.77,
"f_measure": 0.8159133315763016,
"precision": 0.8174754717495807,
"recall": 0.8161111111111111
},
{
"accuracy": 0.82,
"f_measure": 0.8226353934130455,
"precision": 0.8238175920455686,
"recall": 0.8227777777777778
}, # ...
]
result = pd.DataFrame.from_records(data).mean().to_dict()
Which gives you:
{'accuracy': 0.79000000000000004,
'f_measure': 0.8184953755563118,
'precision': 0.82016728940624295,
'recall': 0.81870370370370382}