Group by multiple keys and summarize/average values of a list of dictionaries

后端未结

关注

 7  1227

What is the most pythonic way to group by multiple keys and summarize/average values of a list of dictionaries in Python please? Say I have a list of dictionaries as below:<

相关标签:

7条回答

栀梦

2020-11-30 01:16

Inspired by Eelco Hoogendoorn's answer. Here is another way to resolve this using Pandas package. The code is more readable.

import numpy as np
import pandas as pd

def sum_by_cusip_and_dept(data):
    df = pd.DataFrame(data)
    grouped = df.groupby(['sku', 'dept'])    
    sum = grouped.sum()
    return [{'sku': r[0], 'dept': r[1], 'qty': kv.to_dict().get('qty')} for r, kv in sum.iterrows()]

0 讨论(0)

广开言路

2020-11-30 01:17

You can put the quantities and the number of their occurrences in one big default dict:

from collections import defaultdict

counts = defaultdict(lambda: [0, 0])
for line in input_data:
    entry = counts[(line['dept'], line['sku'])]
    entry[0] += line['qty']
    entry[1] += 1

Now it is only the question to get the numbers into a list of dicts:

sums_dict = [{'dept': k[0], 'sku': k[1], 'qty': v[0]} 
              for k, v in counts.items()]
avg_dict = [{'dept': k[0], 'sku': k[1], 'avg': float(v[0]) / v[1]} for 
             k, v in counts.items()]

The results for the sums:

sums_dict

[{'dept': '002', 'qty': 600, 'sku': 'qux'},
 {'dept': '001', 'qty': 400, 'sku': 'foo'},
 {'dept': '003', 'qty': 700, 'sku': 'foo'},
 {'dept': '002', 'qty': 900, 'sku': 'baz'},
 {'dept': '001', 'qty': 200, 'sku': 'bar'}]

and for the averages:

avg_dict

[{'avg': 600.0, 'dept': '002', 'sku': 'qux'},
 {'avg': 200.0, 'dept': '001', 'sku': 'foo'},
 {'avg': 700.0, 'dept': '003', 'sku': 'foo'},
 {'avg': 450.0, 'dept': '002', 'sku': 'baz'},
 {'avg': 200.0, 'dept': '001', 'sku': 'bar'}]

An alternative version without the default dict:

counts = {}
for line in input_data:
    entry = counts.setdefault((line['dept'], line['sku']), [0, 0])
    entry[0] += line['qty']
    entry[1] += 1

The rest is the same:

sums_dict = [{'dept': k[0], 'sku': k[1], 'qty': v[0]} 
              for k, v in counts.items()]
avg_dict = [{'dept': k[0], 'sku': k[1], 'avg': float(v[0]) / v[1]} for 
             k, v in counts.items()]

0 讨论(0)

天命终不由人

2020-11-30 01:19
@thefourtheye If we use groupby only one key, we should check the type of key after group, if not a tuple, return a list.
```
for key, grp in groupby(sorted(input_data, key = grouper), grouper):
  if not isinstance(key, tuple):
    key = [key]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

天命终不由人

2020-11-30 01:29

To get the aggregated results

from itertools import groupby
from operator import itemgetter

grouper = itemgetter("dept", "sku")
result = []
for key, grp in groupby(sorted(input_data, key = grouper), grouper):
    temp_dict = dict(zip(["dept", "sku"], key))
    temp_dict["qty"] = sum(item["qty"] for item in grp)
    result.append(temp_dict)

from pprint import pprint
pprint(result)

Output

[{'dept': '001', 'qty': 200, 'sku': 'bar'},
 {'dept': '001', 'qty': 400, 'sku': 'foo'},
 {'dept': '002', 'qty': 900, 'sku': 'baz'},
 {'dept': '002', 'qty': 600, 'sku': 'qux'},
 {'dept': '003', 'qty': 700, 'sku': 'foo'}]

And to get the averages, you can simply change the contents inside the for loop, like this

temp_dict = dict(zip(["dept", "sku"], key))
temp_list = [item["qty"] for item in grp]
temp_dict["avg"] = sum(temp_list) / len(temp_list)
result.append(temp_dict)

Output

[{'avg': 200, 'dept': '001', 'sku': 'bar'},
 {'avg': 200, 'dept': '001', 'sku': 'foo'},
 {'avg': 450, 'dept': '002', 'sku': 'baz'},
 {'avg': 600, 'dept': '002', 'sku': 'qux'},
 {'avg': 700, 'dept': '003', 'sku': 'foo'}]

Suggestion: Anyway, I would have added both the qty and avg in the same dict like this

temp_dict = dict(zip(["dept", "sku"], key))
temp_list = [item["qty"] for item in grp]
temp_dict["qty"] = sum(temp_list)
temp_dict["avg"] = temp_dict["qty"] / len(temp_list)
result.append(temp_dict)

Output

[{'avg': 200, 'dept': '001', 'qty': 200, 'sku': 'bar'},
 {'avg': 200, 'dept': '001', 'qty': 400, 'sku': 'foo'},
 {'avg': 450, 'dept': '002', 'qty': 900, 'sku': 'baz'},
 {'avg': 600, 'dept': '002', 'qty': 600, 'sku': 'qux'},
 {'avg': 700, 'dept': '003', 'qty': 700, 'sku': 'foo'}]

0 讨论(0)

被撕碎了的回忆

2020-11-30 01:32

Like always there are lots of valid solutions, I like the defaultdict one, since I find it easier to understand.

from collections import defaultdict as df
food = df(lambda:df(lambda:df(int)))
for dct in input:  food[dct['transId']][dct['sku']][dct['dept']]=dct['qty']
output_tupl=[(d1,d2,sum(food[d1][d2][d3] for d3 in food[d1][d2]) )for d1 in food for d2 in food[d1]]

0 讨论(0)

春和景丽

2020-11-30 01:35
Using the numpy EP you can find here, you could write:
```
inputs = dict( (k, [i[k] for i in input ]) for k in input[0].keys())
print group_by((inputs['dept'], inputs['sku'])).mean(inputs['qty'])
```
However, you may want to consider using the pandas package if you are doing a lot of relational operations of this kind.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页