Pandas Dataframe selecting groups with minimal cardinality

后端未结

关注

 1  1748

I have a problem where I need to take groups of rows from a data frame where the number of items in a group exceeds a certain number (cutoff). For those groups, I need to take s

相关标签:

1条回答

醉酒成梦

2021-01-22 18:09
Use groupby/filter:
```
>>> df.groupby('id').filter(lambda x: len(x) > cutoff)
```
This will just return the rows of your dataframe where the size of the group is greater than your cutoff. Also, it should perform quite a bit better. I timed filter here with a dataframe with 30,039 'id' groups and a little over 4 million observations:
```
In [9]: %timeit df.groupby('id').filter(lambda x: len(x) > 12)
1 loops, best of 3: 12.6 s per loop
```
0 讨论(0)
发布评论:

提交评论
- 加载中...