Pandas Dataframe selecting groups with minimal cardinality

后端 未结 1 1747
太阳男子
太阳男子 2021-01-22 17:24

I have a problem where I need to take groups of rows from a data frame where the number of items in a group exceeds a certain number (cutoff). For those groups, I need to take s

相关标签:
1条回答
  • 2021-01-22 18:09

    Use groupby/filter:

    >>> df.groupby('id').filter(lambda x: len(x) > cutoff)
    

    This will just return the rows of your dataframe where the size of the group is greater than your cutoff. Also, it should perform quite a bit better. I timed filter here with a dataframe with 30,039 'id' groups and a little over 4 million observations:

    In [9]: %timeit df.groupby('id').filter(lambda x: len(x) > 12)
    1 loops, best of 3: 12.6 s per loop
    
    0 讨论(0)
提交回复
热议问题