Conditional counting within groups

前端 未结 1 1435
不思量自难忘°
不思量自难忘° 2021-01-21 12:53

I wanted to do conditional counting after groupby; for example, group by values of column A, and then count within each group how often value 5

相关标签:
1条回答
  • 2021-01-21 13:04

    Select all rows where B equals 5, and then apply groupby/size:

    In [43]: df.loc[df['B']==5].groupby('A').size()
    Out[43]: 
    A
    0    1
    4    2
    dtype: int64
    

    Alternatively, you could use groupby/agg with a custom function:

    In [44]: df.groupby('A')['B'].agg(lambda ser: (ser==5).sum())
    Out[44]: 
    A
    0    1
    4    2
    Name: B, dtype: int64
    

    Note that generally speaking, using agg with a custom function will be slower than using groupby with a builtin method such as size. So prefer the first option over the second.

    In [45]: %timeit df.groupby('A')['B'].agg(lambda ser: (ser==5).sum())
    1000 loops, best of 3: 927 µs per loop
    
    In [46]: %timeit df.loc[df['B']==5].groupby('A').size()
    1000 loops, best of 3: 649 µs per loop
    

    To include A values where the size is zero, you could reindex the result:

    import pandas as pd
    df = pd.DataFrame({'A': [0, 4, 0, 4, 4, 6], 'B': [5, 10, 10, 5, 5, 10]})
    result = df.loc[df['B'] == 5].groupby('A').size()
    result = result.reindex(df['A'].unique())
    

    yields

    A
    0    1.0
    4    2.0
    6    NaN
    dtype: float64
    
    0 讨论(0)
提交回复
热议问题