I wanted to do conditional counting after groupby
; for example, group by values of column A
, and then count within each group how often value 5
Select all rows where B
equals 5, and then apply groupby/size
:
In [43]: df.loc[df['B']==5].groupby('A').size()
Out[43]:
A
0 1
4 2
dtype: int64
Alternatively, you could use groupby/agg
with a custom function:
In [44]: df.groupby('A')['B'].agg(lambda ser: (ser==5).sum())
Out[44]:
A
0 1
4 2
Name: B, dtype: int64
Note that generally speaking, using agg
with a custom function will be slower than using groupby
with a builtin method such as size
. So prefer the first option over the second.
In [45]: %timeit df.groupby('A')['B'].agg(lambda ser: (ser==5).sum())
1000 loops, best of 3: 927 µs per loop
In [46]: %timeit df.loc[df['B']==5].groupby('A').size()
1000 loops, best of 3: 649 µs per loop
To include A
values where the size is zero, you could reindex the result:
import pandas as pd
df = pd.DataFrame({'A': [0, 4, 0, 4, 4, 6], 'B': [5, 10, 10, 5, 5, 10]})
result = df.loc[df['B'] == 5].groupby('A').size()
result = result.reindex(df['A'].unique())
yields
A
0 1.0
4 2.0
6 NaN
dtype: float64