I don\'t understand why apply
and transform
return different dtypes when called on the same data frame. The way I explained the two functions to my
It looks like SeriesGroupBy.transform()
tries to cast the result dtype to the same one as the original column has, but DataFrameGroupBy.transform()
doesn't seem to do that:
In [139]: df.groupby('id')['cat'].transform(lambda x: (x == 1).any())
Out[139]:
0 1
1 1
2 1
3 1
4 1
5 1
6 1
7 0
8 0
9 1
Name: cat, dtype: int64
# v v
In [140]: df.groupby('id')[['cat']].transform(lambda x: (x == 1).any())
Out[140]:
cat
0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 False
8 False
9 True
In [141]: df.dtypes
Out[141]:
cat int64
id int64
dtype: object
Just adding another illustrative example with sum as I find it more explicit:
df = (
pd.DataFrame(pd.np.random.rand(10, 3), columns=['a', 'b', 'c'])
.assign(a=lambda df: df.a > 0.5)
)
Out[70]:
a b c
0 False 0.126448 0.487302
1 False 0.615451 0.735246
2 False 0.314604 0.585689
3 False 0.442784 0.626908
4 False 0.706729 0.508398
5 False 0.847688 0.300392
6 False 0.596089 0.414652
7 False 0.039695 0.965996
8 True 0.489024 0.161974
9 False 0.928978 0.332414
df.groupby('a').apply(sum) # drop rows
a b c
a
False 0.0 4.618465 4.956997
True 1.0 0.489024 0.161974
df.groupby('a').transform(sum) # keep dims
b c
0 4.618465 4.956997
1 4.618465 4.956997
2 4.618465 4.956997
3 4.618465 4.956997
4 4.618465 4.956997
5 4.618465 4.956997
6 4.618465 4.956997
7 4.618465 4.956997
8 0.489024 0.161974
9 4.618465 4.956997
However when applied to pd.DataFrame
and not pd.GroupBy
object I was not able to see any difference.