Pandas transform() vs apply()

后端 未结 2 1791

I don\'t understand why apply and transform return different dtypes when called on the same data frame. The way I explained the two functions to my

相关标签:
2条回答
  • 2021-01-04 02:09

    It looks like SeriesGroupBy.transform() tries to cast the result dtype to the same one as the original column has, but DataFrameGroupBy.transform() doesn't seem to do that:

    In [139]: df.groupby('id')['cat'].transform(lambda x: (x == 1).any())
    Out[139]:
    0    1
    1    1
    2    1
    3    1
    4    1
    5    1
    6    1
    7    0
    8    0
    9    1
    Name: cat, dtype: int64
    
    #                         v       v
    In [140]: df.groupby('id')[['cat']].transform(lambda x: (x == 1).any())
    Out[140]:
         cat
    0   True
    1   True
    2   True
    3   True
    4   True
    5   True
    6   True
    7  False
    8  False
    9   True
    
    In [141]: df.dtypes
    Out[141]:
    cat    int64
    id     int64
    dtype: object
    
    0 讨论(0)
  • 2021-01-04 02:29

    Just adding another illustrative example with sum as I find it more explicit:

    df = (
        pd.DataFrame(pd.np.random.rand(10, 3), columns=['a', 'b', 'c'])
            .assign(a=lambda df: df.a > 0.5)
    )
    
    Out[70]: 
           a         b         c
    0  False  0.126448  0.487302
    1  False  0.615451  0.735246
    2  False  0.314604  0.585689
    3  False  0.442784  0.626908
    4  False  0.706729  0.508398
    5  False  0.847688  0.300392
    6  False  0.596089  0.414652
    7  False  0.039695  0.965996
    8   True  0.489024  0.161974
    9  False  0.928978  0.332414
    
    df.groupby('a').apply(sum)  # drop rows
    
             a         b         c
    a                             
    False  0.0  4.618465  4.956997
    True   1.0  0.489024  0.161974
    
    
    df.groupby('a').transform(sum)  # keep dims
    
              b         c
    0  4.618465  4.956997
    1  4.618465  4.956997
    2  4.618465  4.956997
    3  4.618465  4.956997
    4  4.618465  4.956997
    5  4.618465  4.956997
    6  4.618465  4.956997
    7  4.618465  4.956997
    8  0.489024  0.161974
    9  4.618465  4.956997
    

    However when applied to pd.DataFrame and not pd.GroupBy object I was not able to see any difference.

    0 讨论(0)
提交回复
热议问题