Apply vs transform on a group object

前端 未结 4 1377
别跟我提以往
别跟我提以往 2020-11-22 15:04

Consider the following dataframe:

     A      B         C         D
0  foo    one  0.162003  0.087469
1  bar    one -1.156319 -1.526272
2  foo    two  0.8338         


        
4条回答
  •  伪装坚强ぢ
    2020-11-22 15:34

    I am going to use a very simple snippet to illustrate the difference:

    test = pd.DataFrame({'id':[1,2,3,1,2,3,1,2,3], 'price':[1,2,3,2,3,1,3,1,2]})
    grouping = test.groupby('id')['price']
    

    The DataFrame looks like this:

        id  price   
    0   1   1   
    1   2   2   
    2   3   3   
    3   1   2   
    4   2   3   
    5   3   1   
    6   1   3   
    7   2   1   
    8   3   2   
    

    There are 3 customer IDs in this table, each customer made three transactions and paid 1,2,3 dollars each time.

    Now, I want to find the minimum payment made by each customer. There are two ways of doing it:

    1. Using apply:

      grouping.min()

    The return looks like this:

    id
    1    1
    2    1
    3    1
    Name: price, dtype: int64
    
    pandas.core.series.Series # return type
    Int64Index([1, 2, 3], dtype='int64', name='id') #The returned Series' index
    # lenght is 3
    
    1. Using transform:

      grouping.transform(min)

    The return looks like this:

    0    1
    1    1
    2    1
    3    1
    4    1
    5    1
    6    1
    7    1
    8    1
    Name: price, dtype: int64
    
    pandas.core.series.Series # return type
    RangeIndex(start=0, stop=9, step=1) # The returned Series' index
    # length is 9    
    

    Both methods return a Series object, but the length of the first one is 3 and the length of the second one is 9.

    If you want to answer What is the minimum price paid by each customer, then the apply method is the more suitable one to choose.

    If you want to answer What is the difference between the amount paid for each transaction vs the minimum payment, then you want to use transform, because:

    test['minimum'] = grouping.transform(min) # ceates an extra column filled with minimum payment
    test.price - test.minimum # returns the difference for each row
    

    Apply does not work here simply because it returns a Series of size 3, but the original df's length is 9. You cannot integrate it back to the original df easily.

提交回复
热议问题