pandas: GroupBy .pipe() vs .apply()

后端 未结 1 654
别那么骄傲
别那么骄傲 2021-02-01 04:08

In the example from the pandas documentation about the new .pipe() method for GroupBy objects, an .apply() method accepting the same lambda would retur

1条回答
  •  再見小時候
    2021-02-01 04:58

    What pipe does is to allow you to pass a callable with the expectation that the object that called pipe is the object that gets passed to the callable.

    With apply we assume that the object that calls apply has subcomponents that will each get passed to the callable that was passed to apply. In the context of a groupby the subcomponents are slices of the dataframe that called groupby where each slice is a dataframe itself. This is analogous for a series groupby.

    The main difference between what you can do with a pipe in a groupby context is that you have available to the callable the entire scope of the the groupby object. For apply, you only know about the local slice.

    Setup
    Consider df

    df = pd.DataFrame(dict(
        A=list('XXXXYYYYYY'),
        B=range(10)
    ))
    
       A  B
    0  X  0
    1  X  1
    2  X  2
    3  X  3
    4  Y  4
    5  Y  5
    6  Y  6
    7  Y  7
    8  Y  8
    9  Y  9
    

    Example 1
    Make the entire 'B' column sum to 1 while each sub-group sums to the same amount. This requires that the calculation be aware of how many groups exist. This is something we can't do with apply because apply wouldn't know how many groups exist.

    s = df.groupby('A').B.pipe(lambda g: df.B / g.transform('sum') / g.ngroups)
    s
    
    0    0.000000
    1    0.083333
    2    0.166667
    3    0.250000
    4    0.051282
    5    0.064103
    6    0.076923
    7    0.089744
    8    0.102564
    9    0.115385
    Name: B, dtype: float64
    

    Note:

    s.sum()
    
    0.99999999999999989
    

    And:

    s.groupby(df.A).sum()
    
    A
    X    0.5
    Y    0.5
    Name: B, dtype: float64
    

    Example 2
    Subtract the mean of one group from the values of another. Again, this can't be done with apply because apply doesn't know about other groups.

    df.groupby('A').B.pipe(
        lambda g: (
            g.get_group('X') - g.get_group('Y').mean()
        ).append(
            g.get_group('Y') - g.get_group('X').mean()
        )
    )
    
    0   -6.5
    1   -5.5
    2   -4.5
    3   -3.5
    4    2.5
    5    3.5
    6    4.5
    7    5.5
    8    6.5
    9    7.5
    Name: B, dtype: float64
    

    0 讨论(0)
提交回复
热议问题