pandas GroupBy and cumulative mean of previous rows in group

前端 未结 2 1387
太阳男子
太阳男子 2021-01-13 23:12

I have a dataframe which looks like this:

pd.DataFrame({\'category\': [1,1,1,2,2,2,3,3,3,4],
              \'order_sta         


        
2条回答
  •  别那么骄傲
    2021-01-13 23:16

    "create a new column which contains the mean of the previous times of the same category" sounds like a good use case for GroupBy.expanding (and a shift):

    df['mean'] = (
        df.groupby('category')['time'].apply(lambda x: x.shift().expanding().mean()))
    df
       category  order_start  time  mean
    0         1            1     1   NaN
    1         1            2     4   1.0
    2         1            3     3   2.5
    3         2            1     6   NaN
    4         2            2     8   6.0
    5         2            3    17   7.0
    6         3            1    14   NaN
    7         3            2    12  14.0
    8         3            3    13  13.0
    9         4            1    16   NaN
    

    Another way to calculate this is without the apply (chaining two groupby calls):

    df['mean'] = (
        df.groupby('category')['time']
          .shift()
          .groupby(df['category'])
          .expanding()
          .mean()
          .to_numpy())  # replace to_numpy() with `.values` for pd.__version__ < 0.24
    df
       category  order_start  time  mean
    0         1            1     1   NaN
    1         1            2     4   1.0
    2         1            3     3   2.5
    3         2            1     6   NaN
    4         2            2     8   6.0
    5         2            3    17   7.0
    6         3            1    14   NaN
    7         3            2    12  14.0
    8         3            3    13  13.0
    9         4            1    16   NaN
    

    In terms of performance, it really depends on the number and size of your groups.

提交回复
热议问题