How to groupby multiple columns in pandas DataFrame in pct_change calculation

后端 未结 1 2038
醉梦人生
醉梦人生 2021-01-24 13:47

I am applying a pct_change calculation to a pandas dataframe. Everything works fine when the month column is ordered. When it is not the calculation comes out incorrect.

<
1条回答
  •  醉梦人生
    2021-01-24 14:07

    So i think the issue you have is that the groupby is calculating the percentage difference between adjacent rows of identical prod_desc and this isn't ordered in date order when you perform the operation so moving the sort above the groupby will fix that issue. You can also remove the for loop and write that as one line using pandas.

    import pandas as pd 
    
    data = [
    ('product_a','1/31/2014',53)
    ,('product_b','1/31/2014',44)
    ,('product_c','1/31/2014',36)
    ,('product_a','11/30/2013',52)
    ,('product_b','11/30/2013',43)
    ,('product_c','11/30/2013',35)
    ,('product_a','3/31/2014',50)
    ,('product_b','3/31/2014',41)
    ,('product_c','3/31/2014',34)
    ,('product_a','12/31/2013',50)
    ,('product_b','12/31/2013',41)
    ,('product_c','12/31/2013',34)
    ,('product_a','2/28/2014',52)
    ,('product_b','2/28/2014',43)
    ,('product_c','2/28/2014',35)
    ]
    
    product_df = pd.DataFrame( data, columns=['prod_desc','activity_month','prod_count'])
    
    product_df['activity_month'] = pd.to_datetime(product_df['activity_month'],
     format='%m/%d/%Y')
    
    product_df = product_df.sort_values(['prod_desc','activity_month'])
    product_df['pct_ch'] = product_df.groupby('prod_desc')['prod_count'].pct_change()
    

    I think this should produce the answer you want.

        prod_desc activity_month  prod_count    pct_ch
    3   product_a     2013-11-30          52       NaN
    9   product_a     2013-12-31          50 -0.038462
    0   product_a     2014-01-31          53  0.060000
    12  product_a     2014-02-28          52 -0.018868
    6   product_a     2014-03-31          50 -0.038462
    4   product_b     2013-11-30          43       NaN
    10  product_b     2013-12-31          41 -0.046512
    1   product_b     2014-01-31          44  0.073171
    13  product_b     2014-02-28          43 -0.022727
    7   product_b     2014-03-31          41 -0.046512
    5   product_c     2013-11-30          35       NaN
    11  product_c     2013-12-31          34 -0.028571
    2   product_c     2014-01-31          36  0.058824
    14  product_c     2014-02-28          35 -0.027778
    8   product_c     2014-03-31          34 -0.028571
    

    0 讨论(0)
提交回复
热议问题