Find days since last event pandas dataframe

后端 未结 1 1221
灰色年华
灰色年华 2021-01-01 02:39

I have a pandas data frame:

df12 = pd.DataFrame({\'group_ids\':[1,1,1,2,2,2],\'dates\':[\'2016-04-01\',\'2016-04-20\',\'2016-04-28\',\'2016-04-05\',\'2016-04         


        
相关标签:
1条回答
  • 2021-01-01 03:26

    As I mentioned earlier, this will get you the non-cumulative difference between dates within each group:

    df['days_since_last_event'] = df.groupby('group_ids')['dates'].diff().apply(lambda x: x.days)
    

    In order to get a cumulative sum of this difference, based on whenever event_today_in_group changes, I propose using shift to get the value of the previous row, and then generating a cumulative sum, like so:

    df['event_today_in_group'].shift().cumsum()
    

    Output:

    0    NaN
    1    1.0
    2    1.0
    3    2.0
    4    3.0
    5    4.0
    

    This gives us the second grouping value we need to get the cumulative sums. You could assign the above values to a new column, but if you're only using them for the calculation, then you can simply include them in the subsequent groupby operation like so:

    df.loc[:, 'days_since_last_event'] = df.groupby(['group_ids', df['event_today_in_group'].shift().cumsum()])['days_since_last_event'].cumsum()
    

    Result:

       group_ids      dates  event_today_in_group  days_since_last_event
    0          1 2016-04-01                     1                    NaN
    1          1 2016-04-20                     0                   19.0
    2          1 2016-04-28                     1                   27.0
    3          2 2016-04-05                     1                    NaN
    4          2 2016-04-20                     1                   15.0
    5          2 2016-04-29                     0                    9.0
    
    0 讨论(0)
提交回复
热议问题