Time difference within group by objects in Python Pandas

半城伤御伤魂 提交于 2020-01-12 12:14:38

问题


I have a dataframe that looks like this:

from    to         datetime              other
-------------------------------------------------
11      1     2016-11-06 22:00:00          -
11      1     2016-11-06 20:00:00          -
11      1     2016-11-06 15:45:00          -
11      12    2016-11-06 15:00:00          -
11      1     2016-11-06 12:00:00          -
11      18    2016-11-05 10:00:00          -
11      12    2016-11-05 10:00:00          -
12      1     2016-10-05 10:00:59          -
12      3     2016-09-06 10:00:34          -

I want to groupby "from" and then "to" columns and then sort the "datetime" in descending order and then finally want to calculate the time difference within these grouped by objects between the current time and the next time. For eg, in this case, I would like to have a dataframe like the following:

from    to     timediff in minutes                                          others
11      1            120
11      1            255
11      1            225
11      1            0 (preferrably subtract this date from the epoch)
11      12           300
11      12           0
11      18           0
12      1            25
12      3            0

I can't get my head around figuring this out!! Is there a way out for this? Any help will be much much appreciated!! Thank you so much in advance!


回答1:


df.assign(
    timediff=df.sort_values(
        'datetime', ascending=False
    ).groupby(['from', 'to']).datetime.diff(-1).dt.seconds.div(60).fillna(0))




回答2:


I think you need:

groupby with apply sort_values with diff, convert Timedelta to minutes by seconds and floor division 60

fillna and sort_index, remove level 2 in index

df = df.groupby(['from','to']).datetime
       .apply(lambda x: x.sort_values().diff().dt.seconds // 60)
       .fillna(0)
       .sort_index()
       .reset_index(level=2, drop=True)
       .reset_index(name='timediff in minutes')

print (df)

   from  to  timediff in minutes 
0    11   1                 120.0
1    11   1                 255.0
2    11   1                 225.0
3    11   1                   0.0
4    11  12                 300.0
5    11  12                   0.0
6    11  18                   0.0
7    12   3                   0.0
8    12   3                   0.0

df = df.join(df.groupby(['from','to'])
               .datetime
               .apply(lambda x: x.sort_values().diff().dt.seconds // 60)
               .fillna(0)
               .reset_index(level=[0,1], drop=True)
               .rename('timediff in minutes'))
print (df)
   from  to            datetime other  timediff in minutes
0    11   1 2016-11-06 22:00:00     -                120.0
1    11   1 2016-11-06 20:00:00     -                255.0
2    11   1 2016-11-06 15:45:00     -                225.0
3    11  12 2016-11-06 15:00:00     -                300.0
4    11   1 2016-11-06 12:00:00     -                  0.0
5    11  18 2016-11-05 10:00:00     -                  0.0
6    11  12 2016-11-05 10:00:00     -                  0.0
7    12   3 2016-10-05 10:00:59     -                  0.0
8    12   3 2016-09-06 10:00:34     -                  0.0



回答3:


Almost as above, but without apply:

result = df.sort_values(['from','to','datetime'])\
           .groupby(['from','to'])['datetime']\
           .diff().dt.seconds.fillna(0)


来源:https://stackoverflow.com/questions/41929772/time-difference-within-group-by-objects-in-python-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!