问题
I have a dataframe that looks like this:
from to datetime other
-------------------------------------------------
11 1 2016-11-06 22:00:00 -
11 1 2016-11-06 20:00:00 -
11 1 2016-11-06 15:45:00 -
11 12 2016-11-06 15:00:00 -
11 1 2016-11-06 12:00:00 -
11 18 2016-11-05 10:00:00 -
11 12 2016-11-05 10:00:00 -
12 1 2016-10-05 10:00:59 -
12 3 2016-09-06 10:00:34 -
I want to groupby "from" and then "to" columns and then sort the "datetime" in descending order and then finally want to calculate the time difference within these grouped by objects between the current time and the next time. For eg, in this case, I would like to have a dataframe like the following:
from to timediff in minutes others
11 1 120
11 1 255
11 1 225
11 1 0 (preferrably subtract this date from the epoch)
11 12 300
11 12 0
11 18 0
12 1 25
12 3 0
I can't get my head around figuring this out!! Is there a way out for this? Any help will be much much appreciated!! Thank you so much in advance!
回答1:
df.assign(
timediff=df.sort_values(
'datetime', ascending=False
).groupby(['from', 'to']).datetime.diff(-1).dt.seconds.div(60).fillna(0))
回答2:
I think you need:
groupby
with apply
sort_values with diff, convert Timedelta
to minutes by seconds and floor division 60
fillna
and sort_index
, remove level 2
in index
df = df.groupby(['from','to']).datetime
.apply(lambda x: x.sort_values().diff().dt.seconds // 60)
.fillna(0)
.sort_index()
.reset_index(level=2, drop=True)
.reset_index(name='timediff in minutes')
print (df)
from to timediff in minutes
0 11 1 120.0
1 11 1 255.0
2 11 1 225.0
3 11 1 0.0
4 11 12 300.0
5 11 12 0.0
6 11 18 0.0
7 12 3 0.0
8 12 3 0.0
df = df.join(df.groupby(['from','to'])
.datetime
.apply(lambda x: x.sort_values().diff().dt.seconds // 60)
.fillna(0)
.reset_index(level=[0,1], drop=True)
.rename('timediff in minutes'))
print (df)
from to datetime other timediff in minutes
0 11 1 2016-11-06 22:00:00 - 120.0
1 11 1 2016-11-06 20:00:00 - 255.0
2 11 1 2016-11-06 15:45:00 - 225.0
3 11 12 2016-11-06 15:00:00 - 300.0
4 11 1 2016-11-06 12:00:00 - 0.0
5 11 18 2016-11-05 10:00:00 - 0.0
6 11 12 2016-11-05 10:00:00 - 0.0
7 12 3 2016-10-05 10:00:59 - 0.0
8 12 3 2016-09-06 10:00:34 - 0.0
回答3:
Almost as above, but without apply
:
result = df.sort_values(['from','to','datetime'])\
.groupby(['from','to'])['datetime']\
.diff().dt.seconds.fillna(0)
来源:https://stackoverflow.com/questions/41929772/time-difference-within-group-by-objects-in-python-pandas