How to calculate time between events in a pandas

|▌冷眼眸甩不掉的悲伤 提交于 2020-01-22 09:58:05

问题


Original Question

I'm stuck on the following problem. I'm trying to figure out at which moments in time and for how long a vehicle is situated at the factory. I have an excel sheet in which all events are stored which are either delivery routes or maintenance events. The ultimate goal is to obtain a dataframe in which the vehicle registration number is given with the corresponding arrival at the factory and the time spend there(including maintenance actions). For people interested, this is because I ultimately want to be able to schedule non-critical maintenance actions on the vehicles.

An example of my dataframe would be:

  Registration RoutID       Date Dep Loc Arr Loc Dep Time Arr Time  Days
0         XC66    A58  20/May/17    Home   Loc A    10:54    21:56     0
1         XC66    A59  21/May/17   Loc A    Home    00:12    10:36     0
2         XC66   A345  21/May/17   Home    Loc B    12:41    19:16     0
3         XC66   A346  21/May/17   Loc B   Loc C    20:50    03:49     1
4         XC66   A347  22/May/17   Loc C    Home    06:10    07:40     0
5         XC66    #M1  22/May/17    Home    Home    10:51    13:00     0

I have created a script in which the dates and times are all processed to create the correct datetime columns for the arrival and departure datetimes. For the maintenance periods: "Dep Loc" = Home and "Arr Loc" = Home the following code is used to single out the relevant lines:

df_home = df[df["Dep Loc"].isin(["Home"])]
df_home = df_home[df_home["Arr Loc"].isin(["Home"])]

From here I can easily subtract the dates to create the duration column.

So far so good. However, I'm stuck on using calculating the other times. This because there might be intermediate stops, so the .shift() function does not work as the amount of rows to shift by is not-constant.

I have tried to search on this matter but I could only find shift solutions, or answers that are based in the internal event times, but not on the time between events.

Any guidance in the right direction would be greatly appreciated!

Regards

Attempt of the Solution

I have been stuck on this question for a while now, but shortly after posting this question I tried this solution:

for idx, loc in enumerate(df["Arr Loc"]):
    if loc == "Home":
        a = ((idx2, obj) for idx2, obj in enumerate(df["Dep Loc"]) if (obj == "Home" and idx2 > idx))
        idx_next = next(a)
        idx_next = idx_next[0]

        Arrival_times = df["Arr Time"]
        Departure_times = df["Dep Time"]

        Duration = Arrival_times[idx] - Departure_times[idx_next]

Here I used the next function to find the next occurrence of Home as the starting location(i.e. the time the vehicle leaves the base). Subsequently I subtract the two dates to find the proper time difference.

It works for the small data set, but not still for the entire dataset.


回答1:


After filtering the relevant data rows, convert the "Arr time" & "Dep time" to timestamps according to the "Date" & "Days" columns

df_home = df[df["Dep Loc"].isin(["Home"])]
df_home = df_home[df_home["Arr Loc"].isin(["Home"])]

df_home['Dep Time']=df_home['Date']+' '+df_home['Dep Time'] 

df_home['Arr Time']=df_home['Date']+' '+df_home['Arr Time'] 

df_home['Date']=pd.to_datetime(df_home['Date'])

df_home['Dep Time']=pd.to_datetime(df_home['Dep Time'])
df_home['Arr Time']=pd.to_datetime(df_home['Arr Time'])
df_home['Dep Time']=pd.to_datetime(df_home['Dep Time'])+pd.to_timedelta(df_home['Days'], unit='d')

Finally, difference between "Dep time" & "Arr time" would give the time duration(in minutes)

df_home['diff_duration']=(df_home['Dep Time']-df_home['Arr Time']).astype('timedelta64[m]')


来源:https://stackoverflow.com/questions/45307933/how-to-calculate-time-between-events-in-a-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!