问题
Original Question
I'm stuck on the following problem. I'm trying to figure out at which moments in time and for how long a vehicle is situated at the factory. I have an excel sheet in which all events are stored which are either delivery routes or maintenance events. The ultimate goal is to obtain a dataframe in which the vehicle registration number is given with the corresponding arrival at the factory and the time spend there(including maintenance actions). For people interested, this is because I ultimately want to be able to schedule non-critical maintenance actions on the vehicles.
An example of my dataframe would be:
Registration RoutID Date Dep Loc Arr Loc Dep Time Arr Time Days
0 XC66 A58 20/May/17 Home Loc A 10:54 21:56 0
1 XC66 A59 21/May/17 Loc A Home 00:12 10:36 0
2 XC66 A345 21/May/17 Home Loc B 12:41 19:16 0
3 XC66 A346 21/May/17 Loc B Loc C 20:50 03:49 1
4 XC66 A347 22/May/17 Loc C Home 06:10 07:40 0
5 XC66 #M1 22/May/17 Home Home 10:51 13:00 0
I have created a script in which the dates and times are all processed to create the correct datetime columns for the arrival and departure datetimes. For the maintenance periods: "Dep Loc" = Home and "Arr Loc" = Home the following code is used to single out the relevant lines:
df_home = df[df["Dep Loc"].isin(["Home"])]
df_home = df_home[df_home["Arr Loc"].isin(["Home"])]
From here I can easily subtract the dates to create the duration column.
So far so good. However, I'm stuck on using calculating the other times. This because there might be intermediate stops, so the .shift() function does not work as the amount of rows to shift by is not-constant.
I have tried to search on this matter but I could only find shift solutions, or answers that are based in the internal event times, but not on the time between events.
Any guidance in the right direction would be greatly appreciated!
Regards
Attempt of the Solution
I have been stuck on this question for a while now, but shortly after posting this question I tried this solution:
for idx, loc in enumerate(df["Arr Loc"]):
if loc == "Home":
a = ((idx2, obj) for idx2, obj in enumerate(df["Dep Loc"]) if (obj == "Home" and idx2 > idx))
idx_next = next(a)
idx_next = idx_next[0]
Arrival_times = df["Arr Time"]
Departure_times = df["Dep Time"]
Duration = Arrival_times[idx] - Departure_times[idx_next]
Here I used the next function to find the next occurrence of Home as the starting location(i.e. the time the vehicle leaves the base). Subsequently I subtract the two dates to find the proper time difference.
It works for the small data set, but not still for the entire dataset.
回答1:
After filtering the relevant data rows, convert the "Arr time" & "Dep time" to timestamps according to the "Date" & "Days" columns
df_home = df[df["Dep Loc"].isin(["Home"])]
df_home = df_home[df_home["Arr Loc"].isin(["Home"])]
df_home['Dep Time']=df_home['Date']+' '+df_home['Dep Time']
df_home['Arr Time']=df_home['Date']+' '+df_home['Arr Time']
df_home['Date']=pd.to_datetime(df_home['Date'])
df_home['Dep Time']=pd.to_datetime(df_home['Dep Time'])
df_home['Arr Time']=pd.to_datetime(df_home['Arr Time'])
df_home['Dep Time']=pd.to_datetime(df_home['Dep Time'])+pd.to_timedelta(df_home['Days'], unit='d')
Finally, difference between "Dep time" & "Arr time" would give the time duration(in minutes)
df_home['diff_duration']=(df_home['Dep Time']-df_home['Arr Time']).astype('timedelta64[m]')
来源:https://stackoverflow.com/questions/45307933/how-to-calculate-time-between-events-in-a-pandas