问题
New to Python here and trying to see if there is a more elegant solution.
I have a time series data of telematics devices that has motion indicator. I need to expand the motion indicator to +/- 1 row of the actual motion start and stop (denoted by motion2 column below). I was doing it in SQL using case statements and lead and lag window functions. Trying to convert my codes to python...
Here is the data. import pandas as pd
data = {'device':[1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2],
'time':[1,2,3,4,5,6,7,8,9,10,11,12,5,6,7,8,9,10,11,12,13,14],
'motion':[0,0,1,1,1,0,0,0,1,1,0,0,0,0,0,1,1,1,0,1,0,0]}
df = pd.DataFrame.from_dict(data)
df = df[['device','time','motion']]
##sort data chronologically for each device
df.sort_values(['device','time'], ascending = True, inplace = True)
This is how df looks like
device, time, motion
1,1,0
1,2,0
1,3,1
1,4,1
1,5,1
1,6,0
1,7,0
1,8,0
1,9,1
1,10,1
1,11,0
1,12,0
2,5,0
2,6,0
2,7,0
2,8,1
2,9,1
2,10,1
2,11,0
2,12,1
2,13,0
2,14,0
What I need is the motion2 column below added to the data frame.
device, time, motion, motion2
1,1,0,0
1,2,0,1
1,3,1,1
1,4,1,1
1,5,1,1
1,6,0,1
1,7,0,0
1,8,0,1
1,9,1,1
1,10,1,1
1,11,0,1
1,12,0,0
2,5,0,0
2,6,0,0
2,7,0,1
2,8,1,1
2,9,1,1
2,10,1,1
2,11,0,1
2,12,1,1
2,13,0,1
2,14,0,0
Below is the python code that does works. However, wondering if there is a more elegant way.
##create new columns for prior and next motion indicator
df['prev_motion'] = df.groupby(['device'])['motion'].shift(1)
df['next_motion'] = df.groupby(['device'])['motion'].shift(-1)
##create the desired motion2 indicator to expand +/- 1 record of the motion
start and stop
df['motion2'] = df[['prev_motion', 'motion', 'next_motion']].apply(lambda
row: 1 if row['motion']==1 else (1 if row['prev_motion']==1 or
row['next_motion']==1 else 0), axis=1)
##drop unwanted columns
df.drop(columns=['prev_motion', 'next_motion'], inplace = True)
This was much easier in SQL using case statement and windows functions (lead and lag).
case
when motion = 1 then 1
when motion = 0 and (lead(motion) over (partition by device order by time) = 1) then 1
when motion = 0 and (lag(motion) over (partition by device order by time) = 1) then 1
else 0
end as motion2
回答1:
This isn't necessarily the most elegant, but it works: find any point where motion
is 1
, or where motion
shifted by 1 in either direction is 1
. Here are 2 ways to do that with numpy
functions (note that numpy
functions don't require explicitely importing numpy
, as they are also built into pandas
and can be accessed via pd.np
, but see @Abhi's comment for a pure pandas
equivalent):
df['motion2'] = pd.np.where(df.motion.values|pd.np.roll(df.motion.values,1)|pd.np.roll(df.motion.values,-1),1,0)
# The following is Essentially the equivalent, but maybe a bit clearer / more efficient
df['motion2'] = pd.np.stack((df.motion.values,pd.np.roll(df.motion.values,1),pd.np.roll(df.motion.values,-1))).any(0).astype(int)
>>> df
device time motion motion2
0 1 1 0 0
1 1 2 0 1
2 1 3 1 1
3 1 4 1 1
4 1 5 1 1
5 1 6 0 1
6 1 7 0 0
7 1 8 0 1
8 1 9 1 1
9 1 10 1 1
10 1 11 0 1
11 1 12 0 0
12 2 5 0 0
13 2 6 0 0
14 2 7 0 1
15 2 8 1 1
16 2 9 1 1
17 2 10 1 1
18 2 11 0 1
19 2 12 1 1
20 2 13 0 1
21 2 14 0 0
来源:https://stackoverflow.com/questions/52902343/python-pandas-equivalent-for-sql-case-statement-using-lead-and-lag-window-functi