I have a pandas dataframe with the following general format:
id,atr1,atr2,orig_date,fix_date 1,bolt,l,2000-01-01,nan 1,screw,l,2000-01-01,nan 1,stem,l,2000-01-01,nan 2,stem,l,2000-01-01,nan 2,screw,l,2000-01-01,nan 2,stem,l,2001-01-01,2001-01-01 3,bolt,r,2000-01-01,nan 3,stem,r,2000-01-01,nan 3,bolt,r,2001-01-01,2001-01-01 3,stem,r,2001-01-01,2001-01-01
This result would be the following:
id,atr1,atr2,orig_date,fix_date,failed_part_ind 1,bolt,l,2000-01-01,nan,0 1,screw,l,2000-01-01,nan,0 1,stem,l,2000-01-01,nan,0 2,stem,l,2000-01-01,nan,1 2,screw,l,2000-01-01,nan,0 2,stem,l,2001-01-01,2001-01-01,0 3,bolt,r,2000-01-01,nan,1 3,stem,r,2000-01-01,nan,1 3,bolt,r,2001-01-01,2001-01-01,0 3,stem,r,2001-01-01,2001-01-01,0
Any tips or tricks most welcome!
Update2:
A better way to describe what I need to accomplish is that in a .groupby(['id','atr1','atr2'])
to create a new indicator column where the following criteria are met for records within the groups:
(df['orig_date'] < df['fix_date'])