Python Groupby with Boolean Mask

匿名 (未验证) 提交于 2019-12-03 01:41:02

问题:

I have a pandas dataframe with the following general format:

id,atr1,atr2,orig_date,fix_date 1,bolt,l,2000-01-01,nan 1,screw,l,2000-01-01,nan 1,stem,l,2000-01-01,nan 2,stem,l,2000-01-01,nan 2,screw,l,2000-01-01,nan 2,stem,l,2001-01-01,2001-01-01 3,bolt,r,2000-01-01,nan 3,stem,r,2000-01-01,nan 3,bolt,r,2001-01-01,2001-01-01 3,stem,r,2001-01-01,2001-01-01 

This result would be the following:

id,atr1,atr2,orig_date,fix_date,failed_part_ind 1,bolt,l,2000-01-01,nan,0 1,screw,l,2000-01-01,nan,0 1,stem,l,2000-01-01,nan,0 2,stem,l,2000-01-01,nan,1 2,screw,l,2000-01-01,nan,0 2,stem,l,2001-01-01,2001-01-01,0 3,bolt,r,2000-01-01,nan,1 3,stem,r,2000-01-01,nan,1 3,bolt,r,2001-01-01,2001-01-01,0 3,stem,r,2001-01-01,2001-01-01,0 

Any tips or tricks most welcome!

Update2:

A better way to describe what I need to accomplish is that in a .groupby(['id','atr1','atr2']) to create a new indicator column where the following criteria are met for records within the groups:

(df['orig_date'] < df['fix_date']) 

回答1:

I think this should work:

df['failed_part_ind'] = df.apply(lambda row: 1 if ((row['id'] == row['id']) &                                                 (row['atr1'] == row['atr1']) &                                                 (row['atr2'] == row['atr2']) &                                                 (row['orig_date'] < row['fix_date']))                                             else 0, axis=1)  

Update: I think this is what you want:

import numpy as np def f(g):     min_fix_date = g['fix_date'].min()     if np.isnan(min_fix_date):         g['failed_part_ind'] = 0     else:         g['failed_part_ind'] = g['orig_date'].apply(lambda d: 1 if d < min_fix_date else 0)     return g  df.groupby(['id', 'atr1', 'atr2']).apply(lambda g: f(g)) 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!