问题
I have the multilevel dataframe that looks like:
date_time name note value
list index
1 0 2015-05-22 05:37:59 Tom 129 False
1 2015-05-22 05:38:59 Tom 0 True
2 2015-05-22 05:39:59 Tom 0 False
3 2015-05-22 05:40:59 Tom 45 True
2 4 2015-05-22 05:37:59 Kate 129 True
5 2015-05-22 05:41:59 Kate 0 False
5 2015-05-22 05:37:59 Kate 0 True
I want iterate over the list
, and for each first row of list
check the value of column value
, and if it is False
, delete this row. So the final goal is to delete all the first rows in list
, that have False
in value
I use this code, that seems logic:
def delete_first_false():
for list, new_df in df.groupby(level=0):
for index, row in new_df.iterrows():
new_df=new_df.groupby('name').first().loc([new_df['value']!='False'])
return new_df
return df
but I have this error
AttributeError: '_LocIndexer' object has no attribute 'groupby'
could you explain me what's wrong with my method?
回答1:
Your general approach -- using loops -- rarely works the way you want in pandas.
If you have a groupby
object, you should use the apply
, agg
, filter
or transform
methods. In your case apply
is appropriate.
Your main goal is the following:
So the final goal is to delete all the first rows in (each group defined by )
list
that haveFalse
in (the)value
(column).
So let's write a simple function to do just that on a single, stand-alone dataframe:
def filter_firstrow_falses(df):
if not df['value'].iloc[0]:
return df.iloc[1:]
else:
return df
OK. Simple enough.
Now, let's apply
that to each group of your real dataframe:
import pandas
from io import StringIO
csv = StringIO("""\
list,date_time,name,note,value
1,2015-05-22 05:37:59,Tom,129,False
1,2015-05-22 05:38:59,Tom,0,True
1,2015-05-22 05:39:59,Tom,0,False
1,2015-05-22 05:40:59,Tom,45,True
2,2015-05-22 05:37:59,Kate,129,True
2,2015-05-22 05:41:59,Kate,0,False
2,2015-05-22 05:37:59,Kate,0,True
""")
df = pandas.read_csv(csv)
final = (
df.groupby(by=['list']) # create the groupby object
.apply(filter_firstrow_falses) # apply our function to each group
.reset_index(drop=True) # clean up the index
)
print(final)
list date_time name note value
0 1 2015-05-22 05:38:59 Tom 0 True
1 1 2015-05-22 05:39:59 Tom 0 False
2 1 2015-05-22 05:40:59 Tom 45 True
3 2 2015-05-22 05:37:59 Kate 129 True
4 2 2015-05-22 05:41:59 Kate 0 False
5 2 2015-05-22 05:37:59 Kate 0 True
来源:https://stackoverflow.com/questions/33505339/returning-subset-of-each-group-from-a-pandas-groupby-object