问题
I'm trying to select the first row of each group of a data frame.
import pandas as pd
import numpy as np
x = [{'id':"a",'val':np.nan, 'val2':-1},{'id':"a",'val':'TREE','val2':15}]
df = pd.DataFrame(x)
# id val val2
# 0 a NaN -1
# 1 a TREE 15
When I try to do this with groupby
, I get
df.groupby('id', as_index=False).first()
# id val val2
# 0 a TREE -1
The row returned to me is nowhere in the original data frame. Do I need to do something special with NaN values in columns other than the groupby columns?
回答1:
Found the following that appears to be a workaround on the Pandas github site. Uses the nth()
method
instead of first()
df.groupby('id', as_index=False).nth(0,dropna=False)
I didn't dig into it much. It seems odd that first()
would actually use the val
from a different row but I haven't actually found the documentation on first to check if this is by design.
来源:https://stackoverflow.com/questions/26108181/selecting-first-row-with-groupby-and-nan-columns