Selecting first row with groupby and NaN columns

问题

I'm trying to select the first row of each group of a data frame.

import pandas as pd
import numpy as np
x = [{'id':"a",'val':np.nan, 'val2':-1},{'id':"a",'val':'TREE','val2':15}]
df = pd.DataFrame(x)

#   id   val  val2
# 0  a   NaN    -1
# 1  a  TREE    15

When I try to do this with groupby, I get

df.groupby('id', as_index=False).first()
#   id   val  val2
# 0  a  TREE    -1

The row returned to me is nowhere in the original data frame. Do I need to do something special with NaN values in columns other than the groupby columns?

回答1:

Found the following that appears to be a workaround on the Pandas github site. Uses the nth() method instead of first()

     df.groupby('id', as_index=False).nth(0,dropna=False)

I didn't dig into it much. It seems odd that first() would actually use the val from a different row but I haven't actually found the documentation on first to check if this is by design.

来源：https://stackoverflow.com/questions/26108181/selecting-first-row-with-groupby-and-nan-columns

标签

python

pandas

dataframe

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!