Edit: the rookie mistake I made in string np.nan
having pointed out by @coldspeed, @wen-ben, @ALollz. Answers are quite good, so I don\'t d
Here is the different, you need to make the np.nan
to NaN
, in your original df it is string
, after convert it , you will see the different
df=df.mask(df=='np.nan')
df.groupby('A', as_index=False).head(1) #df.groupby('A', as_index=False).nth(0)
Out[8]:
A B
0 1 NaN
3 2 8
df.groupby('A', as_index=False).first()
# the reason why first have the index reset,
#since it will have chance select the value from different row within the group,
#when the first item is NaN it will skip it to find the first not null value
#rather than from the same row,
#If still keep the original row index will be misleading.
Out[9]:
A B
0 1 4
1 2 8
The major issue is that you likely have the string 'np.nan'
stored and not a real null value. Here are how the three handle null
values differently:
import pandas as pd
df = pd.DataFrame({'A': [1,1,2,2,3,3], 'B': [None, '1', np.NaN, '2', 3, 4]})
first
This will return the first non-null value within each group. Oddly enough it will not skip None
, though this can be made possible with the kwarg dropna=True
. As a result, you may return values for columns that were part of different rows originally:
df.groupby('A', as_index=False).first()
# A B
#0 1 None
#1 2 2
#2 3 3
df.groupby('A', as_index=False).first(dropna=True)
# A B
#0 1 1
#1 2 2
#2 3 3
head(n)
Returns the top n rows within a group. Values remain bound within rows. If you give it an n
that is more than the number of rows, it returns all rows in that group without complaining:
df.groupby('A', as_index=False).head(1)
# A B
#0 1 None
#2 2 NaN
#4 3 3
df.groupby('A', as_index=False).head(200)
# A B
#0 1 None
#1 1 1
#2 2 NaN
#3 2 2
#4 3 3
#5 3 4
nth
:This takes the nth
row, so again values remain bound within the row. .nth(0)
is the same as .head(1)
, though they have different uses. For instance, if you need the 0th and 2nd row, that's difficult to do with .head()
, but easy with .nth([0,2])
. Also it's fair easier to write .head(10)
than .nth(list(range(10))))
.
df.groupby('A', as_index=False).nth(0)
# A B
#0 1 None
#2 2 NaN
#4 3 3
nth
also supports dropping rows with any null-values, so you can use it to return the first row without any null-values, unlike .head()
df.groupby('A', as_index=False).nth(0, dropna='any')
# A B
#A
#1 1 1
#2 2 2
#3 3 3