I have a pandas DataFrame like following.
df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list(\'AAABBBBABCBDDD\'), [1.1,
If you use apply
on the groupby, the function you pass is called on each group, passed as a DataFrame. So you can do:
df.groupby('ID').apply(lambda t: t.iloc[1])
However, this will raise an error if the group doesn't have at least two rows. If you want to exclude groups with fewer than two rows, that could be trickier. I'm not aware of a way to exclude the result of apply
only for certain groups. You could try filtering the group list first by removing small groups, or return a one-row nan
-filled DataFrame and do dropna
on the result.
I think the nth method is supposed to do just that:
In [10]: g = df.groupby('ID')
In [11]: g.nth(1).dropna()
Out[11]:
col1 col2 col3 col4 col5
ID
1 1.1 D 4.7 x/y/z 200
2 3.4 B 3.8 x/u/v 404
3 1.1 A 2.5 x/y/z/n 404
5 2.6 B 4.6 x/y 500
In 0.13 another way to do this is to use cumcount:
df[g.cumcount() == n - 1]
...which is significantly faster.
In [21]: %timeit g.nth(1).dropna()
100 loops, best of 3: 11.3 ms per loop
In [22]: %timeit df[g.cumcount() == 1]
1000 loops, best of 3: 286 µs per loop