pandas dataframe groupby and get nth row

前端 未结 2 964
轻奢々
轻奢々 2020-12-06 03:03

I have a pandas DataFrame like following.

df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list(\'AAABBBBABCBDDD\'), [1.1,          


        
相关标签:
2条回答
  • 2020-12-06 03:30

    If you use apply on the groupby, the function you pass is called on each group, passed as a DataFrame. So you can do:

    df.groupby('ID').apply(lambda t: t.iloc[1])
    

    However, this will raise an error if the group doesn't have at least two rows. If you want to exclude groups with fewer than two rows, that could be trickier. I'm not aware of a way to exclude the result of apply only for certain groups. You could try filtering the group list first by removing small groups, or return a one-row nan-filled DataFrame and do dropna on the result.

    0 讨论(0)
  • 2020-12-06 03:33

    I think the nth method is supposed to do just that:

    In [10]: g = df.groupby('ID')
    In [11]: g.nth(1).dropna()
    Out[11]: 
        col1 col2  col3     col4 col5
    ID                               
    1    1.1    D   4.7    x/y/z  200
    2    3.4    B   3.8    x/u/v  404
    3    1.1    A   2.5  x/y/z/n  404
    5    2.6    B   4.6      x/y  500
    

    In 0.13 another way to do this is to use cumcount:

    df[g.cumcount() == n - 1]
    

    ...which is significantly faster.

    In [21]: %timeit g.nth(1).dropna()
    100 loops, best of 3: 11.3 ms per loop
    
    In [22]: %timeit df[g.cumcount() == 1]
    1000 loops, best of 3: 286 µs per loop
    
    0 讨论(0)
提交回复
热议问题