Pandas get topmost n records within each group

前端 未结 3 1610
无人共我
无人共我 2020-11-22 06:07

Suppose I have pandas DataFrame like this:

>>> df = pd.DataFrame({\'id\':[1,1,1,2,2,2,2,3,4],\'value\':[1,2,3,1,2,3,4,1,1]})
>>> df
   id           


        
3条回答
  •  醉酒成梦
    2020-11-22 06:46

    Did you try df.groupby('id').head(2)

    Ouput generated:

    >>> df.groupby('id').head(2)
           id  value
    id             
    1  0   1      1
       1   1      2 
    2  3   2      1
       4   2      2
    3  7   3      1
    4  8   4      1
    

    (Keep in mind that you might need to order/sort before, depending on your data)

    EDIT: As mentioned by the questioner, use df.groupby('id').head(2).reset_index(drop=True) to remove the multindex and flatten the results.

    >>> df.groupby('id').head(2).reset_index(drop=True)
        id  value
    0   1      1
    1   1      2
    2   2      1
    3   2      2
    4   3      1
    5   4      1
    

提交回复
热议问题