Attempting to find the 5 largest values per month using groupby

后端 未结 2 871
悲&欢浪女
悲&欢浪女 2021-01-25 13:56

I am attempting to show the top three values of nc_type for each month. I tried using n_largest but that doesn\'t do it by date.

Original Data:

相关标签:
2条回答
  • 2021-01-25 14:28

    I'd include group_keys=False

    df.groupby('occurred_date', group_keys=False).nlargest(3)
    
    occurred_date  nc_type
    1.0            f          34
                   w          24
                   z          13
    12.0           w          44
                   g          42
                   a          27
    Name: value, dtype: int64
    
    0 讨论(0)
  • 2021-01-25 14:48

    Scenario 1
    MultiIndex series

    occurred_date  nc_type
    1.0            x           3
                   y           4
                   z          13
                   w          24
                   f          34
    12.0           d          18
                   g          10
                   w          44
                   a          27
                   g          42
    Name: test, dtype: int64
    

    Call sort_values + groupby + head:

    df.sort_values(ascending=False).groupby(level=0).head(2)
    
    occurred_date  nc_type
    12.0           w          44
                   g          42
    1.0            f          34
                   w          24
    Name: test, dtype: int64
    

    Change head(2) to head(5) for your situation.

    Or, expanding upon my comment with nlargest, you could do:

    df.groupby(level=0).nlargest(2).reset_index(level=0, drop=1)
    
    occurred_date  nc_type
    1.0            f          34
                   w          24
    12.0           w          44
                   g          42
    Name: test, dtype: int64
    

    Scenario 2
    3-col dataframe

       occurred_date nc_type  value
    0            1.0       x      3
    1            1.0       y      4
    2            1.0       z     13
    3            1.0       w     24
    4            1.0       f     34
    5           12.0       d     18
    6           12.0       g     10
    7           12.0       w     44
    8           12.0       a     27
    9           12.0       g     42
    

    You can use sort_values + groupby + head:

    df.sort_values(['occurred_date', 'value'], 
            ascending=[True, False]).groupby('occurred_date').head(2)
    
       occurred_date nc_type  value
    4            1.0       f     34
    3            1.0       w     24
    7           12.0       w     44
    9           12.0       g     42
    

    Change head(2) to head(5) for your scenario.


    Scenario 3
    MultiIndex Dataframe

                           test
    occurred_date nc_type      
    1.0           x           3
                  y           4
                  z          13
                  w          24
                  f          34
    12.0          d          18
                  g          10
                  w          44
                  a          27
                  g          42
    

    Or, with nlargest.

    df.groupby(level=0).test.nlargest(2)\
                  .reset_index(level=0, drop=1)
    
    occurred_date  nc_type
    1.0            f          34
                   w          24
    12.0           w          44
                   g          42
    Name: test, dtype: int64
    
    0 讨论(0)
提交回复
热议问题