select largest N of a column of each groupby group using pandas

前端 未结 1 765
谎友^
谎友^ 2020-12-19 18:37

My df:

{\'city1\': {0: \'Chicago\',
  1: \'Chicago\',
  2: \'Chicago\',
  3: \'Chicago\',
  4: \'Miami\',
  5: \'Houston\',
  6: \'Austin\'},
 \'city2\': {0:         


        
相关标签:
1条回答
  • 2020-12-19 19:16

    I added an issue here

    Theory:

    When the results of a groupby on a pd.Series returns the same pd.Series values, then the original index is returned.

    Boiled down example

    df = pd.DataFrame(dict(A=[0, 1, 2, 3]))
    
    # returns results identical to df.A
    print(df.groupby(df.A // 2).A.nsmallest(2))
    
    # returns results out of order
    print(df.groupby(df.A // 2).A.nlargest(2))
    
    0    0
    1    1
    2    2
    3    3
    Name: A, dtype: int64
    A   
    0  1    1
       0    0
    1  3    3
       2    2
    Name: A, dtype: int64
    

    I'd argue that you want these to return the same consistent index.

    This is the most egregious consequence of this:

    # most egregious
    # this will be randomly different
    print(df.groupby(df.A // 2).A.apply(pd.Series.sample, n=2))
    

    returns this on one execution

    A   
    0  1    1
       0    0
    1  2    2
       3    3
    Name: A, dtype: int64
    

    And this on another

    0    0
    1    1
    2    2
    3    3
    Name: A, dtype: int64
    

    Of course this never has an issue because it's impossible to return the same values as the original

    print(df.groupby(df.A // 2).A.apply(pd.Series.sample, n=1))
    
    A   
    0  0    0
    1  2    2
    Name: A, dtype: int64
    

    Work around
    set_index

    cols = ['plant1_type','plant2_type','city2']
    df.set_index(cols).groupby(level=cols)['p234_r_c'].\
        nlargest(1).reset_index()
    
      plant1_type plant2_type     city2  p234_r_c
    0    COMBCYCL        COAL   Toronto       5.0
    1    COMBCYCL        COAL   Detroit       4.0
    2        NUKE    COMBCYCL  St.Louis       2.0
    3        COAL    COMBCYCL     Miami       0.5
    4        NUKE        COAL    Dallas       1.0
    5    COMBCYCL        NUKE    Dallas       4.0
    6        COAL        NUKE    Dallas       3.0
    
    0 讨论(0)
提交回复
热议问题