More bizarre results using: groupby and nlargest() in pandas

后端未结

关注

 1  1103

眼角桃花

This question is an extension of the following post: select largest N of a column of each groupby group using pandas

Lets use the same df and the workaround proposed

相关标签:

1条回答

盖世英雄少女心

2021-01-14 09:55

Try this:

In [76]: df.groupby(cols2)['p234_r_c'].nlargest(1).reset_index(level=3, drop=True).reset_index()
Out[76]:
     city1 plant1_type plant2_type  p234_r_c
0   Austin        COAL        NUKE       3.0
1  Chicago        COAL    COMBCYCL       0.5
2  Chicago    COMBCYCL        COAL       5.0
3  Chicago        NUKE    COMBCYCL       2.0
4  Houston    COMBCYCL        NUKE       4.0
5    Miami        NUKE        COAL       1.0

Frankly speaking I don't understand the following behavior:

In [77]: df.set_index(cols2).groupby(level=cols2)['p234_r_c'].nlargest(1)
Out[77]:
city1    plant1_type  plant2_type  city1    plant1_type  plant2_type
Austin   COAL         NUKE         Austin   COAL         NUKE           3.0
Chicago  COAL         COMBCYCL     Chicago  COAL         COMBCYCL       0.5
         COMBCYCL     COAL         Chicago  COMBCYCL     COAL           5.0
         NUKE         COMBCYCL     Chicago  NUKE         COMBCYCL       2.0
Houston  COMBCYCL     NUKE         Houston  COMBCYCL     NUKE           4.0
Miami    NUKE         COAL         Miami    NUKE         COAL           1.0
Name: p234_r_c, dtype: float64

where:

In [78]: cols2
Out[78]: ['city1', 'plant1_type', 'plant2_type']

0 讨论(0)