Given the problems with groupby()
and nlargest()
as described here and here. I am trying to work around the problems.
Note: for simplicity I us
Unless I'm missing something (and I agree there are bugs lurking in the pandas code here), we can bypass any difficulties relatively simply.
Method #1: use loc
and idxmax
:
In [21]: df.loc[df.groupby(cols2)["p234_r_c"].idxmax()]
Out[21]:
city1 city2 p234_r_c plant1_type plant2_type
6 Austin Dallas 3.0 COAL NUKE
3 Chicago Miami 0.5 COAL COMBCYCL
0 Chicago Toronto 5.0 COMBCYCL COAL
2 Chicago St.Louis 2.0 NUKE COMBCYCL
5 Houston Dallas 4.0 COMBCYCL NUKE
4 Miami Dallas 1.0 NUKE COAL
In [22]: df.loc[df.groupby(cols)["p234_r_c"].idxmax()]
Out[22]:
city1 city2 p234_r_c plant1_type plant2_type
6 Austin Dallas 3.0 COAL NUKE
5 Houston Dallas 4.0 COMBCYCL NUKE
4 Miami Dallas 1.0 NUKE COAL
1 Chicago Detroit 4.0 COMBCYCL COAL
3 Chicago Miami 0.5 COAL COMBCYCL
2 Chicago St.Louis 2.0 NUKE COMBCYCL
0 Chicago Toronto 5.0 COMBCYCL COAL
Method #2: sort by p234_r_c
and use last
:
In [17]: df.sort_values("p234_r_c").groupby(cols2, as_index=False).last()
Out[17]:
city1 plant1_type plant2_type city2 p234_r_c
0 Austin COAL NUKE Dallas 3.0
1 Chicago COAL COMBCYCL Miami 0.5
2 Chicago COMBCYCL COAL Toronto 5.0
3 Chicago NUKE COMBCYCL St.Louis 2.0
4 Houston COMBCYCL NUKE Dallas 4.0
5 Miami NUKE COAL Dallas 1.0
In [18]: df.sort_values("p234_r_c").groupby(cols, as_index=False).last()
Out[18]:
city2 plant1_type plant2_type city1 p234_r_c
0 Dallas COAL NUKE Austin 3.0
1 Dallas COMBCYCL NUKE Houston 4.0
2 Dallas NUKE COAL Miami 1.0
3 Detroit COMBCYCL COAL Chicago 4.0
4 Miami COAL COMBCYCL Chicago 0.5
5 St.Louis NUKE COMBCYCL Chicago 2.0
6 Toronto COMBCYCL COAL Chicago 5.0
If you want to be able to get multiple responses as well, while nlargest and nsmallest are broken, I think it's simplest to sort and then use head or tail. For example:
In [27]: df.sort_values("p234_r_c").groupby(cols, as_index=False).tail(2)
Out[27]:
city1 city2 p234_r_c plant1_type plant2_type
3 Chicago Miami 0.5 COAL COMBCYCL
4 Miami Dallas 1.0 NUKE COAL
2 Chicago St.Louis 2.0 NUKE COMBCYCL
6 Austin Dallas 3.0 COAL NUKE
1 Chicago Detroit 4.0 COMBCYCL COAL
5 Houston Dallas 4.0 COMBCYCL NUKE
0 Chicago Toronto 5.0 COMBCYCL COAL