My df:
{\'city1\': {0: \'Chicago\',
1: \'Chicago\',
2: \'Chicago\',
3: \'Chicago\',
4: \'Miami\',
5: \'Houston\',
6: \'Austin\'},
\'city2\': {0:
I added an issue here
Theory:
When the results of a
groupby
on apd.Series
returns the samepd.Series
values, then the original index is returned.
Boiled down example
df = pd.DataFrame(dict(A=[0, 1, 2, 3]))
# returns results identical to df.A
print(df.groupby(df.A // 2).A.nsmallest(2))
# returns results out of order
print(df.groupby(df.A // 2).A.nlargest(2))
0 0
1 1
2 2
3 3
Name: A, dtype: int64
A
0 1 1
0 0
1 3 3
2 2
Name: A, dtype: int64
I'd argue that you want these to return the same consistent index.
This is the most egregious consequence of this:
# most egregious
# this will be randomly different
print(df.groupby(df.A // 2).A.apply(pd.Series.sample, n=2))
returns this on one execution
A
0 1 1
0 0
1 2 2
3 3
Name: A, dtype: int64
And this on another
0 0
1 1
2 2
3 3
Name: A, dtype: int64
Of course this never has an issue because it's impossible to return the same values as the original
print(df.groupby(df.A // 2).A.apply(pd.Series.sample, n=1))
A
0 0 0
1 2 2
Name: A, dtype: int64
Work around
set_index
cols = ['plant1_type','plant2_type','city2']
df.set_index(cols).groupby(level=cols)['p234_r_c'].\
nlargest(1).reset_index()
plant1_type plant2_type city2 p234_r_c
0 COMBCYCL COAL Toronto 5.0
1 COMBCYCL COAL Detroit 4.0
2 NUKE COMBCYCL St.Louis 2.0
3 COAL COMBCYCL Miami 0.5
4 NUKE COAL Dallas 1.0
5 COMBCYCL NUKE Dallas 4.0
6 COAL NUKE Dallas 3.0