问题
I have a large dataframe (from 500k to 1M rows) which contains for example these 3 numeric columns: ID, A, B
I want to filter the results in order to obtain a table like the one in the image below, where, for each unique value of column id, i have the maximum and minimum value of A and B. How can i do?
EDIT: i have updated the image below in order to be more clear: when i get the max or min from a column i need to get also the data associated to it of the others columns
回答1:
Sample data (note that you posted an image which can't be used by potential answerers without retyping, so I'm making a simple example in its place):
df=pd.DataFrame({ 'id':[1,1,1,1,2,2,2,2],
'a':range(8), 'b':range(8,0,-1) })
The key to this is just using idxmax
and idxmin
and then futzing with the indexes so that you can merge things in a readable way. Here's the whole answer and you may wish to examine intermediate dataframes to see how this is working.
df_max = df.groupby('id').idxmax()
df_max['type'] = 'max'
df_min = df.groupby('id').idxmin()
df_min['type'] = 'min'
df2 = df_max.append(df_min).set_index('type',append=True).stack().rename('index')
df3 = pd.concat([ df2.reset_index().drop('id',axis=1).set_index('index'),
df.loc[df2.values] ], axis=1 )
df3.set_index(['id','level_2','type']).sort_index()
a b
id level_2 type
1 a max 3 5
min 0 8
b max 0 8
min 3 5
2 a max 7 1
min 4 4
b max 4 4
min 7 1
Note in particular that df2 looks like this:
id type
1 max a 3
b 0
2 max a 7
b 4
1 min a 0
b 3
2 min a 4
b 7
The last column there holds the index values in df
that were derived with idxmax
& idxmin
. So basically all the information you need is in df2
. The rest of it is just a matter of merging back with df
and making it more readable.
来源:https://stackoverflow.com/questions/40568438/python-pandas-dataframe-find-max-for-each-unique-values-of-an-another-column