可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
SQL : Select Max(A) , Min (B) , C from Table group by C
I want to do the same operation in pandas on a dataframe. The closer I got was till :
DF2= DF1.groupby(by=['C']).max()
where I land up getting max of both the columns , how do i do more than one operation while grouping by.
回答1:
try agg()
function:
import numpy as np import pandas as pd df = pd.DataFrame(np.random.randint(0,5,size=(20, 3)), columns=list('ABC')) print(df) print(df.groupby('C').agg({'A': max, 'B':min}))
Output:
A B C 0 2 3 0 1 2 2 1 2 4 0 1 3 0 1 4 4 3 3 2 5 0 4 3 6 2 4 2 7 3 4 0 8 4 2 2 9 3 2 1 10 2 3 1 11 4 1 0 12 4 3 2 13 0 0 1 14 3 1 1 15 4 1 1 16 0 0 0 17 4 0 1 18 3 4 0 19 0 2 4 A B C 0 4 0 1 4 0 2 4 2 3 0 4 4 0 1
Alternatively you may want to check pandas.read_sql_query() function...
回答2:
You can use function agg
:
DF2 = DF1.groupby('C').agg({'A': max, 'B': min})
Sample:
print DF1 A B C D 0 1 5 a a 1 7 9 a b 2 2 10 c d 3 3 2 c c DF2 = DF1.groupby('C').agg({'A': max, 'B': min}) print DF2 A B C a 7 5 c 3 2
GroupBy-fu: improvements in grouping and aggregating data in pandas - nice explanations.
回答3:
You can use the agg function
import pandas as pd import numpy as np df.groupby('something').agg({'column1': np.max, 'columns2': np.min})