Multiple aggregation in group by in Pandas Dataframe

匿名 (未验证) 提交于 2019-12-03 01:03:01

问题:

SQL : Select Max(A) , Min (B) , C from Table group by C  

I want to do the same operation in pandas on a dataframe. The closer I got was till :

DF2= DF1.groupby(by=['C']).max()  

where I land up getting max of both the columns , how do i do more than one operation while grouping by.

回答1:

try agg() function:

import numpy as np import pandas as pd   df = pd.DataFrame(np.random.randint(0,5,size=(20, 3)), columns=list('ABC')) print(df)  print(df.groupby('C').agg({'A': max, 'B':min})) 

Output:

    A  B  C 0   2  3  0 1   2  2  1 2   4  0  1 3   0  1  4 4   3  3  2 5   0  4  3 6   2  4  2 7   3  4  0 8   4  2  2 9   3  2  1 10  2  3  1 11  4  1  0 12  4  3  2 13  0  0  1 14  3  1  1 15  4  1  1 16  0  0  0 17  4  0  1 18  3  4  0 19  0  2  4    A  B C 0  4  0 1  4  0 2  4  2 3  0  4 4  0  1 

Alternatively you may want to check pandas.read_sql_query() function...



回答2:

You can use function agg:

DF2 = DF1.groupby('C').agg({'A': max, 'B': min}) 

Sample:

print DF1    A   B  C  D 0  1   5  a  a 1  7   9  a  b 2  2  10  c  d 3  3   2  c  c  DF2 = DF1.groupby('C').agg({'A': max, 'B': min})  print DF2    A  B C       a  7  5 c  3  2 

GroupBy-fu: improvements in grouping and aggregating data in pandas - nice explanations.



回答3:

You can use the agg function

import pandas as pd import numpy as np  df.groupby('something').agg({'column1': np.max, 'columns2': np.min}) 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!