Best way(run-time) to aggregate (calculate ratio of) sum to total count based on group by

若如初见. 提交于 2019-12-11 12:57:20

问题


I'm trying to identify ratio of approved applications(identified by flag '1' and if not then '0') to total applications for each person(Cust_ID). I have achieved this logic by the following code but it takes about 10 mins to compute this for 1.6 M records. Is there a faster to perform the same operation?

# Finding ratio of approved out of total applications
df_approved_ratio = df.groupby('Cust_ID').apply(lambda x:x['STATUS_Approved'].sum()/len(x))

回答1:


I think need aggregate by mean:

df = pd.DataFrame({'STATUS_Approved':[0,1,0,0,1,1],
                   'Cust_ID':list('aaabbb')})

print (df)
   STATUS_Approved Cust_ID
0                0       a
1                1       a
2                0       a
3                0       b
4                1       b
5                1       b

df_approved_ratio = df.groupby('Cust_ID')['STATUS_Approved'].mean()
print (df_approved_ratio)
Cust_ID
a    0.333333
b    0.666667
Name: STATUS_Approved, dtype: float64

print (df.groupby('Cust_ID').apply(lambda x:x['STATUS_Approved'].sum()/len(x)))
Cust_ID
a    0.333333
b    0.666667
Name: STATUS_Approved, dtype: float64


来源:https://stackoverflow.com/questions/51519379/best-wayrun-time-to-aggregate-calculate-ratio-of-sum-to-total-count-based-on

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!