How to sum in pandas by unique index in several columns?

前端未结

关注

 3  1805

耶瑟儿～ 2021-02-04 09:32

I have a pandas DataFrame which details online activities in terms of \"clicks\" during an user session. There are as many as 50,000 unique users, and the dataframe has around 1

3条回答

挽巷 (楼主)

2021-02-04 10:30
The first thing to do is filter registrations dates that precede the registration date, then group on the User_ID and sum.
```
gb = (df[df.Session >= df.Registration]
      .groupby('User_ID')
      .clicks.agg({'Total_Clicks': np.sum}))

>>> gb
         Total_Clicks
User_ID              
1987293             1
2234214             7
2349876             2
9874452             2
```
For the use case you mentioned, I believe this is scalable. It always depends, of course, on your available memory.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...