How to sum in pandas by unique index in several columns?

前端 未结 3 1796
耶瑟儿~
耶瑟儿~ 2021-02-04 09:32

I have a pandas DataFrame which details online activities in terms of \"clicks\" during an user session. There are as many as 50,000 unique users, and the dataframe has around 1

3条回答
  •  挽巷
    挽巷 (楼主)
    2021-02-04 10:30

    The first thing to do is filter registrations dates that precede the registration date, then group on the User_ID and sum.

    gb = (df[df.Session >= df.Registration]
          .groupby('User_ID')
          .clicks.agg({'Total_Clicks': np.sum}))
    
    >>> gb
             Total_Clicks
    User_ID              
    1987293             1
    2234214             7
    2349876             2
    9874452             2
    

    For the use case you mentioned, I believe this is scalable. It always depends, of course, on your available memory.

提交回复
热议问题