Groupby first two earliest dates, then average time between first two dates - pandas

后端 未结 3 740
闹比i
闹比i 2021-01-27 17:43

I\'m hoping to groupby users and find the first two uploads. I\'ve figured out how to get the first date via minimum, but I\'m having trouble getting that second upload date. Th

相关标签:
3条回答
  • 2021-01-27 18:25

    Since the other answers explain pretty well how to achieve this, I'll give you a one-liner for a change

     In [1]: df.groupby('User_ID').apply(lambda g: g.sort_values('Date_Uploaded')['Date_Uploaded'][:2].diff()).mean()
     Out[1]: Timedelta('21 days 12:00:00')
    
    0 讨论(0)
  • 2021-01-27 18:26

    sort, calculate the difference and then groupby + nth(1) to get the difference between the first uploads, if it exists (users with 1 date will not show up).

    import pandas as pd
    
    df['Date_Uploaded'] = pd.to_datetime(df.Date_Uploaded)
    df = df.sort_values(['User_ID', 'Date_Uploaded'])
    
    df.Date_Uploaded.diff().groupby(df.User_ID).nth(1)
    
    #User_ID
    #abc123   36 days
    #efg123    7 days
    #Name: Date_Uploaded, dtype: timedelta64[ns]
    

    If you just want the average then average that series:

    df.Date_Uploaded.diff().groupby(df.User_ID).nth(1).mean()
    #Timedelta('21 days 12:00:00')
    
    0 讨论(0)
  • 2021-01-27 18:27

    Using sort_values + head

    df.sort_values('Date_Uploaded').groupby('User_ID').head(2)
    Out[152]: 
      Date_Uploaded User_ID Display_Status
    6    2018-07-25  efg123        Pending
    5    2018-08-01  efg123        Pending
    3    2018-09-21  abc123        Pending
    0    2018-10-27  abc123        Cleared
    
    0 讨论(0)
提交回复
热议问题