I\'m hoping to groupby users and find the first two uploads. I\'ve figured out how to get the first date via minimum, but I\'m having trouble getting that second upload date. Th
Since the other answers explain pretty well how to achieve this, I'll give you a one-liner for a change
In [1]: df.groupby('User_ID').apply(lambda g: g.sort_values('Date_Uploaded')['Date_Uploaded'][:2].diff()).mean()
Out[1]: Timedelta('21 days 12:00:00')
sort
, calculate the difference and then groupby
+ nth(1)
to get the difference between the first uploads, if it exists (users with 1 date will not show up).
import pandas as pd
df['Date_Uploaded'] = pd.to_datetime(df.Date_Uploaded)
df = df.sort_values(['User_ID', 'Date_Uploaded'])
df.Date_Uploaded.diff().groupby(df.User_ID).nth(1)
#User_ID
#abc123 36 days
#efg123 7 days
#Name: Date_Uploaded, dtype: timedelta64[ns]
If you just want the average then average that series:
df.Date_Uploaded.diff().groupby(df.User_ID).nth(1).mean()
#Timedelta('21 days 12:00:00')
Using sort_values
+ head
df.sort_values('Date_Uploaded').groupby('User_ID').head(2)
Out[152]:
Date_Uploaded User_ID Display_Status
6 2018-07-25 efg123 Pending
5 2018-08-01 efg123 Pending
3 2018-09-21 abc123 Pending
0 2018-10-27 abc123 Cleared