Python newbie here, this is my first question. I tried to find a solution on similar SO questions, like this one, this one, and also this one, but I think my problem is differe
Your intuition seems fine by me, but you can't apply it this way since your dataframe foo
doens't have the same size as your groupby
dataframe. What you could do is map the values like this:
foo['last'] = foo.sess_id.map(foo.groupby('sess_id').DATE.max())
foo['first'] = foo.sess_id.map(foo.groupby('sess_id').DATE.min())
But I don't think it's necessary, you can just use the groupby dataframe as such.
A way to solve your problem could be to look for the missing values in sess_id
column, and apply a custom function to the corresponding dates:
def my_custom_function(time):
current_sessions = my_agg.loc[(my_agg['min']<time) & (my_agg['max']>time)]
count = len(current_sessions)
if count == 0:
return 0
if count > 1:
return -99
return current_sessions.index[0]
my_agg = foo.groupby('sess_id').DATE.agg([min,max])
foo.loc[foo.sess_id.isnull(),'sess_id'] = foo.loc[foo.sess_id.isnull(),'DATE'].apply(my_custom_function)
Output:
DATE sess_id
0 2018-01-01 00:19:01 a
1 2018-01-01 00:19:05 b
2 2018-01-01 00:21:07 a
3 2018-01-01 00:22:07 b
4 2018-01-01 00:25:09 c
5 2018-01-01 00:25:11 -99
6 2018-01-01 00:27:28 c
7 2018-01-01 00:29:29 a
8 2018-01-01 00:30:35 b
9 2018-01-01 00:31:16 b
10 2018-01-01 00:35:22 0
I think it performs what you are looking for, though the output you posted in your question seems to contain typos.