Pandas fill missing values of a column based on the datetime values of another column

后端未结

关注

 1  1551

Python newbie here, this is my first question. I tried to find a solution on similar SO questions, like this one, this one, and also this one, but I think my problem is differe

相关标签:

1条回答

我在风中等你

2021-01-25 12:03

Your intuition seems fine by me, but you can't apply it this way since your dataframe foo doens't have the same size as your groupby dataframe. What you could do is map the values like this:

foo['last'] = foo.sess_id.map(foo.groupby('sess_id').DATE.max())
foo['first'] = foo.sess_id.map(foo.groupby('sess_id').DATE.min())

But I don't think it's necessary, you can just use the groupby dataframe as such.

A way to solve your problem could be to look for the missing values in sess_id column, and apply a custom function to the corresponding dates:

def my_custom_function(time):
    current_sessions = my_agg.loc[(my_agg['min']<time) & (my_agg['max']>time)]
    count = len(current_sessions)
    if count == 0:
        return 0
    if count > 1:
        return -99
    return current_sessions.index[0]

my_agg = foo.groupby('sess_id').DATE.agg([min,max])
foo.loc[foo.sess_id.isnull(),'sess_id'] = foo.loc[foo.sess_id.isnull(),'DATE'].apply(my_custom_function)

Output:

    DATE                    sess_id
0   2018-01-01 00:19:01     a
1   2018-01-01 00:19:05     b
2   2018-01-01 00:21:07     a
3   2018-01-01 00:22:07     b
4   2018-01-01 00:25:09     c
5   2018-01-01 00:25:11     -99
6   2018-01-01 00:27:28     c
7   2018-01-01 00:29:29     a
8   2018-01-01 00:30:35     b
9   2018-01-01 00:31:16     b
10  2018-01-01 00:35:22     0

I think it performs what you are looking for, though the output you posted in your question seems to contain typos.

0 讨论(0)