Pandas fill missing values of a column based on the datetime values of another column

妖精的绣舞 提交于 2019-12-02 06:25:47

Your intuition seems fine by me, but you can't apply it this way since your dataframe foo doens't have the same size as your groupby dataframe. What you could do is map the values like this:

foo['last'] = foo.sess_id.map(foo.groupby('sess_id').DATE.max())
foo['first'] = foo.sess_id.map(foo.groupby('sess_id').DATE.min())

But I don't think it's necessary, you can just use the groupby dataframe as such.

A way to solve your problem could be to look for the missing values in sess_id column, and apply a custom function to the corresponding dates:

def my_custom_function(time):
    current_sessions = my_agg.loc[(my_agg['min']<time) & (my_agg['max']>time)]
    count = len(current_sessions)
    if count == 0:
        return 0
    if count > 1:
        return -99
    return current_sessions.index[0]

my_agg = foo.groupby('sess_id').DATE.agg([min,max])
foo.loc[foo.sess_id.isnull(),'sess_id'] = foo.loc[foo.sess_id.isnull(),'DATE'].apply(my_custom_function)

Output:

    DATE                    sess_id
0   2018-01-01 00:19:01     a
1   2018-01-01 00:19:05     b
2   2018-01-01 00:21:07     a
3   2018-01-01 00:22:07     b
4   2018-01-01 00:25:09     c
5   2018-01-01 00:25:11     -99
6   2018-01-01 00:27:28     c
7   2018-01-01 00:29:29     a
8   2018-01-01 00:30:35     b
9   2018-01-01 00:31:16     b
10  2018-01-01 00:35:22     0

I think it performs what you are looking for, though the output you posted in your question seems to contain typos.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!