Pandas fill missing values of a column based on the datetime values of another column

后端 未结 1 1551
一向
一向 2021-01-25 11:39

Python newbie here, this is my first question. I tried to find a solution on similar SO questions, like this one, this one, and also this one, but I think my problem is differe

相关标签:
1条回答
  • 2021-01-25 12:03

    Your intuition seems fine by me, but you can't apply it this way since your dataframe foo doens't have the same size as your groupby dataframe. What you could do is map the values like this:

    foo['last'] = foo.sess_id.map(foo.groupby('sess_id').DATE.max())
    foo['first'] = foo.sess_id.map(foo.groupby('sess_id').DATE.min())
    

    But I don't think it's necessary, you can just use the groupby dataframe as such.

    A way to solve your problem could be to look for the missing values in sess_id column, and apply a custom function to the corresponding dates:

    def my_custom_function(time):
        current_sessions = my_agg.loc[(my_agg['min']<time) & (my_agg['max']>time)]
        count = len(current_sessions)
        if count == 0:
            return 0
        if count > 1:
            return -99
        return current_sessions.index[0]
    
    my_agg = foo.groupby('sess_id').DATE.agg([min,max])
    foo.loc[foo.sess_id.isnull(),'sess_id'] = foo.loc[foo.sess_id.isnull(),'DATE'].apply(my_custom_function)
    

    Output:

        DATE                    sess_id
    0   2018-01-01 00:19:01     a
    1   2018-01-01 00:19:05     b
    2   2018-01-01 00:21:07     a
    3   2018-01-01 00:22:07     b
    4   2018-01-01 00:25:09     c
    5   2018-01-01 00:25:11     -99
    6   2018-01-01 00:27:28     c
    7   2018-01-01 00:29:29     a
    8   2018-01-01 00:30:35     b
    9   2018-01-01 00:31:16     b
    10  2018-01-01 00:35:22     0
    

    I think it performs what you are looking for, though the output you posted in your question seems to contain typos.

    0 讨论(0)
提交回复
热议问题