Pandas: Average value for the past n days

前端 未结 2 985
隐瞒了意图╮
隐瞒了意图╮ 2020-12-24 03:50

I have a Pandas data frame like this:

test = pd.DataFrame({ \'Date\' : [\'2016-04-01\',\'2016-04-01\',\'2016-04-02\',
                                   


        
相关标签:
2条回答
  • 2020-12-24 04:11
    n = 2
    
    # Cast your dates as timestamps.
    test['Date'] = pd.to_datetime(test.Date)
    
    # Create a daily index spanning the range of the original index.
    idx = pd.date_range(test.Date.min(), test.Date.max(), freq='D')
    
    # Pivot by Dates and Users.
    df = test.pivot(index='Date', values='Value', columns='User').reindex(idx)
    >>> df.head(3)
    User        John  Mike
    2016-04-01     2   1.0
    2016-04-02     3   1.0
    2016-04-03   NaN   4.5
    
    # Apply a rolling mean on the above dataframe and reset the index.
    df2 = (pd.rolling_mean(df.shift(), n, min_periods=1)
           .reset_index()
           .drop_duplicates())
    
    # For Pandas 0.18.0+
    df2 = (df.shift().rolling(window=n, min_periods=1).mean()
           .reset_index()
           .drop_duplicates())
    
    # Melt the result back into the original form.
    df3 = (pd.melt(df2, id_vars='Date', value_name='Value')
           .sort_values(['Date', 'User'])
           .reset_index(drop=True))
    >>> df3.head()
            Date  User  Value
    0 2016-04-01  John    NaN
    1 2016-04-01  Mike    NaN
    2 2016-04-02  John    2.0
    3 2016-04-02  Mike    1.0
    4 2016-04-03  John    2.5
    
    # Merge the results back into the original dataframe.
    >>> test.merge(df3, on=['Date', 'User'], how='left', 
                   suffixes=['', '_Average_past_{0}_days'.format(n)])
    
            Date  User  Value  Value_Average_past_2_days
    0 2016-04-01  Mike    1.0                        NaN
    1 2016-04-01  John    2.0                        NaN
    2 2016-04-02  Mike    1.0                       1.00
    3 2016-04-02  John    3.0                       2.00
    4 2016-04-03  Mike    4.5                       1.00
    5 2016-04-04  Mike    1.0                       2.75
    6 2016-04-05  Mike    2.0                       2.75
    7 2016-04-06  Mike    3.0                       1.50
    8 2016-04-06  John    6.0                        NaN
    

    Summary

    n = 2
    test['Date'] = pd.to_datetime(test.Date)
    idx = pd.date_range(test.Date.min(), test.Date.max(), freq='D')
    df = test.pivot(index='Date', values='Value', columns='User').reindex(idx)
    df2 = (pd.rolling_mean(df.shift(), n, min_periods=1)
           .reset_index()
           .drop_duplicates())
    df3 = (pd.melt(df2, id_vars='Date', value_name='Value')
           .sort_values(['Date', 'User'])
           .reset_index(drop=True))
    test.merge(df3, on=['Date', 'User'], how='left', 
               suffixes=['', '_Average_past_{0}_days'.format(n)])
    
    0 讨论(0)
  • 2020-12-24 04:17

    I think you can use first convert column Date to_datetime, then find missing Days by groupby with resample and last apply rolling

    test['Date'] = pd.to_datetime(test['Date'])
    
    df = test.groupby('User').apply(lambda x: x.set_index('Date').resample('1D').first())
    print df
                     User  Value
    User Date                   
    John 2016-04-01  John    2.0
         2016-04-02  John    3.0
         2016-04-03   NaN    NaN
         2016-04-04   NaN    NaN
         2016-04-05   NaN    NaN
         2016-04-06  John    6.0
    Mike 2016-04-01  Mike    1.0
         2016-04-02  Mike    1.0
         2016-04-03  Mike    4.5
         2016-04-04  Mike    1.0
         2016-04-05  Mike    2.0
    
    df1 = df.groupby(level=0)['Value']
            .apply(lambda x: x.shift().rolling(min_periods=1,window=2).mean())
            .reset_index(name='Value_Average_Past_2_days')
    
    print df1
        User       Date  Value_Average_Past_2_days
    0   John 2016-04-01                        NaN
    1   John 2016-04-02                       2.00
    2   John 2016-04-03                       2.50
    3   John 2016-04-04                       3.00
    4   John 2016-04-05                        NaN
    5   John 2016-04-06                        NaN
    6   Mike 2016-04-01                        NaN
    7   Mike 2016-04-02                       1.00
    8   Mike 2016-04-03                       1.00
    9   Mike 2016-04-04                       2.75
    10  Mike 2016-04-05                       2.75
    11  Mike 2016-04-06                       1.50
    
    print pd.merge(test, df1, on=['Date', 'User'], how='left')
            Date  User  Value  Value_Average_Past_2_days
    0 2016-04-01  Mike    1.0                        NaN
    1 2016-04-01  John    2.0                        NaN
    2 2016-04-02  Mike    1.0                       1.00
    3 2016-04-02  John    3.0                       2.00
    4 2016-04-03  Mike    4.5                       1.00
    5 2016-04-04  Mike    1.0                       2.75
    6 2016-04-05  Mike    2.0                       2.75
    7 2016-04-06  Mike    3.0                       1.50
    8 2016-04-06  John    6.0                        NaN
    
    0 讨论(0)
提交回复
热议问题