Resampling a pandas dataframe with multi-index containing timeseries

前端 未结 2 651
一个人的身影
一个人的身影 2020-12-19 08:52

apologies from creating what appears to be a duplicate of this question. I have a dataframe that is shaped more or less like the one below:

df_lenght = 240
d         


        
相关标签:
2条回答
  • 2020-12-19 09:25

    First let's define a resampler function:

    def resampler(x):    
        return x.set_index('datetime').resample('D').mean().rolling(window=2).mean()
    

    Then, we groupby job_id and apply the resampler function:

     df.reset_index(level=2).groupby(level=1).apply(resampler)
    
    Out[657]: 
                              a         b
    job_id datetime                      
    job1   2017-06-23       NaN       NaN
           2017-06-24  0.053378  0.004727
           2017-06-25  0.265074  0.234081
           2017-06-26  0.192286  0.138148
    job2   2017-06-26       NaN       NaN
           2017-06-27 -0.016629 -0.041284
           2017-06-28 -0.028662  0.055399
           2017-06-29  0.113299 -0.204670
    job3   2017-06-29       NaN       NaN
           2017-06-30  0.233524 -0.194982
           2017-07-01  0.068839 -0.237573
           2017-07-02 -0.051211 -0.069917
    

    Let me know if this is what you are after.

    0 讨论(0)
  • 2020-12-19 09:27

    IIUC, you wish to group by job_id and (daily) datetimes, and wish to ignore the first level of the DataFrame index. Therefore, instead of grouping by

    ( [ level_values(i) for i in [0,1] ] + [ pd.Grouper(freq='D', level=2) ] )
    

    you'd want to groupby

    [df.index.get_level_values(1), pd.Grouper(freq='D', level=2)]
    

    import numpy as np
    import pandas as pd
    np.random.seed(2017)
    
    df_length = 240
    df = pd.DataFrame(np.random.randn(df_length,2), columns=['a','b'] )
    df['datetime'] = pd.date_range('23/06/2017', periods=df_length, freq='H')
    
    unique_jobs = ['job1','job2','job3',]
    job_id = [unique_jobs for i in range (1, int((df_length/len(unique_jobs))+1) ,1) ]
    df['job_id'] = sorted( [val for sublist in job_id for val in sublist] )
    
    df.set_index(['job_id','datetime'], append=True, inplace=True)
    
    grouped = df.groupby([df.index.get_level_values(1), pd.Grouper(freq='D', level=2)])
    result = grouped.mean().rolling(window=2).mean()
    
    print(result)
    

    yields

                              a         b
    job_id datetime                      
    job1   2017-06-23       NaN       NaN
           2017-06-24 -0.203083  0.176141
           2017-06-25 -0.077083  0.072510
           2017-06-26 -0.237611 -0.493329
    job2   2017-06-26 -0.297775 -0.370543
           2017-06-27  0.005124  0.052603
           2017-06-28  0.226142 -0.015584
           2017-06-29 -0.065595  0.210628
    job3   2017-06-29 -0.186865  0.347683
           2017-06-30  0.051508  0.029909
           2017-07-01  0.005341  0.075378
           2017-07-02 -0.027131  0.132192
    
    0 讨论(0)
提交回复
热议问题