Python pandas time series interpolation and regularization

前端 未结 1 1999
[愿得一人]
[愿得一人] 2021-02-02 13:38

I am using Python Pandas for the first time. I have 5-min lag traffic data in csv format:

...
2015-01-04 08:29:05,271238
2015-01-04 08:34:05,329285
2015-01-04 08         


        
1条回答
  •  闹比i
    闹比i (楼主)
    2021-02-02 14:09

    Change the -1s to NaNs:

    ts[ts==-1] = np.nan
    

    Then resample the data to have a 5 minute frequency.

    ts = ts.resample('5T')
    

    Note that, by default, if two measurements fall within the same 5 minute period, resample averages the values together.

    Finally, you could linearly interpolate the time series according to the time:

    ts = ts.interpolate(method='time')
    

    Since it looks like your data already has roughly a 5-minute frequency, you might need to resample at a shorter frequency so cubic or spline interpolation can smooth out the curve:

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    
    values = [271238, 329285, -1, 260260, 263711]
    timestamps = pd.to_datetime(['2015-01-04 08:29:05',
                                 '2015-01-04 08:34:05',
                                 '2015-01-04 08:39:05',
                                 '2015-01-04 08:44:05',
                                 '2015-01-04 08:49:05'])
    
    ts = pd.Series(values, index=timestamps)
    ts[ts==-1] = np.nan
    ts = ts.resample('T').mean()
    
    ts.interpolate(method='spline', order=3).plot()
    ts.interpolate(method='time').plot()
    lines, labels = plt.gca().get_legend_handles_labels()
    labels = ['spline', 'time']
    plt.legend(lines, labels, loc='best')
    plt.show()
    

    enter image description here

    0 讨论(0)
提交回复
热议问题