Python pandas time series interpolation and regularization

陌路散爱 提交于 2019-12-02 22:56:03

Change the -1s to NaNs:

ts[ts==-1] = np.nan

Then resample the data to have a 5 minute frequency.

ts = ts.resample('5T')

Note that, by default, if two measurements fall within the same 5 minute period, resample averages the values together.

Finally, you could linearly interpolate the time series according to the time:

ts = ts.interpolate(method='time')

Since it looks like your data already has roughly a 5-minute frequency, you might need to resample at a shorter frequency so cubic or spline interpolation can smooth out the curve:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

values = [271238, 329285, -1, 260260, 263711]
timestamps = pd.to_datetime(['2015-01-04 08:29:05',
                             '2015-01-04 08:34:05',
                             '2015-01-04 08:39:05',
                             '2015-01-04 08:44:05',
                             '2015-01-04 08:49:05'])

ts = pd.Series(values, index=timestamps)
ts[ts==-1] = np.nan
ts = ts.resample('T').mean()

ts.interpolate(method='spline', order=3).plot()
ts.interpolate(method='time').plot()
lines, labels = plt.gca().get_legend_handles_labels()
labels = ['spline', 'time']
plt.legend(lines, labels, loc='best')
plt.show()

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!