NaN in data frame: when first observation of time series is NaN, frontfill with first available, otherwise carry over last / previous observation

孤街醉人 提交于 2019-12-24 17:33:56

问题


I am performing an ADF-test from statsmodels. The value series can have missing obversations. In fact, I am dropping the analysis if the fraction of NaNs is larger than c. However, if the series makes it through the I get the problem, that the adfuller cannot deal with missing data. Since this is training data with a minimum framesize, I would like to do:

1) if x(t=0) = NaN, then find the next non-NaN value (t>0) 2) otherwise if x(t) = NaN, then x(t) = x(t-1)

So I am compromising here my first value, but making sure the input data has always the same dimension. Alternatively, I could fill if the first value is missing with 0 making use of the limit option from dropna.

From the documentation the different option are not 100% clear to me: method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None

Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap

pad / ffill: does that mean I carry over the previous value? backfill / bfill: does that mean I the value is taken from a valid one in the future?

df.dropna(method = 'bfill', limit 1, inplace = True)
df.dropna(method = 'ffill', inplace = True)

Would that work with limit? The documentation uses 'limit = 1' but has predetermined a value to be filled.


回答1:


1) if x(t=0) = NaN, then find the next non-NaN value (t>0) 2) otherwise if x(t) = NaN, then x(t) = x(t-1)

To front-fill all observations except for (possibly) the first ones, which should be backfilled, you can chain two calls to fillna, the first with method='ffill' and the second with method='fill':

df = pd.DataFrame({'a': [None, None, 1, None, 2, None]})
>>> df.fillna(method='ffill').fillna(method='bfill')
    a
0   1.0
1   1.0
2   1.0
3   1.0
4   2.0
5   2.0


来源:https://stackoverflow.com/questions/49663350/nan-in-data-frame-when-first-observation-of-time-series-is-nan-frontfill-with

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!