问题
I am performing an ADF-test from statsmodels. The value series can have missing obversations. In fact, I am dropping the analysis if the fraction of NaNs is larger than c. However, if the series makes it through the I get the problem, that the adfuller cannot deal with missing data. Since this is training data with a minimum framesize, I would like to do:
1) if x(t=0) = NaN, then find the next non-NaN value (t>0) 2) otherwise if x(t) = NaN, then x(t) = x(t-1)
So I am compromising here my first value, but making sure the input data has always the same dimension. Alternatively, I could fill if the first value is missing with 0 making use of the limit option from dropna.
From the documentation the different option are not 100% clear to me: method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
pad / ffill: does that mean I carry over the previous value? backfill / bfill: does that mean I the value is taken from a valid one in the future?
df.dropna(method = 'bfill', limit 1, inplace = True)
df.dropna(method = 'ffill', inplace = True)
Would that work with limit? The documentation uses 'limit = 1' but has predetermined a value to be filled.
回答1:
1) if x(t=0) = NaN, then find the next non-NaN value (t>0) 2) otherwise if x(t) = NaN, then x(t) = x(t-1)
To front-fill all observations except for (possibly) the first ones, which should be backfilled, you can chain two calls to fillna, the first with method='ffill'
and the second with method='fill'
:
df = pd.DataFrame({'a': [None, None, 1, None, 2, None]})
>>> df.fillna(method='ffill').fillna(method='bfill')
a
0 1.0
1 1.0
2 1.0
3 1.0
4 2.0
5 2.0
来源:https://stackoverflow.com/questions/49663350/nan-in-data-frame-when-first-observation-of-time-series-is-nan-frontfill-with