Is there a way in pandas to interpolate only single missing data points? That is, if there is 2+ consecutive NaN\'s, I\'d like to leave them alone.
so, as an example:
My opinion is that this would be a great capability to include in interpolate
.
That said, this boils down to masking the places where more than one np.nan
exist. I'll wrap that up with some numpy
logic in a handy function.
def cnan(s):
v = s.values
k = v.size
n = np.append(np.isnan(v), False)
m = np.empty(k, np.bool8)
m.fill(True)
i = np.where(n[:-1] & n[1:])[0] + np.arange(2)
m[i[i < k]] = False
return m
s.interpolate().where(cnan(s))
0 1.0
1 1.5
2 2.0
3 3.0
4 NaN
5 NaN
6 4.5
dtype: float64
For those interested in a general solution using advanced numpy
techniques
import pandas as pd
import numpy as np
from numpy.lib.stride_tricks import as_strided as strided
def mask_knans(a, x):
a = np.asarray(a)
k = a.size
n = np.append(np.isnan(a), [False] * (x - 1))
m = np.empty(k, np.bool8)
m.fill(True)
s = n.strides[0]
i = np.where(strided(n, (k + 1 - x, x), (s, s)).all(1))[0][:, None]
i = i + np.arange(x)
i = pd.unique(i[i < k])
m[i] = False
return m
demo
a = np.array([1, np.nan, np.nan, np.nan, 3, np.nan, 4, 5, np.nan, np.nan, 6, 7])
print(mask_knans(a, 3))
[ True False False False True True True True True True True True]
s[(s.shift(-1).notnull()) & (s.shift(1).notnull())] = (s.shift(-1) + s.shift(1))/2
Actually,
s[s.isnull()] = (s.shift(-1) + s.shift(1))/2
works as well, if you are doing simple interpolation.