Interpolate only if single NaN

前端 未结 2 1718
清歌不尽
清歌不尽 2021-02-10 14:33

Is there a way in pandas to interpolate only single missing data points? That is, if there is 2+ consecutive NaN\'s, I\'d like to leave them alone.

so, as an example:

2条回答
  •  谎友^
    谎友^ (楼主)
    2021-02-10 15:24

    My opinion is that this would be a great capability to include in interpolate.
    That said, this boils down to masking the places where more than one np.nan exist. I'll wrap that up with some numpy logic in a handy function.

    def cnan(s):
        v = s.values
        k = v.size
        n = np.append(np.isnan(v), False)
        m = np.empty(k, np.bool8)
        m.fill(True)
        i = np.where(n[:-1] & n[1:])[0] + np.arange(2)
        m[i[i < k]] = False
        return m
    
    s.interpolate().where(cnan(s))
    
    0    1.0
    1    1.5
    2    2.0
    3    3.0
    4    NaN
    5    NaN
    6    4.5
    dtype: float64
    

    For those interested in a general solution using advanced numpy techniques

    import pandas as pd
    import numpy as np
    from numpy.lib.stride_tricks import as_strided as strided
    
    def mask_knans(a, x):
        a = np.asarray(a)
        k = a.size
        n = np.append(np.isnan(a), [False] * (x - 1))
        m = np.empty(k, np.bool8)
        m.fill(True)
    
        s = n.strides[0]
        i = np.where(strided(n, (k + 1 - x, x), (s, s)).all(1))[0][:, None]
        i = i + np.arange(x)
        i = pd.unique(i[i < k])
    
        m[i] = False
    
        return m
    

    demo

    a = np.array([1, np.nan, np.nan, np.nan, 3, np.nan, 4, 5, np.nan, np.nan, 6, 7])
    
    print(mask_knans(a, 3))
    
    [ True False False False  True  True  True  True  True  True  True  True]
    

提交回复
热议问题