Replace NaN's in NumPy array with closest non-NaN value

前端 未结 7 1710
你的背包
你的背包 2021-02-05 04:22

I have a NumPy array a like the following:

>>> str(a)
\'[        nan         nan         nan  1.44955726  1.44628034  1.44409573\\n  1.4408         


        
相关标签:
7条回答
  • 2021-02-05 05:08

    I want to replace each NaN with the closest non-NaN value... there will be no NaN's in the middle of the numbers

    The following will do it:

    ind = np.where(~np.isnan(a))[0]
    first, last = ind[0], ind[-1]
    a[:first] = a[first]
    a[last + 1:] = a[last]
    

    This is a straight numpy solution requiring no Python loops, no recursion, no list comprehensions etc.

    0 讨论(0)
  • 2021-02-05 05:08

    I came across the problem and had to find a custom solution for scattered NaNs. The function below replaces any NaN by the first number occurrence to the right, if none exists, it replaces it by the first number occurrence to the left. Further manipulation can be done to replace it with the mean of boundary occurrences.

    import numpy as np
    
    Data = np.array([np.nan,1.3,np.nan,1.4,np.nan,np.nan])
    
    nansIndx = np.where(np.isnan(Data))[0]
    isanIndx = np.where(~np.isnan(Data))[0]
    for nan in nansIndx:
        replacementCandidates = np.where(isanIndx>nan)[0]
        if replacementCandidates.size != 0:
            replacement = Data[isanIndx[replacementCandidates[0]]]
        else:
            replacement = Data[isanIndx[np.where(isanIndx<nan)[0][-1]]]
        Data[nan] = replacement
    

    Result is:

    >>> Data
    array([ 1.3,  1.3,  1.4,  1.4,  1.4,  1.4])
    
    0 讨论(0)
  • 2021-02-05 05:14

    Here is a solution using simple python iterators. They are actually more efficient here than numpy.where, especially with big arrays! See comparison of similar code here.

    import numpy as np
    
    a = np.array([np.NAN, np.NAN, np.NAN, 1.44955726, 1.44628034, 1.44409573, 1.4408188, 1.43657094, 1.43171624,  1.42649744, 1.42200684, 1.42117704, 1.42040255, 1.41922908, np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, np.NAN])
    
    mask = np.isfinite(a)
    
    # get first value in list
    for i in range(len(mask)):
        if mask[i]:
            first = i
            break
    
    # get last vaue in list
    for i in range(len(mask)-1, -1, -1):
        if mask[i]:
            last = i
            break
    
    # fill NaN with near known value on the edges
    a = np.copy(a)
    a[:first] = a[first]
    a[last + 1:] = a[last]
    
    print(a)
    

    Output:

    [1.44955726 1.44955726 1.44955726 1.44955726 1.44628034 1.44409573
     1.4408188  1.43657094 1.43171624 1.42649744 1.42200684 1.42117704
     1.42040255 1.41922908 1.41922908 1.41922908 1.41922908 1.41922908
     1.41922908 1.41922908]
    

    It replaces only the first and last NaNs like requested here.

    0 讨论(0)
  • 2021-02-05 05:19

    I got something like this

    i = [i for i in range(len(a)) if not np.isnan(a[i])]
    a = [a[i[0]] if x < i[0] else (a[i[-1]] if x > i[-1] else a[x]) for x in range(len(a))]
    

    It's a bit clunky though given it's split up in two lines with nested inline if's in one of them.

    0 讨论(0)
  • 2021-02-05 05:21

    As an alternate solution (this will linearly interpolate for arrays NaNs in the middle, as well):

    import numpy as np
    
    # Generate data...
    data = np.random.random(10)
    data[:2] = np.nan
    data[-1] = np.nan
    data[4:6] = np.nan
    
    print data
    
    # Fill in NaN's...
    mask = np.isnan(data)
    data[mask] = np.interp(np.flatnonzero(mask), np.flatnonzero(~mask), data[~mask])
    
    print data
    

    This yields:

    [        nan         nan  0.31619306  0.25818765         nan         nan
      0.27410025  0.23347532  0.02418698         nan]
    
    [ 0.31619306  0.31619306  0.31619306  0.25818765  0.26349185  0.26879605
      0.27410025  0.23347532  0.02418698  0.02418698]
    
    0 讨论(0)
  • 2021-02-05 05:21

    NaNs have the interesting property of comparing different from themselves, thus we can quickly find the indexes of the non-nan elements:

    idx = np.nonzero(a==a)[0]
    

    it's now easy to replace the nans with the desired value:

    for i in range(0, idx[0]):
        a[i]=a[idx[0]]
    for i in range(idx[-1]+1, a.size)
        a[i]=a[idx[-1]]
    

    Finally, we can put this in a function:

    import numpy as np
    
    def FixNaNs(arr):
        if len(arr.shape)>1:
            raise Exception("Only 1D arrays are supported.")
        idxs=np.nonzero(arr==arr)[0]
    
        if len(idxs)==0:
            return None
    
        ret=arr
    
        for i in range(0, idxs[0]):
            ret[i]=ret[idxs[0]]
    
        for i in range(idxs[-1]+1, ret.size):
            ret[i]=ret[idxs[-1]]
    
        return ret
    

    edit

    Ouch, coming from C++ I always forget about list ranges... @aix's solution is way more elegant and efficient than my C++ish loops, use that instead of mine.

    0 讨论(0)
提交回复
热议问题