Fill cell containing NaN with average of value before and after

前端 未结 3 1712
野趣味
野趣味 2021-01-06 16:27

I would like to fill missing values in a pandas dataframe with the average of the cells directly before and after the missing value. So if it was [1, NaN, 3], the NaN value

相关标签:
3条回答
  • 2021-01-06 17:07

    Use spies006's example df.

    df = pd.DataFrame({'a': [10, 6, -3, -2, 4, 12, 3, 3], 
    'b': [6, -3, np.nan, 12, 8, 11, -5, -5], 
    'id': [1, 1, 1, 1, np.nan, 2, 2, 4]})
    
    #use np.where to locate the nans and fill it with the average of surrounding elements.
    df.where(df.notnull(), other=(df.fillna(method='ffill')+df.fillna(method='bfill'))/2)
    Out[2517]: 
        a     b   id
    0  10   6.0  1.0
    1   6  -3.0  1.0
    2  -3   4.5  1.0
    3  -2  12.0  1.0
    4   4   8.0  1.5
    5  12  11.0  2.0
    6   3  -5.0  2.0
    7   3  -5.0  4.0
    
    0 讨论(0)
  • 2021-01-06 17:15

    This will work if you don't have any NaN values as the last index, which is implied to be true by your imputation methodology.

    >>> data = pd.DataFrame({'a': [10, 6, -3, -2, 4, 12, 3, 3], 
    'b': [6, -3, np.nan, 12, 8, 11, -5, -5], 
    'id': [1, 1, 1, 1, np.nan, 2, 2, 4]})
    >>> data
        a     b   id
    0  10   6.0  1.0
    1   6  -3.0  1.0
    2  -3   NaN  1.0
    3  -2  12.0  1.0
    4   4   8.0  NaN
    5  12  11.0  2.0
    6   3  -5.0  2.0
    7   3  -5.0  4.0
    
    
    
    >>> nan_cols = data.columns[data.isnull().any(axis=0)]
    >>> for col in nan_cols:
    ...     for i in range(len(data)):
    ...             if pd.isnull(data.loc[i, col]):
    ...                     data.loc[i, col] = (data.loc[i-1, col] + data.loc[i+1, col])/2
    
    
    >>> data
        a     b   id
    0  10   6.0  1.0
    1   6  -3.0  1.0
    2  -3   4.5  1.0
    3  -2  12.0  1.0
    4   4   8.0  1.5
    5  12  11.0  2.0
    6   3  -5.0  2.0
    7   3  -5.0  4.0
    
    0 讨论(0)
  • 2021-01-06 17:28

    Consider this dataframe

    df = pd.DataFrame({'val': [1,np.nan, 4, 5, np.nan, 10]})
    
        val
    0   1.0
    1   NaN
    2   4.0
    3   5.0
    4   NaN
    5   10.0
    

    You can use fillna along with shift() to get the desired output

    df.val = df.val.fillna((df.val.shift() + df.val.shift(-1))/2)
    

    You get

        val
    0   1.0
    1   2.5
    2   4.0
    3   5.0
    4   7.5
    5   10.0
    
    0 讨论(0)
提交回复
热议问题