How to replace NaNs by preceding values in pandas DataFrame?

前端 未结 9 2074
無奈伤痛
無奈伤痛 2020-11-22 06:04

Suppose I have a DataFrame with some NaNs:

>>> import pandas as pd
>>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, N         


        
相关标签:
9条回答
  • 2020-11-22 06:40

    ffill now has it's own method pd.DataFrame.ffill

    df.ffill()
    
         0    1    2
    0  1.0  2.0  3.0
    1  4.0  2.0  3.0
    2  4.0  2.0  9.0
    
    0 讨论(0)
  • 2020-11-22 06:41

    One thing that I noticed when trying this solution is that if you have N/A at the start or the end of the array, ffill and bfill don't quite work. You need both.

    In [224]: df = pd.DataFrame([None, 1, 2, 3, None, 4, 5, 6, None])
    
    In [225]: df.ffill()
    Out[225]:
         0
    0  NaN
    1  1.0
    ...
    7  6.0
    8  6.0
    
    In [226]: df.bfill()
    Out[226]:
         0
    0  1.0
    1  1.0
    ...
    7  6.0
    8  NaN
    
    In [227]: df.bfill().ffill()
    Out[227]:
         0
    0  1.0
    1  1.0
    ...
    7  6.0
    8  6.0
    
    0 讨论(0)
  • 2020-11-22 06:42

    You can use fillna to remove or replace NaN values.

    NaN Remove

    import pandas as pd
    
    df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
    
    df.fillna(method='ffill')
         0    1    2
    0  1.0  2.0  3.0
    1  4.0  2.0  3.0
    2  4.0  2.0  9.0
    

    NaN Replace

    df.fillna(0) # 0 means What Value you want to replace 
         0    1    2
    0  1.0  2.0  3.0
    1  4.0  0.0  0.0
    2  0.0  0.0  9.0
    

    Reference pandas.DataFrame.fillna

    0 讨论(0)
  • 2020-11-22 06:49

    You could use the fillna method on the DataFrame and specify the method as ffill (forward fill):

    >>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
    >>> df.fillna(method='ffill')
       0  1  2
    0  1  2  3
    1  4  2  3
    2  4  2  9
    

    This method...

    propagate[s] last valid observation forward to next valid

    To go the opposite way, there's also a bfill method.

    This method doesn't modify the DataFrame inplace - you'll need to rebind the returned DataFrame to a variable or else specify inplace=True:

    df.fillna(method='ffill', inplace=True)
    
    0 讨论(0)
  • 2020-11-22 06:53

    Only one column version

    • Fill NAN with last valid value
    df[column_name].fillna(method='ffill', inplace=True)
    
    • Fill NAN with next valid value
    df[column_name].fillna(method='backfill', inplace=True)
    
    0 讨论(0)
  • 2020-11-22 06:53

    In my case, we have time series from different devices but some devices could not send any value during some period. So we should create NA values for every device and time period and after that do fillna.

    df = pd.DataFrame([["device1", 1, 'first val of device1'], ["device2", 2, 'first val of device2'], ["device3", 3, 'first val of device3']])
    df.pivot(index=1, columns=0, values=2).fillna(method='ffill').unstack().reset_index(name='value')
    

    Result:

            0   1   value
    0   device1     1   first val of device1
    1   device1     2   first val of device1
    2   device1     3   first val of device1
    3   device2     1   None
    4   device2     2   first val of device2
    5   device2     3   first val of device2
    6   device3     1   None
    7   device3     2   None
    8   device3     3   first val of device3
    
    0 讨论(0)
提交回复
热议问题