How to replace NaNs by preceding values in pandas DataFrame?

前端 未结 9 2075
無奈伤痛
無奈伤痛 2020-11-22 06:04

Suppose I have a DataFrame with some NaNs:

>>> import pandas as pd
>>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, N         


        
相关标签:
9条回答
  • 2020-11-22 06:55

    Just agreeing with ffill method, but one extra info is that you can limit the forward fill with keyword argument limit.

    >>> import pandas as pd    
    >>> df = pd.DataFrame([[1, 2, 3], [None, None, 6], [None, None, 9]])
    
    >>> df
         0    1   2
    0  1.0  2.0   3
    1  NaN  NaN   6
    2  NaN  NaN   9
    
    >>> df[1].fillna(method='ffill', inplace=True)
    >>> df
         0    1    2
    0  1.0  2.0    3
    1  NaN  2.0    6
    2  NaN  2.0    9
    

    Now with limit keyword argument

    >>> df[0].fillna(method='ffill', limit=1, inplace=True)
    
    >>> df
         0    1  2
    0  1.0  2.0  3
    1  1.0  2.0  6
    2  NaN  2.0  9
    
    0 讨论(0)
  • 2020-11-22 06:57

    The accepted answer is perfect. I had a related but slightly different situation where I had to fill in forward but only within groups. In case someone has the same need, know that fillna works on a DataFrameGroupBy object.

    >>> example = pd.DataFrame({'number':[0,1,2,nan,4,nan,6,7,8,9],'name':list('aaabbbcccc')})
    >>> example
      name  number
    0    a     0.0
    1    a     1.0
    2    a     2.0
    3    b     NaN
    4    b     4.0
    5    b     NaN
    6    c     6.0
    7    c     7.0
    8    c     8.0
    9    c     9.0
    >>> example.groupby('name')['number'].fillna(method='ffill') # fill in row 5 but not row 3
    0    0.0
    1    1.0
    2    2.0
    3    NaN
    4    4.0
    5    4.0
    6    6.0
    7    7.0
    8    8.0
    9    9.0
    Name: number, dtype: float64
    
    0 讨论(0)
  • 2020-11-22 07:00

    You can use pandas.DataFrame.fillna with the method='ffill' option. 'ffill' stands for 'forward fill' and will propagate last valid observation forward. The alternative is 'bfill' which works the same way, but backwards.

    import pandas as pd
    
    df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
    df = df.fillna(method='ffill')
    
    print(df)
    #   0  1  2
    #0  1  2  3
    #1  4  2  3
    #2  4  2  9
    

    There is also a direct synonym function for this, pandas.DataFrame.ffill, to make things simpler.

    0 讨论(0)
提交回复
热议问题