How to pass another entire column as argument to pandas fillna()

后端 未结 6 2114
太阳男子
太阳男子 2020-11-22 07:01

I would like to fill missing values in one column with values from another column, using fillna method.

(I read that looping through each row would be

相关标签:
6条回答
  • 2020-11-22 07:13

    pandas.DataFrame.combine_first also works.

    (Attention: since "Result index columns will be the union of the respective indexes and columns", you should check the index and columns are matched.)

    import numpy as np
    import pandas as pd
    df = pd.DataFrame([["1","cat","mouse"],
        ["2","dog","elephant"],
        ["3","cat","giraf"],
        ["4",np.nan,"ant"]],columns=["Day","Cat1","Cat2"])
    
    In: df["Cat1"].combine_first(df["Cat2"])
    Out: 
    0    cat
    1    dog
    2    cat
    3    ant
    Name: Cat1, dtype: object
    

    Compare with other answers:

    %timeit df["Cat1"].combine_first(df["Cat2"])
    181 µs ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    
    %timeit df['Cat1'].fillna(df['Cat2'])
    253 µs ± 10.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    %timeit np.where(df.Cat1.isnull(), df.Cat2, df.Cat1)
    88.1 µs ± 793 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    

    I didn't use this method below:

    def is_missing(Cat1,Cat2):    
        if np.isnan(Cat1):        
            return Cat2
        else:
            return Cat1
    
    df['Cat1'] = df.apply(lambda x: is_missing(x['Cat1'],x['Cat2']),axis=1)
    

    because it will raise an Exception:

    TypeError: ("ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''", 'occurred at index 0')
    

    which means np.isnan can be applied to NumPy arrays of native dtype (such as np.float64), but raises TypeError when applied to object arrays.

    So I revise the method:

    def is_missing(Cat1,Cat2):    
        if pd.isnull(Cat1):        
            return Cat2
        else:
            return Cat1
    
    %timeit df.apply(lambda x: is_missing(x['Cat1'],x['Cat2']),axis=1)
    701 µs ± 7.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    0 讨论(0)
  • 2020-11-22 07:18

    Here is a more general approach (fillna method is probably better)

    def is_missing(Cat1,Cat2):    
        if np.isnan(Cat1):        
            return Cat2
        else:
            return Cat1
    
    df['Cat1'] = df.apply(lambda x: is_missing(x['Cat1'],x['Cat2']),axis=1)
    
    0 讨论(0)
  • 2020-11-22 07:19

    You could do

    df.Cat1 = np.where(df.Cat1.isnull(), df.Cat2, df.Cat1)
    

    The overall construct on the RHS uses the ternary pattern from the pandas cookbook (which it pays to read in any case). It's a vector version of a? b: c.

    0 讨论(0)
  • 2020-11-22 07:20

    You can provide this column to fillna (see docs), it will use those values on matching indexes to fill:

    In [17]: df['Cat1'].fillna(df['Cat2'])
    Out[17]:
    0    cat
    1    dog
    2    cat
    3    ant
    Name: Cat1, dtype: object
    
    0 讨论(0)
  • 2020-11-22 07:22

    I know this is an old question, but I had a need for doing something similar recently. I was able to use the following:

    df = pd.DataFrame([["1","cat","mouse"],
        ["2","dog","elephant"],
        ["3","cat","giraf"],
        ["4",np.nan,"ant"]],columns=["Day","Cat1","Cat2"])
    
    print(df)
    
      Day Cat1      Cat2
    0   1  cat     mouse
    1   2  dog  elephant
    2   3  cat     giraf
    3   4  NaN       ant
    
    df1 = df.bfill(axis=1).iloc[:, 1]
    df1 = df1.to_frame()
    print(df1)
    

    Which yields:

      Cat1
    0  cat
    1  dog
    2  cat
    3  ant
    

    Hope this is helpful to someone!

    0 讨论(0)
  • 2020-11-22 07:28

    Just use the value parameter instead of method:

    In [20]: df
    Out[20]:
      Cat1      Cat2  Day
    0  cat     mouse    1
    1  dog  elephant    2
    2  cat     giraf    3
    3  NaN       ant    4
    
    In [21]: df.Cat1 = df.Cat1.fillna(value=df.Cat2)
    
    In [22]: df
    Out[22]:
      Cat1      Cat2  Day
    0  cat     mouse    1
    1  dog  elephant    2
    2  cat     giraf    3
    3  ant       ant    4
    
    0 讨论(0)
提交回复
热议问题