Error: float object has no attribute notnull

前端 未结 4 917
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-02-18 19:15

I have a dataframe:

  a     b     c
0 nan   Y     nan
1  23   N      3
2 nan   N      2
3  44   Y     nan

I wish to have this output:



        
4条回答
  •  闹比i
    闹比i (楼主)
    2021-02-18 20:03

    Since you just want Nans to be propagated, multiplying the columns takes care of that for you:

    >>> df = pd.read_clipboard()
    >>> df
          a  b    c
    0   NaN  Y  NaN
    1  23.0  N  3.0
    2   NaN  N  2.0
    3  44.0  Y  NaN
    >>> df.a * df.c
    0     NaN
    1    69.0
    2     NaN
    3     NaN
    dtype: float64
    >>>
    

    If you want to do it on a condition, you can use np.where here instead of .apply. all you need is the following:

    >>> df
          a  b    c
    0   NaN  Y  NaN
    1  23.0  N  3.0
    2   NaN  N  2.0
    3  44.0  Y  NaN
    >>> np.where(df.b == 'N', df.a*df.c, df.a)
    array([ nan,  69.,  nan,  44.])
    

    This is the default behavior for most operations involving Nan. So, you can simply assign the result of the above:

    >>> df['d'] = np.where(df.b == 'N', df.a*df.c, df.a)
    >>> df
          a  b    c     d
    0   NaN  Y  NaN   NaN
    1  23.0  N  3.0  69.0
    2   NaN  N  2.0   NaN
    3  44.0  Y  NaN  44.0
    >>>
    

    Just to elaborate on what this:

    np.where(df.b == 'N', df.a*df.c, df.a)
    

    Is doing, you can think of it as "where df.b == 'N', give me the result of df.a * df.c, else, give me just df.a:

    >>> np.where(df.b == 'N', df.a*df.c, df.a)
    array([ nan,  69.,  nan,  44.])
    

    Also note, if your dataframe were a little different:

    >>> df
          a  b    c
    0   NaN  Y  NaN
    1  23.0  Y  3.0
    2   NaN  N  2.0
    3  44.0  Y  NaN
    >>> df.loc[0,'a'] = 99
    >>> df.loc[0, 'b']= 'N'
    >>> df
          a  b    c
    0  99.0  N  NaN
    1  23.0  N  3.0
    2   NaN  N  2.0
    3  44.0  Y  NaN
    

    Then the following would not be equivalent:

    >>> np.where(df.b == 'N', df.a*df.c, df.a)
    array([ nan,  69.,  nan,  44.])
    >>> np.where((df.b == 'N') & (~df.c.isnull()), df.a*df.c, df.a)
    array([ 99.,  69.,  nan,  44.])
    

    So you might want to use the slightly more verbose:

    >>> df['d'] = np.where((df.b == 'N') & (~df.c.isnull()), df.a*df.c, df.a)
    >>> df
          a  b    c     d
    0  99.0  N  NaN  99.0
    1  23.0  N  3.0  69.0
    2   NaN  N  2.0   NaN
    3  44.0  Y  NaN  44.0
    >>>
    

提交回复
热议问题