Error: float object has no attribute notnull

前端 未结 4 928
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-02-18 19:15

I have a dataframe:

  a     b     c
0 nan   Y     nan
1  23   N      3
2 nan   N      2
3  44   Y     nan

I wish to have this output:



        
相关标签:
4条回答
  • 2021-02-18 19:43

    You don't need apply, use np.where:

    df['d'] = np.where(df.a.isnull(),
             np.nan,
             np.where((df.b == "N")&(~df.c.isnull()),
                      df.a*df.c,
                      df.a))
    

    Output:

          a  b    c     d
    0   NaN  Y  NaN   NaN
    1  23.0  N  3.0  69.0
    2   NaN  N  2.0   NaN
    3  44.0  Y  NaN  44.0
    
    0 讨论(0)
  • 2021-02-18 19:46

    Use

    pd.isnull(df['Description'][i])
    
    0 讨论(0)
  • 2021-02-18 19:51

    You can try

    df['d'] = np.where((df.b == 'N') & (pd.notnull(df.c)), df.a*df.c, np.where(pd.notnull(df.a), df.a, np.nan))
    
    
        a       b   c      d
    0   NaN     Y   NaN    NaN
    1   23.0    N   3.0    69.0
    2   NaN     N   2.0    NaN
    3   44.0    Y   NaN    44.0
    

    See the documentation for pandas notnull, in your current code, you just need to change series.notnull to pd.notnull(series) for it to work. Though np.where should be more efficient

    def f4(row):
        if row['a']==np.nan:
            return np.nan
        elif (row['b']=="N") & (pd.notnull(row.c)):
            return row['a']*row['c']
        else:
            return row['a']
    df['d']=df.apply(f4,axis=1)
    
    0 讨论(0)
  • 2021-02-18 20:03

    Since you just want Nans to be propagated, multiplying the columns takes care of that for you:

    >>> df = pd.read_clipboard()
    >>> df
          a  b    c
    0   NaN  Y  NaN
    1  23.0  N  3.0
    2   NaN  N  2.0
    3  44.0  Y  NaN
    >>> df.a * df.c
    0     NaN
    1    69.0
    2     NaN
    3     NaN
    dtype: float64
    >>>
    

    If you want to do it on a condition, you can use np.where here instead of .apply. all you need is the following:

    >>> df
          a  b    c
    0   NaN  Y  NaN
    1  23.0  N  3.0
    2   NaN  N  2.0
    3  44.0  Y  NaN
    >>> np.where(df.b == 'N', df.a*df.c, df.a)
    array([ nan,  69.,  nan,  44.])
    

    This is the default behavior for most operations involving Nan. So, you can simply assign the result of the above:

    >>> df['d'] = np.where(df.b == 'N', df.a*df.c, df.a)
    >>> df
          a  b    c     d
    0   NaN  Y  NaN   NaN
    1  23.0  N  3.0  69.0
    2   NaN  N  2.0   NaN
    3  44.0  Y  NaN  44.0
    >>>
    

    Just to elaborate on what this:

    np.where(df.b == 'N', df.a*df.c, df.a)
    

    Is doing, you can think of it as "where df.b == 'N', give me the result of df.a * df.c, else, give me just df.a:

    >>> np.where(df.b == 'N', df.a*df.c, df.a)
    array([ nan,  69.,  nan,  44.])
    

    Also note, if your dataframe were a little different:

    >>> df
          a  b    c
    0   NaN  Y  NaN
    1  23.0  Y  3.0
    2   NaN  N  2.0
    3  44.0  Y  NaN
    >>> df.loc[0,'a'] = 99
    >>> df.loc[0, 'b']= 'N'
    >>> df
          a  b    c
    0  99.0  N  NaN
    1  23.0  N  3.0
    2   NaN  N  2.0
    3  44.0  Y  NaN
    

    Then the following would not be equivalent:

    >>> np.where(df.b == 'N', df.a*df.c, df.a)
    array([ nan,  69.,  nan,  44.])
    >>> np.where((df.b == 'N') & (~df.c.isnull()), df.a*df.c, df.a)
    array([ 99.,  69.,  nan,  44.])
    

    So you might want to use the slightly more verbose:

    >>> df['d'] = np.where((df.b == 'N') & (~df.c.isnull()), df.a*df.c, df.a)
    >>> df
          a  b    c     d
    0  99.0  N  NaN  99.0
    1  23.0  N  3.0  69.0
    2   NaN  N  2.0   NaN
    3  44.0  Y  NaN  44.0
    >>>
    
    0 讨论(0)
提交回复
热议问题