Python Pandas: if the data is NaN, then change to be 0, else change to be 1 in data frame

后端 未结 3 699
迷失自我
迷失自我 2021-01-04 08:41

I have a DataFrame:df as following:

 row  id  name    age   url           
  1   e1   tom    NaN   http1   
  2   e2   john   25    NaN
  3   e3   lucy   NaN         


        
相关标签:
3条回答
  • 2021-01-04 09:11

    You can use where with fillna and condition by isnull:

    df[['age', 'url']] = df[['age', 'url']].where(df[['age', 'url']].isnull(), 1)
                                           .fillna(0).astype(int)
    print (df)
    
       row  id  name  age  url
    0    1  e1   tom    0    1
    1    2  e2  john    1    0
    2    3  e3  lucy    0    1
    3    4  e4  tick    1    0
    

    Or numpy.where with isnull:

    df[['age', 'url']] = np.where(df[['age', 'url']].isnull(), 0, 1)
    print (df)
       row  id  name  age  url
    0    1  e1   tom    0    1
    1    2  e2  john    1    0
    2    3  e3  lucy    0    1
    3    4  e4  tick    1    0
    

    Fastest solution with notnull and astype:

    df[['age', 'url']] = df[['age', 'url']].notnull().astype(int)
    print (df)
       row  id  name  age  url
    0    1  e1   tom    0    1
    1    2  e2  john    1    0
    2    3  e3  lucy    0    1
    3    4  e4  tick    1    0
    

    EDIT:

    I try modify your solution:

    df[['age', 'url']] = df[['age', 'url']].applymap(lambda x: 0 if pd.isnull(x) else 1)
    print (df)
       row  id  name  age  url
    0    1  e1   tom    0    1
    1    2  e2  john    1    0
    2    3  e3  lucy    0    1
    3    4  e4  tick    1    0
    

    Timings:

    len(df)=4k:

    In [127]: %timeit df[['age', 'url']] = df[['age', 'url']].applymap(lambda x: 0 if pd.isnull(x) else 1)
    100 loops, best of 3: 11.2 ms per loop
    
    In [128]: %timeit df[['age', 'url']] = np.where(df[['age', 'url']].isnull(), 0, 1)
    100 loops, best of 3: 2.69 ms per loop
    
    In [129]: %timeit df[['age', 'url']] = np.where(pd.notnull(df[['age', 'url']]), 1, 0)
    100 loops, best of 3: 2.78 ms per loop
    
    In [131]: %timeit df.loc[:, ['age', 'url']] = df[['age', 'url']].notnull() * 1
    1000 loops, best of 3: 1.45 ms per loop
    
    In [136]: %timeit df[['age', 'url']] = df[['age', 'url']].notnull().astype(int)
    1000 loops, best of 3: 1.01 ms per loop
    
    0 讨论(0)
  • 2021-01-04 09:17
    df.loc[:, ['age', 'url']] = df[['age', 'url']].notnull() * 1
    df
    

    0 讨论(0)
  • 2021-01-04 09:20

    Use np.where with pd.notnull to replace the missing and valid elements with 0 and 1 respectively:

    In [90]:
    df[['age', 'url']] = np.where(pd.notnull(df[['age', 'url']]), 1, 0)
    df
    
    Out[90]:
       row  id  name  age  url
    0    1  e1   tom    0    1
    1    2  e2  john    1    0
    2    3  e3  lucy    0    1
    3    4  e4  tick    1    0
    
    0 讨论(0)
提交回复
热议问题