How to replace all non-NaN entries of a dataframe with 1 and all NaN with 0

后端 未结 9 940
轻奢々
轻奢々 2021-02-01 18:07

I have a dataframe with 71 columns and 30597 rows. I want to replace all non-nan entries with 1 and the nan values with 0.

Initially I tried for-loop on each value of th

相关标签:
9条回答
  • 2021-02-01 18:41

    for fmarc 's answer:

    df.loc[~df.isnull()] = 1  # not nan
    df.loc[df.isnull()] = 0   # nan
    

    The code above does not work for me, and the below works.

    df[~df.isnull()] = 1  # not nan
    df[df.isnull()] = 0   # nan
    

    With the pandas 0.25.3

    And if you want to just change values in specific columns, you may need to create a temp dataframe and assign it to the columns of the original dataframe:

    change_col = ['a', 'b']
    tmp = df[change_col]
    tmp[tmp.isnull()]='xxx'
    df[change_col]=tmp
    
    0 讨论(0)
  • 2021-02-01 18:43

    I do a lot of data analysis and am interested in finding new/faster methods of carrying out operations. I had never come across jezrael's method, so I was curious to compare it with my usual method (i.e. replace by indexing). NOTE: This is not an answer to the OP's question, rather it is an illustration of the efficiency of jezrael's method. Since this is NOT an answer I will remove this post if people do not find it useful (and after being downvoted into oblivion!). Just leave a comment if you think I should remove it.

    I created a moderately sized dataframe and did multiple replacements using both the df.notnull().astype(int) method and simple indexing (how I would normally do this). It turns out that the latter is slower by approximately five times. Just an fyi for anyone doing larger-scale replacements.

    from __future__ import division, print_function
    
    import numpy as np
    import pandas as pd
    import datetime as dt
    
    
    # create dataframe with randomly place NaN's
    data = np.ones( (1e2,1e2) )
    data.ravel()[np.random.choice(data.size,data.size/10,replace=False)] = np.nan
    
    df = pd.DataFrame(data=data)
    
    trials = np.arange(100)
    
    
    d1 = dt.datetime.now()
    
    for r in trials:
        new_df = df.notnull().astype(int)
    
    print( (dt.datetime.now()-d1).total_seconds()/trials.size )
    
    
    # create a dummy copy of df.  I use a dummy copy here to prevent biasing the 
    # time trial with dataframe copies/creations within the upcoming loop
    df_dummy = df.copy()
    
    d1 = dt.datetime.now()
    
    for r in trials:
        df_dummy[df.isnull()] = 0
        df_dummy[df.isnull()==False] = 1
    
    print( (dt.datetime.now()-d1).total_seconds()/trials.size )
    

    This yields times of 0.142 s and 0.685 s respectively. It is clear who the winner is.

    0 讨论(0)
  • 2021-02-01 18:45

    You can take the return value of df.notnull(), which is False where the DataFrame contains NaN and True otherwise and cast it to integer, giving you 0 where the DataFrame is NaN and 1 otherwise:

    newdf = df.notnull().astype('int')
    

    If you really want to write into your original DataFrame, this will work:

    df.loc[~df.isnull()] = 1  # not nan
    df.loc[df.isnull()] = 0   # nan
    
    0 讨论(0)
  • 2021-02-01 18:47

    Here i will give a suggestion to take a particular column and if the rows in that column is NaN replace it by 0 or values are there in that column replace it as 1

    this below line will change your column to 0

    df.YourColumnName.fillna(0,inplace=True)
    

    Now Rest of the Not Nan Part will be Replace by 1 by below code

    df["YourColumnName"]=df["YourColumnName"].apply(lambda x: 1 if x!=0 else 0)
    

    Same Can Be applied to the total dataframe by not defining the column Name

    0 讨论(0)
  • 2021-02-01 18:53

    Use notnull with casting boolean to int by astype:

    print ((df.notnull()).astype('int'))
    

    Sample:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'a': [np.nan, 4, np.nan], 'b': [1,np.nan,3]})
    print (df)
         a    b
    0  NaN  1.0
    1  4.0  NaN
    2  NaN  3.0
    
    print (df.notnull())
           a      b
    0  False   True
    1   True  False
    2  False   True
    
    print ((df.notnull()).astype('int'))
       a  b
    0  0  1
    1  1  0
    2  0  1
    
    0 讨论(0)
  • 2021-02-01 18:55

    There is a method .fillna() on DataFrames which does what you need. For example:

    df = df.fillna(0)  # Replace all NaN values with zero, returning the modified DataFrame
    

    or

    df.fillna(0, inplace=True)   # Replace all NaN values with zero, updating the DataFrame directly
    
    0 讨论(0)
提交回复
热议问题