How to replace all non-NaN entries of a dataframe with 1 and all NaN with 0

后端未结

关注

 9  940

I have a dataframe with 71 columns and 30597 rows. I want to replace all non-nan entries with 1 and the nan values with 0.

Initially I tried for-loop on each value of th

相关标签:

9条回答

我寻月下人不归

2021-02-01 18:41
for fmarc 's answer:
```
df.loc[~df.isnull()] = 1  # not nan
df.loc[df.isnull()] = 0   # nan
```
The code above does not work for me, and the below works.
```
df[~df.isnull()] = 1  # not nan
df[df.isnull()] = 0   # nan
```
With the pandas 0.25.3

And if you want to just change values in specific columns, you may need to create a temp dataframe and assign it to the columns of the original dataframe:
```
change_col = ['a', 'b']
tmp = df[change_col]
tmp[tmp.isnull()]='xxx'
df[change_col]=tmp
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
情深已故

2021-02-01 18:43
I do a lot of data analysis and am interested in finding new/faster methods of carrying out operations. I had never come across jezrael's method, so I was curious to compare it with my usual method (i.e. replace by indexing). NOTE: This is not an answer to the OP's question, rather it is an illustration of the efficiency of jezrael's method. Since this is NOT an answer I will remove this post if people do not find it useful (and after being downvoted into oblivion!). Just leave a comment if you think I should remove it.

I created a moderately sized dataframe and did multiple replacements using both the df.notnull().astype(int) method and simple indexing (how I would normally do this). It turns out that the latter is slower by approximately five times. Just an fyi for anyone doing larger-scale replacements.
```
from __future__ import division, print_function

import numpy as np
import pandas as pd
import datetime as dt


# create dataframe with randomly place NaN's
data = np.ones( (1e2,1e2) )
data.ravel()[np.random.choice(data.size,data.size/10,replace=False)] = np.nan

df = pd.DataFrame(data=data)

trials = np.arange(100)


d1 = dt.datetime.now()

for r in trials:
    new_df = df.notnull().astype(int)

print( (dt.datetime.now()-d1).total_seconds()/trials.size )


# create a dummy copy of df.  I use a dummy copy here to prevent biasing the 
# time trial with dataframe copies/creations within the upcoming loop
df_dummy = df.copy()

d1 = dt.datetime.now()

for r in trials:
    df_dummy[df.isnull()] = 0
    df_dummy[df.isnull()==False] = 1

print( (dt.datetime.now()-d1).total_seconds()/trials.size )
```
This yields times of 0.142 s and 0.685 s respectively. It is clear who the winner is.
0 讨论(0)
发布评论:

提交评论
- 加载中...
孤独总比滥情好

2021-02-01 18:45
You can take the return value of df.notnull(), which is False where the DataFrame contains NaN and True otherwise and cast it to integer, giving you 0 where the DataFrame is NaN and 1 otherwise:
```
newdf = df.notnull().astype('int')
```
If you really want to write into your original DataFrame, this will work:
```
df.loc[~df.isnull()] = 1  # not nan
df.loc[df.isnull()] = 0   # nan
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
轮回少年

2021-02-01 18:47
Here i will give a suggestion to take a particular column and if the rows in that column is NaN replace it by 0 or values are there in that column replace it as 1

this below line will change your column to 0
```
df.YourColumnName.fillna(0,inplace=True)
```
Now Rest of the Not Nan Part will be Replace by 1 by below code
```
df["YourColumnName"]=df["YourColumnName"].apply(lambda x: 1 if x!=0 else 0)
```
Same Can Be applied to the total dataframe by not defining the column Name
0 讨论(0)
发布评论:

提交评论
- 加载中...

星月不相逢

2021-02-01 18:53

Use notnull with casting boolean to int by astype:

print ((df.notnull()).astype('int'))

Sample:

import pandas as pd
import numpy as np

df = pd.DataFrame({'a': [np.nan, 4, np.nan], 'b': [1,np.nan,3]})
print (df)
     a    b
0  NaN  1.0
1  4.0  NaN
2  NaN  3.0

print (df.notnull())
       a      b
0  False   True
1   True  False
2  False   True

print ((df.notnull()).astype('int'))
   a  b
0  0  1
1  1  0
2  0  1

0 讨论(0)

你的背包

2021-02-01 18:55

There is a method .fillna() on DataFrames which does what you need. For example:

df = df.fillna(0)  # Replace all NaN values with zero, returning the modified DataFrame

df.fillna(0, inplace=True)   # Replace all NaN values with zero, updating the DataFrame directly

0 讨论(0)

1 2 下一页