I have a dataframe with 71 columns and 30597 rows. I want to replace all non-nan entries with 1 and the nan values with 0.
Initially I tried for-loop on each value of th
Use: df.fillna(0)
to fill NaN with 0.
Generally there are two steps - substitute all not NAN values and then substitute all NAN values.
dataframe.where(~dataframe.notna(), 1)
- this line will replace all not nan values to 1.dataframe.fillna(0)
- this line will replace all NANs to 0Side note: if you take a look at pandas documentation, .where
replaces all values, that are False
- this is important thing. That is why we use inversion to create a mask ~dataframe.notna()
, by which .where()
will replace values
I'd advise making a new column rather than just replacing. You can always delete the previous column if necessary but its always helpful to have a source for a column populated via an operation on another.
e.g. if df['col1'] is the existing column
df['col2'] = df['col1'].apply(lambda x: 1 if not pd.isnull(x) else np.nan)
where col2 is the new column. Should also work if col2 has string entries.