I am working on a large dataset with many columns of different types. There are a mix of numeric values and strings with some NULL values. I need to change the NULL Value to
Use DataFrame.select_dtypes for numeric columns, filter by subset and replace values to 0
, then repalce all another columns to empty string:
print (df)
0 1 2 3 4 5 6 7 8 9
0 1 John 2.0 Doe 3 Mike 4.0 Orange 5 Stuff
1 9 NaN NaN NaN 8 NaN NaN Lemon 12 NaN
print (df.dtypes)
0 int64
1 object
2 float64
3 object
4 int64
5 object
6 float64
7 object
8 int64
9 object
dtype: object
c = df.select_dtypes(np.number).columns
df[c] = df[c].fillna(0)
df = df.fillna("")
print (df)
0 1 2 3 4 5 6 7 8 9
0 1 John 2.0 Doe 3 Mike 4.0 Orange 5 Stuff
1 9 0.0 8 0.0 Lemon 12
Another solution is create dictionary for replace:
num_cols = df.select_dtypes(np.number).columns
d1 = dict.fromkeys(num_cols, 0)
d2 = dict.fromkeys(df.columns.difference(num_cols), "")
d = {**d1, **d2}
print (d)
{0: 0, 2: 0, 4: 0, 6: 0, 8: 0, 1: '', 3: '', 5: '', 7: '', 9: ''}
df = df.fillna(d)
print (df)
0 1 2 3 4 5 6 7 8 9
0 1 John 2.0 Doe 3 Mike 4.0 Orange 5 Stuff
1 9 0.0 8 0.0 Lemon 12
You could try this to substitute a different value for each different column (A
to C
are numeric, while D
is a string):
import pandas as pd
import numpy as np
df_pd = pd.DataFrame([[np.nan, 2, np.nan, '0'],
[3, 4, np.nan, '1'],
[np.nan, np.nan, np.nan, '5'],
[np.nan, 3, np.nan, np.nan]],
columns=list('ABCD'))
df_pd.fillna(value={'A':0.0,'B':0.0,'C':0.0,'D':''})