I have a 227x4 DataFrame with country names and numerical values to clean (wrangle ?).
Here\'s an abstraction of the DataFrame:
import pandas as pd
i
Use numeric with errors coerce i.e
cols = ['Measure1','Measure2']
df[cols] = df[cols].apply(pd.to_numeric,errors='coerce')
Country Name Measure1 Measure2 0 PuB 7.0 6.0 1 JHq 2.0 NaN 2 opE 4.0 3.0 3 pxl 3.0 6.0 4 ouP NaN 4.0 5 qZR 4.0 6.0
Assign only columns of interest:
cols = ['Measure1','Measure2']
mask = df[cols].applymap(lambda x: isinstance(x, (int, float)))
df[cols] = df[cols].where(mask)
print (df)
Country Name Measure1 Measure2
0 uFv 7 8
1 vCr 5 NaN
2 qPp 2 6
3 QIC 10 10
4 Suy NaN 8
5 eFS 6 4
A meta-question, Is it normal that it takes me more than 3 hours to formulate a question here (including research) ?
In my opinion yes, create good question is really hard.
cols = ['Measure1','Measure2']
df[cols] = df[cols].applymap(lambda x: x if not isinstance(x, str) else np.nan)
or
df[cols] = df[cols].applymap(lambda x: np.nan if isinstance(x, str) else x)
Result:
In [22]: df
Out[22]:
Country Name Measure1 Measure2
0 nBl 10.0 9.0
1 Ayp 8.0 NaN
2 diz 4.0 1.0
3 aad 7.0 3.0
4 JYI NaN 10.0
5 BJO 9.0 8.0