I have a DataFrame in which a column might have three kinds of values, integers (12331), integers as strings (\'345\') or some other string (\'text\').
Is there a way to
you can use df._get_numeric_data() directly.
You could use pd.to_numeric with errors=coerce
to substitute your non numeric values with NaN
and apply it the each column. Then you could use dropna
or fillna
whatever you prefer.
df = pd.read_csv('file.csv')
df = df.apply(pd.to_numeric, errors='coerce')
df = df.dropna()
Pandas has some tools for converting these kinds of columns, but they may not suit your needs exactly. pd.to_numeric
converts mixed columns like yours, but converts non-numeric strings to NaN
. This means you'll get float columns, not integer, since only float columns can have NaN
values. That usually doesn't matter too much but it's good to be aware of.
df = pd.DataFrame({'mixed_types': [12331, '345', 'text']})
pd.to_numeric(df['mixed_types'], errors='coerce')
Out[7]:
0 12331.0
1 345.0
2 NaN
Name: mixed_types, dtype: float64
If you want to then drop all the NaN
rows:
# Replace the column with the converted values
df['mixed_types'] = pd.to_numeric(df['mixed_types'], errors='coerce')
# Drop NA values, listing the converted columns explicitly
# so NA values in other columns aren't dropped
df.dropna(subset = ['mixed_types'])
Out[11]:
mixed_types
0 12331.0
1 345.0