I have a DataFrame in which a column might have three kinds of values, integers (12331), integers as strings (\'345\') or some other string (\'text\').
Is there a way to
Pandas has some tools for converting these kinds of columns, but they may not suit your needs exactly. pd.to_numeric
converts mixed columns like yours, but converts non-numeric strings to NaN
. This means you'll get float columns, not integer, since only float columns can have NaN
values. That usually doesn't matter too much but it's good to be aware of.
df = pd.DataFrame({'mixed_types': [12331, '345', 'text']})
pd.to_numeric(df['mixed_types'], errors='coerce')
Out[7]:
0 12331.0
1 345.0
2 NaN
Name: mixed_types, dtype: float64
If you want to then drop all the NaN
rows:
# Replace the column with the converted values
df['mixed_types'] = pd.to_numeric(df['mixed_types'], errors='coerce')
# Drop NA values, listing the converted columns explicitly
# so NA values in other columns aren't dropped
df.dropna(subset = ['mixed_types'])
Out[11]:
mixed_types
0 12331.0
1 345.0