Ignoring non-numerical string values in pandas dataframe

后端未结

关注

 3  1018

I have a DataFrame in which a column might have three kinds of values, integers (12331), integers as strings (\'345\') or some other string (\'text\').

Is there a way to

相关标签:

3条回答

庸人自扰

2021-02-15 05:14

you can use df._get_numeric_data() directly.

0 讨论(0)
发布评论:

提交评论
- 加载中...
北恋

2021-02-15 05:33
You could use pd.to_numeric with errors=coerce to substitute your non numeric values with NaN and apply it the each column. Then you could use dropna or fillna whatever you prefer.
```
df = pd.read_csv('file.csv')
df = df.apply(pd.to_numeric, errors='coerce')
df = df.dropna()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

鱼传尺愫

2021-02-15 05:35

Pandas has some tools for converting these kinds of columns, but they may not suit your needs exactly. pd.to_numeric converts mixed columns like yours, but converts non-numeric strings to NaN. This means you'll get float columns, not integer, since only float columns can have NaN values. That usually doesn't matter too much but it's good to be aware of.

df = pd.DataFrame({'mixed_types': [12331, '345', 'text']})

pd.to_numeric(df['mixed_types'], errors='coerce')
Out[7]: 
0    12331.0
1      345.0
2        NaN
Name: mixed_types, dtype: float64

If you want to then drop all the NaN rows:

# Replace the column with the converted values
df['mixed_types'] = pd.to_numeric(df['mixed_types'], errors='coerce')

# Drop NA values, listing the converted columns explicitly
#   so NA values in other columns aren't dropped
df.dropna(subset = ['mixed_types'])
Out[11]: 
   mixed_types
0      12331.0
1        345.0

0 讨论(0)