Ignoring non-numerical string values in pandas dataframe

后端 未结 3 1001
情话喂你
情话喂你 2021-02-15 04:59

I have a DataFrame in which a column might have three kinds of values, integers (12331), integers as strings (\'345\') or some other string (\'text\').

Is there a way to

相关标签:
3条回答
  • 2021-02-15 05:14

    you can use df._get_numeric_data() directly.

    0 讨论(0)
  • 2021-02-15 05:33

    You could use pd.to_numeric with errors=coerce to substitute your non numeric values with NaN and apply it the each column. Then you could use dropna or fillna whatever you prefer.

    df = pd.read_csv('file.csv')
    df = df.apply(pd.to_numeric, errors='coerce')
    df = df.dropna()
    
    0 讨论(0)
  • 2021-02-15 05:35

    Pandas has some tools for converting these kinds of columns, but they may not suit your needs exactly. pd.to_numeric converts mixed columns like yours, but converts non-numeric strings to NaN. This means you'll get float columns, not integer, since only float columns can have NaN values. That usually doesn't matter too much but it's good to be aware of.

    df = pd.DataFrame({'mixed_types': [12331, '345', 'text']})
    
    pd.to_numeric(df['mixed_types'], errors='coerce')
    Out[7]: 
    0    12331.0
    1      345.0
    2        NaN
    Name: mixed_types, dtype: float64
    

    If you want to then drop all the NaN rows:

    # Replace the column with the converted values
    df['mixed_types'] = pd.to_numeric(df['mixed_types'], errors='coerce')
    
    # Drop NA values, listing the converted columns explicitly
    #   so NA values in other columns aren't dropped
    df.dropna(subset = ['mixed_types'])
    Out[11]: 
       mixed_types
    0      12331.0
    1        345.0
    
    0 讨论(0)
提交回复
热议问题