Ignoring non-numerical string values in pandas dataframe

后端 未结 3 999
情话喂你
情话喂你 2021-02-15 04:59

I have a DataFrame in which a column might have three kinds of values, integers (12331), integers as strings (\'345\') or some other string (\'text\').

Is there a way to

3条回答
  •  鱼传尺愫
    2021-02-15 05:35

    Pandas has some tools for converting these kinds of columns, but they may not suit your needs exactly. pd.to_numeric converts mixed columns like yours, but converts non-numeric strings to NaN. This means you'll get float columns, not integer, since only float columns can have NaN values. That usually doesn't matter too much but it's good to be aware of.

    df = pd.DataFrame({'mixed_types': [12331, '345', 'text']})
    
    pd.to_numeric(df['mixed_types'], errors='coerce')
    Out[7]: 
    0    12331.0
    1      345.0
    2        NaN
    Name: mixed_types, dtype: float64
    

    If you want to then drop all the NaN rows:

    # Replace the column with the converted values
    df['mixed_types'] = pd.to_numeric(df['mixed_types'], errors='coerce')
    
    # Drop NA values, listing the converted columns explicitly
    #   so NA values in other columns aren't dropped
    df.dropna(subset = ['mixed_types'])
    Out[11]: 
       mixed_types
    0      12331.0
    1        345.0
    

提交回复
热议问题