Remove rows where column value type is string Pandas

前端 未结 4 1989
暗喜
暗喜 2020-12-17 10:21

I have a pandas dataframe. One of my columns should only be floats. When I try to convert that column to floats, I\'m alerted that there are strings in there. I\'d like to d

相关标签:
4条回答
  • 2020-12-17 10:55

    You can find the data type of a column from the dtype.kind attribute. Something like df[col].dtype.kind. See the numpy docs for more details. Transpose the dataframe to go from indices to columns.

    0 讨论(0)
  • 2020-12-17 10:59

    One of my columns should only be floats. I'd like to delete all rows where values in this column are strings

    You can convert your series to numeric via pd.to_numeric and then use pd.Series.notnull. Conversion to float is required as a separate step to avoid your series reverting to object dtype.

    # Data from @EdChum
    
    df = pd.DataFrame({'a': [0.1, 0.5, 'jasdh', 9.0]})
    
    res = df[pd.to_numeric(df['a'], errors='coerce').notnull()]
    res['a'] = res['a'].astype(float)
    
    print(res)
    
         a
    0  0.1
    1  0.5
    3  9.0
    
    0 讨论(0)
  • 2020-12-17 11:07

    Use convert_objects with param convert_numeric=True this will coerce any non numeric values to NaN:

    In [24]:
    
    df = pd.DataFrame({'a': [0.1,0.5,'jasdh', 9.0]})
    df
    Out[24]:
           a
    0    0.1
    1    0.5
    2  jasdh
    3      9
    In [27]:
    
    df.convert_objects(convert_numeric=True)
    Out[27]:
         a
    0  0.1
    1  0.5
    2  NaN
    3  9.0
    In [29]:
    

    You can then drop them:

    df.convert_objects(convert_numeric=True).dropna()
    Out[29]:
         a
    0  0.1
    1  0.5
    3  9.0
    

    UPDATE

    Since version 0.17.0 this method is now deprecated and you need to use to_numeric unfortunately this operates on a Series rather than a whole df so the equivalent code is now:

    df.apply(lambda x: pd.to_numeric(x, errors='coerce')).dropna()
    
    0 讨论(0)
  • 2020-12-17 11:08

    Assume your data frame is df and you wanted to ensure that all data in one of the column of your data frame is numeric in specific pandas dtype, e.g float:

    df[df.columns[n]] = df[df.columns[n]].apply(pd.to_numeric, errors='coerce').fillna(0).astype(float).dropna()
    
    0 讨论(0)
提交回复
热议问题