python: remove all rows in pandas dataframe that contain a string

后端 未结 2 2041
野趣味
野趣味 2021-02-07 13:52

I\'ve got a pandas dataframe called data and I want to remove all rows that contain a string in any column. For example, below we see the \'gdp\' column has a string at index 3,

相关标签:
2条回答
  • 2021-02-07 14:13

    You can apply a function that tests row-wise your DataFrame for the presence of strings, e.g., say that df is your DataFrame

     rows_with_strings  = df.apply(
           lambda row : 
              any([ isinstance(e, basestring) for e in row ])
           , axis=1) 
    

    This will produce a mask for your DataFrame indicating which rows contain at least one string. You can hence select the rows without strings through the opposite mask

     df_with_no_strings = df[~rows_with_strings]
    

    .

    Example:

     a = [[1,2],['a',2], [3,4], [7,'d']]
     df = pd.DataFrame(a,columns = ['a','b'])
    
    
     df 
       a  b
    0  1  2
    1  a  2
    2  3  4
    3  7  d
    
    select  = df.apply(lambda r : any([isinstance(e, basestring) for e in r  ]),axis=1) 
    
    df[~select]                                                                                                                                
    
        a  b
     0  1  2
     2  3  4
    
    0 讨论(0)
  • 2021-02-07 14:15

    You can take the transpose, call ```convert_objects``, which works columns-wise, and then compare the data types to get a boolean key like this:

    df[df.T.convert_objects().dtypes != object]
    
    0 讨论(0)
提交回复
热议问题