python: remove all rows in pandas dataframe that contain a string

后端未结

关注

 2  2048

I\'ve got a pandas dataframe called data and I want to remove all rows that contain a string in any column. For example, below we see the \'gdp\' column has a string at index 3,

相关标签:

2条回答

情书的邮戳

2021-02-07 14:13

You can apply a function that tests row-wise your DataFrame for the presence of strings, e.g., say that df is your DataFrame

 rows_with_strings  = df.apply(
       lambda row : 
          any([ isinstance(e, basestring) for e in row ])
       , axis=1)

This will produce a mask for your DataFrame indicating which rows contain at least one string. You can hence select the rows without strings through the opposite mask

 df_with_no_strings = df[~rows_with_strings]

Example:

 a = [[1,2],['a',2], [3,4], [7,'d']]
 df = pd.DataFrame(a,columns = ['a','b'])


 df 
   a  b
0  1  2
1  a  2
2  3  4
3  7  d

select  = df.apply(lambda r : any([isinstance(e, basestring) for e in r  ]),axis=1) 

df[~select]                                                                                                                                

    a  b
 0  1  2
 2  3  4

0 讨论(0)

太阳男子

2021-02-07 14:15
You can take the transpose, call ```convert_objects``, which works columns-wise, and then compare the data types to get a boolean key like this:
```
df[df.T.convert_objects().dtypes != object]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...