I\'ve got a pandas dataframe called data and I want to remove all rows that contain a string in any column. For example, below we see the \'gdp\' column has a string at index 3,
You can apply a function that tests row-wise your DataFrame
for the presence of strings, e.g., say that df
is your DataFrame
rows_with_strings = df.apply(
lambda row :
any([ isinstance(e, basestring) for e in row ])
, axis=1)
This will produce a mask for your DataFrame indicating which rows contain at least one string. You can hence select the rows without strings through the opposite mask
df_with_no_strings = df[~rows_with_strings]
.
Example:
a = [[1,2],['a',2], [3,4], [7,'d']]
df = pd.DataFrame(a,columns = ['a','b'])
df
a b
0 1 2
1 a 2
2 3 4
3 7 d
select = df.apply(lambda r : any([isinstance(e, basestring) for e in r ]),axis=1)
df[~select]
a b
0 1 2
2 3 4
You can take the transpose, call ```convert_objects``, which works columns-wise, and then compare the data types to get a boolean key like this:
df[df.T.convert_objects().dtypes != object]