If I have a DataFrame:
myDF = DataFrame(data=[[11,11],[22,\'2A\'],[33,33]], columns = [\'A\',\'B\'])
Gives the following dataframe (Starting ou
I had the same question, but for a more general case where it was hard to tell if the function would generate an exception (i.e. you couldn't explicitly check this condition with something as straightforward as isdigit
).
After thinking about it for a while, I came up with the solution of embedding the try/except
syntax in a separate function. I'm posting a toy example in case it helps anyone.
import pandas as pd
import numpy as np
x=pd.DataFrame(np.array([['a','a'], [1,2]]))
def augment(x):
try:
return int(x)+1
except:
return 'error:' + str(x)
x[0].apply(lambda x: augment(x))
A way to achieve that with lambda
:
myDF['B'].apply(lambda x: int(x) if str(x).isdigit() else None)
For your input:
>>> myDF
A B
0 11 11
1 22 2A
2 33 33
[3 rows x 2 columns]
>>> myDF['B'].apply(lambda x: int(x) if str(x).isdigit() else None)
0 11
1 NaN
2 33
Name: B, dtype: float64
much better/faster to do:
In [1]: myDF = DataFrame(data=[[11,11],[22,'2A'],[33,33]], columns = ['A','B'])
In [2]: myDF.convert_objects(convert_numeric=True)
Out[2]:
A B
0 11 11
1 22 NaN
2 33 33
[3 rows x 2 columns]
In [3]: myDF.convert_objects(convert_numeric=True).dtypes
Out[3]:
A int64
B float64
dtype: object
This is a vectorized method of doing just this. The coerce
flag say to mark as nan
anything that cannot be converted to numeric.
You can of course do this to a single column if you'd like.