I have a dataframe such that the column contains both json objects and strings. I want to get rid of rows that does not contains json objects.
Below is how my dataframe
I think I would prefer to use an isinstance
check:
In [11]: df.loc[df.A.apply(lambda d: isinstance(d, dict))]
Out[11]:
A
2 {'a': 5, 'b': 6, 'c': 8}
5 {'d': 9, 'e': 10, 'f': 11}
If you want to include numbers too, you can do:
In [12]: df.loc[df.A.apply(lambda d: isinstance(d, (dict, np.number)))]
Out[12]:
A
2 {'a': 5, 'b': 6, 'c': 8}
5 {'d': 9, 'e': 10, 'f': 11}
Adjust this to whichever types you want to include...
The last step, json_normalize takes a list of json objects, for whatever reason a Series is no good (and gives the KeyError), you can make this a list and your good to go:
In [21]: df1 = df.loc[df.A.apply(lambda d: isinstance(d, (dict, np.number)))]
In [22]: json_normalize(list(df1["A"]))
Out[22]:
a b c d e f
0 5.0 6.0 8.0 NaN NaN NaN
1 NaN NaN NaN 9.0 10.0 11.0